Tag Archives: Knowledge Graph

The Diffbot Knowledge Graph and Extraction Tools

DBpedia Member Features – In the last few weeks, we gave DBpedia members the chance to present special products, tools and applications and share them with the community. We already published several posts in which DBpedia members provided unique insights. This week we will continue with Diffbot. They will present the Diffbot Knowledge Graph and various extraction tools. Have fun while reading!

by Diffbot

Diffbot’s mission to “structure the world’s knowledge” began with Automatic Extraction APIs meant to pull structured data from most pages on the public web by leveraging machine learning rather than hand-crafted rules.

More recently, Diffbot has emerged as one of only three Western entities to crawl a vast majority of the web, utilizing our Automatic Extraction APIs to make the world’s largest commercially-available Knowledge Graph.

A Knowledge Graph At The Scale Of The Web

The Diffbot Knowledge Graph is automatically constructed by crawling and extracting data from over 60 billion web pages. It currently represents over 10 billion entities and 1 trillion facts about People, Organizations, Products, Articles, Events, among others.

Users can access the Knowledge Graph programmatically through an API. Other ways to access the Knowledge Graph include a visual query interface and a range of integrations (e.g., Excel, Google Sheets, Tableau). 

Visually querying the web like a database


Whether you’re consuming Diffbot KG data in a visual “low code” way or programmatically, we’ve continually added features to our powerful query language (Diffbot Query Language, or DQL) to allow users to “query the web like a database.” 

Guilt-Free Public Web Data

Current use cases for Diffbot’s Knowledge Graph and web data extraction products run the gamut and include data enrichment; lead enrichment; market intelligence; global news monitoring; large-scale product data extraction for ecommerce and supply chain; sentiment analysis of articles, discussions, and products; and data for machine learning. For all of the billions of facts in Diffbot’s KG, data provenance is preserved with the original source (a public URL) of each fact.

Entities, Relationships, and Sentiment From Private Text Corpora 

The team of researchers at Diffbot has been developing new natural language processing techniques for years to improve their extraction and KG products. In October 2020, Diffbot made this technology commercially-available to all via the Natural Language API

Our Natural Language API Demo Parsing Text Input About Diffbot Founder, Mike Tung

Our Natural Language API pulls out entities, relationships/facts, categories and sentiment from free-form texts. This allows organizations to turn unstructured texts into structured knowledge graphs. 

Diffbot and DBpedia

In addition to extracting data from web pages, Diffbot’s Knowledge Graph compiles public web data from many structured sources. One important source of knowledge is DBpedia. Diffbot also contributes to DBpedia by providing access to our extraction and KG services and collaborating with researchers in the DBpedia community. For a recent collaboration between DBpedia and Diffbot, be sure to check out the Diffbot track in DBpedia’s Autumn Hackathon for 2020

A big thank you to Diffbot, especially Filipe Mesquita for presenting their innovative Knowledge Graph.  

Yours,

DBpedia Association

FinScience: leveraging DBpedia tools for fintech applications

DBpedia Member Features – In the last few weeks, we gave DBpedia members the chance to present special products, tools and applications and share them with the community. We already published several posts in which DBpedia members provided unique insights. This week we will continue with FinScience. They will present their latest products, solutions and challenges. Have fun while reading!

by FinScience

A brief presentation of who we are

FinScience is an Italian data-driven fintech company founded in 2017 in Milan by Google’s former senior managers and Alternative Data experts, who have combined their digital and financial expertise. FinScience, thus, originates from this merger of the world of Finance and the world of Data Science.
The company leverages founders’ experiences concerning Data Governance, Data Modeling and Data Platforms solutions. These are further enriched through the tech role in the European Consortium SSIX (Horizon 2020 program) focused on the building of a Social Sentiment for financial purposes. FinScience applies proprietary AI-based technologies to combine financial data/insights with alternative data in order to generate new investment ideas, ESG scores and non-conventional lists of companies that can be included in investment products by financial operators.

The FinScience’s data analysis pipeline is strongly grounded on the DBpedia ontology: the greatest value, according to our experience, is given by the possibility to connect knowledge in different languages, to query automatically-extracted structured information and to have rather frequently updated models.

Products and solutions

FinScience daily retrieves content from the web. About 1.5 million web pages are visited every day on about 35.000 different domains. The content of these pages is extracted, interpreted and analysed via Natural Language Processing techniques to identify valuable information and sources. Thanks to the structured information based on the DBpedia ontology, we can apply our proprietary AI algorithms to suggest to our customers the right investment opportunities.Our products are mainly based on the integration of this purely digital data – we call it “alternative data”- with traditional sources coming from the world of finance and sustainability. We describe these products briefly:

  • FinScience Platform for traders​: it leverages the power of machine learning to help traders monitor specific companies, spot new trends in the financial market, give access to an high added-value selection of companies and themes.
  • ESG scoring​: we provide an assessment of corporate ESG performance, by combining internal data (traditional, self-disclosed data) with external ‘alternative’ data (stakeholder-generated data) in order to measure the gap between what the companies communicate and what is stakeholder perception related to corporate sustainability commitments.
  • Thematic selections of listed companies​ : we create Trend-Driven selections oriented towards innovative themes: our data, together with the analysis of financial specialists, contribute to the selection of a set of listed companies related to trending themes such as the Green New Deal, the 5G technology or new medtech applications.

FinScience and DBpedia

As mentioned before, FinScience is strongly grounded in the DBpedia ontology, since we employ Spotlight to perform Named Entity Recognition (NER), namely automatic annotation of entities in a text. The NER task is performed with a two step procedure. The first step consists in annotating the named entity of a text using ​ DBpedia Spotlight​. In particular, Spotlight links a mention in the text (that is identified by its name and its context within the text) to the DBpedia entity that maximizes the joint probability of occurrence of both. The model is pre-trained on texts extracted from Wikipedia. Note that each entity is represented by a link to a DBpedia page (see, e.g. ​ http://dbpedia.org/page/Eni​ ), a DBpedia type indicating the type of the entity according to ​ this​ ontology and other information.

Another interesting feature of this approach is that we have a one to one mapping of the italian and english entities (and in general any language supported by DBpedia), allowing us to have a unified representation of an entity in the two languages. We are able to obtain this kind of information by exploiting the potential of ​ DBpedia Virtuoso​, which allows us to access DBpedia dataset via SPARQL. By identifying the entities mentioned in the online content, we can understand which topics are mentioned and thus identify companies and trends that are spreading in the digital ecosystem as well as analyzing how they are related to each other.

Challenges and next steps

One of the toughest challenges for us is to find an optimal way to update the models used by DBpedia Spotlight. Every day new entities and concepts arise and we are willing to recognise them in the news we analyze. And that is not all. In addition to recognizing new concepts, we need to be able to track an entity through all the updated versions of the model. In this way, we will not only be able to identify entities, but we will also have evidence of when some concepts were first generated. And we will know how they have changed over time, regardless of the names that have been used to identify them.

We are strongly involved in the DBpedia community and we try to contribute with our know-how. Particularly FinScience will contribute on infrastructure and Dockerfiles as well as on finding issues on the new released project (for instance, ​wikistats-extractor​).

A big thank you to FinSciene for presenting their products, challenges and contribution to DBpedia.  

Yours,

DBpedia Association

PoolParty Semantic Suite: The Ideal Tool To Build And Manage Enterprise Knowledge Graphs

DBpedia Member Features – In the coming weeks, we will give DBpedia members the chance to present special products, tools and applications and share them with the community. We will publish several posts in which DBpedia members provide unique insights. This week the Semantic Web Company will present use cases for the PoolParty Semantic Suite. Have fun while reading!

by the Semantic Web Company

About 80 to 90 percent of the information companies generate is extremely diverse and unstructured — stored in text files, e-mails or similar documents, what makes it difficult to search and analyze. Knowledge graphs have become a well-known solution to this problem because they make it possible to extract information from text and link it to other data sources, whether structured or not. However, building a knowledge graph at enterprise scale can be challenging and time-consuming.

PoolParty Semantic Suite is the most complete and secure semantic platform on the global market. It is also the ideal tool to help companies build and manage Enterprise Knowledge Graphs. With PoolParty in place, you will have no problems extracting value from large amounts of heterogeneous data, no matter if it’s stored in a relational database or in text files. The platform provides comprehensive tools for the management of enterprise knowledge graphs along the entire life cycle. Here is a list of the main use cases for the PoolParty Semantic Suite:

Data linking and enrichment

Driven by the Linked Data initiative, increasing amounts of viable data sets about various topics have been published on the Semantic Web. PoolParty allows users to use these online resources, amongst them DBPedia, to easily and quickly enrich a thesaurus with more data.

Search and recommender engines

Arrive at enriched and in-depth search results that provide relevant facts and contextualized answers to your specific questions, rather than a broad search result with many (ir)relevant documents and messages – but no valuable input. PoolParty Semantic Suite can be used to implement semantic search and recommendations that are relevant to your users.

Text Mining and Auto Tagging

Manually tagging an entire database is very time-consuming and often leads to inconsistent search results. PoolParty’s graph-based text mining can improve this process making it faster, consistent and precise. This is achieved by using advanced text mining algorithms and Natural Language Processing to automatically extract relevant entities, terms and other metadata from text and documents, helping drive in-depth text analytics.

Data Integration and Data Fabric

The Semantic Data Fabric is a new solution to data silos that combines the best-of-breed technologies, data catalogs and knowledge graphs, based on Semantic AI. With a semantic data fabric, companies can combine text and documents (unstructured) with data residing in relational databases and data warehouses (structured) to create a comprehensive view of their customers, employees, products, and other vital areas of business.

Taxonomies, Ontologies and Knowledge Graphs That Scale

With release 8.0 of the PoolParty Semantic Suite, users have even more options to conveniently generate, edit, and use knowledge graphs. In addition, the powerful and performant GraphDB by Ontotext has been added as PoolParty’s recommended embedded store and it is shipped as an add-on module. GraphDB is an enterprise-level graph database with state-of-the-art performance, scalability and security. This provides greater robustness to PoolParty and allows you to work with much larger taxonomies effectively.

A big thank you to the Semantic Web Company presenting use cases for the PoolParty Semantic Suite. 

Yours,

DBpedia Association


TerminusDB and DBpedia

DBpedia Member Features – In the coming weeks, we will give DBpedia members the chance to present special products, tools and applications and share them with the community. We will publish several posts in which DBpedia members provide unique insights. This week TerminusDB will show you how to use TerminusDB’s unique collaborative features to access DBpedia data. Have fun while reading!

by Luke Feeney from TerminusDB

This post introduces TerminusDB as a member of the DBpedia Association – proudly supporting the important work of DBpedia. It will also show you how to use TerminusDB’s unique collaborative features to access DBpedia data.

TerminusDB – an Open Source Knowledge Graph

TerminusDB is an open-source knowledge graph database that provides reliable, private & efficient revision control & collaboration. If you want to collaborate with colleagues or build data-intensive applications, nothing will make you more productive.

TerminusDB provides the full suite of revision control features and TerminusHub allows users to manage access to databases and collaboratively work on shared resources.

  • Flexible data storage, sharing, and versioning capabilities
  • Collaboration for your team or integrated in your app
  • Work locally then sync when you push your changes
  • Easy querying, cleaning, and visualization
  • Integrate powerful version control and collaboration for your enterprise and individual customers.

The TerminusDB project originated in Trinity College Dublin in Ireland in 2015. From its earliest origins, TerminusDB worked with DBpedia through the ALIGNED project, which was a research project funded by Horizon 2020 that focused on building quality-centric software for data management.

ALIGNED Project with early TerminusDB (then called ‘Dacura’) and DBpedia


While working on this project and especially our work building the architecture behind Seshat: The Global History Databank, we needed a solution that could enable collaboration among a highly distributed team on a shared database whose primary function was the curation of high-quality datasets with a very rich structure. While the scale of data was not particularly large, the complexity was extremely high. Unfortunately, the linked-data and RDF toolchains was severely lacking – we evaluated several tools in an attempt to architect a solution; however, in the end we were forced to build an end-to-end ourselves.

Evolution of TerminusDB

In general, we think that computers are fantastic things because they allow you to leverage much more evidence when making decisions than would otherwise be possible. It is possible to write computer programs that automate the ingestion and analysis of unimaginably large quantities of data.

If the data is well chosen, it is almost always the case that computational analysis reveals new and surprising insights simply because it incorporates more evidence than could possibly be captured by a human brain. And because the universe is chaotic and there are combinatorial explosions of possibilities all over the place, evidence is always better than intuition when seeking insight.

As anybody who has grappled with computers and large quantities of data will know, it’s not as simple as that. Computers should be able to do most of this for us. It makes no sense that we are still writing the same simple and tedious data validation and transformation programs over and over ad infinitum. There must be a better way.

This is the problem that we set out to solve with TerminusDB. We identified two indispensable characteristics that were lacking in data management tools:

  1. A rich and universally machine-interpretable modelling language. If we want computers to be able to transform data between different representations automatically, they need to be able to describe their data models to one another.
  2. Effective revision control. Revision control technologies have been instrumental in turning software production from a craft to an engineering discipline because they make collaboration and coordination between large groups much more fault tolerant. The need for such capabilities is obvious when dealing with data – where the existence of multiple versions of the same underlying dataset is almost ubiquitous and with only the most primitive tool support.

TerminusDB and DBpedia

Team TerminusDB took part in the DBpedia Autumn Hackathon 2020. As you know, DBpedia is an extract of the structured data from Wikipedia.

Our Hackathon Project Board

You can read all about our DBpedia Autumn Hackathon adventures in this blog post.

Open Source

Unlike many systems in the graph database world, TerminusDB is committed to open source. We believe in the principals of open source, open data and open science. We welcome all those data people that want to contribute to the general good of the world. This is very much in alignment with the DBpedia Association and community.

DBpedia on TerminusHub

TerminusHub is the collaborative point between TerminusDBs. You can push data to you colleagues and collaborators, you can pull updates (efficiently – just the diffs) and you can clone databases that are made available on the Hub (by the TerminusDB team or by others). Think of it as GitHub, but for data.

The DBpedia database is available on TerminusHub. You can clone the full DB in a couple of minutes (depending on your internet connection of course) and get querying. TerminusDB uses succinct data structures to compress everything so it makes sharing large database feasible – more technical detail here: https://github.com/terminusdb/terminusdb/blob/dev/docs/whitepaper/terminusdb.pdf for interested parties.

TerminusDB in the DBpedia Association

We will contribute to DBpedia by working to improve the quality of data available, by introducing new datasets that can be integrated with DBpedia, and by participating fully in the community.

We are looking forward to a bright future together.

A big thank you to Luke and TerminusDB presenting how TerminusDB works and how they would like to work with DBpedia in the future.

Yours,

DBpedia Association

Call for Participants: DBpedia Autumn Hackathon

Dear DBpedians, Linked Data savvies and Ontologists,

We would like to invite you to join the DBpedia Autumn Hackathon 2020 as a new format to contribute to DBpedia, gain fame, win small prizes and experience the latest technology provided by DBpedia Association members. 
The hackathon is part of the Knowledge Graphs in Action conference on October 6, 2020. 

Timeline 

  • Registration of participants – main communication channel will be the #hackathon channel in DBpedia Slack (sign up, then add yourself to the channel). If you wish to receive a reminder email on Sep 21st, 2020 you can leave your email address in this form.
  • Until September 14th – preparation phase, participating organizations prepare details and track formation. Additional tracks can be proposed, please contact dbpedia-events@infai.org.
  • Announcement of details for each track, including prizes, participating data, demos as well as tools and tasks. Please check updates on the Hackathon website. – September 21st, 2020
  • Hacking period, coordinated via DBpedia slack September 21st to October 1st, 2020
  • Submission of hacking result (3 min video and 2-3 paragraph summary with links, if not stated otherwise in the track) – October 1st, 2020 at 23:59 Hawaii Time
  • Final Event, Each track chair will present a short recap of the track and announces prizes or summarizes the result of hacking. – October 5th, 2020 at 16:00 CEST
  • Knowledge Graphs in Action Event (see program) – October 6th, 2020 at 9:50 – 15:30 CEST
  • Results and videos are documented on the DBpedia Website and the DBpedia Youtube channel.

Member Tracks 

The member tracks are hosted by DBpedia Association members, who are technology leaders in the area of Knowledge Engineering. Additional tracks can be proposed until Sep 14th, please contact dbpedia-events@infai.org.

  • timbr SQL Knowledge Graph: Learn how to model, map and query ontologies in timbr and then model an ontology of GDELT, map it to the GDELT database, and answer a number of questions that currently are quite impossible to get from the BigQuery GDELT database. Cash prizes planned. 
  • GNOSS Knowledge Graph Builder: Give meaning to your organisation’s documents and data with a Knowledge Graph. 
  • ImageSnippets: Labeling images with semantic descriptions. Use DBpedia spotlight and an entity matching lookup to select DBpedia terms to describe images. Then explore the resulting dataset through searches over inference graphs and explore the ImageSnippets dataset through our SPARQL endpoint. Prizes planned. 
  • Diffbot: Build Your Own Knowledge Graph! Use the Natural Language API to extract triples from natural language text and expand these triples with data from the Diffbot Knowledge Graph (10+ billion entities, 1+ trillion facts). Check out the demo. All participants will receive access to the Diffbot KG and tools for (non-commercial) research for one year ($10,000 value).

Dutch National Knowledge Graph Track

Following the DBpedia FlexiFusion approach, we are currently flexi-fusing a huge, dbpedia-style knowledge graph that will connect many Linked Data sources and data silos relevant to the country of the Netherlands. We hope that this will eventually crystallize a well-connected sub-community linked open data (LOD) cloud in the same manner as DBpedia crystallized the original LOD cloud with some improvements (you could call it LOD Mark II). Data and hackathon details will be announced on 21st of September.

Organising committee:

Improve DBpedia Track

A community track, where everybody can participate and contribute in improving existing DBpedia components, in particular the extraction framework, the mappings, the ontology, data quality test cases, new extractors, links and other extensions. Best individual contributions will be acknowledged on the DBpedia website by anointing the WebID/Foaf profile.

(chaired by Milan Dojchinovski and Marvin Hofer from the DBpedia Association & InfAI and the DBpedia Hacking Committee, please message @m1ci to volunteer to the hacking committee)

DBpedia Open Innovation Track 

(not part of the hackathon, pre-announcement)

For the DBpedia Spring Event 2021, we are planning an Open Innovation Track, where DBpedians can showcase their applications. This endeavour will not be part of the hackathon as we are looking for significant showcases with development effort of months & years built on the core infrastructure of DBpedia such as the SPARQL endpoint, the data, lookup, spotlight, DBpedia Live, etc. Details will be announced during the Hackathon Final Event on October 5.  

(chaired by Heiko Paulheim et al.)

Stay tuned and check Twitter, Facebook and our Website or subscribe to our Newsletter for latest news and information.

The DBpedia Organizing Team


‘Knowledge Graphs in Action’ online event on Oct 6, 2020

Due to current circumstances, the SEMANTiCS Onsite Conference 2020 had, unfortunately, to be postponed till September 2021. To bridge the gap until 2021, DBpedia, PLDN and EuroSDR will organize a SEMANTiCS satellite event online, on October 6, 2020. We set up an exciting themed program around ‘Knowledge Graphs in Action: DBpedia, Linked Geodata and Geo-information Integration’.

This new event is a combination of two already existing ones: the DBpedia Community Meeting, which is regularly held as part of the SEMANTiCS, and the annual Spatial Linked Data conference organised by EuroSDR and the Platform Linked Data Nederland. We fused both together and as a bonus, we added a track about Geo-information Integration hosted by EuroSDR. For the joint opening session, we recruited four amazing keynote speakers to kick the event off.    

Highlights of the Knowledge Graph in Action event

– Hackathon (starts 2 weeks earlier)

– Keynote by Carsten Hoyer-Click, German Aerospace Center

– Keynote by Marinos Kavouras, National Technical University of Athens

– Keynote by Peter Mooney, Maynooth University

– Spatial Linked Data Country Session

– DBpedia Chapter Session

– Self Service GIS Session

– DBpedia Showcase Session

Quick Facts

– Web URL: https://wiki.dbpedia.org/meetings/KnowledgeGraphsInAction

– When: October 6, 2020

– Where: The conference will take place fully online.

Schedule

– Please check the schedule for the upcoming Knowledge Graphs in Action event here: https://wiki.dbpedia.org/meetings/KnowledgeGraphsInAction  

Registration 

– Attending the conference is free. Registration is required though. Please get in touch with us if you have any problems during the registration stage. Register here to be part of the meeting: https://wiki.dbpedia.org/meetings/KnowledgeGraphsInAction 

Organisation

– Benedicte Bucher, University Gustave Eiffel, IGN, EuroSDR

– Erwin Folmer, Kadaster, University of Twente, Platform Linked Data Netherlands

– Rob Lemmens, University of Twente

– Sebastian Hellmann, AKSW/KILT, DBpedia Association

– Julia Holze, DBpedia Association

Don’t think twice and register now! Join the Knowledge Graph in Action event on October 6, 2020 to catch up with the latest research results and developments in the Semantic Web Community. Register here and meet us and other SEMANTiCS enthusiasts.

For latest news and updates check Twitter, LinkedIn, the DBpedia blog and our Website or subscribe to our newsletter.

We are looking forward to meeting you online!

Julia

on behalf of the DBpedia Association

ImageSnippets and DBpedia

 by Margaret Warren 

The following post introduces to you ImageSnippets and how this tool profits from the use of DBpedia.

ImageSnippets – A Tool for Image Curation

For over two decades, ImageSnippets has been evolving as an ontology and data-driven framework for image annotation research. Representing the informal knowledge people have about the context and provenance of images as RDF/linked data is challenging, but it has also been an enlightening and engaging journey in not only applying formal semantic web theory to building image graphs but also to weave together our interests with what others have been doing in the field of semantic annotation and knowledge graph building over these many years. 

DBpedia provides the entities for our RDF descriptions

Since the beginning, we have always made use of DBpedia and other publicly available datasets to provide the entities for use in our RDF descriptions.  Though ImageSnippets can be used to build special vocabularies around niche domains, our primary research is around relation ontology building and we prefer to avoid the creation of new entities unless we absolutely can not find them through any other service.

When we first went live with our basic system in 2013, we began hand-building tens of thousands of triples using terms primarily from DBpedia (the core of the linked data cloud.) While there would often be an overlap of terms with other datasets – almost a case of too many choices – we formed a best practice of preferentially using DBpedia terms as often as possible, because they gave us the most utility for reasoning using the SKOS concepts built into the DBpedia service. We have also made extensive use of DBpedia Spotlight for named-entity extraction.

How to combine DBpedia & Wikidata and make it useful for ImageSnippets

But the addition of the Wikidata Query Service over the past 18 months or so has now given us an even more unique challenge: how to work with both! Since DBpedia and Wikidata both have class relationships that we can reason from, we found ourselves in a position to be able to examine both DBpedia and Wikidata in concert with each other through the use of mapping techniques between the two datasets.

How it works: ImageSnippets & DBpedia

When an image is saved, we build inference graphs over results from both DBpedia and Wikidata. These graphs can be revealed with simple SPARQL queries at our endpoint and queries from subclasses, taxons and SKOS concepts can find image results in our custom search tool.  We have also just recently added a pathfinder utility – highly useful for semantic explainability as it will return the precise path of connections from an originating source entity to the target entity that was used in our custom image search.

Sometimes a query will produce very unintuitive results, and the pathfinder tool enables us to quickly locate semantic errors which lead to clearly erroneous misclassifications (for example, a search for the Wikidata subclass of ‘communication medium’ reveals images of restaurants and hotels because of misclassifications in Wikidata.) In this way we can quickly troubleshoot the results of queries, using the images as visual cues to explore the accuracy of the semantic modelling in both datasets.


We are very excited with the new directions that we feel can come of our knitting together of the two knowledge graphs through the use of our visual interface and believe there is a great potential for ImageSnippets to serve a more complex role in cleaning and aligning the two datasets, using the images as our guides.

A big thank you to Margaret Warren for providing some insights into her work at ImageSnippets.

Yours,

DBpedia Association

SEMANTiCS 2019 Interview: Katja Hose

Today’s post features an interview with our DBpedia Day keynote speaker Katja Hose, a Professor of Computer Science at Aalborg University, Denmark. In this Interview, Katja talks about increasing the reliability of Knowledge Graph Access as well as her expectations for SEMANTiCS 2019

Prior to joining Aalborg University, Katja was a postdoc at the Max Planck Institute for Informatics in Saarbrücken. She received her doctoral degree in Computer Science from Ilmenau University of Technology in Germany.

Can you tell us something about your research focus?

The most important focus of my research has been querying the Web of Data, in particular, efficient query processing over distributed knowledge graphs and Linked Data. This includes indexing, source selection, and efficient query execution. Unfortunately, it happens all too often that the services needed to access remote knowledge graphs are temporarily not available, for instance, because a software component crashed. Hence, we are currently developing a decentralized architecture for knowledge sharing that will make access to knowledge graphs a reliable service, which I believe is the key to a wider acceptance and usage of this technology.

How do you personally contribute to the advancement of semantic technologies?

I contribute by doing research, advancing the state of the art, and applying semantic technologies to practical use cases.  The most important achievements so far have been our works on indexing and federated query processing, and we have only recently published our first work on a decentralized architecture for sharing and querying semantic data. I have also been using semantic technologies in other contexts, such as data warehousing, fact-checking, sustainability assessment, and rule mining over knowledge bases.

Overall, I believe the greatest ideas and advancements come when trying to apply semantic technologies to real-world use cases and problems, and that is what I will keep on doing.

Which trends and challenges do you see for linked data and the semantic web?

The goal and the idea behind Linked Data and the Semantic Web is the second-best invention after the Internet. But unlike the Internet, Linked Data and the Semantic Web are only slowly being adopted by a broader community and by industry.

I think part of the reason is that from a company’s point of view, there are not many incentives and added benefit of broadly sharing the achievements. Some companies are simply reluctant to openly share their results and experiences in the hope of retaining an advantage over their competitors. I believe that if these success stories were shared more openly, and this is the trend we are witnessing right now, more companies will see the potential for their own problems and find new exciting use cases.

Another particular challenge, which we will have to overcome, is that it is currently still far too difficult to obtain and maintain an overview of what data is available and formulate a query as a non-expert in SPARQL and the particular domain… and of course, there is the challenge that accessing these datasets is not always reliable.

As artificial intelligence becomes more and more important, what is your vision of AI?

AI and machine learning are indeed becoming more and more important. I do believe that these technologies will bring us a huge step ahead. The process has already begun. But we also need to be aware that we are currently in the middle of a big hype where everybody wants to use AI and machine learning – although many people actually do not truly understand what it is and if it is actually the best solution to their problems. It reminds me a bit of the old saying “if the only tool you have is a hammer, then every problem looks like a nail”. Only time will tell us which problems truly require machine learning, and I am very curious to find out which solutions will prevail.

However, the current state of the art is still very far away from the AI systems that we all know from Science Fiction. Existing systems operate like black boxes on well-defined problems and lack true intelligence and understanding of the meaning of the data. I believe that the key to making these systems trustworthy and truly intelligent will be their ability to explain their decisions and their interpretation of the data in a transparent way.

What are your expectations about Semantics 2019 in Karlsruhe?

First and foremost, I am looking forward to meeting a broad range of people interested in semantic technologies. In particular, I would like to get in touch with industry-based research and to be exposed 

The End

We like to thank Katje Hose for her insights and are happy to have her as one of our keynote speakers.

Visit SEMANTiCS 2019 in Karlsruhe, Sep 9-12 and get your tickets for our community meeting here. We are looking forward to meeting you during DBpedia Day.

Yours DBpedia Association

RDF2NL: Generating Texts from RDF Data

RDF2NL is featured in the following guest post by Diego Moussalem, (Dice Research Group & Portuguese DBpedia Chapter).

Hi DBpedians,

During the DBpedia Day in Leipzig, I gave a talk about how to use the facts contained in the DBpedia Knowledge Graph for generating coherent sentences and texts.

We essentially rely on Natural Language Generation (NLG) techniques for accomplishing this task. NLG is the process of generating coherent natural language text from non-linguistic data (Reiter and Dale, 2000). Despite community agreement on the actual text and speech output of these systems, there is far less consensus on what the input should be (Gatt and Krahmer, 2017). A large number of inputs have been taken for NLG systems, including images (Xu et al., 2015), numeric data (Gkatzia et al., 2014), semantic representations (Theune et al., 2001).

Why not generate text from Knowledge graphs? 

The generation of natural language from the Semantic Web has been already introduced some years ago (Ngonga Ngomo et al., 2013; Bouayad-Agha et al., 2014; Staykova, 2014). However, it has gained recently substantial attention and some challenges have been proposed to investigate the quality of automatically generated texts from RDF (Colin et al., 2016). Moreover, RDF has demonstrated a promising ability to support the creation of NLG benchmarks (Gardent et al., 2017). Still, English is the only language which has been widely targeted. Thus, we proposed RDF2NL which can generate texts in other languages than English by relying on different language versions of SimpleNLG.

What is RDF2NL?

While the exciting avenue of using deep learning techniques in NLG approaches (Gatt and Krahmer, 2017) is open to this task and deep learning has already shown promising results for RDF data (Sleimi and Gardent, 2016), the morphological richness of some languages led us to develop a rule-based approach. This was to ensure that we could identify the challenges imposed by each language from the SW perspective before applying Machine Learning (ML) algorithms. RDF2NL is able to generate either a single sentence or a summary of a given resource. RDF2NL is based on Ngonga Ngomo et.al LD2NL and it also uses the Brazilian, Spanish, French, German and Italian adaptations of SimpleNLG to the realization task.

An example of RDF2NL application:

We envisioned a promising application by using RDF2PT which aims to support the automatic creation of benchmarking datasets to Named Entity Recognition (NER) and Entity Linking (EL) tasks. In Brazilian Portuguese, there is a lack of gold standards datasets for these tasks, which makes the investigation of these problems difficult for the scientific community. Our aim was to create Brazilian Portuguese silver standard datasets which are able to be uploaded into GERBIL for easy evaluation. To this end, we implemented RDF2PT ( Portuguese version of RDF2NL) in BENGAL , which is an approach for automatically generating NER benchmarks based on RDF triples and Knowledge Graphs. This application has already resulted in promising datasets which we have used to investigate the capability of multilingual entity linking systems for recognizing and disambiguating entities in Brazilian Portuguese texts. Some results you can find below:
NER – http://gerbil.aksw.org/gerbil/experiment?id=201801050043
NED – http://gerbil.aksw.org/gerbil/experiment?id=201801110012

More application scenarios

  • Summarize or Explain KBs to non-experts
  • Create news automatically (automated journalism)
  • Summarize medical records
  • Generate technical manuals
  • Support the training of other NLP tasks
  • Generate product descriptions (Ebay)

Deep Learning into RDF2NL

After devising our rule-based approach, we realized that RD2NL is really good by selecting adequate content from the RDF triples, but the fluency of its generated texts remains a challenge. Therefore, we decided to move forward and work with neural network models to improve the fluency of texts as they have already shown promising results in the generation of translations. Thus, we focused on the generation of referring expressions, which is an essential part while generating texts, it basically decides how the NLG model will present the information about a given entity. For example, the referring expressions of the entity Barack Obama can be “the former president of USA”, “Obama”, “Barack”, “He” and so on. Afterward, we have been working on combining different NLG sub-tasks into single neural models for improving the fluency of our texts.

GSoC on it – Stay tuned!  

Apart from trying to improve the fluency of our models, we relied previously on different language versions of SimpleNLG to the realization task. Nowadays, we have been investigating the generation of multiple languages by using a unique neural model. Our student has been working hard to provide nice results and we are basically at the end of our GSoC project. So stay tuned to know the outcome of this exciting project.

Many thanks to Diego for his contribution. If you want to write a guest post, share your results on the DBpedia Blog, and thus give your work more visibility and outreach, just ping us via dbpedia@infai.org.

Yours

DBpedia Association

timbr – the DBpedia SQL Semantic Knowledge Platform

With timbr, WPSemantix and the DBpedia Association launch the first SQL Semantic Knowledge Graph that integrates Wikipedia and Wikidata Knowledge into SQL engines.

In part three of DBpedia’s growth hack blog series, we feature timbr, the latest development at DBpedia in collaboration with WPSemantix. Read on to find out how it works.

timbr – DBpedia SQL Semantic Knowledge Platform

Tel Aviv, Israel and Leipzig, Germany – July 18, 2019 – WP-Semantix (WPS) – the “SQL Knowledge Graph Company™” and DBpedia Association – Institut für Angewandte Informatik e.V., announced today the launch of the timbr-DBpedia SQL Semantic Knowledge Platform, a unique version of WPS’ timbr SQL Semantic Knowledge Graph that integrates timbr-DBpedia ontology, timbr’s ontology explorer/visualizer and timbr’s SQL query service, to provide for the first time semantic access to DBpedia knowledge in SQL and to thus facilitate DBpedia knowledge integration into standard data warehouses and data lakes.

DBpedia

DBpedia is the crowd-sourced community effort to extract structured content from the information created in various Wikimedia projects and publish these as files on the Databus and via online databases. This structured information resembles an open knowledge graph which has been available for everyone on the Web for over a decade. Knowledge graphs are a new kind of databases developed to store knowledge in a machine-readable form, organized as connected, relationship-rich data. After the publication of DBpedia (in parallel to Freebase) 12 years ago, knowledge graphs have become very successful and Google uses a similar approach to create the knowledge cards displayed in search results.

Query the world’s knowledge in standard SQL

Amit Weitzner, founder and CEO at WPS commented: “Knowledge graphs use specialized languages, require resource-intensive, dedicated infrastructure and require costly ETL operations. That is, they did until timbr came along. timbr employs SQL – the most widely known database language, to eliminate the technological barriers to entry for using knowledge graphs and to implement Semantic Web principles to provide knowledge graph functionality in SQL. timbr enables modelling of data as connected, context-enriched concepts with inference and graph traversal capabilities while being queryable in standard SQL, to represent knowledge in data warehouses and data lakes. timbr-DBpedia is our first vertical application and we are very excited by the prospects of our cooperation with the DBpedia team to enable the largest user base to query the world’s knowledge in standard SQL.”

Sebastian Hellmann, executive director of the DBpedia Association commented that:

“our vision of the DBpedia Databus – transforming Linked Data into a networked data economy, is becoming a reality thanks to tools such as timbr-DBpedia which take full advantage of our unique data sets and data architecture. We look forward to working with WPS to also enable access to new data sets as they become available .”

timbr will help to explore the power of semantic technologies

Prof. James Hendler, pioneer and a world-leading authority in Semantic Web technologies and WPS’ advisory board member commented “timbr can be a game-changing solution by enabling the semantic inference capabilities needed in many modelling applications to be done in SQL. This approach will enable many users to get the advantages of semantic AI technologies and data integration without the learning curve of many current systems. By giving more people access to the semantic version of Wikipedia, timbr-DBpedia will definitely contribute to allowing the majority of the market to explore the power of semantic technologies.”

timbr-DBpedia is available as a query service or licensed for use as SaaS or on-premises. See the DBpedia website: wiki.dbpedia.org/timbr.

About WPSemantix

WP-Semantix Ltd. (wpsemantix.com) is the developer of the timbr SQL semantic knowledge platform, a dynamic abstraction layer over relational and non-relational data, facilitating declaration and powerful exploration of semantically rich ontologies using a standard SQL query interface. timbr is natively accessible in Apache Spark, Python, R and SQL to empower data scientists to perform complex analytics and generate sophisticated ML algorithms.  Its JDBC interface provides seamless integration with the most popular business intelligence solutions to make complex analytics accessible to analysts and domain experts across the organization.

WP-Semantix, timbr, “SQL Knowledge Graph”, “SQL Semantic Knowledge Graph” and associated marks and trademarks are registered trademarks of WP Semantix Ltd.

DBpedia is looking forward to this cooperation. Follow us on Twitter for the latest information and stay tuned for part four of our growth hack series. The next post features the GlobalFactSyncRe. Curious? You have to be a little more patient and wait till Thursday, July 25th.

Yours DBpedia Association