Due to current circumstances, the SEMANTiCS Onsite Conference 2020 had, unfortunately, to be postponed till September 2021. To bridge the gap until 2021, DBpedia, PLDN and EuroSDR will organize a SEMANTiCS satellite event online, on October 6, 2020. We set up an exciting themed program around ‘Knowledge Graphs in Action: DBpedia, Linked Geodata and Geo-information Integration’.
This new event is a combination of two already existing ones: the DBpedia Community Meeting, which is regularly held as part of the SEMANTiCS, and the annual Spatial Linked Data conference organised by EuroSDR and the Platform Linked Data Nederland. We fused both together and as a bonus, we added a track about Geo-information Integration hosted by EuroSDR. For the joint opening session, we recruited four amazing keynote speakers to kick the event off.
Highlights of the Knowledge Graph in Action event
– Hackathon (starts 2 weeks earlier)
– Keynote by Carsten Hoyer-Click, German Aerospace Center
– Keynote by Marinos Kavouras, National Technical University of Athens
– Benedicte Bucher, University Gustave Eiffel, IGN, EuroSDR
– Erwin Folmer, Kadaster, University of Twente, Platform Linked Data Netherlands
– Rob Lemmens, University of Twente
– Sebastian Hellmann, AKSW/KILT, DBpedia Association
– Julia Holze, DBpedia Association
Don’t think twice and register now! Join the Knowledge Graph in Action event on October 6, 2020 to catch up with the latest research results and developments in the Semantic Web Community. Register here and meet us and other SEMANTiCS enthusiasts.
The following post introduces to you ImageSnippets and how this tool profits from the use of DBpedia.
ImageSnippets – A Tool for Image Curation
For over two decades,ImageSnippetshas been evolving as an ontology and data-driven framework for image annotation research. Representing the informal knowledge people have about the context and provenance of images as RDF/linked data is challenging, but it has also been an enlightening and engaging journey in not only applying formal semantic web theory to building image graphs but also to weave together our interests with what others have been doing in the field of semantic annotation and knowledge graph building over these many years.
DBpedia provides the entities for our RDF descriptions
Since the beginning, we have always made use of DBpedia and other publicly available datasets to provide the entities for use in our RDF descriptions. Though ImageSnippets can be used to build special vocabularies around niche domains, our primary research is around relation ontology building and we prefer to avoid the creation of new entities unless we absolutely can not find them through any other service.
When we first went live with our basic system in 2013, we began hand-building tens of thousands of triples using terms primarily from DBpedia (the core of the linked data cloud.) While there would often be an overlap of terms with other datasets – almost a case of too many choices – we formed a best practice of preferentially using DBpedia terms as often as possible, because they gave us the most utility for reasoning using the SKOS concepts built into the DBpedia service. We have also made extensive use of DBpedia Spotlight for named-entity extraction.
How to combine DBpedia & Wikidata and make it useful for ImageSnippets
But the addition of the Wikidata Query Service over the past 18 months or so has now given us an even more unique challenge: how to work with both! Since DBpedia and Wikidata both have class relationships that we can reason from, we found ourselves in a position to be able to examine both DBpedia and Wikidata in concert with each other through the use of mapping techniques between the two datasets.
How it works: ImageSnippets & DBpedia
When an image is saved, we build inference graphs over results from both DBpedia and Wikidata. These graphs can be revealed with simple SPARQL queries at our endpoint and queries from subclasses, taxons and SKOS concepts can find image results in our custom search tool. We have also just recently added a pathfinder utility – highly useful for semantic explainability as it will return the precise path of connections from an originating source entity to the target entity that was used in our custom image search.
Sometimes a query will produce very unintuitive results, and the pathfinder tool enables us to quickly locate semantic errors which lead to clearly erroneous misclassifications (for example, a search for the Wikidata subclass of ‘communication medium’ reveals images of restaurants and hotels because of misclassifications in Wikidata.) In this way we can quickly troubleshoot the results of queries, using the images as visual cues to explore the accuracy of the semantic modelling in both datasets.
We are very excited with the new directions that we feel can come of our knitting together of the two knowledge graphs through the use of our visual interface and believe there is a great potential for ImageSnippets to serve a more complex role in cleaning and aligning the two datasets, using the images as our guides.
A big thank you to Margaret Warren for providing some insights into her work at ImageSnippets.
Today’s post features an interview with our DBpedia Day keynote speaker Katja Hose, a Professor of Computer Science at Aalborg University, Denmark. In this Interview, Katja talks about increasing the reliability of Knowledge Graph Access as well as her expectations for SEMANTiCS 2019.
Prior to joining Aalborg University, Katja was a postdoc at the Max Planck Institute for Informatics in Saarbrücken. She received her doctoral degree in Computer Science from Ilmenau University of Technology in Germany.
Can you tell us something about your research focus?
The most important focus of my research has been querying the Web of Data, in particular, efficient query processing over distributed knowledge graphs and Linked Data. This includes indexing, source selection, and efficient query execution. Unfortunately, it happens all too often that the services needed to access remote knowledge graphs are temporarily not available, for instance, because a software component crashed. Hence, we are currently developing a decentralized architecture for knowledge sharing that will make access to knowledge graphs a reliable service, which I believe is the key to a wider acceptance and usage of this technology.
How do you personally contribute to the advancement of semantic technologies?
I contribute by doing research, advancing the state of the art, and applying semantic technologies to practical use cases. The most important achievements so far have been our works on indexing and federated query processing, and we have only recently published our first work on a decentralized architecture for sharing and querying semantic data. I have also been using semantic technologies in other contexts, such as data warehousing, fact-checking, sustainability assessment, and rule mining over knowledge bases.
Overall, I believe the greatest ideas and advancements come when trying to apply semantic technologies to real-world use cases and problems, and that is what I will keep on doing.
Which trends and challenges do you see for linked data and the semantic web?
The goal and the idea behind Linked Data and the Semantic Web is the second-best invention after the Internet. But unlike the Internet, Linked Data and the Semantic Web are only slowly being adopted by a broader community and by industry.
I think part of the reason is that from a company’s point of view, there are not many incentives and added benefit of broadly sharing the achievements. Some companies are simply reluctant to openly share their results and experiences in the hope of retaining an advantage over their competitors. I believe that if these success stories were shared more openly, and this is the trend we are witnessing right now, more companies will see the potential for their own problems and find new exciting use cases.
Another particular challenge, which we will have to overcome, is that it is currently still far too difficult to obtain and maintain an overview of what data is available and formulate a query as a non-expert in SPARQL and the particular domain… and of course, there is the challenge that accessing these datasets is not always reliable.
As artificial intelligence becomes more and more important, what is your vision of AI?
AI and machine learning are indeed becoming more and more important. I do believe that these technologies will bring us a huge step ahead. The process has already begun. But we also need to be aware that we are currently in the middle of a big hype where everybody wants to use AI and machine learning – although many people actually do not truly understand what it is and if it is actually the best solution to their problems. It reminds me a bit of the old saying “if the only tool you have is a hammer, then every problem looks like a nail”. Only time will tell us which problems truly require machine learning, and I am very curious to find out which solutions will prevail.
However, the current state of the art is still very far away from the AI systems that we all know from Science Fiction. Existing systems operate like black boxes on well-defined problems and lack true intelligence and understanding of the meaning of the data. I believe that the key to making these systems trustworthy and truly intelligent will be their ability to explain their decisions and their interpretation of the data in a transparent way.
What are your expectations about Semantics 2019 in Karlsruhe?
First and foremost, I am looking forward to meeting a broad range of people interested in semantic technologies. In particular, I would like to get in touch with industry-based research and to be exposed
We like to thank Katje Hose for her insights and are happy to have her as one of our keynote speakers.
During the DBpedia Day in Leipzig, I gave a talk about how to use the facts contained in the DBpedia Knowledge Graph for generating coherent sentences and texts.
We essentially rely on Natural Language Generation (NLG) techniques for accomplishing this task. NLG is the process of generating coherent natural language text from non-linguistic data (Reiter and Dale, 2000). Despite community agreement on the actual text and speech output of these systems, there is far less consensus on what the input should be (Gatt and Krahmer, 2017). A large number of inputs have been taken for NLG systems, including images (Xu et al., 2015), numeric data (Gkatzia et al., 2014), semantic representations (Theune et al., 2001).
Why not generate text from Knowledge graphs?
The generation of natural language from the Semantic Web has been already introduced some years ago (Ngonga Ngomo et al., 2013; Bouayad-Agha et al., 2014; Staykova, 2014). However, it has gained recently substantial attention and some challenges have been proposed to investigate the quality of automatically generated texts from RDF (Colin et al., 2016). Moreover, RDF has demonstrated a promising ability to support the creation of NLG benchmarks (Gardent et al., 2017). Still, English is the only language which has been widely targeted. Thus, we proposed RDF2NL which can generate texts in other languages than English by relying on different language versions of SimpleNLG.
What is RDF2NL?
While the exciting avenue of using deep learning techniques in NLG approaches (Gatt and Krahmer, 2017) is open to this task and deep learning has already shown promising results for RDF data (Sleimi and Gardent, 2016), the morphological richness of some languages led us to develop a rule-based approach. This was to ensure that we could identify the challenges imposed by each language from the SW perspective before applying Machine Learning (ML) algorithms. RDF2NL is able to generate either a single sentence or a summary of a given resource. RDF2NL is based on Ngonga Ngomo et.al LD2NL and it also uses the Brazilian, Spanish, French, German and Italian adaptations of SimpleNLG to the realization task.
An example of RDF2NL application:
We envisioned a promising application by using RDF2PT which aims to support the automatic creation of benchmarking datasets to Named Entity Recognition (NER) and Entity Linking (EL) tasks. In Brazilian Portuguese, there is a lack of gold standards datasets for these tasks, which makes the investigation of these problems difficult for the scientific community. Our aim was to create Brazilian Portuguese silver standard datasets which are able to be uploaded into GERBIL for easy evaluation. To this end, we implemented RDF2PT ( Portuguese version of RDF2NL) in BENGAL , which is an approach for automatically generating NER benchmarks based on RDF triples and Knowledge Graphs. This application has already resulted in promising datasets which we have used to investigate the capability of multilingual entity linking systems for recognizing and disambiguating entities in Brazilian Portuguese texts. Some results you can find below: NER – http://gerbil.aksw.org/gerbil/experiment?id=201801050043 NED – http://gerbil.aksw.org/gerbil/experiment?id=201801110012
More application scenarios
Summarize or Explain KBs to non-experts
Create news automatically (automated journalism)
Summarize medical records
Generate technical manuals
Support the training of other NLP tasks
Generate product descriptions (Ebay)
Deep Learning into RDF2NL
After devising our rule-based approach, we realized that RD2NL is really good by selecting adequate content from the RDF triples, but the fluency of its generated texts remains a challenge. Therefore, we decided to move forward and work with neural network models to improve the fluency of texts as they have already shown promising results in the generation of translations. Thus, we focused on the generation of referring expressions, which is an essential part while generating texts, it basically decides how the NLG model will present the information about a given entity. For example, the referring expressions of the entity Barack Obama can be “the former president of USA”, “Obama”, “Barack”, “He” and so on. Afterward, we have been working on combining different NLG sub-tasks into single neural models for improving the fluency of our texts.
GSoC on it – Stay tuned!
Apart from trying to improve the fluency of our models, we relied previously on different language versions of SimpleNLG to the realization task. Nowadays, we have been investigating the generation of multiple languages by using a unique neural model. Our student has been working hard to provide nice results and we are basically at the end of our GSoC project. So stay tuned to know the outcome of this exciting project.
Many thanks to Diego for his contribution. If you want to write a guest post, share your results on the DBpedia Blog, and thus give your work more visibility and outreach, just ping us via email@example.com.
With timbr, WPSemantix and the DBpedia Association launch the first SQL Semantic Knowledge Graph that integrates Wikipedia and Wikidata Knowledge into SQL engines.
In part three of DBpedia’s growth hack blog series, we feature timbr, the latest development at DBpedia in collaboration with WPSemantix. Read on to find out how it works.
timbr – DBpedia SQL Semantic Knowledge Platform
Tel Aviv, Israel and Leipzig, Germany – July 18, 2019 – WP-Semantix (WPS) – the “SQL Knowledge Graph Company™” and DBpedia Association – Institut für Angewandte Informatik e.V., announced today the launch of the timbr-DBpedia SQL Semantic Knowledge Platform, a unique version of WPS’ timbr SQL Semantic Knowledge Graph that integrates timbr-DBpedia ontology, timbr’s ontology explorer/visualizer and timbr’s SQL query service, to provide for the first time semantic access to DBpedia knowledge in SQL and to thus facilitate DBpedia knowledge integration into standard data warehouses and data lakes.
DBpedia is the crowd-sourced community effort to extract structured content from the information created in various Wikimedia projects and publish these as files on the Databus and via online databases. This structured information resembles an open knowledge graph which has been available for everyone on the Web for over a decade. Knowledge graphs are a new kind of databases developed to store knowledge in a machine-readable form, organized as connected, relationship-rich data. After the publication of DBpedia (in parallel to Freebase) 12 years ago, knowledge graphs have become very successful and Google uses a similar approach to create the knowledge cards displayed in search results.
Query the world’s knowledge in standard SQL
Amit Weitzner, founder and CEO at WPS commented: “Knowledge graphs use specialized languages, require resource-intensive, dedicated infrastructure and require costly ETL operations. That is, they did until timbr came along. timbr employs SQL – the most widely known database language, to eliminate the technological barriers to entry for using knowledge graphs and to implement Semantic Web principles to provide knowledge graph functionality in SQL. timbr enables modelling of data as connected, context-enriched concepts with inference and graph traversal capabilities while being queryable in standard SQL, to represent knowledge in data warehouses and data lakes. timbr-DBpedia is our first vertical application and we are very excited by the prospects of our cooperation with the DBpedia team to enable the largest user base to query the world’s knowledge in standard SQL.”
Sebastian Hellmann, executive director of the DBpedia Association commented that:
timbr will help to explore the power of semantic technologies
Prof. James Hendler, pioneer and a world-leading authority in Semantic Web technologies and WPS’ advisory board member commented “timbr can be a game-changing solution by enabling the semantic inference capabilities needed in many modelling applications to be done in SQL. This approach will enable many users to get the advantages of semantic AI technologies and data integration without the learning curve of many current systems. By giving more people access to the semantic version of Wikipedia, timbr-DBpedia will definitely contribute to allowing the majority of the market to explore the power of semantic technologies.”
timbr-DBpedia is available as a query service or licensed for use as SaaS or on-premises. See the DBpedia website: wiki.dbpedia.org/timbr.
WP-Semantix Ltd. (wpsemantix.com) is the developer of the timbr SQL semantic knowledge platform, a dynamic abstraction layer over relational and non-relational data, facilitating declaration and powerful exploration of semantically rich ontologies using a standard SQL query interface. timbr is natively accessible in Apache Spark, Python, R and SQL to empower data scientists to perform complex analytics and generate sophisticated ML algorithms. Its JDBC interface provides seamless integration with the most popular business intelligence solutions to make complex analytics accessible to analysts and domain experts across the organization.
WP-Semantix, timbr, “SQL Knowledge Graph”, “SQL Semantic Knowledge Graph” and associated marks and trademarks are registered trademarks of WP Semantix Ltd.
DBpedia is looking forward to this cooperation. Follow us on Twitter for the latest information and stay tuned for part four of our growth hack series. The next post features the GlobalFactSyncRe. Curious? You have to be a little more patient and wait till Thursday, July 25th.
One lightning event after the other. Just four weeks after our Amsterdam Community Meeting, we crossed the Atlantic for the third time to meet with over 110 US-based DBpedia enthusiasts. This time, the DBpedia Community met in Cupertino, California and was hosted at Apple Inc.
First and foremost, we would like to thank Apple for the warm welcome and the hosting of the event.
After a Meet & Greet with refreshments, Taylor Rhyne, Eng. Product Manager at Apple, and Pablo N. Mendes, Researcher at Apple and chair of the DBpedia Community Committee, opened the main event with a short introduction setting the tone for the following 2 hours.
The main event attracted attendees with eleven invited talks from major companies of the Bay Area actively using DBpedia or interested in knowledge graphs in general such as Diffbot, IBM, Wikimedia, NTENT, Nuance, Volley and Stardog Union.
Tommaso Soru (University of Leipzig), DBpedia mentor in our Google Summer of Code (GSoC) projects, opened the invited talks session with the updates from the DBpedia developer community. This year, DBpedia participated in the GSoC 2017 program with 7 different projects including “First Chatbot for DBpedia”, which was selected as Best DBpedia GSoC Project 2017. His presentation is available here.
DBpedia likes to thank the following poeple for organizinga nd hosting our Community Meeting in Cupertino, California.
For continuous hosting of the main DBpedia Endpoint
Invited Talks- A Short Recap
Filipe Mesquita (Diffbot) introduced the new DBpedia NLP Department, born from a recent partnership between our organization and the California based company, which aims at creating the most accurate and comprehensive database of human knowledge. His presentation is available here. Dan Gruhl (IBM Research) held a presentation about the in-house development of an omnilingual ontology and how DBpedia data supported this
endeavor. Stas Malyshev representative for Dario Taraborelli (both Wikimedia Foundation) presented the current state of the structured data initiatives at Wikidata and the query capabilities for Wikidata. Their slides are available here and here. Ricardo Baeza-Yates (NTENT) gave a short talk on mobile semantic search.
The second part of the event saw Peter F. Patel-Schneider (Nuance) holding a presentation with the title “DBpedia from the Fringe” giving some insights on how DBpedia could be further improved. Shortly after, Sebastian Hellmann, Executive Director of the DBpedia Association, walked the stage and presented the state of the art of the association, including achievements and future goals. Sanjay Krishnan (U.C. Berkeley) talked about the link between AlphaGo and data cleansing. You can find his slides here. Bill Andersen (Volley.com) argued for the use of extremely precise and fine-grained approaches to deal with small data. His presentation is available here. Finally, Michael Grove (Stardog Union) stressed on the view of knowledge graphs as knowledge toolkits backed by a graph data model.
The event concluded with refreshments, snacks and drinks served in the atrium allowing to talk about the presented topics, discuss the latest developments in the field of knowledge graphs and network between all participants. In the end, this closing session was way longer than had been planned.
GSoC Mentor Summit
Shortly after the CA Community Meeting, our DBpedia mentors Tommaso Soru and Magnus Knuth participated at the Google Summer of Code Mentor Summit held in Sunnyvale California. During free sessions hosted by mentors of diverse open source organizations, Tommaso and Magnus presented selected projects during their lightning talks. Beyond open source, open data topics have been targeted in multiple sessions, as this is not only relevant for research, but there is also a strong need in software projects. The meetings paved the way for new collaborations in various areas, e.g. the field of question answering over the DBpedia knowledge corpus, in particular the use of Neural SPARQL Machines for the translation of natural language into structured queries. We expect that this hot deep-learning topic will be featured in the next edition of GSoC projects. Overall, it has been a great experience to meet so many open source fellows from all over the world.
After the event is before another ….
Connected Data London, November 16th, 2017.
Sebastian Hellmann, executive director of the DBpedia Association will present Data Quality and Data Usage in a large-scale Multilingual Knowledge Graph during the content track at the Connected Data in London. He will also join the panelists in the late afternoon panel discussionabout Linked Open Data: Is it failing or just getting out of the blocks? Feel free to join the event and support DBpedia.
A message for all DBpedia enthusiasts – our next Community Meeting
Currently we are planning our next Community Meeting and would like to invite DBpedia enthusiasts and chapters who like to host a meeting to send us their ideas to firstname.lastname@example.org. The meeting is scheduled for the beginning of 2018. Any suggestions regarding place, time, program and topics are welcome!