Category Archives: DBpedia Databus

A year with DBpedia – Retrospective Part Two

Retrospective Part II. Welcome to the second part of our journey around the world with DBpedia. This time we are taking you to Greece, Germany, to Australia and finally France.

Let the travels begin.

Welcome to Thessaloniki, Greece & ESWC

DBpedians from the Portuguese Chapter presented their research results during ESWC 2018 in Thessaloniki, Greece.  the team around Diego Moussalem developed a demo to extend MAG  to support Entity Linking in 40 different languages. A special focus was put on low-resources languages such as Ukrainian, Greek, Hungarian, Croatian, Portuguese, Japanese and Korean. The demo relies on online web services which allow for an easy access to (their) entity linking approaches. Furthermore, it can disambiguate against DBpedia and Wikidata. Currently, MAG is used in diverse projects and has been used largely by the Semantic Web community. Check the demo via http://bit.ly/2RWgQ2M. Further information about the development can be found in a research paper, available here

 

Welcome back to Leipzig Germany

With our new credo “connecting data is about linking people and organizations”, halfway through 2018, we finalized our concept of the DBpedia Databus. This global DBpedia platform aims at sharing the efforts of OKG governance, collaboration, and curation to maximize societal value and develop a linked data economy.

With this new strategy, we wanted to meet some DBpedia enthusiasts of the German DBpedia Community. Fortunately, the LSWT (Leipzig Semantic Web Tag) 2018 hosted in Leipzig, home to the DBpedia Association proofed to be the right opportunity.  It was the perfect platform to exchange with researchers, industry and other organizations about current developments and future application of the DBpedia Databus. Apart from hosting a hands-on DBpedia workshop for newbies we also organized a well-received WebID -Tutorial. Finally,  the event gave us the opportunity to position the new DBpedia Databus as a global open knowledge network that aims at providing unified and global access to knowledge (graphs).

Welcome down under – Melbourne Australia

Further research results that rely on DBpedia were presented during ACL2018, in Melbourne, Australia, July 15th to 20th, 2018. The core of the research was DBpedia data, based on the WebNLG corpus, a challenge where participants automatically converted non-linguistic data from the Semantic Web into a textual format. Later on, the data was used to train a neural network model for generating referring expressions of a given entity. For example, if Jane Doe is a person’s official name, the referring expression of that person would be “Jane”, “Ms Doe”, “J. Doe”, or  “the blonde woman from the USA” etc.

If you want to dig deeper but missed ACL this year, the paper is available here.

 

Welcome to Lyon, France

In July the DBpedia Association travelled to France. With the organizational support of Thomas Riechert (HTWK, InfAI) and Inria, we finally met the French DBpedia Community in person and presented the DBpedia Databus. Additionally, we got to meet the French DBpedia Chapter, researchers and developers around Oscar Rodríguez Rocha and Catherine Faron Zucker.  They presented current research results revolving around an approach to automate the generation of educational quizzes from DBpedia. They wanted to provide a useful tool to be applied in the French educational system, that:

  • helps to test and evaluate the knowledge acquired by learners and…
  • supports lifelong learning on various topics or subjects. 

The French DBpedia team followed a 4-step approach:

  1. Quizzes are first formalized with Semantic Web standards: questions are represented as SPARQL queries and answers as RDF graphs.
  2. Natural language questions, answers and distractors are generated from this formalization.
  3. We defined different strategies to extract multiple choice questions, correct answers and distractors from DBpedia.
  4. We defined a measure of the information content of the elements of an ontology, and of the set of questions contained in a quiz.

Oscar R. Rocha and Catherine F. Zucker also published a paper explaining the detailed approach to automatically generate quizzes from DBpedia according to official French educational standards. 

 

 

Thank you to all DBpedia enthusiasts that we met during our journey. A big thanks to

With this journey from Europe to Australia and back we provided you with insights into research based on DBpedia as well as a glimpse into the French DBpedia Chapter. In our final part of the journey coming up next week, we will take you to Vienna,  San Francisco and London.  In the meantime, stay tuned and visit our Twitter channel or subscribe to our DBpedia Newsletter.

 

Have a great week.

Yours DBpedia Association

The Release Circle – A Glimpse behind the Scenes

As you already know, with the new DBpedia strategy our mode of publishing releases changed.  The new DBpedia release process follows a three-step approach starting from the Extraction to ID-Management towards the Fusion, which finalizes the release process.  Our DBpedia releases are currently published on a monthly basis. In this post, we give you insight into the single steps of the release process and into what our developers actually do when preparing a DBpedia release.

Extraction  – Step one of the Release

The good news is, our new release mode is taking shape and noticeable picked up speed. Finally the 2018-08 and, additionally the 2018.09.12 and the 2018.10.16 Releases are now available in our LTS repository.

The 2018-08 Release was generated on the basis of the Wikipedia datasets extracted in early August and currently comprises 136 languages. The extraction release contains the raw extracted data generated by the DBpedia extraction-framework. The post-processing steps, such as data-deduplication or URI-normalization are omitted and moved to later parts of the release process. Thus, we can provide direct, transparent access to the generated data in every step. Until we manage two releases per month, our data is mostly based on the second Wikipedia datasets of the previous month. In line with that, the 2018.09.12 release is based on late August data and the recent 2018.10.16 Release is based on Wikipedia datasets extracted on September 20th. They all comprise 136 languages and contain a stable list of datasets since the 2018-08 release.

Our releases are now ready for parsing and external use. Additionally, there will be a new Wikidata-based release this week.

ID-Management – Step two of the Release

For a complete “new DBpedia” release the DBpedia ID-Management and Fusion of the data have to be added to the process. The Databus ID Management is a process to unify various different IRIs identifying the same entities coined from different data providers. Taking datasets with overlapping domains of interest from multiple data providers, the set of IRIs denoting the entities in the source datasets are determined heuristically (e.g. excluding RDF/OWL types/classes).

Afterwards, these selected IRIs a numeric primary key, the ‘Singleton ID’. The core of the ID Management process happens in the next step: Based on the large set of owl:sameAs assertions in the input data with high confidence, the connected components induced from the corresponding sameAs-graph is computed. In other words: The groups of all entities from the input datasets (transitively) reachable from one to another are determined. We dubbed these groups the sameAs-clusters. For each sameAs-cluster we pick one member as representant, which determines the ‘Cluster ID’ or ‘Global Identifier’ for all cluster members.

Apart from being an essential preparatory step for the Fusion, these Global Identifiers serve purpose in their own right as unified Linked Data identifiers for groups of Linked Data entities that should be viewed as equivalent or ‘the same thing’.

A processing workflow based on Apache Spark to perform the process described on above for large quantities of RDF input data is already in place and has been run successfully for a large set of DBpedia inputs consisting of:

 

Fusion – Step three of the Release

Based on the extraction and the ID-Management, the Data Fusion finalizes the last step of the  DBpedia release cycle. With the goal of improving data quality and data coverage, the process uses the DBpedia global IRI clusters to fuse and enrich the source datasets. The fused data contains all resource of the input datasets. The fusion process is based on a functional property decision to decide the number of selected values ( owl:FunctionalProperty determination ). Further, the value selection for this functional properties is based on a preference dependent on the originated source dataset. For example, preferred values for En-DBpedia over DE-DBpedia.

The enrichment improves entity-properties and -values coverage for resources only contained in the source data. Furthermore, we create provenance data to keep track of the origin of each triple. This provenance data is also used for the http-based http://global.dbpedia.org resource view.

At the moment the fused and enriched data is available for the generic, and mapping-based extractions. More datasets are still in progress.  The DBpedia-fusion data is uploading to http://downloads.dbpedia.org/repo/dev/fusion/

Please note we are still in the midst of the beta testing for our data release tool, so in case you do come across any errors, reporting them to us is much appreciated to fuel the testing process.

Further information regarding the releases progress can be found here: http://dev.dbpedia.org/

Next steps

We will add more releases to the repository on a monthly basis aiming for a bi-weekly release mode as soon as possible. In between the intervals, any mistakes or errors you find and report in this data can be fixed for the upcoming release. Currently, the generated metadata in the DataID-file is not stable. This will fluctuate, still needs to be improved and will change in the near future.  We are now working on the next release and will inform you as soon as it is published.

Yours DBpedia Association

This blog post was written with the help of our DBpedia developers Robert Bielinski, Markus Ackermann and Marvin Hofer who were responsible for the work done with respect to the DBpedia releases. We like to thank them for their great work. 

 

Grüezi Community!

More than 110 DBpedia enthusiasts joined the Community Meeting in Vienna.

After the success of the last two community meetings in Amsterdam and Leipzig, we thought it is time to meet you at the SEMANTiCS conference again. This year’s SEMANTiCS opened with the DBpedia Day on September 10th, 2018 in Vienna.

First and foremost, we would like to thank the Institute for Applied Informatics for supporting our community and many thanks to the Technical University Vienna and the SEMANTiCS for hosting our community meeting.

Opening Session
Javier Fernández

Javier David Fernández García, Vienna University of Economics, opened the meeting with his keynote Linked Open Data cloud – act now before it’s too late. He reflected on challenges towards arriving at a truly machine-readable and decentralized Web of Data. Javier reviewed the current state of affairs, highlighted key technical and non-technical challenges, and outlined potential solution strategies.

The second keynote speaker was Mathieu d’Aquin, Professor of Informatics at the Insight Centre for Data Analytics at NUI Galway. Mathieu, who is specialized in data analytics, completed the meeting with his keynote Dealing with Open Domain Data.

Mathieu d’Aquin

 

Showcase Session
Beyza Yaman

Patrik Schneider started the DBpedia Showcase Session with his presentation of the “NII (Japan) Research Showcase – A Knowledge Graph Management Framework for DBpedia”. Shortly after, Jan Forberg, from AKSW/KILT Leipzig, promoted the usage of WebIDs in a short how-to tutorial session. Adam Sanchez, from University Grenoble Alpes, talked about RDFization of a relational database from medicine domain by using Ontop. Followed by another presentation by Beyza Yaman, University of Genoa, talking about Exploiting Context-Dependent Quality Metadata for Linked Data Source Selection. Afterwards, Robert Bielinski, from AKSW/KILT Leipzig, introduced the new DBpedia release circle by using Apache Spark. Closing the Showcase Session, Tomas Kliegr, University of Economics Prague, presented a showcase using DBpedia to study cognitive biases affecting interpretation of machine learning results.

 

For further details of the presentations follow the links to the slides.

  • WebID Creation by Jan Forberg, AKSW/KILT slides
  • RDFization by Adam Sanchez, Université Grenoble Alpes slides
  • Exploiting Context-Dependent Quality Metadata by Beyza Yaman, University of Genoa slides
  • Extracting Data using Apache Spark by Robert Bielinski, AKSW/KILT slides
  • Using DBpedia to study cognitive biases affecting interpretation of machine learning results by Tomas Kliegr, University of Economics Prague slides
Parallel Session
Gary Munnelly

As a regular part of the DBpedia Community Meeting, we had two parallel sessions in the afternoon where DBpedians can discuss technical issues. Participants interested in NLP-related topics joined the NLP & DBpedia session. Milan Dojchinovski (AKSW/KILT) chaired this session with four very stimulating talks. Hereafter you will find all presentations given during this session:

 

Diego Moussallem

At the same time, the DBpedia Association Hour provided a platform for the community to discuss technical questions and especially the DBpedia databus. Sebastian Hellmann presented the DBpedia databus and explained the advantages of global IDs. Shortly after, Marvin Hofer (AKSW/KILT) demonstrated the new DBpedia global ID webinterface. Please find his slides here.

 

Afternoon Track
Enno Meijers

The 12th edition of the DBpedia Community Meeting also covered a special chapter session, chaired by Enno Meijers, from the Dutch DBpedia Language Chapter. The speakers presented the latest technical or organizational developments of their respective chapter.

Following, you find a list of all presentations of this session:

 

This session has mainly created an exchange platform for the different DBpedia chapters. For the first time, representatives of the European chapters discussed problems and challenges of DBpedia from their point of view. Furthermore, tools, applications and projects were presented by each chapter.

Jens Grivolla

Summing up, the 12th DBpedia Community Meeting brought together more than 110 DBpedia enthusiasts from Europe who engaged in vital discussions about Linked Data, the DBpedia databus as well as DBpedia use cases and services.

 

In case you missed the event, all slides and presentations are also available on our Website. Further insights, feedback and photos about the event are available on Twitter via #DBpediaDay.

We are now looking forward to more DBpedia meetings in the next years. So, stay tuned and check Twitter, Facebook and the Website or subscribe to our Newsletter for latest news and updates.

Yours

DBpedia Association

Beta-tests for the DBpedia Databus commence

Finally, we are proud to announce that the beta-testing of our data release tool for data releases on the DBpedia Databus is about to start.

In the past weeks, our developers at DBpedia  have been developing a new data release tool to release datasets on the DBpedia Databus. In that context, we are still looking for beta-testers who have a dataset they wish to release.  Sign up here and benefit from an increased visibility for your dataset and your work done.

We are now preparing the first internal test with our own dataset to ensure the data release tool is ready for the testers. During the testing process, beta-testers will discuss occurring problems, challenges and ideas for improvement via the DBpedia #releases channel on Slack to profit from each other’s knowledge and skills. Issues are documented via GitHub.

The whole testing process for the data release tool follows a 4-milestones plan:

Milestone One: Every tester needs to have a WebID to release data on the DBpedia Databus. In case you are interested in how to set up a WebID, our tutorial will help you a great deal.

Milestone Two: For their datasets, testers will generate DataIDs, that provide detailed descriptions of the datasets and their different manifestations as well as relations to agents like persons or organizations, in regard to their rights and responsibilities.

Milestone Three: This milestone is considered as achieved if an RSS feed feature can be generated. Additionally, bugs, that arose during the previous phases should have been fixed. We also want to collect the testers particular demands and wishes that would benefit the tool or the process. A second release can be attempted to check how integrated fixes and changes work out.

Milestone Four: This milestone marks the final upload of the dataset to the DBpedia Databus which is hopefully possible in about 3 weeks.

For updates on the beta-test, follow this link.

Looking forward to working with you…

Yours,

DBpedia Association

 

PS: In case you want to get one of the last spots in the beta-testing team, just sign up here and get yourself a WebID and start testing.