Do you want to stay informed about upcoming DBpedia events, releases and technical developments? Through the DBpedia newsletter you get the possibility to be always up to date and to provide feedback to us.
Four times per year we will inform the DBpedia community about meetings, new collaborations and other topics related to DBpedia. So make sure to subscribe to our NEWSLETTER and do not miss any news.
DBpedia will participate for a fifth time in the Google Summer of Code program (GSoC) and now we are looking for students who will share their ideas with us. We are regularly growing our community through GSoC and can deliver more and more opportunities to you. We got excited with our new ideas, we hope you will get excited too!
What is GSoC?
Google Summer of Code is a global program focused on bringing more student developers into open source software development. Funds will given to students (BSc, MSc, PhD) to work for three months on a specific task. At first open source organizations announce their student projects and then students should contact the mentor organizations they want to work with and write up a project proposal for the summer. After a selection phase, students are matched with a specific project and a set of mentors to work on the project during the summer.
Sören Auer and the DBpedia Board members prepared a survey to assess the direction of the DBpedia Association. We would like to know what you think should be our priorities and how you would like the funds of the association to be used.
Your opinion counts – so please contribute actively in developing a better DBpedia. If you use DBpedia and want us to keep going forward, we kindly invite you to vote here: https://goo.gl/forms/rDqLcwL823Ok09Uw2
We will publish the results in anonymized, aggregated form on the DBpedia website.
As previous years, we would like your input for DBpedia related project ideas for GSoC 2017.
For those who are unfamiliar with GSoC (Google Summer of Code), Google pays students (BSc, MSc, PhD) to work for 3 months on an open source project. Open source organizations announce their student projects and students apply for projects they like. After a selection phase, students are matched with a specific project and a set of mentors to work on the project during the summer.
The DBpedia community and members from over 20 countries work hard to localize and internationalize DBpedia and support the extraction of non-English Wikipedia editions as well as build a data community around a certain language, region or special interest. The chapters are part of the DBpedia executives and have taken on responsibility to contribute to the infrastructure of DBpedia.
Other partners like imec/Ghent University and Institute of Sound and Vision have signed as well and became an executive partner of the DBpedia Association. The Vrije Universiteit will join soon. It is a cooperation between these Dutch organizations as well as the NL-DBpedia community.
Hereby we announce the release of DBpedia 2016-04. The new release is based on updated Wikipedia dumps dating from March/April 2016 featuring a significantly expanded base of information as well as richer and (hopefully) cleaner data based on the DBpedia ontology.
The English version of the DBpedia knowledge base currently describes 6.0M entities of which 4.6M have abstracts, 1.53M have geo coordinates and 1.6M depictions. In total, 5.2M resources are classified in a consistent ontology, consisting of 1.5M persons, 810K places (including 505K populated places), 490K works (including 135K music albums, 106K films and 20K video games), 275K organizations (including 67K companies and 53K educational institutions), 301K species and 5K diseases. The total number of resources in English DBpedia is 16.9M that, besides the 6.0M resources, includes 1.7M skos concepts (categories), 7.3M redirect pages, 260K disambiguation pages and 1.7M intermediate nodes.
Altogether the DBpedia 2016-04 release consists of 9.5 billion (2015-10: 8.8 billion) pieces of information (RDF triples) out of which 1.3 billion (2015-10: 1.1 billion) were extracted from the English edition of Wikipedia, 5.0 billion (2015-04: 4.4 billion) were extracted from other language editions and 3.2 billion (2015-10: 3.2 billion) from DBpedia Commons and Wikidata. In general, we observed a growth in mapping-based statements of about 2%.
Thorough statistics can be found on the DBpedia website and general information on the DBpedia datasets here.
The DBpedia community added new classes and properties to the DBpedia ontology via the mappings wiki. The DBpedia 2016-04 ontology encompasses:
The editor community of the mappings wiki also defined many new mappings from Wikipedia templates to DBpedia classes. For the DBpedia 2016-04 extraction, we used a total of 5800 template mappings (DBpedia 2015-10: 5553 mappings). For the second time the top language, gauged by the number of mappings, is Dutch (646 mappings), followed by the English community (604 mappings).
In addition to normalized datasets to English DBpedia (en-uris) we additionally provide normalized datasets based on the DBpedia Wikidata (DBw) datasets (wkd-uris). These sorted datasets will be the foundation for the upcoming fusion process with wikidata. The DBw-based uris will be the only ones provided from the following releases on.
We now filter out triples from the Raw Infobox Extractor that are already mapped. E.g. no more “<x> dbo:birthPlace <z>” and “<x> dbp:birthPlace|dbp:placeOfBirth|… <z>” in the same resource. These triples are now moved to the “infobox-properties-mapped” datasets and not loaded on the main endpoint. See issue 22 for more details.
Major improvements in our citation extraction. See here for more details.
We incorporated the statistical distribution approach of Heiko Paulheim in creating type statements automatically and providing them as an additional datasets (instance_types_sdtyped_dbo).
In case you missed it, what we changed in the previous release (2015-10):
English DBpedia switched to IRIs. This can be a breaking change to some applications that need to change their stored DBpedia resource URIs / links. We provide the “uri-same-as-iri” dataset for English to ease the transition.
The instance-types dataset is now split into two files: instance-types (containing only direct types) and instance-types-transitive containing the transitive types of a resource based on the DBpedia ontology
The mappingbased-properties file is now split into three (3) files:
“geo-coordinates-mappingbased” that contains the coordinated originating from the mappings wiki. the “geo-coordinates” continues to provide the coordinates originating from the GeoExtractor
“mappingbased-literals” that contains mapping based fact with literal values
“mappingbased-objects” that contains mapping based fact with object values
the “mappingbased-objects-disjoint-[domain|range]” are facts that are filtered out from the “mappingbased-objects” datasets as errors but are still provided
We added a new extractor for citation data that provides two files:
citation links: linking resources to citations
citation data: trying to get additional data from citations. This is a quite interesting dataset but we need help to clean it up
All datasets are available in .ttl and .tql serialization (nt, nq dataset were neglected for reasons of redundancy and server capacity).
Dataset normalization: We are going to normalize datasets based on wikidata uris and no longer on the English language edition, as a prerequisite to finally start the fusion process with wikidata.
RML Integration: Wouter Maroy did already provide the necessary groundwork for switching the mappings wiki to a RML based approach on Github. We are not there yet but this is at the top of our list of changes.
Starting with the next release we are adding datasets with NIF annotations of the abstracts (as we already provided those for the 2015-04 release). We will eventually extend the NIF annotation dataset to cover the whole Wikipedia article of a resource.
SDTypes: We extended the coverage of the automatically created type statements (instance_types_sdtyped_dbo) to English, German and Dutch (see above).
Extensions: In the extension folder (2016-04/ext) we provide two new datasets, both are to be considered in an experimental state:
DBpedia World Facts: This dataset is authored by the DBpedia association itself. It lists all countries, all currencies in use and (most) languages spoken in the world as well as how these concepts relate to each other (spoken in, primary language etc.) and useful properties like iso codes (ontology diagram). This Dataset extends the very useful LEXVO dataset with facts from DBpedia and the CIA Factbook. Please report any error or suggestions in regard to this dataset to Markus.
Lector Facts: This experimental dataset was provided by Matteo Cannaviccio and demonstrates his approach to generating facts by using common sequences of words (i.e. phrases) that are frequently used to describe instances of binary relations in a text. We are looking into using this approach as a regular extraction step. It would be helpful to get some feedback from you.
Lots of thanks to
Markus Freudenberg (University of Leipzig / DBpedia Association) for taking over the whole release process and creating the revamped download & statistics pages.
Dimitris Kontokostas (University of Leipzig / DBpedia Association) for conveying his considerable knowledge of the extraction and release process.
All editors that contributed to the DBpedia ontology mappings via the Mappings Wiki.
The whole DBpedia Internationalization Committee for pushing the DBpedia internationalization forward.
Heiko Paulheim (University of Mannheim) for providing the necessary code for his algorithm to generate additional type statements for formerly untyped resources and identify and removed wrong statements. Which is now part of the DIEF.
Václav Zeman, Thomas Klieger and the whole LHD team (University of Prague) for their contribution of additional DBpedia types
Marco Fossati (FBK) for contributing the DBTax types
Alan Meehan (TCD) for performing a big external link cleanup
Aldo Gangemi (LIPN University, France & ISTC-CNR, Italy) for providing the links from DOLCE to DBpedia ontology.
Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink Software) for loading the new data set into the Virtuoso instance that provides 5-Star Linked Open Data publication and SPARQL Query Services.
OpenLink Software (http://www.openlinksw.com/) collectively for providing the SPARQL Query Services and Linked Open Data publishing infrastructure for DBpedia in addition to their continuous infrastructure support.
Ruben Verborgh from Ghent University – iMinds for publishing the dataset as Triple Pattern Fragments, and iMinds for sponsoring DBpedia’s Triple Pattern Fragments server.
Ali Ismayilov (University of Bonn) for extending the DBpedia Wikidata dataset.
Vladimir Alexiev (Ontotext) for leading a successful mapping and ontology clean up effort.
All the GSoC students and mentors which directly or indirectly influenced the DBpedia release
After the success of the last two community meetings in Palo Alto and in The Hague we thought it is time to meet in Leipzig, where the DBpedia Association is located. During the SEMANTiCS 2016 in Leipzig, Sep 12-15, the DBpedia community met on the 15th of September. First and foremost, we would like to thank the Institute for Applied Informatics for supporting our community, the University of Leipzig for hosting our meeting and many thanks to the SEMANTiCS for hosting and sponsoring the meeting.
During the opening session, Lydia Pintscher, product manager of Wikidata, presented Wikidata: bringing structured data to Wikipedia with 16000 volunteers. Lydia described similarities and varieties between DBpedia and Wikidata and she talked about prospective steps for Wikidata. Harald Sack from the Hasso-Plattner-Institut spoke during the opening session, too. He introduced the dwerft Project – DBpedia and Linked Data for the Media Value Chaintopics which aims the common technology platform »Linked Production Data Cloud«.
The DBpedia showcase session started with the DBpedia 2016-04 release update by Markus Freudenberg (AKSW/KILT). At this session, six speakers presented how to utilize DBpedia in novel and interesting ways. For example:
Miel Vander Sande (iMinds) talked about DBpedia Archives as Memento with Triple Pattern Fragments.
Jörn Hees (DFKI) introduced us to Human associations in the Semantic Web and DBpedia.
Peter de Laat from GoUnitive urged the community to personalize user interaction in a Linked Data environment.
DBpedia Association hour
The 7th edition of the community meeting covered the first DBpedia Association hour, which provided a platform for the community to discuss and give feedback. Sebastian Hellmann (AKSW, KILT), Julia Holze (DBpedia Association) and Dimitris Kontokostas (AKSW, KILT) gave an update on the DBpedia Association status. We talked about our technical progress, DBpedia funding and visions. Sebastian Hellmann introduced the Board of Trustees, which is the main decision-making body of the DBpedia Association and oversees the association and its work as its ultimate corporate authority.
Enno Meijers (KB) of the Dutch DBpedia chapter announced a successful cooperation between Huygens ING, iMinds/Univ. Gent, Vrije Universiteit Amsterdam, Institute for Sound and Vision, Koninklijke Bibliotheek (KB) and the NL-DBpedia community. By signing the Manifest of Understanding (MoU) they support the goals of the DBpedia Association officially and strengthen the Dutch chapter and community.
The sessions in the afternoon highlighted two important fields of research and development, namely DBpedia ontology and DBpedia & NLP. At the DBpedia ontology session, Wouter Maroy (iMinds) presented DBpedia RML mappings, which he created during this year’s Google Summer of Code project and Gerard Kuys (Ordina) discussed the question ‘Does extraction prelude structure?’ with the DBpedia ontology group. At the same time, Milan Dojchinovski (AKSW/KILT) chaired the DBpedia & NLP session with eight very interesting talks. You will find all presentations given during this session on our website. The last two presentations Analyzing and improving the Polish Wikipedia Citations (part of the Wikipedia References & Citations challenge) and Greek DBpedia updates were given by Krzysztof Węcel (Poznan University) and Sotiris Karampatakis (OKF Greece).
On the closing session we wrapped up the meeting and gave out our prizes to:
The “DBpedia Excellence in Engineering” went to Markus Freudenberg for keeping up with the DBpedia releases
The “Citations Challenge prize” went to Krzysztof Węcel for his very thorough citation analysis.
All slides and presentations are also available on our Website and you will find more feedback and photos about the event on Twitter via #DBpediaLeipzig2016.
Summing up, the event brought together more than 150 DBpedians from Europe which engaged in vital conversations about interesting projects and approaches to questions/problems revolving around DBpedia. We would like to thank the organizers Magnus Knuth (HPI, DBpedia German & Commons), Monika Solanki (University of Oxford) and representatives of the DBpedia Association such as Dimitris Kontokostas, Sebastian Hellmann and Julia Holze for devoting their time to the organization of the meeting and the program.
We are now looking forward to the 8th DBpedia Community Meeting (which most probably coming sooner than you think across the Atlantic). Check our website for further updates or follow #DBpedia on twitter.
During the SEMANTiCS 2016 in Leipzig, Sep 12-15, the DBpedia community will get together on the 15th of September for the 7th edition of the DBpedia Community Meeting. The meeting will take place at the University of Leipzig (Augustusplatz 10, 04109 Leipzig, Germany). See here for detailed directions.
Over 140 participants registered for the next DBpedia Community Meeting, only few seats are left. So come and get your ticket to be part of this event.
The 7th edition of this event covers the first DBpedia Association hour, which provide a platform for the community to discuss and give feedback. On top we will have a DBpedia showcase session on DBpedia+ Data Stack 2016-04 – Release and talks about Human associations in the Semantic Web and DBpedia, DBpedia Archives as Memento with Triple Pattern Fragments and Towards a Unified PageRank for DBpedia and Wikidata. Our event features a dev & tutorial session to learn about DBpedia as well as a DBpedia ontology session and a DBpedia & NLP session.
Lydia Pintscher, product manager of Wikidata will speak about Wikidata: bringing structured data to Wikipedia with 16000 volunteers and Harald Sack from the Hasso-Plattner-Institut will speak about the dwerft Project – DBpedia and Linked Data for the Media Value Chaintopics. At the end of the meeting there will be a session for the “DBpedia references and citations challenge”, submissions will be judged by the Organizing Committee and the best two will receive a prize.
Attending the DBpedia Community meeting is free, but you need to register here. Optionally, in case you like to support DBpedia with a little more than your presence during the event, you can choose a DBpedia support ticket. Have a look here:
This data holds huge potential, especially for the Wikidata challenge of providing a reference source for every statement. It describes not only a lot of bibliographical data, but also a lot of web pages and many other sources around the web.
The data we extract at the moment is quite raw and can be improved in many different ways. Some of the potential improvements are:
We welcome contributions that improve the existing citation dataset in any way; and we are open to collaboration and helping. Results will be presented at the next DBpedia meeting: 15 September 2016 in Leipzig, co-located with SEMANTiCS 2016. Each participant should submit a short description of his/her contribution by Monday 12 September 2016 and present his/her work at the meeting. Comments, questions can be posted on the DBpedia discussion & developer lists or in our new DBpedia ideas page.
Submissions will be judged by the Organizing Committee and the best two will receive a prize.
DBpedia Tutorial on Semantic Knowledge Integration in established Data (IT) Environments
Enriching data with a semantic layer and linking entities is key to what is loosely called Smart Data. An easy, yet comprehensive way of achieving this is the use of Linked Data standards.
In this DBpedia tutorial, we will introduce
the basic ideas of Linked Data and other Semantic Web standards
existing open datasets that can be freely reused (including DBpedia of course)
software and services in the DBpedia infrastructure such as the DBpedia SPARQL service, the lookup service and the DBpedia Spotlight Entity Linking service
common business use cases that will help to apply the learned lessons into practice
integration example into a hypothetical environment
In particular, we would like to show how to seamlessly integrate Linked Data technologies into existing IT- and data-environments and discuss how to link private corporate data knowledge graphs to DBpedia and Linked Open Data. Another special focus is on finding links in text and unstructured data.