DBpedia Homepage | Blog | Sourceforge Page

DBpedia Version 2014 released

September 9, 2014 - 10:58 am by ChrisBizer - No comments »

Hi all,

we are happy to announce the release of DBpedia 2014.

The most important improvements of the new release compared to DBpedia 3.9 are:

1. the new release is based on updated Wikipedia dumps dating from April / May 2014 (the 3.9 release was based on dumps from March / April 2013), leading to an overall increase of the number of things described in the English edition from 4.26 to 4.58 million things.

2. the DBpedia ontology is enlarged and the number of infobox to ontology mappings has risen, leading to richer and cleaner data.

The English version of the DBpedia knowledge base currently describes 4.58 million things, out of which 4.22 million are classified in a consistent ontology (http://wiki.dbpedia.org/Ontology2014), including 1,445,000 persons, 735,000 places (including 478,000 populated places), 411,000 creative works (including 123,000 music albums, 87,000 films and 19,000 video games), 241,000 organizations (including 58,000 companies and 49,000 educational institutions), 251,000 species and 6,000 diseases.

We provide localized versions of DBpedia in 125 languages. All these versions together describe 38.3 million things, out of which 23.8 million are localized descriptions of things that also exist in the English version of DBpedia. The full DBpedia data set features 38 million labels and abstracts in 125 different languages, 25.2 million links to images and 29.8 million links to external web pages; 80.9 million links to Wikipedia categories, and 41.2 million links to YAGO categories. DBpedia is connected with other Linked Datasets by around 50 million RDF links.

Altogether the DBpedia 2014 release consists of 3 billion pieces of information (RDF triples) out of which 580 million were extracted from the English edition of Wikipedia, 2.46 billion were extracted from other language editions.

Detailed statistics about the DBpedia data sets in 28 popular languages are provided at Dataset Statistics page (http://wiki.dbpedia.org/Datasets2014/DatasetStatistics).

The main changes between DBpedia 3.9 and 2014 are described below. For additional, more detailed information please refer to the DBpedia Change Log (http://wiki.dbpedia.org/Changelog).

 1. Enlarged Ontology

The DBpedia community added new classes and properties to the DBpedia ontology via the mappings wiki. The DBpedia 2014 ontology encompasses

  • 685  classes (DBpedia 3.9: 529)
  • 1,079 object properties (DBpedia 3.9: 927)
  • 1,600 datatype properties (DBpedia 3.9: 1,290)
  • 116 specialized datatype properties (DBpedia 3.9: 116)
  • 47 owl:equivalentClass and 35 owl:equivalentProperty mappings to http://schema.org

2. Additional Infobox to Ontology Mappings

The editors community of the mappings wiki also defined many new mappings from Wikipedia templates to DBpedia classes. For the DBpedia 2014 extraction, we used 4,339 mappings (DBpedia 3.9: 3,177 mappings), which are distributed as follows over the languages covered in the release.

  • English: 586 mappings
  • Dutch: 469 mappings
  • Serbian: 450 mappings
  • Polish: 383 mappings
  • German: 295 mappings
  • Greek: 281 mappings
  • French: 221 mappings
  • Portuguese: 211 mappings
  • Slovenian: 170 mappings
  • Korean: 148 mappings
  • Spanish: 137 mappings
  • Italian: 125 mappings
  • Belarusian: 125 mappings
  • Hungarian: 111 mappings
  • Turkish: 91 mappings
  • Japanese: 81 mappings
  • Czech: 66 mappings
  • Bulgarian: 61 mappings
  • Indonesian: 59 mappings
  • Catalan: 52 mappings
  • Arabic: 52 mappings
  • Russian: 48 mappings
  • Basque: 37 mappings
  • Croatian: 36 mappings
  • Irish: 17 mappings
  • Wiki-Commons: 12 mappings
  • Welsh: 7 mappings
  • Bengali: 6 mappings
  • Slovak: 2 Mappings

3. Extended Type System to cover Articles without Infobox

 Until the DBpedia 3.8 release, a concept was only assigned a type (like person or place) if the corresponding Wikipedia article contains an infobox indicating this type. Starting from the 3.9 release, we provide type statements for articles without infobox that are inferred based on the link structure within the DBpedia knowledge base using the algorithm described in Paulheim/Bizer 2014 (http://www.heikopaulheim.com/documents/ijswis_2014.pdf). For the new release, an improved version of the algorithm was run to produce type information for 400,000 things that were formerly not typed. A similar algorithm (presented in the same paper) was used to identify and remove potentially wrong statements from the knowledge base.

 4. New and updated RDF Links into External Data Sources

 We updated the following RDF link sets pointing at other Linked Data sources: Freebase, Wikidata, Geonames and GADM. For an overview about all data sets that are interlinked from DBpedia please refer to http://wiki.dbpedia.org/Interlinking.

Accessing the DBpedia 2014 Release 

 You can download the new DBpedia datasets from http://wiki.dbpedia.org/Downloads.

 As usual, the new dataset is also available as Linked Data and via the DBpedia SPARQL endpoint at http://dbpedia.org/sparql.

Credits

 Lots of thanks to

  1. Daniel Fleischhacker (University of Mannheim) and Volha Bryl (University of Mannheim) for improving the DBpedia extraction framework, for extracting the DBpedia 2014 data sets for all 125 languages, for generating the updated RDF links to external data sets, and for generating the statistics about the new release.
  2. All editors that contributed to the DBpedia ontology mappings via the Mappings Wiki.
  3.  The whole DBpedia Internationalization Committee for pushing the DBpedia internationalization forward.
  4. Dimitris Kontokostas (University of Leipzig) for improving the DBpedia extraction framework and loading the new release onto the DBpedia download server in Leipzig.
  5. Heiko Paulheim (University of Mannheim) for re-running his algorithm to generate additional type statements for formerly untyped resources and identify and removed wrong statements.
  6. Petar Ristoski (University of Mannheim) for generating the updated links pointing at the GADM database of Global Administrative Areas. Petar will also generate an updated release of DBpedia as Tables soon.
  7. Aldo Gangemi (LIPN University, France & ISTC-CNR, Italy) for providing the links from DOLCE to DBpedia ontology.
  8.  Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink Software) for loading the new data set into the Virtuoso instance that serves the Linked Data view and SPARQL endpoint.
  9.  OpenLink Software (http://www.openlinksw.com/) altogether for providing the server infrastructure for DBpedia.
  10. Michael Moore (University of Waterloo, as an intern at the University of Mannheim) for implementing the anchor text extractor and and contribution to the statistics scripts.
  11. Ali Ismayilov (University of Bonn) for implementing Wikidata extraction, on which the interlanguage link generation was based.
  12. Gaurav Vaidya (University of Colorado Boulder) for implementing and running Wikimedia Commons extraction.
  13. Andrea Di Menna, Jona Christopher Sahnwaldt, Julien Cojan, Julien Plu, Nilesh Chakraborty and others who contributed improvements to the DBpedia extraction framework via the source code repository on GitHub.
  14.  All GSoC mentors and students for working directly or indirectly on this release: https://github.com/dbpedia/extraction-framework/graphs/contributors

 The work on the DBpedia 2014 release was financially supported by the European Commission through the project LOD2 - Creating Knowledge out of Linked Data (http://lod2.eu/).

More information about DBpedia is found at http://dbpedia.org/About as well as in the new overview article about the project available at  http://wiki.dbpedia.org/Publications.

Have fun with the new DBpedia 2014 release!

Cheers,

Daniel Fleischhacker, Volha Bryl, and Christian Bizer

 

 

DBpedia Spotlight V0.7 released

July 21, 2014 - 9:58 am by ChrisBizer - No comments »

DBpedia Spotlight is an entity linking tool for connecting free text to DBpedia through the recognition and disambiguation of entities and concepts from the DBpedia KB.

We are happy to announce Version 0.7 of DBpedia Spotlight, which is also the first official release of the probabilistic/statistical implementation.

More information about as well as updated evaluation results for DBpedia Spotlight V0.7 are found in this paper:

Joachim Daiber, Max Jakob, Chris Hokamp, Pablo N. Mendes: Improving Efficiency and Accuracy in Multilingual Entity ExtractionISEM2013. 

The changes to the statistical implementation include:

  • smaller and faster models through quantization of counts, optimization of search and some pruning
  • better handling of case
  • various fixes in Spotlight and PigNLProc
  • models can now be created without requiring a Hadoop and Pig installation
  • UIMA support by @mvnural
  • support for confidence value

See the release notes at [1] and the updated demos at [4].

Models for Spotlight 0.7 can be found here [2].

Additionally, we now provide the raw Wikipedia counts, which we hope will prove useful for research and development of new models [3].

A big thank you to all developers who made contributions to this version (with special thanks to Faveeo and Idio). Huge thanks to Jo for his leadership and continued support to the community.

Cheers,
Pablo Mendes,

on behalf of Joachim Daiber and the DBpedia Spotlight developer community.

[1] - https://github.com/dbpedia-spotlight/dbpedia-spotlight/releases/tag/release-0.7

[2] - http://spotlight.sztaki.hu/downloads/

[3] - http://spotlight.sztaki.hu/downloads/raw

[4] - http://dbpedia-spotlight.github.io/demo/

(This message is an adaptation of Joachim Daiber’s message to the DBpedia Spotlight list. Edited to suit this broader community and give credit to him.)

Call for Ideas and Mentors for GSoC 2014 DBpedia + Spotlight joint proposal (please contribute within the next days)

February 12, 2014 - 8:32 am by Dimitris Kontokostas - No comments »

We started to draft a document for submission at Google Summer of Code 2014:
http://dbpedia.org/gsoc2014

We are still in need of ideas and mentors.  If you have any improvements on DBpedia or DBpedia Spotlight that you would like to have done, please submit it in the ideas section now. Note that accepted GSoC students will receive about 5000 USD for a three months, which can help you to estimate the effort and size of proposed ideas. It is also ok to extend/amend existing ideas (as long as you don’t hi-jack them). Please edit here:
https://docs.google.com/document/d/13YcM-LCs_W3-0u-s24atrbbkCHZbnlLIK3eyFLd7DsI/edit?pli=1

Becoming a mentor is also a very good way to get involved with DBpedia. As a mentor you will also be able to vote on proposals, after Google accepts our project. Note that it is also ok, if you are a researcher and have a suitable student to submit an idea and become mentor. After acceptance by Google the student then has to apply for the idea and get accepted.

Please take some time this week to add your ideas and apply as a mentor, if applicable. Feel free to improve the introduction as well and comment on the rest of the document.

Information on GSoC in general can be found here:
http://www.google-melange.com/gsoc/homepage/google/gsoc2014

Thank you for your help,
Sebastian and Dimitris

Making sense out of the Wikipedia categories (GSoC2013)

November 29, 2013 - 2:21 pm by Dimitris Kontokostas - No comments »

(Part of our DBpedia+spotlight @ GSoC mini blog series)

Mentor: Marco Fossati @hjfocs <fossati[at]spaziodati.eu>
Student: Kasun Perera <kkasunperera[at]gmail.com>

The latest version of the DBpedia ontology has 529 classes. It is not well balanced and shows a lack of coverage in terms of encyclopedic knowledge representation.

Furthermore, the current typing approach involves a costly manual mapping effort and heavily depends on the presence of infoboxes in Wikipedia articles.

Hence, a large number of DBpedia instances is either un-typed, due to a missing mapping or a missing infobox, or has a too generic or too specialized type, due to the nature of the ontology.

The goal of this project is to identify a set of senseful Wikipedia categories that can be used to extend the coverage of DBpedia instances.

How we used the Wikipedia category system

Wikipedia categories are organized in some kind of really messy hierarchy, which is of little use from an ontological point of view.

We investigated how to process this chaotic world.

Here’s what we have done

We have identified a set of meaningful categories by combining the following approaches:

  1. Algorithmic, programmatically traversing the whole Wikipedia category system.

Wow! This was really the hardest part. Kasun made a great job! Special thanks to the category guru Christian Consonni for shedding light in the darkness of such a weird world.

  1. Linguistic, identifying conceptual categories with NLP techniques.

We got inspired by the YAGO guys.

  1. Multilingual, leveraging interlanguage links.

Kudos to Aleksander Pohl for the idea.

  1. Post-mortem, cleaning out stuff that was still not relevant

No resurrection without Freebase!

Outcomes

We found out a total amount of 3751 candidates that can be used to type the instances.

We produced a dataset in the following format:

<Wikipedia_article_page> rdf:type <article_category>

You can access the full dump here. This has not been validated by humans yet.

If you feel like having a look at it, please tell us what do you think about.

Take a look at the Kasun’s progress page for more details.

DBpedia as Tables released

November 25, 2013 - 11:30 am by ChrisBizer - No comments »

As some of the potential users of DBpedia might not be familiar with the RDF data model and the SPARQL query language, we provide some of the core DBpedia 3.9 data also in tabular form as Comma-Seperated-Values (CSV) files which can easily be processed using standard tools such as spreadsheet applications, relational databases or data mining tools.

For each class in the DBpedia ontology (such as Person, Radio Station, Ice Hockey Player, or Band) we provide a single CSV file which contains all instances of this class. Each instance is described by its URI, an English label and a short abstract, the mapping-based infobox data describing the instance (extracted from the English edition of Wikipedia), and geo-coordinates (if applicable).

Altogether we provide 530 CVS files in the form of a single ZIP file (size 3 GB compressed and 73.4 GB uncompressed).

More information about the file format as well as the download link is found at DBpedia as Tables.

DBpedia 3.9 released, including wider infobox coverage, additional type statements, and new YAGO and Wikidata links

September 17, 2013 - 4:57 pm by Christopher Sahnwaldt - No comments »

Hi all,

we are happy to announce the release of DBpedia 3.9.

The most important improvements of the new release compared to DBpedia 3.8 are:

1. the new release is based on updated Wikipedia dumps dating from March / April 2013 (the 3.8 release was based on dumps from June 2012), leading to an overall increase in the number of concepts in the English edition from 3.7 to 4.0 million things.

2. the DBpedia ontology is enlarged and the number of infobox to ontology mappings has risen, leading to richer and cleaner concept descriptions.

3. we extended the DBpedia type system to also cover Wikipedia articles that do not contain an infobox.

4. we provide links pointing from DBpedia concepts to Wikidata concepts and updated the links pointing at YAGO concepts and classes, making it easier to integrate knowledge from these sources.

The English version of the DBpedia knowledge base currently describes 4.0 million things, out of which 3.22 million are classified in a consistent Ontology, including 832,000 persons, 639,000 places (including 427,000 populated places), 372,000 creative works (including 116,000 music albums, 78,000 films and 18,500 video games), 209,000 organizations (including 49,000 companies and 45,000 educational institutions), 226,000 species and 5,600 diseases.

We provide localized versions of DBpedia in 119 languages. All these versions together describe 24.9 million things, out of which 16.8 million overlap (are interlinked) with the concepts from the English DBpedia. The full DBpedia data set features labels and abstracts for 12.6 million unique things in 119 different languages; 24.6 million links to images and 27.6 million links to external web pages; 45.0 million external links into other RDF datasets, 67.0 million links to Wikipedia categories, and 41.2 million YAGO categories.

Altogether the DBpedia 3.9 release consists of 2.46 billion pieces of information (RDF triples) out of which 470 million were extracted from the English edition of Wikipedia, 1.98 billion were extracted from other language editions, and about 45 million are links to external data sets.

Detailed statistics about the DBpedia data sets in 24 popular languages are provided at Dataset Statistics.

The main changes between DBpedia 3.8 and 3.9 are described below. For additional, more detailed information please refer to the Change Log.

1. Enlarged Ontology

The DBpedia community added new classes and properties to the DBpedia ontology via the mappings wiki. The DBpedia 3.9 ontology encompasses

  • 529 classes (DBpedia 3.8: 359)
  • 927 object properties (DBpedia 3.8: 800)
  • 1290 datatype properties (DBpedia 3.8: 859)
  • 116 specialized datatype properties (DBpedia 3.8: 116)
  • 46 owl:equivalentClass and 31 owl:equivalentProperty mappings to http://schema.org

2. Additional Infobox to Ontology Mappings

The editors of the mappings wiki also defined many new mappings from Wikipedia templates to DBpedia classes. For the DBpedia 3.9 extraction, we used 3177 mappings (DBpedia 3.8: 2347 mappings), that are distributed as follows over the languages covered in the release.

3. Extended Type System to cover Articles without Infobox

Until the DBpedia 3.8 release, a concept was only assigned a type (like person or place) if the corresponding Wikipedia article contains an infobox indicating this type. The new 3.9 release now also contains type statements for articles without infobox that were inferred based on the link structure within the DBpedia knowledge base using the algorithm described in Paulheim/Bizer 2013. Applying the algorithm allowed us to provide type information for 440,000 concepts that were formerly not typed. A similar algorithm was also used to identify and remove potentially wrong links from the knowledge base.

4. New and updated RDF Links into External Data Sources

We added RDF links to Wikidata and updated the following RDF link sets pointing at other Linked Data sources: YAGO, FreebaseGeonamesGADM and EUNIS. For an overview about all data sets that are interlinked from DBpedia please refer to DBpedia Interlinking.

5. New Find Related Concepts Service

We offer a new service for finding resources that are related to a given DBpedia seed resource. More information about the service is found at DBpedia FindRelated.

Accessing the DBpedia 3.9  Release

You can download the new DBpedia datasets from http://wiki.dbpedia.org/Downloads39.

As usual, the dataset is also available as Linked Data and via the DBpedia SPARQL endpoint at http://dbpedia.org/sparql

Credits

Lots of thanks to

  • Jona Christopher Sahnwaldt (Freelancer funded by the University of Mannheim, Germany) for improving the DBpedia extraction framework, for extracting the DBpedia 3.9 data sets for all 119 languages, and for generating the updated RDF links to external data sets.
  • All editors that contributed to the DBpedia ontology mappings via the Mappings Wiki.
  • Heiko Paulheim (University of Mannheim, Germany) for inventing and implementing the algorithm to generate additional type statements for formerly untyped resources.
  • The whole Internationalization Committee for pushing the DBpedia internationalization forward.
  • Dimitris Kontokostas (University of Leipzig) for improving the DBpedia extraction framework and loading the new release onto the DBpedia download server in Leipzig.
  • Volha Bryl (University of Mannheim, Germany) for generating the statistics about the new release.
  • Petar Ristoski (University of Mannheim, Germany) for generating the updated links pointing at the GADM database of Global Administrative Areas.
  • Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink Software) for loading the new data set into the Virtuoso instance that serves the Linked Data view and SPARQL endpoint.
  • OpenLink Software (http://www.openlinksw.com/) altogether for providing the server infrastructure for DBpedia.
  • Julien Cojan, Andrea Di Menna, Ahmed Ktob, Julien Plu, Jim Regan and others who contributed improvements to the DBpedia extraction framework via the source code repository on GitHub.

The work on the DBpedia 3.9 release was financially supported by the European Commission through the project LOD2 - Creating Knowledge out of Linked Data (http://lod2.eu/).

More information about DBpedia is found at http://dbpedia.org/About as well as in the new overview article about the project.

Have fun with the new DBpedia release!

Cheers,

Chris Bizer and Christopher Sahnwaldt

New DBpedia Overview Article

June 24, 2013 - 5:48 pm by JensLehmann - No comments »

We are pleased to announce that a new overview article for DBpedia is available.

The article covers several aspects of the DBpedia community project:

  • The DBpedia extraction framework.
  • The mappings wiki as the central structure for maintaining the community-curated DBpedia ontology.
  • Statistics on the multilingual support in DBpedia.
  • DBpedia live synchronisation with Wikipedia.
  • Statistics on the interlinking of DBpedia with other parts of the LOD cloud (incoming and outgoing links).
  • Several usage statistics: What kind of queries are asked against DBpedia and how did that change over the past years? How much traffic do the official static and live endpoint as well as the download server have? What are the most popular DBpedia datasets?
  • A description of use cases and applications of DBpedia in several areas (drop me mail if important applications are missing).
  • The relation of DBpedia to the YAGO, Freebase and WikiData projects.
  • Future challenges for the DBpedia project.

After our ISWC 2009 paper on DBpedia, this is the (long overdue) new reference article for DBpedia, which should provide a good introduction to the project. We submitted the article as a system report to the Semantic Web journal.

Download article as PDF.

DBpedia+Spotlight accepted @ Google Summer of Code 2013

April 10, 2013 - 4:11 pm by Dimitris Kontokostas - No comments »

Google Summer of Code (GSoC) is a global program that offers post-secondary student developers (ages 18 and older, BSc, MSc, PhD)  stipends to write code for various open source software projects. Since its inception in 2005, the program has brought together over 6,000 successful student participants and over 3,000 mentors from over 100 countries worldwide, all for the love of code.

DBpedia participated successfully in last’s year GSoC as DBpedia Spotlight. We were allowed with 4 students (out of a total 37 applications) and managed to enhance DBpedia Spotlight in time performance, accuracy and extra functionality.  We are thrilled to announce, that we were accepted again in GSoC 2013. We are participating with all DBpedia-family products this time - that is DBpedia, DBpedia Spotlight and DBpedia Wiktionary - and we hope we share the same luck, again.

This year we have  brand new and exciting ideas so, if you know energetic students (BSc, MSc, PhD) interested in working with DBpedia, text processing, and semantics, please encourage them to apply!

If you are a student, the application period starts in 2 weeks (deadline May 3rd). Judging from last year’s competition, writing a good application can be a really hard task so you should start preparing from now. We already created a dedicated mailing list and a few  warm-up tasks ( to get you familiar with our technologies) and we will of course be always available to any questions.

So go ahead, choose your idea, write your application and impress us;)

http://www.google-melange.com/gsoc/org/google/gsoc2013/dbpediaspotlight

On behalf of the DBpedia GSoC team,

Dimitris Kontokostas

DBpedia 3.8 released, including enlarged Ontology and additional localized Versions

August 6, 2012 - 3:48 pm by ChrisBizer - No comments »

Hi all,

we are happy to announce the release of DBpedia 3.8.


The most important improvements of the new release compared to DBpedia 3.7 are:

1. the new release is based on updated Wikipedia dumps dating from late May / early June 2012.
2. the DBpedia ontology is enlarged and the number of infobox to ontology mappings has risen.
3. the DBpedia internationalization has progressed and we now provide localized versions of DBpedia in even more languages.

The English version of the DBpedia knowledge base currently describes 3.77 million things, out of which 2.35 million are classified in a consistent
Ontology, including 764,000 persons, 573,000 places (including 387,000 populated places), 333,000 creative works (including 112,000 music albums, 72,000 films and 18,000 video games), 192,000 organizations (including 45,000 companies and 42,000 educational institutions), 202,000 species and 5,500 diseases.

We provide localized versions of DBpedia in 111 languages. All these versions together describe 20.8 million things, out of which 10.5 mio overlap (are interlinked) with concepts from the English DBpedia. The full DBpedia data set features labels and abstracts for 10.3 million unique things in 111 different languages; 8.0 million links to images and 24.4 million HTML links to external web pages; 27.2 million data links into external RDF data sets, 55.8 million links to Wikipedia categories, and 8.2 million YAGO categories. The dataset consists of 1.89 billion pieces of information (RDF triples) out of which 400 million were extracted from the English edition of Wikipedia, 1.46 billion were extracted from other language editions, and about 27 million are data links into external RDF data sets.

The main changes between DBpedia 3.7 and 3.8 are described below. For additional, more detailed information please refer to the change log.

1. Enlarged Ontology

The DBpedia community added many new classes and properties on the
mappings wiki. The DBpedia 3.8 ontology encompasses

  • 359 classes (DBpedia 3.7: 319)
  • 800 object properties (DBpedia 3.7: 750)
  • 859 datatype properties (DBpedia 3.7: 791)
  • 116 specialized datatype properties (DBpedia 3.7: 102)
  • 45 owl:equivalentClass and 31 owl:equivalentProperty mappings to
    http://schema.org


2. Additional Infobox to Ontology Mappings

The editors of the
mappings wiki also defined many new mappings from Wikipedia templates to DBpedia classes. For the DBpedia 3.8 extraction, we used 2347 mappings, among them

  • Polish: 382 mappings
  • English: 345 mappings
  • German: 211 mappings
  • Portuguese: 207 mappings
  • Greek: 180 mappings
  • Slovenian: 170 mappings
  • Korean: 146 mappings
  • Hungarian: 111 mappings
  • Spanish: 107 mappings
  • Turkish: 91 mappings
  • Czech: 66 mappings
  • Bulgarian: 61 mappings
  • Catalan: 52 mappings
  • Arabic: 51 mappings


3. New local DBpedia Chapters


We are also happy to see the number of local DBpedia chapters in different countries rising. Since the 3.7 DBpedia release we welcomed the French, Italian and Japanese Chapters. In addition, we expect the Dutch DBpedia chapter to go online during the next months (in cooperation with http://bibliotheek.nl/). The DBpedia chapters provide local SPARQL endpoints and dereferencable URIs for the DBpedia data in their corresponding language. The DBpedia Internationalization page provides an overview of the current state of the DBpedia Internationalization effort.

4. New and updated RDF Links into External Data Sources

We have added new RDF links pointing at resources in the following Linked Data sources: Amsterdam Museum, BBC Wildlife Finder, CORDIS, DBTune, Eurostat (Linked Statistics), GADM, LinkedGeoData, OpenEI (Open Energy Info). In addition, we have updated many of the existing RDF links pointing at other Linked Data sources.


5. New Wiktionary2RDF Extractor

We developed a DBpedia extractor, that is configurable for any Wiktionary edition. It generates an comprehensive ontology about languages for use as a semantic lexical resource in linguistics. The data currently includes language, part of speech, senses with definitions, synonyms, taxonomies (hyponyms, hyperonyms, synonyms, antonyms) and translations for each lexical word. It furthermore is hosted as Linked Data and can serve as a central linking hub for LOD in linguistics. Currently available languages are English, German, French, Russian. In the next weeks we plan to add Vietnamese and Arabic. The goal is to allow the addition of languages just by configuration without the need of programming skills, enabling collaboration as in the Mappings Wiki. For more information visit http://wiktionary.dbpedia.org/

6. Improvements to the Data Extraction Framework

  • Additionally to N-Triples and N-Quads, the framework was extended to write triple files in Turtle format
  • Extraction steps that looked for links between different Wikipedia editions were replaced by more powerful post-processing scripts
  • Preparation time and effort for abstract extraction is minimized, extraction time is reduced to a few milliseconds per page
  • To save file system space, the framework can compress DBpedia triple files while writing and decompress Wikipedia XML dump files while reading
  • Using some bit twiddling, we can now load all ~200 million inter-language links into a few GB of RAM and analyze them
  • Users can download ontology and mappings from mappings wiki and store them in files to avoid downloading them for each extraction, which takes a lot of time and makes extraction results less reproducible
  • We now use IRIs for all languages except English, which uses URIs for backwards compatibility
  • We now resolve redirects in all datasets where the objects URIs are DBpedia resources
  • We check that extracted dates are valid (e.g. February never has 30 days) and its format is valid according to its XML Schema type, e.g. xsd:gYearMonth
  • We improved the removal of HTML character references from the abstracts
  • When extracting raw infobox properties, we make sure that predicate URI can be used in RDF/XML by appending an underscore if necessary
  • Page IDs and Revision IDs datasets now use the DBpedia resource as subject URI, not the Wikipedia page URL
  • We use foaf:isPrimaryTopicOf instead of foaf:page for the link from DBpedia resource to Wikipedia page
  • New inter-language link datasets for all languages


Accessing the DBpedia 3.8  Release

You can download the new DBpedia dataset from
http://dbpedia.org/Downloads38.

As usual, the dataset is also available as Linked Data and via the DBpedia SPARQL endpoint at
http://dbpedia.org/sparql

Credits

Lots of thanks to

  • Jona Christopher Sahnwaldt (Freie Universität Berlin, Germany) for improving the DBpedia extraction framework and for extracting the DBpedia 3.8 data sets.
  • Dimitris Kontokostas (Aristotle University of Thessaloniki, Greece) for implementing the language generalizations to the extraction framework.
  • Uli Zellbeck and Anja Jentzsch (Freie Universität Berlin, Germany) for generating the new and updated RDF links to external datasets using the Silk interlinking framework.
  • Jonas Brekle (Universität Leipzig, Germany) and Sebastian Hellmann (Universität Leipzig, Germany)for their work on the new Wikionary2RDF extractor.
  • All editors that contributed to the DBpedia ontology mappings via the Mappings Wiki.
  • The whole Internationalization Committee for pushing the DBpedia internationalization forward.
  • Kingsley Idehen and Patrick van Kleef (both OpenLink Software) for loading the dataset into the Virtuoso instance that serves the Linked Data view and SPARQL endpoint. OpenLink Software (http://www.openlinksw.com/) altogether for providing the server infrastructure for DBpedia.

The work on the DBpedia 3.8 release was financially supported by the European Commission through the projects LOD2 - Creating Knowledge out of Linked Data (http://lod2.eu/, improvements to the extraction framework) and LATC - LOD Around the Clock (http://latc-project.eu/, creation of external RDF links).

More information about DBpedia is found at http://dbpedia.org/About

Have fun with the new DBpedia release!

Cheers,

Chris Bizer

DBpedia Spotlight has been selected for Google Summer of Code. Please apply now!

March 22, 2012 - 1:18 pm by ChrisBizer - No comments »

The Google Summer of Code (GSoC) is a global program that offers student developers (BSc,MSc,PhD) stipends to write code for open source software projects. It has had thousands of participants since the first edition in 2005, connecting prospective students with mentors from open source communities such as Debian, KDE, Gnome, Apache Software Foundation, Mozilla, etc.

For the students, it is a great chance to get real-world software development experience. For the open source communities, it is a chance to expand their development community. For everybody else, more source code is created and released for the benefit of all!

We are thrilled to announce that our open source project DBpedia Spotlight has been selected for the Google Summer of Code 2012.

We are now seeking students interested in working with us to enhance operational aspects of DBpedia Spotlight, as well as to engage in research activities in collaboration with our team. If you are an energetic developer, passionate for open source and interested in areas related to DBpedia Spotlight, please get in touch with us!

We have shared a number of project ideas to get you started.

To apply, visit: http://www.google-melange.com/gsoc/org/google/gsoc2012/dbpediaspotlight

If you would like to see DBpedia Spotlight in action, helping you to explore available projects within GSoC 2012, please visit our demonstration page at: http://spotlight.dbpedia.org/gsoc/