DBpedia 3.4 released

We are happy to announce the release of DBpedia 3.4. The new release is based on Wikipedia dumps dating from September 2009.

The new DBpedia data set describes more than 2.9 million things, including 282,000 persons, 339,000 places, 88,000 music albums, 44,000 films, 15,000 video games, 119,000 organizations, 130,000 species and 4400 diseases. The DBpedia data set now features labels and abstracts for these things in 91 different languages; 807,000 links to images and 3,840,000 links to external web pages; 4,878,100 external links into other RDF datasets, 415,000 Wikipedia categories, and 75,000 YAGO categories. The data set consists of 479 million pieces of information (RDF triples) out of which 190 million were extracted from the English edition of Wikipedia and 289 million were extracted from other language editions.

The new release provides the following improvements and changes compared to the DBpedia 3.3 release:

  1. the data set has been extracted from more recent Wikipedia dumps.
  2. the data set now provides labels, abstracts and infobox data in 91 different languages.
  3. we provide two different version of the DBpedia Infobox Ontology (loose and strict) in order to meet different application requirements. Please refer to http://wiki.dbpedia.org/Datasets#h18-11 for details.
  4. as Wikipedia has moved to dual-licensing, we also dual-license DBpedia. The DBpedia 3.4 data set is licensed under the terms of the Creative Commons Attribution-ShareAlike 3.0 license and the GNU Free Documentation License.
  5. the mapping-based infobox data extractor has been improved and now normalizes units of measurement.
  6. various bug fixes and improvements throughout the code base. Please refer to the change log for the complete list http://wiki.dbpedia.org/Changelog

You can download the new DBpedia dataset from http://wiki.dbpedia.org/Downloads34. As usual, the dataset is also available as Linked Data and via the DBpedia SPARQL endpoint at http://dbpedia.org/sparql.

Lots of thanks to

  • Anja Jentzsch, Christopher Sahnwaldt, Robert Isele, and Paul Kreis (all Freie Universität Berlin) for improving the DBpedia extraction framework and for extracting the new data set.
  • Jens Lehmann and Sören Auer (Universität Leipzig) for providing new data set via the DBpedia download server at Universität Leipzig.
  • Kingsley Idehen and Mitko Iliev for loading the new data set into the Virtuoso instance that serves the Linked Data view and SPARQL endpoint. OpenLink Software (http://www.openlinksw.com/) altogether for providing the server infrastructure for DBpedia.
  • Neofonie GmbH, Berlin, (http://www.neofonie.de/index.jsp) for supporting the DBpedia project by paying Christopher Sahnwaldt.

The next steps for the DBpedia project will be to

  1. synchronize Wikipedia and DBpedia by deploying the DBpedia live extraction which updates the DBpedia knowledge base immediately when a Wikipedia article changes.
  2. enable the DBpedia user community to edit and maintain the DBpedia ontology and the infobox mappings that are used by the extraction framework in a public Wiki.
  3. increase the quality of the extracted data by improving and fine-tuning the extraction code.

All this will hopefully happen soon.

Have fun with the new data set!

Cheers

Chris Bizer