DBpedia Homepage | Blog | Sourceforge Page

Archiv für die Kategorie 'Dataset releases'

DBpedia 3.4 released

November 11, 2009 - 1:52 pm by ChrisBizer - No comments »

We are happy to announce the release of DBpedia 3.4. The new release is based on Wikipedia dumps dating from September 2009.

The new DBpedia data set describes more than 2.9 million things, including 282,000 persons, 339,000 places, 88,000 music albums, 44,000 films, 15,000 video games, 119,000 organizations, 130,000 species and 4400 diseases. The DBpedia data set now features labels and abstracts for these things in 91 different languages; 807,000 links to images and 3,840,000 links to external web pages; 4,878,100 external links into other RDF datasets, 415,000 Wikipedia categories, and 75,000 YAGO categories. The data set consists of 479 million pieces of information (RDF triples) out of which 190 million were extracted from the English edition of Wikipedia and 289 million were extracted from other language editions.

The new release provides the following improvements and changes compared to the DBpedia 3.3 release:

  1. the data set has been extracted from more recent Wikipedia dumps.
  2. the data set now provides labels, abstracts and infobox data in 91 different languages.
  3. we provide two different version of the DBpedia Infobox Ontology (loose and strict) in order to meet different application requirements. Please refer to http://wiki.dbpedia.org/Datasets#h18-11 for details.
  4. as Wikipedia has moved to dual-licensing, we also dual-license DBpedia. The DBpedia 3.4 data set is licensed under the terms of the Creative Commons Attribution-ShareAlike 3.0 license and the GNU Free Documentation License.
  5. the mapping-based infobox data extractor has been improved and now normalizes units of measurement.
  6. various bug fixes and improvements throughout the code base. Please refer to the change log for the complete list http://wiki.dbpedia.org/Changelog

You can download the new DBpedia dataset from http://wiki.dbpedia.org/Downloads34. As usual, the dataset is also available as Linked Data and via the DBpedia SPARQL endpoint at http://dbpedia.org/sparql.

Lots of thanks to

  • Anja Jentzsch, Christopher Sahnwaldt, Robert Isele, and Paul Kreis (all Freie Universität Berlin) for improving the DBpedia extraction framework and for extracting the new data set.
  • Jens Lehmann and Sören Auer (Universität Leipzig) for providing new data set via the DBpedia download server at Universität Leipzig.
  • Kingsley Idehen and Mitko Iliev for loading the new data set into the Virtuoso instance that serves the Linked Data view and SPARQL endpoint. OpenLink Software (http://www.openlinksw.com/) altogether for providing the server infrastructure for DBpedia.
  • Neofonie GmbH, Berlin, (http://www.neofonie.de/index.jsp) for supporting the DBpedia project by paying Christopher Sahnwaldt.

The next steps for the DBpedia project will be to

  1. synchronize Wikipedia and DBpedia by deploying the DBpedia live extraction which updates the DBpedia knowledge base immediately when a Wikipedia article changes.
  2. enable the DBpedia user community to edit and maintain the DBpedia ontology and the infobox mappings that are used by the extraction framework in a public Wiki.
  3. increase the quality of the extracted data by improving and fine-tuning the extraction code.

All this will hopefully happen soon.

Have fun with the new data set!

Cheers

Chris Bizer

DBpedia now part of Amazon Public Data Sets

February 27, 2009 - 7:41 pm by Sören - No comments »

Kingsley announced on Tuesday that the first of data sets from the LOD community including DBpedia have been uploaded to the Amazon’s public data set hosting facility. Thus you can now do the following:

  1. Download DBpedia data from Amazon’s hosting facility at no cost to your own data center and then build your own personal or service specific edition of DBpedia
  2. Download to an EC2 AMI and build yourself using Virtuoso or any other Quad / Triple Store
  3. Use the DBpedia EC2 AMI which we provide (which will produce a rendition in 1.5 hrs)

We especially thank our colleagues and new Linked Data supporters at both Amazon Web Services and Infochimps.org for their assistance re. getting this very taxing process in motion.

DBpedia version 3.2 released including the new DBpedia Ontology

November 17, 2008 - 1:54 pm by ChrisBizer - No comments »

we are happy to announce the release of DBpedia version 3.2.

The new knowledge base has been extracted from the October 2008 Wikipedia dumps. Compared to the last release, the new knowledge base provides three mayor improvements:

1. DBpedia Ontology

DBpedia now features a shallow, cross-domain ontology, which has been manually created based on the most commonly used infoboxes within Wikipedia. The ontology currently covers over 170 classes which form a subsumption hierarchy and have 940 properties. The ontology is instanciated by a new infobox data extraction method which is based on hand-generated mappings of Wikipedia infoboxes to the DBpedia ontology. The mappings define fine-granular rules on how to parse infobox values. The mappings also adjust weaknesses in the Wikipedia infobox system, like having different infoboxes for the same class (currently 350 Wikipedia templates are mapped to 170 ontology classes), using different property names for the same property (currently 2350 template properties are mapped to 940 ontology properties), and not having clearly defined datatypes for properties. Therefore, the instance data within the infobox ontology is much cleaner and better structured than the infobox data within the DBpedia infobox dataset which is generated using the old infobox extraction code. The DBpedia Ontology currently contains about 882.000 instances.

More information about the ontology is found at http://wiki.dbpedia.org/Ontology

2. RDF Links to Freebase

Freebase is an open-license database which provides data about million of things from various domains. Freebase has recently released an Linked Data interface to their content. As there is a big overlap between DBpedia and Freebase, we have added 2.4 million RDF links to DBpedia pointing at the corresponding things in Freebase. These links can be used to smush and fuse data about a thing from DBpedia and Freebase.

For more information about the Freebase links see
http://blog.dbpedia.org/2008/11/15/dbpedia-is-now-interlinked-with-freebase-links-to-opencyc-updated/

3. Cleaner Abstacts

Within the old DBpedia dataset it occurred that the abstracts for different languages contained Wikpedia markup and other strange characters. For the 3.2 release, we have improved DBpedia’s abstract extraction code which results in much cleaner abstracts that can safely be displayed in user interfaces.

Access the new DBpedia knowledge base 

The new DBpedia release can be downloaded from:

http://wiki.dbpedia.org/Downloads32

and is also available via the DBpedia SPARQL endpoint at

http://dbpedia.org/sparql

and via DBpedia’s Linked Data interface. Example URIs:

http://dbpedia.org/resource/Berlin
http://dbpedia.org/page/Oliver_Stone

Lots of thanks to everybody who contributed to the Dbpedia 3.2 release!

Especially:

1. Georgi Kobilarov (Freie Universität Berlin) who designed and implemented the new infobox extraction framework.
2. Anja Jentsch (Freie Universität Berlin) who contributed to implementing the new extraction framework and wrote the infobox to ontology class mappings.
3. Paul Kreis (Freie Universität Berlin) who improved the datatype extraction code.
4. Andreas Schultz (Freie Universität Berlin) for generating the Freebase to DBpedia RDF links.
5. Everybody at OpenLink Software for hosting DBpedia on a Virtuoso server and for providing the statistics about the new Dbpedia knowledge base.

Have fun with the new DBpedia knowledge base!

DBpedia 3.1 breaks 100 million triples barrier

August 18, 2008 - 9:58 am by JensLehmann - One comment »

Today, we released DBpedia 3.1. As always in the past years, the size of Wikipedia increased a lot over the past months. The new extraction contains 116,7 million triples, marking an increase of 27% over the previous version.

Apart from the more recent Wikipedia dumps we used, some notable improvements are a much better YAGO mapping, providing a more complete (more classes assigned to instances) and accurate (95% accuracy) class hierarchy for DBpedia. The Geo extractor code has been improved and is now run for all 14 languages. URI validation has switched to the PEAR validation class.

Downloads | ChangeLog

DBpedia 3.0 Release

February 10, 2008 - 9:08 pm by JensLehmann - One comment »

We announce the availability of the DBpedia 3.0 final release.

Downloads are available at http://wiki.dbpedia.org/Downloads. For a list of changes since DBpedia 2.0, see the Changelog. Most notably, multi-language support was improved, new linked data sets added, and extraction code improved. Compared to the 3.0 release candidate, a number of extraction framework and data set bugs reported at our sourceforge.net bug tracker were fixed.

Overall, the combined download size of all provided NT and CSV files is 5,0 GB (uncompressed: 48,1 GB). The available data sets contain 92M triples (excluding 126M triples for internal Wikipedia links). DBpedia’s coverage grows to 2.4M entities for the English edition in this release, thanks to the hard-working Wikipedia contributors.

The extraction was performed on a server of the AKSW research group. I would like to thank Jörg Schüppel, Sören Auer, Chris Bizer, Richard Cyganiak, Georgi Kobilarov, the OpenLink team, and many other contributors for their DBpedia support.

DBpedia-Cyc linkage

October 3, 2007 - 7:00 pm by Sören - One comment »

The commonsense knowledge base Cyc or OpenCyc (when compared to DBpedia) seems to follow a rather top-down approach – first more abstract concepts and entities are represented and later Cyc started to include also more domain knowledge. This seems to be reasonable, since domain knowledge changes faster and there is much more of it. On the other hand, domain knowledge is usually, what people need to solve real problems within their domains. DBpedia contains primarily domain knowledge, hence a combination of both – Cyc and DBpedia – could really be a winning team.

Together with the committed OpenCyc community we produced a first DBpedia-Cyc linkage, which is now available as a DBpedia dataset from the downloads section. The dataset will soon also be loaded into the DBpedia SPARQL endpoint and made available as linked data. More information about the linkage can be found at: http://wiki.dbpedia.org/OpenCyc

DBpedia 2.0 released

September 5, 2007 - 6:51 pm by Georgi Kobilarov - One comment »

After quite some work into improving the DBpedia information extraction framework, we have released a new version of the DBpedia dataset today.

The renewed DBpedia dataset describes 1,950,000 “things”, including at least 80,000 persons, 70,000 places, 35,000 music albums, 12,000 films. It contains 657,000 links to images, 1,600,000 links to relevant external web pages and 440,000 external links into other RDF datasets. Altogether, the DBpedia dataset now consists of around 103 million RDF triples.

We worked on improving the data quality in order to make the dataset more usable and useful to developers and fixed a lot of bugs submitted by our growing developer-community. We also reworked our framework to enable developers to extend the dataset with their own extractors.

We are grateful for all contributions and are looking forward to support new projects based on DBpedia data.

 

New dataset containing links between DBpedia instances

March 27, 2007 - 10:07 pm by Sören - No comments »

A new dataset containing links between DBpedia instances is available for download. The dataset was created from the internal pagelinks between Wikipedia articles.

New release of the DBpedia dataset

March 17, 2007 - 10:14 pm by Sören - No comments »

Changes: Improved quality of short abstracts, additional long abstracts, links to external webpages added, links to Wikipedia pages in different languages added.

The DBpedia dataset is now available for download (25 million triples, zipped 440 MB altogether).

February 15, 2007 - 10:17 pm by Sören - No comments »