DBpedia Homepage | Blog | Sourceforge Page

Archiv für die Kategorie 'Dataset releases'

DBpedia version 3.2 released including the new DBpedia Ontology

November 17, 2008 - 1:54 pm by ChrisBizer - No comments »

we are happy to announce the release of DBpedia version 3.2.

The new knowledge base has been extracted from the October 2008 Wikipedia dumps. Compared to the last release, the new knowledge base provides three mayor improvements:

1. DBpedia Ontology

DBpedia now features a shallow, cross-domain ontology, which has been manually created based on the most commonly used infoboxes within Wikipedia. The ontology currently covers over 170 classes which form a subsumption hierarchy and have 940 properties. The ontology is instanciated by a new infobox data extraction method which is based on hand-generated mappings of Wikipedia infoboxes to the DBpedia ontology. The mappings define fine-granular rules on how to parse infobox values. The mappings also adjust weaknesses in the Wikipedia infobox system, like having different infoboxes for the same class (currently 350 Wikipedia templates are mapped to 170 ontology classes), using different property names for the same property (currently 2350 template properties are mapped to 940 ontology properties), and not having clearly defined datatypes for properties. Therefore, the instance data within the infobox ontology is much cleaner and better structured than the infobox data within the DBpedia infobox dataset which is generated using the old infobox extraction code. The DBpedia Ontology currently contains about 882.000 instances.

More information about the ontology is found at http://wiki.dbpedia.org/Ontology

2. RDF Links to Freebase

Freebase is an open-license database which provides data about million of things from various domains. Freebase has recently released an Linked Data interface to their content. As there is a big overlap between DBpedia and Freebase, we have added 2.4 million RDF links to DBpedia pointing at the corresponding things in Freebase. These links can be used to smush and fuse data about a thing from DBpedia and Freebase.

For more information about the Freebase links see
http://blog.dbpedia.org/2008/11/15/dbpedia-is-now-interlinked-with-freebase-links-to-opencyc-updated/

3. Cleaner Abstacts

Within the old DBpedia dataset it occurred that the abstracts for different languages contained Wikpedia markup and other strange characters. For the 3.2 release, we have improved DBpedia’s abstract extraction code which results in much cleaner abstracts that can safely be displayed in user interfaces.

Access the new DBpedia knowledge base 

The new DBpedia release can be downloaded from:

http://wiki.dbpedia.org/Downloads32

and is also available via the DBpedia SPARQL endpoint at

http://dbpedia.org/sparql

and via DBpedia’s Linked Data interface. Example URIs:

http://dbpedia.org/resource/Berlin
http://dbpedia.org/page/Oliver_Stone

Lots of thanks to everybody who contributed to the Dbpedia 3.2 release!

Especially:

1. Georgi Kobilarov (Freie Universität Berlin) who designed and implemented the new infobox extraction framework.
2. Anja Jentsch (Freie Universität Berlin) who contributed to implementing the new extraction framework and wrote the infobox to ontology class mappings.
3. Paul Kreis (Freie Universität Berlin) who improved the datatype extraction code.
4. Andreas Schultz (Freie Universität Berlin) for generating the Freebase to DBpedia RDF links.
5. Everybody at OpenLink Software for hosting DBpedia on a Virtuoso server and for providing the statistics about the new Dbpedia knowledge base.

Have fun with the new DBpedia knowledge base!

DBpedia 3.1 breaks 100 million triples barrier

August 18, 2008 - 9:58 am by JensLehmann - One comment »

Today, we released DBpedia 3.1. As always in the past years, the size of Wikipedia increased a lot over the past months. The new extraction contains 116,7 million triples, marking an increase of 27% over the previous version.

Apart from the more recent Wikipedia dumps we used, some notable improvements are a much better YAGO mapping, providing a more complete (more classes assigned to instances) and accurate (95% accuracy) class hierarchy for DBpedia. The Geo extractor code has been improved and is now run for all 14 languages. URI validation has switched to the PEAR validation class.

Downloads | ChangeLog

DBpedia 3.0 Release

February 10, 2008 - 9:08 pm by JensLehmann - One comment »

We announce the availability of the DBpedia 3.0 final release.

Downloads are available at http://wiki.dbpedia.org/Downloads. For a list of changes since DBpedia 2.0, see the Changelog. Most notably, multi-language support was improved, new linked data sets added, and extraction code improved. Compared to the 3.0 release candidate, a number of extraction framework and data set bugs reported at our sourceforge.net bug tracker were fixed.

Overall, the combined download size of all provided NT and CSV files is 5,0 GB (uncompressed: 48,1 GB). The available data sets contain 92M triples (excluding 126M triples for internal Wikipedia links). DBpedia’s coverage grows to 2.4M entities for the English edition in this release, thanks to the hard-working Wikipedia contributors.

The extraction was performed on a server of the AKSW research group. I would like to thank Jörg Schüppel, Sören Auer, Chris Bizer, Richard Cyganiak, Georgi Kobilarov, the OpenLink team, and many other contributors for their DBpedia support.

DBpedia-Cyc linkage

October 3, 2007 - 7:00 pm by Sören - One comment »

The commonsense knowledge base Cyc or OpenCyc (when compared to DBpedia) seems to follow a rather top-down approach – first more abstract concepts and entities are represented and later Cyc started to include also more domain knowledge. This seems to be reasonable, since domain knowledge changes faster and there is much more of it. On the other hand, domain knowledge is usually, what people need to solve real problems within their domains. DBpedia contains primarily domain knowledge, hence a combination of both – Cyc and DBpedia – could really be a winning team.

Together with the committed OpenCyc community we produced a first DBpedia-Cyc linkage, which is now available as a DBpedia dataset from the downloads section. The dataset will soon also be loaded into the DBpedia SPARQL endpoint and made available as linked data. More information about the linkage can be found at: http://wiki.dbpedia.org/OpenCyc

DBpedia 2.0 released

September 5, 2007 - 6:51 pm by Georgi Kobilarov - One comment »

After quite some work into improving the DBpedia information extraction framework, we have released a new version of the DBpedia dataset today.

The renewed DBpedia dataset describes 1,950,000 “things”, including at least 80,000 persons, 70,000 places, 35,000 music albums, 12,000 films. It contains 657,000 links to images, 1,600,000 links to relevant external web pages and 440,000 external links into other RDF datasets. Altogether, the DBpedia dataset now consists of around 103 million RDF triples.

We worked on improving the data quality in order to make the dataset more usable and useful to developers and fixed a lot of bugs submitted by our growing developer-community. We also reworked our framework to enable developers to extend the dataset with their own extractors.

We are grateful for all contributions and are looking forward to support new projects based on DBpedia data.

 

New dataset containing links between DBpedia instances

March 27, 2007 - 10:07 pm by Sören - No comments »

A new dataset containing links between DBpedia instances is available for download. The dataset was created from the internal pagelinks between Wikipedia articles.

New release of the DBpedia dataset

March 17, 2007 - 10:14 pm by Sören - No comments »

Changes: Improved quality of short abstracts, additional long abstracts, links to external webpages added, links to Wikipedia pages in different languages added.

The DBpedia dataset is now available for download (25 million triples, zipped 440 MB altogether).

February 15, 2007 - 10:17 pm by Sören - No comments »