DBpedia Homepage | Blog | Sourceforge Page

Archiv für die Kategorie 'Dataset releases'

DBpedia 3.1 breaks 100 million triples barrier

August 18, 2008 - 9:58 am by JensLehmann - No comments »

Today, we released DBpedia 3.1. As always in the past years, the size of Wikipedia increased a lot over the past months. The new extraction contains 116,7 million triples, marking an increase of 27% over the previous version.

Apart from the more recent Wikipedia dumps we used, some notable improvements are a much better YAGO mapping, providing a more complete (more classes assigned to instances) and accurate (95% accuracy) class hierarchy for DBpedia. The Geo extractor code has been improved and is now run for all 14 languages. URI validation has switched to the PEAR validation class.

Downloads | ChangeLog

DBpedia 3.0 Release

February 10, 2008 - 9:08 pm by JensLehmann - One comment »

We announce the availability of the DBpedia 3.0 final release.

Downloads are available at http://wiki.dbpedia.org/Downloads. For a list of changes since DBpedia 2.0, see the Changelog. Most notably, multi-language support was improved, new linked data sets added, and extraction code improved. Compared to the 3.0 release candidate, a number of extraction framework and data set bugs reported at our sourceforge.net bug tracker were fixed.

Overall, the combined download size of all provided NT and CSV files is 5,0 GB (uncompressed: 48,1 GB). The available data sets contain 92M triples (excluding 126M triples for internal Wikipedia links). DBpedia’s coverage grows to 2.4M entities for the English edition in this release, thanks to the hard-working Wikipedia contributors.

The extraction was performed on a server of the AKSW research group. I would like to thank Jörg Schüppel, Sören Auer, Chris Bizer, Richard Cyganiak, Georgi Kobilarov, the OpenLink team, and many other contributors for their DBpedia support.

DBpedia-Cyc linkage

October 3, 2007 - 7:00 pm by Sören - One comment »

The commonsense knowledge base Cyc or OpenCyc (when compared to DBpedia) seems to follow a rather top-down approach – first more abstract concepts and entities are represented and later Cyc started to include also more domain knowledge. This seems to be reasonable, since domain knowledge changes faster and there is much more of it. On the other hand, domain knowledge is usually, what people need to solve real problems within their domains. DBpedia contains primarily domain knowledge, hence a combination of both – Cyc and DBpedia – could really be a winning team.

Together with the committed OpenCyc community we produced a first DBpedia-Cyc linkage, which is now available as a DBpedia dataset from the downloads section. The dataset will soon also be loaded into the DBpedia SPARQL endpoint and made available as linked data. More information about the linkage can be found at: http://wiki.dbpedia.org/OpenCyc

DBpedia 2.0 released

September 5, 2007 - 6:51 pm by Georgi Kobilarov - One comment »

After quite some work into improving the DBpedia information extraction framework, we have released a new version of the DBpedia dataset today.

The renewed DBpedia dataset describes 1,950,000 “things”, including at least 80,000 persons, 70,000 places, 35,000 music albums, 12,000 films. It contains 657,000 links to images, 1,600,000 links to relevant external web pages and 440,000 external links into other RDF datasets. Altogether, the DBpedia dataset now consists of around 103 million RDF triples.

We worked on improving the data quality in order to make the dataset more usable and useful to developers and fixed a lot of bugs submitted by our growing developer-community. We also reworked our framework to enable developers to extend the dataset with their own extractors.

We are grateful for all contributions and are looking forward to support new projects based on DBpedia data.

 

New dataset containing links between DBpedia instances

March 27, 2007 - 10:07 pm by Sören - No comments »

A new dataset containing links between DBpedia instances is available for download. The dataset was created from the internal pagelinks between Wikipedia articles.

New release of the DBpedia dataset

March 17, 2007 - 10:14 pm by Sören - No comments »

Changes: Improved quality of short abstracts, additional long abstracts, links to external webpages added, links to Wikipedia pages in different languages added.

The DBpedia dataset is now available for download (25 million triples, zipped 440 MB altogether).

February 15, 2007 - 10:17 pm by Sören - No comments »