Category Archives: Inter-linkage

New DBpedia Overview Article

We are pleased to announce that a new overview article for DBpedia is available.

The article covers several aspects of the DBpedia community project:

  • The DBpedia extraction framework.
  • The mappings wiki as the central structure for maintaining the community-curated DBpedia ontology.
  • Statistics on the multilingual support in DBpedia.
  • DBpedia live synchronisation with Wikipedia.
  • Statistics on the interlinking of DBpedia with other parts of the LOD cloud (incoming and outgoing links).
  • Several usage statistics: What kind of queries are asked against DBpedia and how did that change over the past years? How much traffic do the official static and live endpoint as well as the download server have? What are the most popular DBpedia datasets?
  • A description of use cases and applications of DBpedia in several areas (drop me mail if important applications are missing).
  • The relation of DBpedia to the YAGO, Freebase and WikiData projects.
  • Future challenges for the DBpedia project.

After our ISWC 2009 paper on DBpedia, this is the (long overdue) new reference article for DBpedia, which should provide a good introduction to the project. We submitted the article as a system report to the Semantic Web journal.

Download article as PDF.

DBpedia 3.8 released, including enlarged Ontology and additional localized Versions

Hi all,

we are happy to announce the release of DBpedia 3.8.


The most important improvements of the new release compared to DBpedia 3.7 are:

1. the new release is based on updated Wikipedia dumps dating from late May / early June 2012.
2. the DBpedia ontology is enlarged and the number of infobox to ontology mappings has risen.
3. the DBpedia internationalization has progressed and we now provide localized versions of DBpedia in even more languages.

The English version of the DBpedia knowledge base currently describes 3.77 million things, out of which 2.35 million are classified in a consistent
Ontology, including 764,000 persons, 573,000 places (including 387,000 populated places), 333,000 creative works (including 112,000 music albums, 72,000 films and 18,000 video games), 192,000 organizations (including 45,000 companies and 42,000 educational institutions), 202,000 species and 5,500 diseases.

We provide localized versions of DBpedia in 111 languages. All these versions together describe 20.8 million things, out of which 10.5 mio overlap (are interlinked) with concepts from the English DBpedia. The full DBpedia data set features labels and abstracts for 10.3 million unique things in 111 different languages; 8.0 million links to images and 24.4 million HTML links to external web pages; 27.2 million data links into external RDF data sets, 55.8 million links to Wikipedia categories, and 8.2 million YAGO categories. The dataset consists of 1.89 billion pieces of information (RDF triples) out of which 400 million were extracted from the English edition of Wikipedia, 1.46 billion were extracted from other language editions, and about 27 million are data links into external RDF data sets.

The main changes between DBpedia 3.7 and 3.8 are described below. For additional, more detailed information please refer to the change log.

1. Enlarged Ontology

The DBpedia community added many new classes and properties on the
mappings wiki. The DBpedia 3.8 ontology encompasses

  • 359 classes (DBpedia 3.7: 319)
  • 800 object properties (DBpedia 3.7: 750)
  • 859 datatype properties (DBpedia 3.7: 791)
  • 116 specialized datatype properties (DBpedia 3.7: 102)
  • 45 owl:equivalentClass and 31 owl:equivalentProperty mappings to
    http://schema.org


2. Additional Infobox to Ontology Mappings

The editors of the
mappings wiki also defined many new mappings from Wikipedia templates to DBpedia classes. For the DBpedia 3.8 extraction, we used 2347 mappings, among them

  • Polish: 382 mappings
  • English: 345 mappings
  • German: 211 mappings
  • Portuguese: 207 mappings
  • Greek: 180 mappings
  • Slovenian: 170 mappings
  • Korean: 146 mappings
  • Hungarian: 111 mappings
  • Spanish: 107 mappings
  • Turkish: 91 mappings
  • Czech: 66 mappings
  • Bulgarian: 61 mappings
  • Catalan: 52 mappings
  • Arabic: 51 mappings


3. New local DBpedia Chapters


We are also happy to see the number of local DBpedia chapters in different countries rising. Since the 3.7 DBpedia release we welcomed the French, Italian and Japanese Chapters. In addition, we expect the Dutch DBpedia chapter to go online during the next months (in cooperation with http://bibliotheek.nl/). The DBpedia chapters provide local SPARQL endpoints and dereferencable URIs for the DBpedia data in their corresponding language. The DBpedia Internationalization page provides an overview of the current state of the DBpedia Internationalization effort.

4. New and updated RDF Links into External Data Sources

We have added new RDF links pointing at resources in the following Linked Data sources: Amsterdam Museum, BBC Wildlife Finder, CORDIS, DBTune, Eurostat (Linked Statistics), GADM, LinkedGeoData, OpenEI (Open Energy Info). In addition, we have updated many of the existing RDF links pointing at other Linked Data sources.


5. New Wiktionary2RDF Extractor

We developed a DBpedia extractor, that is configurable for any Wiktionary edition. It generates an comprehensive ontology about languages for use as a semantic lexical resource in linguistics. The data currently includes language, part of speech, senses with definitions, synonyms, taxonomies (hyponyms, hyperonyms, synonyms, antonyms) and translations for each lexical word. It furthermore is hosted as Linked Data and can serve as a central linking hub for LOD in linguistics. Currently available languages are English, German, French, Russian. In the next weeks we plan to add Vietnamese and Arabic. The goal is to allow the addition of languages just by configuration without the need of programming skills, enabling collaboration as in the Mappings Wiki. For more information visit http://wiktionary.dbpedia.org/

6. Improvements to the Data Extraction Framework

  • Additionally to N-Triples and N-Quads, the framework was extended to write triple files in Turtle format
  • Extraction steps that looked for links between different Wikipedia editions were replaced by more powerful post-processing scripts
  • Preparation time and effort for abstract extraction is minimized, extraction time is reduced to a few milliseconds per page
  • To save file system space, the framework can compress DBpedia triple files while writing and decompress Wikipedia XML dump files while reading
  • Using some bit twiddling, we can now load all ~200 million inter-language links into a few GB of RAM and analyze them
  • Users can download ontology and mappings from mappings wiki and store them in files to avoid downloading them for each extraction, which takes a lot of time and makes extraction results less reproducible
  • We now use IRIs for all languages except English, which uses URIs for backwards compatibility
  • We now resolve redirects in all datasets where the objects URIs are DBpedia resources
  • We check that extracted dates are valid (e.g. February never has 30 days) and its format is valid according to its XML Schema type, e.g. xsd:gYearMonth
  • We improved the removal of HTML character references from the abstracts
  • When extracting raw infobox properties, we make sure that predicate URI can be used in RDF/XML by appending an underscore if necessary
  • Page IDs and Revision IDs datasets now use the DBpedia resource as subject URI, not the Wikipedia page URL
  • We use foaf:isPrimaryTopicOf instead of foaf:page for the link from DBpedia resource to Wikipedia page
  • New inter-language link datasets for all languages


Accessing the DBpedia 3.8  Release

You can download the new DBpedia dataset from
http://dbpedia.org/Downloads38.

As usual, the dataset is also available as Linked Data and via the DBpedia SPARQL endpoint at
http://dbpedia.org/sparql

Credits

Lots of thanks to

  • Jona Christopher Sahnwaldt (Freie Universität Berlin, Germany) for improving the DBpedia extraction framework and for extracting the DBpedia 3.8 data sets.
  • Dimitris Kontokostas (Aristotle University of Thessaloniki, Greece) for implementing the language generalizations to the extraction framework.
  • Uli Zellbeck and Anja Jentzsch (Freie Universität Berlin, Germany) for generating the new and updated RDF links to external datasets using the Silk interlinking framework.
  • Jonas Brekle (Universität Leipzig, Germany) and Sebastian Hellmann (Universität Leipzig, Germany)for their work on the new Wikionary2RDF extractor.
  • All editors that contributed to the DBpedia ontology mappings via the Mappings Wiki.
  • The whole Internationalization Committee for pushing the DBpedia internationalization forward.
  • Kingsley Idehen and Patrick van Kleef (both OpenLink Software) for loading the dataset into the Virtuoso instance that serves the Linked Data view and SPARQL endpoint. OpenLink Software (http://www.openlinksw.com/) altogether for providing the server infrastructure for DBpedia.

The work on the DBpedia 3.8 release was financially supported by the European Commission through the projects LOD2 – Creating Knowledge out of Linked Data (http://lod2.eu/, improvements to the extraction framework) and LATC – LOD Around the Clock (http://latc-project.eu/, creation of external RDF links).

More information about DBpedia is found at http://dbpedia.org/About

Have fun with the new DBpedia release!

Cheers,

Chris Bizer

DBpedia 3.7 released, including 15 localized Editions

Hi all,

we are happy to announce the release of DBpedia 3.7. The new release is based on Wikipedia dumps dating from late July 2011.

The new DBpedia data set describes more than 3.64 million things, of which 1.83 million are classified in a consistent ontology, including 416,000 persons, 526,000 places, 106,000 music albums, 60,000 films, 17,500 video games, 169,000 organizations, 183,000 species and 5,400 diseases.

The DBpedia data set features labels and abstracts for 3.64 million things in up to 97 different languages; 2,724,000 links to images and 6,300,000 links to external web pages; 6,200,000 external links into other RDF datasets, and 740,000 Wikipedia categories. The dataset consists of 1 billion pieces of information (RDF triples) out of which 385 million were extracted from the English edition of Wikipedia and roughly 665 million were extracted from other language editions and links to external datasets.

Localized Editions

Up till now, we extracted data from non-English Wikipedia pages only if there exists an equivalent English page, as we wanted to have a single URI to identify a resource across all 97 languages. However, since there are many pages in the non-English Wikipedia editions that do not have an equivalent English page (especially small towns in different countries, e.g. the Austrian village Endach, or legal and administrative terms that are just relevant for a single country) relying on English URIs only had the negative effect that DBpedia did not contain data for these entities and many DBpedia users have complained about this shortcoming.

As part of the DBpedia 3.7 release, we now provide 15 localized DBpedia editions for download that contain data from all Wikipedia pages in a specific language. These localized editions cover the following languages: ca, de, el, es, fr, ga, hr, hu, it, nl, pl, pt, ru, sl, tr. The URIs identifying entities in these i18n data sets are constructed directly from the non-English title and a language-specific URI namespaces (e.g. http://ru.dbpedia.org/resource/Berlin), so there are now 16 different URIs in DBpedia that refer to Berlin. We also extract the inter-language links from the different Wikipedia editions. Thus, whenever a inter-language links between a non-English Wikipedia page and its English equivalent exists, the resulting owl:sameAs link can be used to relate the localized DBpedia URI to the equivalent in the main (English) DBpedia edition. The localized DBpedia editions are provided for download on the DBpedia download page (http://wiki.dbpedia.org/Downloads37). Note that we have not provide public SPARQL endpoints for the localized editions, nor do the localized URIs dereference. This might change in the future, as more local DBpedia chapters are set up in different countries as part of the DBpedia internationalization effort (http://dbpedia.org/Internationalization).

Other Changes

Beside the new localized editions, the DBpedia 3.7 release provides the following improvements and changes compared to the last release:

1. Framework

  • Redirects are resolved in a post-processing step for increased inter-connectivity of 13% (applied for English data sets)
  • Extractor configuration using the dependency injection principle
  • Simple threaded loading of mappings in server
  • Improved international language parsing support thanks to the members of the Internationalization Committee: http://dbpedia.org/Internationalization

2. Bugfixes

  • Encode homepage URLs to conform with N-Triples spec
  • Correct reference parsing
  • Recognize MediaWiki parser functions
  • Raw infobox extraction produces more object properties again
  • skos:related for category links starting with “:” and having and anchor text
  • Restrict objects to Main namespace in MappingExtractor
  • Double rounding (e.g. a person’s height should not be 1800.00000001 cm)
  • Start position in abstract extractor
  • Server can handle template names containing a slash
  • Encoding issues in YAGO dumps

3. Ontology

  • 320 ontology classes
  • 750 object properties
  • 893 datatype properties
  • owl:equivalentClass and owl:equivalentProperty mappings to http://schema.org

Note that the ontology now is a directed-acyclic graph. Classes can have multiple superclasses, which was important for the mappings to schema.org. A taxonomy can still be constructed by ignoring all superclass but the one that is specified first in the list and is considered the most important.

4. Mappings

  • Dynamic statistics for infobox mappings showing the overall and individual coverage of the mappings in each language: http://mappings.dbpedia.org/index.php/Mapping_Statistics
  • Improved DBpedia Ontology as well as improved Infobox mappings using http://mappings.dbpedia.org/. These improvements are largely due to collective work by the community before and during the DBpedia Mapping Creation Sprint. For English, there are 17.5 million RDF statements based on mappings (13.8 million in version 3.6) (see also http://dbpedia.org/Downloads37#ontologyinfoboxproperties).
  • ConstantProperty mappings to capture information from the template title (e.g. Infobox_Australian_Road {{TemplateMapping | mapToClass = Road | mappings = {{ConstantMapping | ontologyProperty = country | value = Australia }}}})
  • Language specification for string properties in PropertyMappings (e.g. Infobox_japan_station: {{PropertyMapping | templateProperty = name | ontologyProperty = foaf:name | language = ja}} )
  • Multiplication factor in PropertyMappings (e.g. Infobox_GB_station: {{PropertyMapping | templateProperty = usage0910 | ontologyProperty = passengersPerYear | factor = 1000000}}, because it’s always specified in millions)

5. RDF Links to External Data Sources

  • New RDF links pointing at resources in the following Linked Data sources: Umbel, EUnis, LinkedMDB, Geospecis
  • Updated RDF links pointing at resources in the following Linked Data sources: Freebase, WordNet, Opencyc, New York Times, Drugbank, Diseasome, Flickrwrapper, Sider, Factbook, DBLP, Eurostat, Dailymed, Revyu

Accessing the new DBpedia Release

You can download the new DBpedia dataset from http://dbpedia.org/Downloads37.

As usual, the dataset is also available as Linked Data and via the DBpedia SPARQL endpoint (http://dbpedia.org/sparql).

Credits

Lots of thanks to

  • All editors that contributed to the DBpedia ontology mappings via the Mappings Wiki.
  • Max Jakob (Freie Universität Berlin, Germany) for improving the DBpedia extraction framework and for extracting the new datasets.
  • Dimitris Kontokostas (Aristotle University of Thessaloniki, Greece) for providing language generalizations to the extraction framework.
  • Paul Kreis (Freie Universität Berlin, Germany) for administering the ontology and for delivering the mapping statistics and schema.org mappings.
  • Uli Zellbeck (Freie Universität Berlin, Germany) for providing the links to external datasets using the Silk framework.
  • The whole Internationalization Committee for expanding some DBpedia extractors to a number of languages:
    http://dbpedia.org/Internationalization.
  • Kingsley Idehen and Mitko Iliev (both OpenLink Software) for loading the dataset into the Virtuoso instance that serves the Linked Data view and SPARQL endpoint. OpenLink Software (http://www.openlinksw.com/) altogether for providing the server infrastructure for DBpedia.

The work on the new release was financially supported by:

  • The European Commission through the project LOD2 – Creating Knowledge out of Linked Data (http://lod2.eu/, improvements to the extraction framework).
  • The European Commission through the project LATC – LOD Around the Clock (http://latc-project.eu/, creation of external RDF links).
  • Vulcan Inc. as part of its Project Halo (http://www.projecthalo.com/).

More information about DBpedia is found at http://dbpedia.org/About

Have fun with the new data set!

Cheers,

Chris Bizer

Links to DBpedia from Ontos NLP web services

The NLP specialist Ontos extends the quality and amount of information for developers by integrating its news portal into the Linked Data Cloud. Ontos’ GUIDs for objects are now dereferencable – the resulting RDF contains owl:sameAs-attributes to DBpedia, Freebase and others (cf. e.g the entry for Barack Obama).

Within the news portal Ontos crawls news articles from diverse online sources, uses its cutting-edge NLP technology to extract facts (objects and relations between them), merges these information with existing ones and stores them including respective references to the original news article – all of this fully automatically. Facts from Ontos’ portal are accessible via a RESTful HTTP API. Fetching data is free – in order to receive an API key, developers have to register (e-mail address only!) at Ontos’ homepage.

For humans Ontos provides a search interface at http://www.ontosearch.com. It allows to look-up objects in the database and viewing respective summaries in HTML or RDF.

Please note that the generated RDF does currently contain a small part of existing information (e. g. no article references yet). Ontos will extend the respective content step-by-step.

3sat TV magazine features Linked Data and DBpedia

The 3Sat computer magazine ‘neues‘ has broadcasted a feature about Linked Data and DBpedia and the roles both efforts are playing in the evolution of the Web into a medium for the publication and linkage of data.

See:

Background information:

DBpedia version 3.2 released including the new DBpedia Ontology

we are happy to announce the release of DBpedia version 3.2.

The new knowledge base has been extracted from the October 2008 Wikipedia dumps. Compared to the last release, the new knowledge base provides three mayor improvements:

1. DBpedia Ontology

DBpedia now features a shallow, cross-domain ontology, which has been manually created based on the most commonly used infoboxes within Wikipedia. The ontology currently covers over 170 classes which form a subsumption hierarchy and have 940 properties. The ontology is instanciated by a new infobox data extraction method which is based on hand-generated mappings of Wikipedia infoboxes to the DBpedia ontology. The mappings define fine-granular rules on how to parse infobox values. The mappings also adjust weaknesses in the Wikipedia infobox system, like having different infoboxes for the same class (currently 350 Wikipedia templates are mapped to 170 ontology classes), using different property names for the same property (currently 2350 template properties are mapped to 940 ontology properties), and not having clearly defined datatypes for properties. Therefore, the instance data within the infobox ontology is much cleaner and better structured than the infobox data within the DBpedia infobox dataset which is generated using the old infobox extraction code. The DBpedia Ontology currently contains about 882.000 instances.

More information about the ontology is found at http://wiki.dbpedia.org/Ontology

2. RDF Links to Freebase

Freebase is an open-license database which provides data about million of things from various domains. Freebase has recently released an Linked Data interface to their content. As there is a big overlap between DBpedia and Freebase, we have added 2.4 million RDF links to DBpedia pointing at the corresponding things in Freebase. These links can be used to smush and fuse data about a thing from DBpedia and Freebase.

For more information about the Freebase links see
http://blog.dbpedia.org/2008/11/15/dbpedia-is-now-interlinked-with-freebase-links-to-opencyc-updated/

3. Cleaner Abstacts

Within the old DBpedia dataset it occurred that the abstracts for different languages contained Wikpedia markup and other strange characters. For the 3.2 release, we have improved DBpedia’s abstract extraction code which results in much cleaner abstracts that can safely be displayed in user interfaces.

Access the new DBpedia knowledge base 

The new DBpedia release can be downloaded from:

http://wiki.dbpedia.org/Downloads32

and is also available via the DBpedia SPARQL endpoint at

http://dbpedia.org/sparql

and via DBpedia’s Linked Data interface. Example URIs:

http://dbpedia.org/resource/Berlin
http://dbpedia.org/page/Oliver_Stone

Lots of thanks to everybody who contributed to the Dbpedia 3.2 release!

Especially:

1. Georgi Kobilarov (Freie Universität Berlin) who designed and implemented the new infobox extraction framework.
2. Anja Jentsch (Freie Universität Berlin) who contributed to implementing the new extraction framework and wrote the infobox to ontology class mappings.
3. Paul Kreis (Freie Universität Berlin) who improved the datatype extraction code.
4. Andreas Schultz (Freie Universität Berlin) for generating the Freebase to DBpedia RDF links.
5. Everybody at OpenLink Software for hosting DBpedia on a Virtuoso server and for providing the statistics about the new Dbpedia knowledge base.

Have fun with the new DBpedia knowledge base!

DBpedia is now interlinked with Freebase. Links to OpenCyc updated.

Freebase is an open-license database which provides data about million of things from various domains. Freebase has recently released an Linked Data interface to their content (See release note). As there is a big overlap between DBpedia and Freebase, we have added 2.4 million RDF links to DBpedia pointing at the corresponding things in Freebase. These links can be used to smush and fuse data about a thing from DBpedia and Freebase. For instance, you can use the Marbles Linked Data browser to view data about the Lord of the Rings from Freebase and DBpedia smushed together.

 We have also updated the the RDF links to OpenCyc, which allow you to use DBpedia instance data together with conceptual knowledge of OpenCyc.

Example Freebase Link

http://dbpedia.org/resource/Woody_Allen owl:sameAs  http://rdf.freebase.com/ns/guid.9202a8c04000641f800000000004064f

Example Open Cyc Link

http://dbpedia.org/resource/Tetris owl:sameAs http://sw.opencyc.org/2008/06/10/concept/Mx4rv9-ZUpwpEbGdrcN5Y29ycA

The links are available via the DBpedia Linked Data interface and via SPARQL endpoint and can also be downloaded as single files:

DBpedia Mobile won the 2nd prize of the Semantic Web Challenge 2008

We are happy to announce that DBpedia Mobile has won the 2nd prize of the Semantic Web Challenge at the 7th International Semantic Web Conference.

DBpedia Mobile is a location-aware client for the Semantic Web that can be used on an iPhone and other mobile devices. Based on the current GPS position of a mobile device, DBpedia Mobile renders a map indicating nearby locations from the DBpedia dataset. Starting from this map, the user can explore background information about his surroundings by navigating along data links into otherWeb data sources. DBpedia Mobile has been designed for the use case of a tourist exploring a city. As the application is not restricted to a xed set of data sources but can retrieve and display data from arbitrary Web data sources, DBpedia Mobile can also be employed within other use cases, including ones unforeseen by its developers. Besides accessing Web data, DBpedia Mobile also enables users to publish their current location, pictures and reviews to the Semantic Web so that they can be used by other Semantic Web applications. Instead of simply being tagged with geographical coordinates, published content is interlinked with a nearby DBpedia resource and thus contributes to the overall richness of the Geospatial Semantic Web.

For more information about DBpedia Mobile please refer to:

DBpedia Mobile released.

Freie Universität Berlin has released DBpedia Mobile.  Based on the current GPS position of a mobile device, DBpedia Mobile renders a map containing information about nearby locations from the DBpedia dataset (currently around 300,000 locations). DBpedia Mobile uses the Marbles Linked Data Browser to render Fresnel-based views for selected resources, as well as its SPARQL capabilities to build the map view. Starting from the map, users can explore background information about locations and can navigate into DBpedia and other interlinked datasets such as GeoNames, Revyu, EuroStat and Flickr.

More information about DBpedia Mobile is found on the project website.

DBpedia-Cyc linkage

The commonsense knowledge base Cyc or OpenCyc (when compared to DBpedia) seems to follow a rather top-down approach – first more abstract concepts and entities are represented and later Cyc started to include also more domain knowledge. This seems to be reasonable, since domain knowledge changes faster and there is much more of it. On the other hand, domain knowledge is usually, what people need to solve real problems within their domains. DBpedia contains primarily domain knowledge, hence a combination of both – Cyc and DBpedia – could really be a winning team.

Together with the committed OpenCyc community we produced a first DBpedia-Cyc linkage, which is now available as a DBpedia dataset from the downloads section. The dataset will soon also be loaded into the DBpedia SPARQL endpoint and made available as linked data. More information about the linkage can be found at: http://wiki.dbpedia.org/OpenCyc