Monthly Archives: November 2009

German government proclaims Faceted Wikipedia/DBpedia Search one of the 365 most innovative ideas in Germany

The German federal government has proclaimed Faceted Wikipedia Search as one of the 365 most innovative ideas in Germany in the context of the Deutschland – Land der Ideen competition. The competition showcases innovative ideas in areas such as science and technology, business, education, art and ecology. The patron of the competition is the German President Horst Köhler.

Faceted Wikipedia/DBpedia Search allows users to ask complex queries, like “Which Rivers flow into the Rhine and are longer than 50 kilometers?” or “Which Skyscrapers in China have more than 50 floors and have been constructed before the year 2000?” against Wikipedia. The answers to these queries are not generated using key word matching as the answers of search engines like Google or Yahoo, but are generated based on structured information that has been extracted from many different Wikipedia articles. Faceted Wikipedia/DBpedia Search allows users to query Wikipedia like a structured database and thus enables them to truly exploit Wikipedia’s collective intelligence.

Faceted Wikipedia/Dbpedia Search can be tested online at http://dbpedia.neofonie.de/browse/

Please click on the example queries below to see Faceted Wikipedia Search in action:

Faceted Wikipedia/DBpedia Search has been jointly developed by neofonie GmbH, Berlin and the Web-based Systems Group at Freie Universität Berlin. Technically, Faceted Wikipedia/DBpedia Search is based on the DBpedia data extraction framework and neofonie search technology.

 

The DBpedia data extraction framework extracts structured data from Wikipedia, such as the content of infoboxes which summarize relevant facts as a table on the top right-hand side of Wikipedia articles. The extracted data is represented using the Resource Description Framework, a data model for web-based systems. Currently, the framework extracts around 190 million facts from the English editon of Wikipedia and 289 million facts from Wikipedia editions in 90 further languages. The DBpedia data extraction framework is developed by the Web-based Systems group at Freie Universität Berlin and the Agile Knowledge Engineering and Semantic Web group at Universität Leizpig.

 

The neofonie search engine, neofonie search, is employed to execute complex queries over the extracted data. neofonie search aggregates RDF data from DBpedia with full-text data from Wikipedia. The aggregated data is then divided into hierarchical facets, composed of 200 types with 2.9 million values. In addition to providing the search technology and processing power, neofonie is also responsible for the hosting of the Faceted Wikipedia/DBpedia Search on the Amazon Elastic Compute Cloud (Amazon EC2).

As DBpedia covers a wide range of domains and has a high degree of conceptual overlap with various other open-license datasets, an increasing number of data publishers have started to set data-level links from their data sources to DBpedia, making DBpedia one of the cristalization points of the emerging Web of Linked Data. In the future, the links between databases will allow applications like Faceted Wikipedia Search to answer queries based not only on Wikipedia knowledge but based on knowledge from a world wide web of databases.

Faceted Wikipedia Search will be presented as part of the Land der Ideen series on April 12th, 2010 at neofonie, Berlin.

Additional information about the Land der Ideen competition, DBpedia, neofonie and the Web of Data is found at:

DBpedia 3.4 released

We are happy to announce the release of DBpedia 3.4. The new release is based on Wikipedia dumps dating from September 2009.

The new DBpedia data set describes more than 2.9 million things, including 282,000 persons, 339,000 places, 88,000 music albums, 44,000 films, 15,000 video games, 119,000 organizations, 130,000 species and 4400 diseases. The DBpedia data set now features labels and abstracts for these things in 91 different languages; 807,000 links to images and 3,840,000 links to external web pages; 4,878,100 external links into other RDF datasets, 415,000 Wikipedia categories, and 75,000 YAGO categories. The data set consists of 479 million pieces of information (RDF triples) out of which 190 million were extracted from the English edition of Wikipedia and 289 million were extracted from other language editions.

The new release provides the following improvements and changes compared to the DBpedia 3.3 release:

  1. the data set has been extracted from more recent Wikipedia dumps.
  2. the data set now provides labels, abstracts and infobox data in 91 different languages.
  3. we provide two different version of the DBpedia Infobox Ontology (loose and strict) in order to meet different application requirements. Please refer to http://wiki.dbpedia.org/Datasets#h18-11 for details.
  4. as Wikipedia has moved to dual-licensing, we also dual-license DBpedia. The DBpedia 3.4 data set is licensed under the terms of the Creative Commons Attribution-ShareAlike 3.0 license and the GNU Free Documentation License.
  5. the mapping-based infobox data extractor has been improved and now normalizes units of measurement.
  6. various bug fixes and improvements throughout the code base. Please refer to the change log for the complete list http://wiki.dbpedia.org/Changelog

You can download the new DBpedia dataset from http://wiki.dbpedia.org/Downloads34. As usual, the dataset is also available as Linked Data and via the DBpedia SPARQL endpoint at http://dbpedia.org/sparql.

Lots of thanks to

  • Anja Jentzsch, Christopher Sahnwaldt, Robert Isele, and Paul Kreis (all Freie Universität Berlin) for improving the DBpedia extraction framework and for extracting the new data set.
  • Jens Lehmann and Sören Auer (Universität Leipzig) for providing new data set via the DBpedia download server at Universität Leipzig.
  • Kingsley Idehen and Mitko Iliev for loading the new data set into the Virtuoso instance that serves the Linked Data view and SPARQL endpoint. OpenLink Software (http://www.openlinksw.com/) altogether for providing the server infrastructure for DBpedia.
  • Neofonie GmbH, Berlin, (http://www.neofonie.de/index.jsp) for supporting the DBpedia project by paying Christopher Sahnwaldt.

The next steps for the DBpedia project will be to

  1. synchronize Wikipedia and DBpedia by deploying the DBpedia live extraction which updates the DBpedia knowledge base immediately when a Wikipedia article changes.
  2. enable the DBpedia user community to edit and maintain the DBpedia ontology and the infobox mappings that are used by the extraction framework in a public Wiki.
  3. increase the quality of the extracted data by improving and fine-tuning the extraction code.

All this will hopefully happen soon.

Have fun with the new data set!

Cheers

Chris Bizer