In cooperation with Thomas Riechert (HTWK/InfAI), the DBpedia Association organized our second DBpedia meetup this year, this time in Lyon. On July 3rd, 2018, we met the French DBpedia Community at the ENS in person and presented the vision of the new DBpedia Databus, an opportunity which simplifies the work with data.
First and foremost, we would like to thank the Institute for Applied Informatics for supporting our community and the LARHRA Laboratory as well as the ENS for hosting our community meetup. Special thanks go to Thomas Riechert and Vincent Alamercery (LARHRA Lyon) for organizing the event.
In the following months, Elmahdi plans to work on the DBpedia historic live version and the DBpedia wiki commons. His research will be presented during our 12th DBpedia Community meeting on September 10th, in Vienna.
Following Elmahdi, Francesco Beretta presented LARHRA laboratory and its different research areas. In particular, he introduced the Data for History Consortium which is an international consortium founded in 2017 with the aim of improving geo-historical data interoperability in the semantic web.
The afternoon track started out with an inspiring presentation by Adam Sanchez from the University of Grenoble. He talked about ‘RDFization of a relational database from medicine domain using Ontop’ (slides) and introduced the Ontop mappings. Afterwards, Oscar Rodríguez Rocha (University of Côte d’Azur) showcased the application ‘Automatic Generation Educational Quizzes’ from DBpedia (slides) and explained how the automatic generation of quizzes works based on the game Les Incollables.
The meeting concluded with a dynamic discussion on the DBpedia Databus and potential collaborations between the DBpedia Association and the French DBpedia Chapter.
All slides and presentations are available on our Website. You can find more feedback and photos about the event on Twitter via #DBpediaLyon.
You still can’t get enough of DBpedia?
Don’t worry, we already have another meeting of the DBpedia community in the pipeline. Our 12th DBpedia Community meeting is scheduled for September 10th and preparations on the program are already in full swing. Our DBpedia Day will kick-off this year’s edition of SEMANTiCS 2018, hosted at TU Vienna and brings the European DBpedia community together.
You want to contribute? Please submit your proposal and be a part of our amazing program. Register here and meet us and other DBpedia enthusiasts in Vienna. We are looking forward to your contribution.
Working with data is hard and repetitive. That is why we are more than happy to announce the launch of the alpha version of our DBpedia Databus, a way that simplifies working with data.
We have studied the data network for already 10 years and we conclude that organizations with open data are struggling to work together properly. Even though they could and should collaborate, they are hindered by technical and organizational barriers. They duplicate work on the same data. On the other hand, companies selling data cannot do so in a scalable way. The consumers are left empty-handed and trapped between the choice of inferior open data or buying from a jungle-like market.
We need to rethink the incentives for linking data
We envision a hub, where everybody uploads data. In that hub, useful operations like versioning, cleaning, transformation, mapping, linking, merging, hosting are done automagically on a central communication system, the bus, and then again dispersed in a decentral network to the consumers and applications. On the Databus, data flows from data producers through the platform to the consumers (left to right), any errors or feedback flows in the opposite direction and reaches the data source to provide a continuous integration service and improves the data at the source.
The DBpedia Databus is a platform that allows exchanging, curating and accessing data between multiple stakeholders. Any data entering the bus will be versioned, cleaned, mapped, linked and its licenses and provenance tracked. Hosting in multiple formats will be provided to access the data either as dump download or as API.
Publishing data on the Databus means connecting and comparing your data to the network
If you are grinding your teeth about how to publish data on the web, you can just use the Databus to do so. Data loaded on the bus will be highly visible, available and queryable. You should think of it as a service:
Visibility guarantees, that your citations and reputation goes up.
Besides a web download, we can also provide a Linked Data interface, SPARQL-endpoint, Lookup (autocomplete) or other means of availability (like AWS or Docker images).
Any distribution we are doing will funnel feedback and collaboration opportunities your way to improve your dataset and your internal data quality.
You will receive an enriched dataset, which is connected and complemented with any other available data (see the same folder names in data and fusion folders).
How it works at the moment
Integration of data is easy with the Databus. We have been integrating and loading additional datasets alongside DBpedia for the world to query. Popular datasets are ICD10 (medical data) and organizations and persons. We are still in an initial state, but we already loaded 10 datasets (6 from DBpedia, 4 external) on the bus using these phases:
Acquisition: data is downloaded from the source and logged in.
Conversion: data is converted to N-Triples and cleaned (Syntax parsing, datatype validation, and SHACL).
Mapping: the vocabulary is mapped on the DBpedia Ontology and converted (We have been doing this for Wikipedia’s Infoboxes and Wikidata, but now we do it for other datasets as well).
Linking: Links are mainly collected from the sources, cleaned and enriched.
IDying: All entities found are given a new Databus ID for tracking.
Clustering: ID’s are merged onto clusters using one of the Databus ID’s as cluster representative.
Data Comparison: Each dataset is compared with all other datasets. We have an algorithm that decides on the best value, but the main goal here is transparency, i.e. to see which data value was chosen and how it compares to the other sources.
A main knowledge graph fused from all the sources, i.e. a transparent aggregate.
For each source, we are producing a local fused version called the “Databus Complement”. This is a major feedback mechanism for all data providers, where they can see what data they are missing, what data differs in other sources and what links are available for their IDs.
You can compare all data via a web service.
Contact us via email@example.com if you would like to have additional datasets integrated and maintained alongside DBpedia.
From your point of view
If you are selling data, the Databus provides numerous opportunities for you. You can link your offering to the open entities in the Databus. This allows consumers to discover your services better by showing it with each request.
Open data on the Databus will be a commodity. We are greatly downing the cost of understanding the data, retrieving and reformatting it. We are constantly extending ways of using the data and are willing to implement any formats and APIs you need. If you are lacking a certain kind of data, we can also scout for it and load it onto the Databus.
Is it free?
Maintaining the Databus is a lot of work and servers incurring a high cost. As a rule of thumb, we are providing everything for free that we can afford to provide for free. DBpedia was providing everything for free in the past, but this is not a healthy model, as we can neither maintain quality properly nor grow.
On the Databus everything is provided “As is” without any guarantees or warranty. Improvements can be done by the volunteer community. The DBpedia Association will provide a business interface to allow guarantees, major improvements, stable maintenance, and hosting.
Final databases are licensed under ODC-By. This covers our work on recomposition of data. Each fact is individually licensed, e.g. Wikipedia abstracts are CC-BY-SA, some are CC-BY-NC, some are copyrighted. This means that data is available for research, informational and educational purposes. We recommend to contact us for any professional use of the data (clearing) so we can guarantee that legal matters are handled correctly. Otherwise, professional use is at own risk.
The Databus data is available at http://downloads.dbpedia.org/databus/ ordered into three main folders:
Data: the data that is loaded on the Databus at the moment
Global: a folder that contains provenance data and the mappings to the new IDs
Fusion: the output of the Databus
Most notably you can find:
Provenance mapping of the new ids in global/persistence-core/cluster-iri-provenance-ntriples/<http://downloads.dbpedia.org/databus/global/persistence-core/cluster-iri-provenance-ntriples/> and global/persistence-core/global-ids-ntriples/<http://downloads.dbpedia.org/databus/global/persistence-core/global-ids-ntriples/>
The final fused version for the core: fusion/core/fused/<http://downloads.dbpedia.org/databus/fusion/core/fused/>
A detailed JSON-LD file for data comparison: fusion/core/json/<http://downloads.dbpedia.org/databus/fusion/core/json/>
Complements, i.e. the enriched Dutch DBpedia Version: fusion/core/nl.dbpedia.org/<http://downloads.dbpedia.org/databus/fusion/core/nl.dbpedia.org/>
(Note that the file and folder structure are still subject to change)
Only 8 days left to reserve your seat for our 3rd US DBpedia Community Meeting. We are happy to announce that the 11th DBpedia Meeting will be held in Cupertino, California on October 12th 2017, hosted by Apple Inc.
The meetup focuses on connecting the community interested in DBpedia and Knowledge Graphs in general, has included lightning talks by distinguished speakers (e.g. from Stanford, Google, IBM Watson, Netflix, LinkedIn, Wikimedia Foundation, Nuance, etc.). Talk topics have extended also to natural language processing, knowledge representation, information extraction, integration and retrieval, graph databases, knowledge base embeddings and machine learning.
We are looking forward to meeting again in person with the US-based DBpedia Community.
After the success of the last two community meetings in Sunnyvale and in Galway, we thought it is time to go Orange again. During the SEMANTiCS 2017 in Amsterdam, Sep 11-14, the DBpedia Community met on the 14th of September. First and foremost, we would like to thank the Institute for Applied Informatics for supporting our community and many thanks to the Meervaart Theatre and the SEMANTiCS for hosting our community meeting.
During the opening session, Chris Welty, Google Researcher, presented Even the Changes Are Changing: A New Age of Cognitive Computing. He introduced the impact and challenges of question answering & AI as well as the development of Jeopardy through technical changes. Victor de Boer from the VU University talked about Semantic Technology for Development: Semantic Web without the Web?. He demonstrated the use of semantic technology in the challenging technical environment of developing countries. Both talks illustrated the ever growing importance of semantic technology and AI each placed at opposite sites of the technology spectrum, from Raspberry PIs to High Performance Clusters.
The DBpedia Showcase Session started with an interactive interview. Sebastian Hellmann (AKSW/KILT) talked with Jan-Bart de Vreede (Kennisnet, former member of the Wikimedia Foundation) about the challenges of growing an open community and creating a more formal structure. They discussed advantages, pitfalls and what lessons can be learned from other communities such as Wikimedia. Afterwards Markus Freudenberg (AKSW/KILT) introduced the highlights of the 2016-10 DBpedia Release.
At this session, five speakers presented how to utilize DBpedia in novel and interesting ways. Including:
Virtuoso 8 and Scalable Attributed-based Access Controls (ABAC) by Patrick van Kleef (Openlink Software)
Learning to Associate DBpedia Entities like Humans by Joern Hees (DFKI) (demo)
Towards Using UnifiedViews for Executing DBpedia Data Extraction and Curation Tasks by Tomas Knap (Semantic Web Company)
Sustainable Linked Data Generation: The Case of DBpedia by Wouter Maroy (imec)
As a regular part of the DBpedia Community Meeting, we had two parallel sessions in the afternoon where DBpedia newbies can learn about what DBpedia is and how to use the DBpedia datasets. Participants who wanted to learn DBpedia basics joined the tutorial session by Markus Freudenberg (DBpedia Release Manager). The DBpedia Association Hour provided a platform for the community to discuss the results of the DBpedia Strategy Survey 2017. This survey was prepared by Sören Auer and the DBpedia Board members to get to know what the DBpedia Community thinks about DBpedia’s strategic priorities and how the funds of the DBpedia Association should be spent. Even if 45 minutes were not adequate to review all survey questions, this session proved to be beneficial due to a really agile and dynamic discussion. A better cooperation and communication between the Association and the different national and language chapter is only one suitable key which was embraced by the community to facilitate problem solving and DBpedia’s organization.
The sessions in the afternoon highlighted two important fields of research and development, namely DBpedia Ontology and DBpedia & NLP. At the DBpedia Ontology Session, Gustavo Publio (AKSW/KILT) presented data quality issues in DBpedia and highlighted the challenges on redesign the DBpedia Ontology (slides). Wouter Maroy (imec) and Ismael Rodríguez (Polytechnic University of Catalonia) showcased the DBpedia Mappings Front-End Administration, which they created during this year’s Google Summer of Code project. If you are interested in career opportunities at DBpedia, check out Wouter’s success story here.
At the same time, Milan Dojchinovski (AKSW/KILT) chaired the DBpedia & NLP session with five very interesting talks. In the following you will find all presentations given during this session:
DBpedia Spotlight 1.0 – A new Release by Sandro Coelho (DBpedia Association) slides
TextExt winner – Lector: RDF Triples Extraction from Wikipedia Text by Matteo Cannaviccio, Roma Tre University slides
Chaudron: Extending DBpedia with Measures by Julien Subercaze, Télécom Saint-Etienne & Université Jean Monnet slides
Dutch DBpedia Hour & Joint Workshop
Enno Meijers (National Library of the Netherlands) chaired the Dutch DBpedia Hour. In this open session members of the Dutch DBpedia Language Chapter discussed tasks and responsibilities for sustaining and developing the Dutch DBpedia as well as communication, technical infrastructure and content improvement of the DBpedia Dutch Language Chapter. The reference for this discussion was the tasks and responsibilities stated in the Memorandum of Understanding signed by Huygens ING, Koninklijke Bibliotheek, Vrije Universiteit Amsterdam, iMec and Beeld en Geluid. Outcome of this session was an agreement on the approach for creating an operational plan.
Simultaneously, DBpedia joint a session with the Workshop “Linked Data Quality Assessment and Improvement from Academia to Industry”. The presentations are available below:
In the closing session, Sebastian Hellmann (AKSW/KILT) announced a new collaboration to strengthen the DBpedia NLP Department. Via videostream we talked with Mike Tung and Filipe Mesquita from diffbot, about NLP and the relation extraction from Wikipedia articles. If you are interested in the new collaboration, please check diffbot’s slides here.
All slides and presentations are also available on our Website and you will find more feedback and photos about the event on Twitter via #DBpediaAmsterdam17.
We would like to thank the DBpedia Dutch language chapter, especially Enno Meijers (National Library of the Netherlands), Lieke Verhelst (Linked Data Factory, Informagic), Victor de Boer (Vrije Universiteit Amsterdam), Roland Cornelissen (metamatter), Gerald Wildenbeest (Saxion), Gerard Kuys (Ordina), Maarten Brinkerink (The Netherlands Institute for Sound and Vision) as well as Julia Holze (DBpedia Association), Dimitris Kontokostas (DBpedia Chapter Coordinator) and Sebastian Hellmann (AKSW/KILT, DBpedia Association) for devoting their time to curating the program and organizing the meeting.
Special thanks go to Katharina Weissenberg and Anna Keil for supporting the meeting by taking pictures of the community and the event.
We are now looking forward to the 11th DBpedia Community Meeting which will be held on 12th of October 2017 in Cupertino, California. Visit our event page for further updates.
We are happy to announce that the 10th DBpedia Community Meeting will be held in Amsterdam, Netherlands. During the SEMANTiCS 2017, Sep 11-14, the DBpedia Community will get together on the 14th of September for the DBpedia Day.
What cool things do you do with DBpedia? Present your tools and datasets at the DBpedia Community Meeting. Please submit your proposal in our form.
– Where: Meervaart Theatre, Meer en Vaart 300, 1068 LE Amsterdam, Netherlands
– Call for Contribution: Please submit your proposal in our form.
– Attending the DBpedia Community Meeting costs €40 (excl. registration fee and VAT). DBpedia members get free admission, please contact your nearest DBpedia chapter or the DBpedia Association for a promotion code.
If you can’t stand it till the end of the SEMANTiCS, you can already participate in the workshop “Two worlds, one goal: A Reliable Linked Data ecosystem for media” held by DBpedia and Wolters Kluwer on the 11th of September. This half-day workshop aims at exploring major topics for publishers and libraries from DBpedia’s and Wolters Kluwer’s perspective. Therefore, both communities will dive into core areas like Interlinking, Metadata and Data Quality and address challenges such as fundamental requirements when publishing data on the web. Did we spark your interest? Check our detailed program here and get your ticket today.
After our 2nd Community Meeting in the US, we delighted the Irish DBpedia Community with the 9th DBpedia Community Meeting, which was co-located with the Language, Data and Knowledge Conference 2017 in Galway at the premises of the NUI Galway.
First and foremost, we would like to thank John McCrae (Insight Centre for Data Analytics, NUI Galway) and the LDK Conference for co-hosting and support the event.
The focus of this Community Meeting was the Irish DBpedia and Linked Data Community in Ireland. Therefore we invited local data scientists as well as European DBpedia enthusiasts to discuss the state of Irish Linked Data.
The meeting started with two compelling keynotes by Brian Ó Raghallaigh, Dublin City University and Logainm.ie, and Sharon Flynn, NUI Galway and Wikimedia Ireland. Brian presented Logainm.ie, a data use case about placenames in Ireland with a special focus on linked Logainm and machine-readable data.
His insightful presentation was followed by Sharon Flynn talking about Wikimedia in Ireland and the challenges of “this monumental undertaking” with particular reference to the Wikimedia Community in Ireland.
For more details on the content of the presentations, follow the links to the slides.
As a regular part of the DBpedia Community Meeting we have two parallel sessions in the afternoon where DBpedia newbies can learn about what DBpedia is and how to use the DBpedia data sets.
Participants who wanted to learn DBpedia basics joined the DBpedia Tutorial Session byMarkus Freudenberg (DBpedia Release Manager). The DBpedia Association Hour provided a platform for the community to discuss and give feedback.
Additionally, Sebastian Hellmann and Julia Holze, members of the DBpedia Association, updated the participants about the growing number of the DBpedia Association members, the formalized DBpedia language chapters, the established DBpedia Community Committee and they informed about technical developments such as the DBpedia API.
Ontology Engineering and Software Alignment in the ALIGNED Project
The afternoon session started with the DBpedia 2016-10 release update by Markus Freudenberg (DBpedia Release Manager). Following this, Kevin Chekov Feeney, (Trinity College Dublin) presented the software alignment in the ALIGNED project. He talked about “Generating correct-by-construction semantic datasets from unstructured, semi-structured and badly structured data sources”.
At this point, we also like to thank the ALIGNED project for the development ofDBpedia as a project use case and for covering parts of the travel cost.
Session about Irish Linked Data Projects
Chaired by Rob Brennan and Bianca Pereira, the speakers in the last session presented new Irish Linked Data Projects, for example GeoHive, BIOOPENER and the TCD Open Linked Data Engagement Fund Project. The following panel session gave DBpedia and Linked Data enthusiasts a platform for exchange and discussion. Outcome of this session was the creation of a roadmap for the Irish Linked Data with all participants.
Following, you find a list of all presentations of this session:
Closing this session John McCrae announced that the next edition of the Language, Data and Knowledge (LDK) Conference is scheduled for 2019 in Germany. We at the DBpedia Association are now looking forward to welcome the LDK Community in Leipzig!
Social Evening Event
The Community Meeting slowly came to an end with our social evening event, which was held at the PorterShed in Galway. The evening session revolved around the topic How to exploit data commercially? and featured two short impulse talks. Paul Buitelaar started the session by presenting “Kibi”, which is an Open Source platform for Data Intelligence based on the search engine Elasticsearch. Finally, Sebastian Hellmann talked about “Improving the Utility of DBpedia by co-designing a public and commercial DBpedia API” (slides).
Summing up, the 9th DBpedia Community Meeting brought together more than 45 DBpedia enthusiasts from Ireland and Europe who engaged in vital discussions about Linked Data, DBpedia use cases and services.
Special thanks go to LDK 2017 for hosting the meeting.
Thanks Ireland and hello Amsterdam!
We are looking forward to the next DBpedia Community Meeting which will be held in Amsterdam, Netherlands. Co-located with the SEMANTiCS17, the Community will get together on the 14th of September on the DBpedia Day.
Sören Auer and the DBpedia Board members prepared a survey to assess the direction of the DBpedia Association. We wanted to know what the DBpedia Community thinks about DBpedia’s strategic priorities and how the funds of the DBpedia Association are be spent. Between February 2017 and April 2017, a total of 40 members of the DBpedia Community actively participated in the survey and voted as follows:
1. What should be the priorities of the DBpedia Association in the next year?
To overview the various priorities which were mentioned, the following digest illustrates the answers in four different groups. The most frequent answer was: to increase the data quality, followed by the enlargement of the DBpedia Community through broader dissemination.
2. What should be the priorities of the DBpedia Association in the next three years?
In contrast to question one, this one is based on the priorities the DBpedia Association focuses on during the next three years. As well as in the previous overview, the specified priorities are divided into four categories.
3. What is your main interest in DBpedia?
The chart above depicts the several main interests in DBpedia. The majority of participants have an “academic & professional” (45.7%) interest in DBpedia, followed by “professional” (28.6%) and “academic” (20.0%) interests. Only 2.9% of the answers are student-related interests.
4. How should the funds of the association be used?
With respects to “How should the funds of the association be used?”, most attendees chose “service provisioning”. The “development of new DBpedia features” was the second most popular choice. Nevertheless, also “Community building” and “release production” scored many votes.
5. How should the DBpedia Association collaborate with national/language chapters?
Agreeing on strategic goals; making sure that national contributions can be spread to other chapters, thus increasing the overall usability of DBpedia; keeping track of good practices
Facilitating grassroots initiatives – so mainly promote and stimulate national/language initiatives
Local events related to DBpedia tasks
Regular events to share ideas and data
Join other languages members onto DBpedia
As an umbrella organization: support, mediation, and representation
Regular exchange and involvement
Consult, try to figure out common priorities
6. Should DBpedia open itself to contain and curate more data not directly extracted from Wikipedia?As the chart above clearly depicts, more than half of the participants are in favor of DBpedia comprising datasets not directly derived or extracted from Wikipedia. In contrast, 34.3% have the oppositional opinion and appreciate DBpedia focussing solely on data extraction from Wikipedia.
If yes, which other datasets should DBpedia prioritize for fusion to improve its coverage and quality?
7. Which of the following features do you consider most important?
The following diagram gives a review of particular features and their importance from the participants point of view. As the result of question one reveals, data quality is considered the most important issue by the survey participants (23.7%). The second most important features, with 17.2% each, are: the provision of datasets extracted from the Wikipedia article text, substantial collaboration/integration with WikiData and a provision of better search, respectively an exploration of user interfaces.
8. Any other question, feedback, opinion, ideas or suggestion you would like to send to the association.
Increased support of non-RDF publication formats is probably wise as an insurance policy that DBpedia will stay relevant.
In users mailing-list being more open-minded in an easy manner and always signalling provocative postings are welcome. And I fear it is a bit late for this survey, but better late than never, my greetings to all making some thoughts about this stuff.
DBpedia Spotlight should return Wikidata URIs by default, for stability
Use a richer ontology without contradictions, e.g. Book-Physical vs. Book-Conceptual Work
Thank you for your input and your participation! Your priorities and opinions are of vital importance for the success of DBpedia in the future. We will discuss the implementation of your answers during our next DBpedia Board Meetings in order to find a reasonable strategic direction of the DBpedia Association for the next years.
We are happy to announce that the 9th DBpedia Community meeting will be held in Galway, Ireland on June 21st 2017. DBpedia will be part of the Language, Data and Knowledge conference (LDK) in Galway. This new biennial conference series aims at bringing together researchers from across disciplines. The DBpedia Meeting is part of the conference and is scheduled for the last day.
Only few seats are left: So come and get your ticket to be part of the 9th DBpedia Community meeting in Galway.
Keynote #1: Logainm.ie data use cases by Brian Ó Raghallaigh (Dublin City University & Logainm)
Keynote #2: Wikimedia in Ireland: A Monumental Undertaking by Sharon Flynn (NUI Galway & Wikimedia Ireland)
DBpedia Association hour
A session about Irish Linked data projects (and DBpedia)
The social event will be held in the evening (starting at 6pm) at the PorterShed around the topic How to exploit data commercially? featuring several short impulse talks. We still have some remaining slots and would welcome you to present your success stories as well as use cases, but also tell us about your problems regarding the commercialisation of data. If you are interested in presenting, please email firstname.lastname@example.org.
Do you want to stay informed about upcoming DBpedia events, releases and technical developments? Through the DBpedia newsletter you get the possibility to be always up to date and to provide feedback to us.
Four times per year we will inform the DBpedia community about meetings, new collaborations and other topics related to DBpedia. So make sure to subscribe to our NEWSLETTER and do not miss any news.
DBpedia will participate for a fifth time in the Google Summer of Code program (GSoC) and now we are looking for students who will share their ideas with us. We are regularly growing our community through GSoC and can deliver more and more opportunities to you. We got excited with our new ideas, we hope you will get excited too!
What is GSoC?
Google Summer of Code is a global program focused on bringing more student developers into open source software development. Funds will given to students (BSc, MSc, PhD) to work for three months on a specific task. At first open source organizations announce their student projects and then students should contact the mentor organizations they want to work with and write up a project proposal for the summer. After a selection phase, students are matched with a specific project and a set of mentors to work on the project during the summer.