Tag Archives: DataBus

Grüezi Community!

More than 110 DBpedia enthusiasts joined the Community Meeting in Vienna.

After the success of the last two community meetings in Amsterdam and Leipzig, we thought it is time to meet you at the SEMANTiCS conference again. This year’s SEMANTiCS opened with the DBpedia Day on September 10th, 2018 in Vienna.

First and foremost, we would like to thank the Institute for Applied Informatics for supporting our community and many thanks to the Technical University Vienna and the SEMANTiCS for hosting our community meeting.

Opening Session

Javier Fernández

Javier David Fernández García, Vienna University of Economics, opened the meeting with his keynote Linked Open Data cloud – act now before it’s too late. He reflected on challenges towards arriving at a truly machine-readable and decentralized Web of Data. Javier reviewed the current state of affairs, highlighted key technical and non-technical challenges, and outlined potential solution strategies.

The second keynote speaker was Mathieu d’Aquin, Professor of Informatics at the Insight Centre for Data Analytics at NUI Galway. Mathieu, who is specialized in data analytics, completed the meeting with his keynote Dealing with Open Domain Data.

Mathieu d’Aquin

 

Showcase Session

Beyza Yaman

Patrik Schneider started the DBpedia Showcase Session with his presentation of the “NII (Japan) Research Showcase – A Knowledge Graph Management Framework for DBpedia”. Shortly after, Jan Forberg, from AKSW/KILT Leipzig, promoted the usage of WebIDs in a short how-to tutorial session. Adam Sanchez, from University Grenoble Alpes, talked about RDFization of a relational database from medicine domain by using Ontop. Followed by another presentation by Beyza Yaman, University of Genoa, talking about Exploiting Context-Dependent Quality Metadata for Linked Data Source Selection. Afterwards, Robert Bielinski, from AKSW/KILT Leipzig, introduced the new DBpedia release circle by using Apache Spark. Closing the Showcase Session, Tomas Kliegr, University of Economics Prague, presented a showcase using DBpedia to study cognitive biases affecting interpretation of machine learning results.

 

For further details of the presentations follow the links to the slides.

  • WebID Creation by Jan Forberg, AKSW/KILT slides
  • RDFization by Adam Sanchez, Université Grenoble Alpes slides
  • Exploiting Context-Dependent Quality Metadata by Beyza Yaman, University of Genoa slides
  • Extracting Data using Apache Spark by Robert Bielinski, AKSW/KILT slides
  • Using DBpedia to study cognitive biases affecting interpretation of machine learning results by Tomas Kliegr, University of Economics Prague slides

Parallel Session

Gary Munnelly

As a regular part of the DBpedia Community Meeting, we had two parallel sessions in the afternoon where DBpedians can discuss technical issues. Participants interested in NLP-related topics joined the NLP & DBpedia session. Milan Dojchinovski (AKSW/KILT) chaired this session with four very stimulating talks. Hereafter you will find all presentations given during this session:

 

Diego Moussallem

At the same time, the DBpedia Association Hour provided a platform for the community to discuss technical questions and especially the DBpedia databus. Sebastian Hellmann presented the DBpedia databus and explained the advantages of global IDs. Shortly after, Marvin Hofer (AKSW/KILT) demonstrated the new DBpedia global ID webinterface. Please find his slides here.

 

Afternoon Track

Enno Meijers

The 12th edition of the DBpedia Community Meeting also covered a special chapter session, chaired by Enno Meijers, from the Dutch DBpedia Language Chapter. The speakers presented the latest technical or organizational developments of their respective chapter.

Following, you find a list of all presentations of this session:

 

This session has mainly created an exchange platform for the different DBpedia chapters. For the first time, representatives of the European chapters discussed problems and challenges of DBpedia from their point of view. Furthermore, tools, applications and projects were presented by each chapter.

Jens Grivolla

Summing up, the 12th DBpedia Community Meeting brought together more than 110 DBpedia enthusiasts from Europe who engaged in vital discussions about Linked Data, the DBpedia databus as well as DBpedia use cases and services.

 

In case you missed the event, all slides and presentations are also available on our Website. Further insights, feedback and photos about the event are available on Twitter via #DBpediaDay.

We are now looking forward to more DBpedia meetings in the next years. So, stay tuned and check Twitter, Facebook and the Website or subscribe to our Newsletter for latest news and updates.

Yours,

DBpedia Association

Beta-Test Updates

While everyone at the DBpedia Association was preparing for the SEMANTiCS Conference in Vienna, we also managed to reach an important milestone regarding the beta-test for our data release tool.

First and foremost, already 3500 files have been published with the plugin. These files will be part of the new DBpedia release and are available on our LTS repository.

Secondly, the documentation of the testing has been brought into good shape. Feel free to drop by and check it out.
Thirdly, we reached our first interoperability goal. The current metadata is sufficient to produce RSS 1.0 feeds. See here for further information. We also defined a loose roadmap on top of the readme, where interoperability to DCAT and DCAT-AP has high priority.

 

Now we have some time to support you and work one on one and also prepare the configurations to help you set up the data releases. Lastly, we already received data from DNB and SUMO, so we will start to look into these more closely.

Thanks to all the beta-testers for your nice work.

We keep you posted.

Yours

DBpedia Association

Beta-tests for the DBpedia Databus commence

Finally, we are proud to announce that the beta-testing of our data release tool for data releases on the DBpedia Databus is about to start.

In the past weeks our developers at DBpedia  have been devloping a new data release tool to release datasets on the DBpedia Databus. In that context we are still looking for beta-testers who have a dataset they wish to release.  Sign up here and benefit from an increased visibility for your dataset and your work done.

We are now preparing the first internal test with our own dataset to ensure the data release tool is ready for the testers. During the testing process, beta-testers will discuss occuring problems, challenges and ideas for improvement via the DBpedia #releases channel on Slack to profit from each other’s knowledge and skills. Issues are documented via GitHub.

The whole testing process for the data release tool follows a 4-milestones plan:

Milestone One: Every tester needs to have a WebID to release data on the DBpedia Databus. In case you are interested in how to set up a WebID, our tutorial will help you a great deal.

Milestone Two: For their datasets, testers will generate DataIDs, that provide  detailed descriptions of the datasets and their different manifestations as well as relations to agents like persons or organizations, in regard to their rights and responsibilities.

Milestone Three: This milestone is considered as achieved, if an RSS feed feature can be genreated. Additionally, bugs, that arose during the previous phases should have been fixed. We also want to collect the testers particular demands and wishes that would benefit the tool or the process. A second release can be attempted to check how integrated fixes and changes work out.

Milestone Four: This milestone marks the final upload of the dataset to the DBedia Databus which is hopefully possible in about 3 weeks.

 

In case you want to get one of the last spots in the beta-testing team, just sign up here and get yourself a WebID and start testing.

Looking forward to working with you…

Yours,

DBpedia Association

The DBpedia Databus – transforming Linked Data into a networked data economy

Working with data is hard and repetitive. That is why we are more than happy to announce the launch of the alpha version of our DBpedia Databus, a way that simplifies working with data. 

We have studied the data network for already 10 years and we conclude that organizations with open data are struggling to work together properly. Even though they could and should collaborate, they are hindered by technical and organizational barriers. They duplicate work on the same data. On the other hand, companies selling data cannot do so in a scalable way. The consumers are left empty-handed and trapped between the choice of inferior open data or buying from a jungle-like market. 

We need to rethink the incentives for linking data

Vision

We envision a hub, where everybody uploads data. In that hub, useful operations like versioning, cleaning, transformation, mapping, linking, merging, hosting are done automagically on a central communication system, the bus, and then again dispersed in a decentral network to the consumers and applications.  On the Databus, data flows from data producers through the platform to the consumers (left to right), any errors or feedback flows in the opposite direction and reaches the data source to provide a continuous integration service and improves the data at the source.

The DBpedia Databus is a platform that allows exchanging, curating and accessing data between multiple stakeholders. Any data entering the bus will be versioned, cleaned, mapped, linked and its licenses and provenance tracked. Hosting in multiple formats will be provided to access the data either as dump download or as API.

Publishing data on the Databus means connecting and comparing your data to the network

If you are grinding your teeth about how to publish data on the web, you can just use the Databus to do so. Data loaded on the bus will be highly visible, available and queryable. You should think of it as a service:

  • Visibility guarantees, that your citations and reputation goes up.
  • Besides a web download, we can also provide a Linked Data interface, SPARQL-endpoint, Lookup (autocomplete) or other means of availability (like AWS or Docker images).
  • Any distribution we are doing will funnel feedback and collaboration opportunities your way to improve your dataset and your internal data quality.
  • You will receive an enriched dataset, which is connected and complemented with any other available data (see the same folder names in data and fusion folders).

 How it works at the moment

Integration of data is easy with the Databus. We have been integrating and loading additional datasets alongside DBpedia for the world to query. Popular datasets are ICD10 (medical data) and organizations and persons. We are still in an initial state, but we already loaded 10 datasets (6 from DBpedia, 4 external) on the bus using these phases:

  1.  Acquisition: data is downloaded from the source and logged in.
  2. Conversion: data is converted to N-Triples and cleaned (Syntax parsing, datatype validation, and SHACL).
  3. Mapping: the vocabulary is mapped on the DBpedia Ontology and converted (We have been doing this for Wikipedia’s Infoboxes and Wikidata, but now we do it for other datasets as well).
  4. Linking: Links are mainly collected from the sources, cleaned and enriched.
  5. IDying: All entities found are given a new Databus ID for tracking.
  6.  Clustering: ID’s are merged onto clusters using one of the Databus ID’s as cluster representative.
  7. Data Comparison: Each dataset is compared with all other datasets. We have an algorithm that decides on the best value, but the main goal here is transparency, i.e. to see which data value was chosen and how it compares to the other sources.
  8. A main knowledge graph fused from all the sources, i.e. a transparent aggregate.
  9. For each source, we are producing a local fused version called the “Databus Complement”. This is a major feedback mechanism for all data providers, where they can see what data they are missing, what data differs in other sources and what links are available for their IDs.
  10. You can compare all data via a web service.

Contact us via dbpedia@infai.org if you would like to have additional datasets integrated and maintained alongside DBpedia.

From your point of view

Data Sellers

If you are selling data, the Databus provides numerous opportunities for you. You can link your offering to the open entities in the Databus. This allows consumers to discover your services better by showing it with each request.

Data Consumers

Open data on the Databus will be a commodity. We are greatly downing the cost of understanding the data, retrieving and reformatting it. We are constantly extending ways of using the data and are willing to implement any formats and APIs you need. If you are lacking a certain kind of data, we can also scout for it and load it onto the Databus.

Is it free?

Maintaining the Databus is a lot of work and servers incurring a high cost. As a rule of thumb, we are providing everything for free that we can afford to provide for free. DBpedia was providing everything for free in the past, but this is not a healthy model, as we can neither maintain quality properly nor grow.

On the Databus everything is provided “As is” without any guarantees or warranty. Improvements can be done by the volunteer community. The DBpedia Association will provide a business interface to allow guarantees, major improvements, stable maintenance, and hosting.

License

Final databases are licensed under ODC-By. This covers our work on recomposition of data. Each fact is individually licensed, e.g. Wikipedia abstracts are CC-BY-SA, some are CC-BY-NC, some are copyrighted. This means that data is available for research, informational and educational purposes. We recommend to contact us for any professional use of the data (clearing) so we can guarantee that legal matters are handled correctly. Otherwise, professional use is at own risk.

Current Statistics

The Databus data is available at http://downloads.dbpedia.org/databus/ ordered into three main folders:

  • Data: the data that is loaded on the Databus at the moment
  • Global: a folder that contains provenance data and the mappings to the new IDs
  • Fusion: the output of the Databus

Most notably you can find:

  • Provenance mapping of the new ids in global/persistence-core/cluster-iri-provenance-ntriples/<http://downloads.dbpedia.org/databus/global/persistence-core/cluster-iri-provenance-ntriples/> and global/persistence-core/global-ids-ntriples/<http://downloads.dbpedia.org/databus/global/persistence-core/global-ids-ntriples/>
  • The final fused version for the core: fusion/core/fused/<http://downloads.dbpedia.org/databus/fusion/core/fused/>
  • A detailed JSON-LD file for data comparison: fusion/core/json/<http://downloads.dbpedia.org/databus/fusion/core/json/>
  • Complements, i.e. the enriched Dutch DBpedia Version: fusion/core/nl.dbpedia.org/<http://downloads.dbpedia.org/databus/fusion/core/nl.dbpedia.org/>

(Note that the file and folder structure are still subject to change)

Sources

 

Upcoming Developments

Data market
  • build your own data inventory and merchandise your data via Linked Data or via secure named graphs in the DBpedia SPARQL Endpoint (WebID + TLS + OpenLink’s  Virtuoso database)
DBpedia Marketplace
  • Offer your Linked Data tools, services, products
  • Incubate new research into products
  • Example: Support for RDFUnit (https://github.com/AKSW/RDFUnit created by the SHACL editor), assistance with SHACL writing and deployment of the open-source software

 

DBpedia and the Databus will transform Linked Data into a networked data economy

 

For any questions or inquiries related to the new DBpedia Databus, please contact us via dbpedia@infai.org

 

Yours,

DBpedia Association