Tag Archives: DBpedia Databus

DBpedia Workshop at LDAC

More than 90 DBpedia enthusiasts joined the DBpedia Workshop colocated with LDAC2020

On June 19, 2020 we organized a DBpedia workshop co-located with the LDAC workshop series to exchange knowledge regarding new technologies and innovations in the fields of Linked Data and Semantic Web. This workshop series provides a focused overview on technical and applied research on the usage of Semantic Web, Linked Data and Web of Data technologies for the architecture and construction domains (design, engineering, construction, operation, etc.). The workshop aims at gathering researchers, industry stakeholders, and standardization bodies of the broader Linked Building Data (LBD) community.

First and foremost, we would like to thank the LDAC committee for hosting our virtual meeting and many thanks to Beyza Yaman, Milan Dojchinovski, Johannes Frey and Kris McGlinn for organizing and chairing the DBpedia workshop. 

Following, we will give you a brief retrospective about the presentations.

Opening & Keynote 

The first virtual DBpedia meeting was opened with a keynote presentation ‘{RDF} Data quality assessment – connecting the pieces’ by Dimitris Kontokostas (diffbot, US). He gave an overview on the latest developments and achievements around Data Quality. His presentation was focused on defining data quality and identification of data quality issues.  

Sebastian Hellmann gave a brief overview of DBpedia’s history. Furthermore, he presented the updated DBpedia Organisational architecture, including the vision of the new DBpedia chapters and benefits of the DBpedia membership.

Shortly after,  Milan Dojchinovski (InfAI/CTU in Prague) gave a presentation on  ‘Querying and Integrating (Architecture and Construction) Data with DBpedia’. ‘The New DBpedia Release Cycle’ was introduced by Marvin Hofer (InfAI). Closing the Showcase Session, Johannes Frey, InfAI, presented the Databus Archivo and demonstrated the downloading process with the DBpedia Databus

For further details of the presentations follow the links to the slides.

  • Keynote: {RDF} Data quality assessment – connecting the pieces, by Dimitris Kontokostas, diffbot, US (slides)
  • Overview of DBpedia Organisational Architecture, by Sebastian Hellmann, Julia Holze, Bettina Klimek, Milan Dojchinovski, INFAI / DBpedia Association (slides)
  • Querying and Integrating (Architecture and Construction) Data with DBpedia by Milan Dojchinovski, INFAI/CTU in Prague (slides)
  • The New DBpedia Release Cycle by Marvin Hofer and Milan Dojchinovski, INFAI (slides)
  • Databus Archivo and Downloading with the Databus by Johannes Frey, Fabian Goetz and Milan Dojchinovski, INFAI (slides)

Geospatial Data & DBpedia Session

After the opening session we had the Geospatial Data & DBpedia Session. Milan Dojchinovski (InfAI/CTU in Prague) chaired this session with three very stimulating talks. Hereafter you will find all presentations given during this session:

  • Linked Geospatial Data & Data Quality by Wouter Beek, Triply Ltd. (slides)
  • Contextualizing OSi’s Geospatial Data with DBpedia by Christophe Debruyne, Vrije Universiteit Brussel and ADAPT at Trinity College Dublin
  • Linked Spatial Data: Beyond The Linked Open Data Cloud by Chaidir A. Adlan, The Deutsche Gesellschaft für Internationale Zusammenarbeit GmbH (slides)

Data Quality & DBpedia Session

The first online DBpedia workshop also covered a special data quality session. Johannes Frey (InfAI) chaired this session with three very stimulating talks. Hereafter you will find all presentations given during this session:

  • SeMantic AnsweR Type prediction with DBpedia – ISWC 2020 Challenge by Nandana Mihindukulasooriya, MIT-IBM Watson AI Lab (slides)
  • RDF Doctor: A Holistic Approach for Syntax Error Detection and Correction of RDF Data by Ahmad Hemid, Fraunhofer IAIS (slides)
  • The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with SANSA by Gezim Sejdiu,  Deutsche Post DHL Group and University of Bonn (slides)
  • Closing words by the workshop organizers

In case you missed the event, all slides and presentations are also available on the DBpeda workshop website. Further insights, feedback and photos about the event are available on Twitter (#DBpediaDay hashtag).

We are now looking forward to our first DBpedia Stack tutorial, which will be held online on July 1st, 2020. Over the last year, the DBpedia core team has consolidated a great amount of technology around DBpedia. The tutorial primarily targets developers (in particular of DBpedia Chapters) that wish to learn how to replicate local infrastructure such as loading and hosting an own SPARQL endpoint. A core focus will also be the new DBpedia Stack, which contains several dockerized applications that are automatically loading data from the Databus. Attending the DBpedia Stack tutorial is free and will be organized online. Please register to be part of the meeting.

Stay tuned and check Twitter, Facebook and our Website or subscribe to our Newsletter for latest news and information.

Julia and Milan 

on behalf of the DBpedia Association

New Prototype: Databus Collection Feature

We are thrilled to announce that our Databus Collection Feature for the DBpedia Databus has been developed and is now available as a prototype. It simplifies the way to bundle your data and use it in your application.

A new Databus Collection Feature? How come, and how does it work? Read below and find out how using the DBpedia Databus becomes easier by the day and with each new tool.

Motivation

With more and more data being uploaded to the databus we started to develop test applications using that data. The SPARQL endpoint offers a central hub to access all metadata for datasets uploaded to the databus provided you know how to write SPARQL queries. The metadata includes the download links of the data files – it was, therefore, possible to pass a SPARQL query to an application, download the actual data and then use for whatever purpose the app had.

The Databus Collection Editor

The DBpedia Databus now provides an editor for collections. A collection is basically a labelled SPARQL query that is retrievable via URI. Hence, with the collection editor you can group Databus groups and artifacts into a bundle and publish your selection using your Databus account. It is now a breeze to select the data you need, share the exact selection with others and/or use it in existing or self-made applications.

If you are not familiar with SPARQL and data queries, you can think of the feature as a shopping cart for data: You create a new cart, put data in it and tell your friends or applications where to find it. Quite neat, right?

In the following section, we will cover the user interface of the collection editor.

The Editor UI

Firstly, you can find the collection editor by going to the DBpedia Databus and following the Collections link at the top or you can get there directly by clicking here.

What you will see is the following:

General Collection Info

Secondly, since you do not have any collections yet, the editor has already created an empty collection named “Unnamed” for you. At the right side next to the label and description you will find a pen icon. By clicking the icon or the label itself you can edit its content. The collection is not published yet, so the Collection URI is blank.

Whenever you are not logged in or the collection has not been published yet, the editor will also notify you that your changes are only saved in your local browser cache and NOT remotely on our server. Keep that in mind when clearing your cache. Publishing the collection however is easy: Simply log into (or create) your Databus account and hit the publish button in the action bar. This will open up a modal where you can pick your unique collection id and hit publish again. That’s it!

The Collection Info section will now show the collection URI. Following the link will take you to the HTML representation of your collection that will be visible to others. Hitting the Edit button in the action bar will bring you back to the editor.

Collection Hierarchy

Let’s have a look at the core piece of the collection editor: the hierarchy view. A collection can be a bundle of different Databus groups and artifacts but is not limited to that. If you know how to write a SPARQL query, you can easily extend your collection with more powerful selections. Therefore, the hierarchy is split into two nodes:

  • Generated Queries: Contains all queries that are generated from your selection in the UI
  • Custom Queries: Contains all custom written SPARQL queries

Both, hierarchy nodes have a “+” icon. Clicking on this button will let you add generated or custom queries respectively.

Custom Queries

If you hit the “+” icon on the Custom Queries node, a new node called “Custom Query” will appear in the hierarchy. You can remove a custom query by clicking on the trashcan icon in the hierarchy. If you click the node it will take you to a SPARQL input field where you can edit the query.

To make your collection more understandable for others, you can even document the query by adding a label and description.

Writing Your Own Custom Queries

A collection query is a SPARQL query of the form:

SELECT DISTINCT ?file WHERE {
    {
        [SUBQUERY]
    }
    UNION
    {
        [SUBQUERY]
    }
    UNION
    ...
    UNION
    {
        [SUBQUERY]
    }
}

All selections made by generated and custom queries will be joined into a single result set with a single column called “file“. Thus it is important that your custom query binds data to a variable called “file” as well.

Generated Queries

Clicking the “+” icon on the Generated Queries node will take you to a search field. Make use of the indexed search on the Databus to find and add the groups and artifacts you need. If you want to refine your search, don’t worry: you can do that in the next step!

Once the artifact or group has been added to your collection, the Add to Collection button will turn green. Once you are done you can go back to the Editor with Back to Hierarchy button.

Your hierarchy will now contain several new nodes.

Group Facets, Artifact Facets and Overrides

Group and artifacts that have been added to the collection will show up as nodes in the hierarchy. Clicking a node will open a filter where you can refine your dataset selection. Setting a filter to a group node will apply it to all artifact nodes unless you override that setting in any artifact node manually. The filter set in the group node is shown in the artifact facets in dark grey. Any overrides in the artifact facets will be highlighted in green:

Group Nodes

A group node will provide a list of filters that will be applied to all artifacts of that group:

Artifact Nodes

Artifact nodes will then actually select data files which will be visible in the faceted view. The facets are generated dynamically from the available variants declared in the metadata.

Example: Here we selected the latest version of the databus dump as n-triple. This collection is already in use: The collection URI is passed to the new generic lookup application, which then creates the search function for the databus website. If you are interested in how to configure the lookup application, you can go here: https://github.com/dbpedia/lookup-application. Additionally, there will also be another blog post about the lookup within the next few weeks

Use Cases

The DBpedia Databus Collections are useful in many ways.

  • You can share a specific dataset with your community or colleagues.
  • You can re-use dataset others created
  • You can plug collections into databus-ready applications and avoid spending time on the download and setup process
  • You can point to a specific piece of data (e.g. for testing) with a single URI in your publications
  • You can help others to create data queries more easily

We hope you enjoy the Databus Collection Feature and we would love to hear your feedback! You can leave your thoughts and suggestions in the new DBpedia Forum. Feedback of any kinds is highly appreciated since we want to improve the prototype as fast and user-driven as possible! Cheers!

A big thanks goes to DBpedia developer Jan Forberg who finalized the Databus Collection Feature and compiled this text.

Yours

DBpedia Association

One Billion derived Knowledge Graphs

… by and for Consumers until 2025

One Billion – what a mission! We are proud to announce that the DBpedia Databus website at https://databus.dbpedia.org and the SPARQL API at https://databus.dbpedia.org/(repo/sparql|yasgui) (docu) are in public beta now!

The system is usable (eat-your-own-dog-food tested) following a “working software over comprehensive documentation” approach. Due to its many components (website, SPARQL endpoints, keycloak, mods, upload client, download client, and data debugging), we estimate approximately six months in beta to fix bugs, implement all features and improve the details.

But, let’s start from the beginning

The DBpedia Databus is a platform to capture invested effort by data consumers who needed better data quality (fitness for use) in order to use the data and give improvements back to the data source and other consumers. DBpedia Databus enables anybody to build an automated DBpedia-style extraction, mapping and testing for any data they need. Databus incorporates features from DNS, Git, RSS, online forums and Maven to harness the full work power of data consumers. Vision

Our vision

Professional consumers of data worldwide have already built stable cleaning and refinement chains for all available datasets, but their efforts are invisible and not reusable. Deep, cleaned data silos exist beyond the reach of publishers and other consumers trapped locally in pipelines. Data is not oil that flows out of inflexible pipelines. Databus breaks existing pipelines into individual components that together form a decentralized, but centrally coordinated data network. In this set-up, data can flow back to previous components, the original sources, or end up being consumed by external components.

One Billion interconnected, quality-controlled Knowledge Graphs until 2025

The Databus provides a platform for re-publishing these files with very little effort (leaving file traffic as only cost factor) while offering the full benefits of built-in system features such as automated publication, structured querying, automatic ingestion, as well as pluggable automated analysis, data testing via continuous integration, and automated application deployment (software with data). The impact is highly synergistic. Just a few thousand professional consumers and research projects can expose millions of cleaned datasets, which are on par with what has long existed in deep silos and pipelines.

To a data consumer network

As we are inverting the paradigm form a publisher-centric view to a data consumer network, we will open the download valve to enable discovery and access to massive amounts of cleaner data than published by the original source. The main DBpedia Knowledge Graph alone has 600k file downloads per year complemented by downloads at over 20 chapters, e.g. http://es.dbpedia.org as well as over 8 million daily hits on the main Virtuoso endpoint.

Community extension from the alpha phase such as DBkWik, LinkedHypernyms are being loaded onto the bus and consolidated. We expect this number to reach over 100 by the end of the year. Companies and organisations who have previously uploaded their backlinks here will be able to migrate to the databus. Other datasets are cleaned and posted. In two of our research projects LOD-GEOSS and PLASS, we will re-publish open datasets, clean them and create collections, which will result in DBpedia-style knowledge graphs for energy systems and supply-chain management.

A new era for decentralized collaboration on data quality

DBpedia was established around producing a queryable knowledge graph derived from Wikipedia content that’s able to answer questions like “What have Innsbruck and Leipzig in common?” A community and consumer network quickly formed around this highly useful data, resulting in a large, well-structured, open knowledge graph that seeded the Linked Open Data Cloud — which is the largest knowledge graph on earth. The main lesson learned after these 13 years is that current data “copy” or “download” processes are inefficient by a magnitude that can only be grasped from a global perspective. Consumers spend tremendous effort fixing errors on the client-side. If one unparseable line needs 15 minutes to find and fix, we are talking about 104 days of work for 10,000 downloads. Providers – on the other hand – will never have the resources to fix the last error as cost increases exponentially (20/80 rule). 

One billion knowledge graphs in mind – the progress so far

Discarding faulty data often means that a substitute source has to be found, which is hours of research and might lead to similar problems. From the dozens of DBpedia Community meetings we held we can summarize that for each clean-up procedure, data transformation, linkset or schema mapping that a consumer creates client-side, dozens of consumers have invested the same effort client-side before him and none of it reaches the source or other consumers with the same problem. Holding the community meetings just showed us the tip of the iceberg. 

As a foundation, we implemented a mappings wiki that allowed consumers to improve data quality centrally. A next advancement was the creation of the SHACL standard by our former CTO and board member Dimitris Kontokostas. SHACL allows consumers to specify repeatable tests on graph structures and datatypes, which is an effective way to systematically assess data quality. We established the DBpedia Databus as a central platform to better capture decentrally created, client-side value by consumers.

It is an open system, therefore value that is captured flows right back to everybody.  

The full document “DBpedia’s Databus and strategic initiative to facilitate “One Billion derived Knowledge Graphs by and for Consumers” until 2025 is available here.  

If you have any feedback or questions, please use the DBpedia Forum, the “report issues” button, or dbpedia@infai.org.

Yours,

DBpedia Association

DBpedia Growth Hack – Fall/Winter 2019

*UPDATE* – We are now 5 weeks in our growth hack. Read on below to find out how it all started. Click here to follow up on each of our milestones.

A growth hack – how come?

Things have gone a bit quiet around DBpedia. No new releases, no clear direction to go. Did DBpedia stop? Actually not. There were community and board member meetings, discussions, 500 messages per week on dbpedia.slack.com.

We are still there. We, as a community, restructured and now we are done, which means that DBpedia will now work more focused to build on its Technology Leadership role in the Web of Data and thus – with our very own DBpedia Growth Hack – bring new innovation and free fuel to everybody.

What is this growth hack?

We restructured in two areas:

  1. The agility of knowledge delivery –  our release cycle was too slow and too expensive. We were unable to include substantial contributions from DBpedians. Therefore, quality and features stagnated.
  2. Transparent processes – DBpedia has a crafty community with highly skilled knowledge engineers backing it. At some point, we grew too much and became lumpy, with a big monolithic system that nobody could improve because of side effects. So we designed a massive curation infrastructure where information can be retrieved, adjusted and errors discussed and fixed.

We have been consistently working on this restructuring for two years now and we now have the infrastructure ready as horizontal prototype meaning each part works and everybody can start using it. We ate our own dog food and built the first application.

(Frey et al. DBpedia FlexiFusion – Best of Wikipedia > Wikidata > Your Data (accepted at ISWC 2019) .

Now we will go through each part and polish & document it, and will report about it with a blog post each.  Stay tuned !

Is DBpedia Academic or Industrial?

The Semantic Web has a history of being labelled as too academic and a part of it colored DBpedia as well. Here is our personal truth: It is an engineering project and therefore it swings both ways. It is a great academic success with 25,000 papers using the data and  enabling research and innovation. The free data drives research on data-driven research. Also, we are probably THE fastest pathway from lab to market as our industry adoption has unprecedented speed. Proof will follow in the blog posts of the Growth Hack series.

Blog Posts of the Growth Hack series:

(not necessarily in that order, depending on how fast we can polish & document )

  • Query DBpedia as SQL – a first service on the Databus
  • DBpedia Live Extraction – Realtime updates of Wikipedia
  • DBpedia Business Models – How to earn money with DBpedia & the Databus
  • MARVIN Release Bot – together with https://blogs.tib.eu/wp/tib/ incl. an update of https://wiki.dbpedia.org/Datasets
  • The new forum https://forum.dbpedia.org is already ready to register, but needs some structure. Intended as replacement of support.dbpedia.org

In addition some announcements of on-going projects:

  • GlobalFactSync (GFS)Syncing facts between Wikipedia and Wikidata
  • Energy Databus: LOD GEOSS project focusing on energy system data on the bus
  • Supply-Chain-Management Databus – PLASS project focusing on SCM data on the bus

So, stay tuned for our upcoming posts and follow our journey.

Yours

DBpedia Association

Artificial Intelligence (AI) and DBpedia

Artificial Intelligence (AI) is currently the central subject of the just announced ‘Year of Science’  by the Federal German Ministry. In recent years, new approaches were explored on how to facilitate AI, new mindsets were established and new tools were developed, new technologies implemented. AI is THE key technology of the 21st century. Together with Machine Learning (ML), it transforms society faster than ever before and, will lead humankind to its digital future.

In this digital transformation era, success will be based on using analytics to discover the insights locked in the massive volume of data being generated today. Success with AI and ML depends on having the right infrastructure to process the data.[1]

The Value of Data Governance

One key element to facilitate ML and AI for the digital future of Europe, are ‘decentralized semantic data flows’, as stated by Sören Auer, a founding member of DBpedia and current director at TIB, during a meeting about the digital future in Germany at the Bundestag. He further commented that major AI breakthroughs were indeed facilitated by easily accessible datasets, whereas the Algorithms used were comparatively old.

In conclusion, Auer reasons that the actual value lies in data governance. Infact, in order to guarantee progress in  AI, the development of a common and transparent understanding of data is necessary. [2]

DBpedia Databus – Digital Factory Platform

The DBpedia Databus  – our digital factory Platform –  is one of many drivers that will help to build the much-needed data infrastructure for ML and AI to prosper.  With the DBpedia Databus, we create a hub that facilitates a ‘networked data-economy’ revolving around the publication of data. Upholding the motto, Unified and Global Access to Knowledge, the databus facilitates exchanging, curating and accessing data between multiple stakeholders  – always, anywhere. Publishing data on the Databus means connecting and comparing (your) data to the network. Check our current DBpedia releases via http://downloads.dbpedia.org/repo/dev/.

DBpediaDay – & AI for Smart Agriculture

Furthermore, you can learn about the DBpedia Databus during our 13th DBpedia Community meeting, co-located with LDK conference,  in Leipzig, May 2019. Additionally, as a special treat for you, we also offer an AI side-event on May 23rd, 2019.

May we present you the thinktank and hackathon  – “Artificial Intelligence for Smart Agriculture”. The goal of this event is to develop new ideas and small tools which can demonstrate the use of AI in the agricultural domain or the use of AI for a sustainable bio-economy. In that regard, a special focus will be on the use and the impact of linked data for AI components. 

In short, the two-part event, co-located with LSWT & DBpediaDay, comprises workshops, on-site team hacking as well as presentations of results. The activity is supported by the projects DataBio and Bridge2Era as well as CIAOTECH/PNO. All participating teams are invited to join and present their projects. Further Information are available here. Please submit your ideas and projects here.  

 

Finally, the DBpedia Association is looking forward to meeting you in Leipzig, home of our head office. Pay us a visit!

____

Resources:

[1] Zeus Kerravala; The Success of ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING Requires an Architectural Approach to Infrastructure. ZK Research: A Division of Kerravala Consulting © 2018 ZK Research, available via http://bit.ly/2UwTJRo

[2] Sören Auer; Statement at the Bundestag during a meeting in AI, Summary is available via https://www.tib.eu/de/service/aktuelles/detail/tib-direktor-als-experte-zu-kuenstlicher-intelligenz-ki-im-deutschen-bundestag/

Call for Participation – LDK Conference & DBpedia Day

Call for Participation LDK – Conference

With the advent of digital technologies, an ever-increasing amount of language data is now available across various application areas and industry sectors, thus making language data more and more valuable. In that context, we would like to draw your attention to the 2nd Language, Data and Knowledge conference, short LDK conference which will be held in Leipzig from May 20th till 22nd, 2019.

The Conference

This new biennial conference series aims at bringing together researchers from across disciplines concerned with language data in data science and knowledge-based applications.

Keynote Speakers

We are happy, that Christian Bizer, a founding member of DBpedia, will be one of the three amazing keynote speakers that open the LDK conference. Apart from Christian, Christiane Fellbaum from Princeton University and  Eduart Werner, representative of Leipzig University will share their thoughts on current language data issues to start vital discussions revolving around language data.

Be part of this event in Leipzig and catch up with the latest research outcomes in the areas of acquisition, provenance, representation, maintenance, usability, quality as well as legal, organizational and infrastructure aspects of language data.  

DBpedia Community Meeting

To get the full Leipzig experience, we also like to invite you to our DBpedia Community meeting, which is colocated with LDK and will be held on May, 23rd 2019. Contributions are still welcome. Just in get in touch via dbpedia@infai.org .

We also offer an interesting side-event, the Thinktank and Hackathon “Artificial Intelligence for Smart Agriculture”. Visit our website for further information.

Join LDK conference 2019 and our DBpedia Community Meeting to catch up with the latest research and developments in the Semantic Web Community. 

 

Yours DBpedia Association