Tag Archives: DBpedia Databus

One Billion derived Knowledge Graphs

… by and for Consumers until 2025

One Billion – what a mission! We are proud to announce that the DBpedia Databus website at https://databus.dbpedia.org and the SPARQL API at https://databus.dbpedia.org/(repo/sparql|yasgui) (docu) are in public beta now!

The system is usable (eat-your-own-dog-food tested) following a “working software over comprehensive documentation” approach. Due to its many components (website, SPARQL endpoints, keycloak, mods, upload client, download client, and data debugging), we estimate approximately six months in beta to fix bugs, implement all features and improve the details.

But, let’s start from the beginning

The DBpedia Databus is a platform to capture invested effort by data consumers who needed better data quality (fitness for use) in order to use the data and give improvements back to the data source and other consumers. DBpedia Databus enables anybody to build an automated DBpedia-style extraction, mapping and testing for any data they need. Databus incorporates features from DNS, Git, RSS, online forums and Maven to harness the full work power of data consumers. Vision

Our vision

Professional consumers of data worldwide have already built stable cleaning and refinement chains for all available datasets, but their efforts are invisible and not reusable. Deep, cleaned data silos exist beyond the reach of publishers and other consumers trapped locally in pipelines. Data is not oil that flows out of inflexible pipelines. Databus breaks existing pipelines into individual components that together form a decentralized, but centrally coordinated data network. In this set-up, data can flow back to previous components, the original sources, or end up being consumed by external components.

One Billion interconnected, quality-controlled Knowledge Graphs until 2025

The Databus provides a platform for re-publishing these files with very little effort (leaving file traffic as only cost factor) while offering the full benefits of built-in system features such as automated publication, structured querying, automatic ingestion, as well as pluggable automated analysis, data testing via continuous integration, and automated application deployment (software with data). The impact is highly synergistic. Just a few thousand professional consumers and research projects can expose millions of cleaned datasets, which are on par with what has long existed in deep silos and pipelines.

To a data consumer network

As we are inverting the paradigm form a publisher-centric view to a data consumer network, we will open the download valve to enable discovery and access to massive amounts of cleaner data than published by the original source. The main DBpedia Knowledge Graph alone has 600k file downloads per year complemented by downloads at over 20 chapters, e.g. http://es.dbpedia.org as well as over 8 million daily hits on the main Virtuoso endpoint.

Community extension from the alpha phase such as DBkWik, LinkedHypernyms are being loaded onto the bus and consolidated. We expect this number to reach over 100 by the end of the year. Companies and organisations who have previously uploaded their backlinks here will be able to migrate to the databus. Other datasets are cleaned and posted. In two of our research projects LOD-GEOSS and PLASS, we will re-publish open datasets, clean them and create collections, which will result in DBpedia-style knowledge graphs for energy systems and supply-chain management.

A new era for decentralized collaboration on data quality

DBpedia was established around producing a queryable knowledge graph derived from Wikipedia content that’s able to answer questions like “What have Innsbruck and Leipzig in common?” A community and consumer network quickly formed around this highly useful data, resulting in a large, well-structured, open knowledge graph that seeded the Linked Open Data Cloud — which is the largest knowledge graph on earth. The main lesson learned after these 13 years is that current data “copy” or “download” processes are inefficient by a magnitude that can only be grasped from a global perspective. Consumers spend tremendous effort fixing errors on the client-side. If one unparseable line needs 15 minutes to find and fix, we are talking about 104 days of work for 10,000 downloads. Providers – on the other hand – will never have the resources to fix the last error as cost increases exponentially (20/80 rule). 

One billion knowledge graphs in mind – the progress so far

Discarding faulty data often means that a substitute source has to be found, which is hours of research and might lead to similar problems. From the dozens of DBpedia Community meetings we held we can summarize that for each clean-up procedure, data transformation, linkset or schema mapping that a consumer creates client-side, dozens of consumers have invested the same effort client-side before him and none of it reaches the source or other consumers with the same problem. Holding the community meetings just showed us the tip of the iceberg. 

As a foundation, we implemented a mappings wiki that allowed consumers to improve data quality centrally. A next advancement was the creation of the SHACL standard by our former CTO and board member Dimitris Kontokostas. SHACL allows consumers to specify repeatable tests on graph structures and datatypes, which is an effective way to systematically assess data quality. We established the DBpedia Databus as a central platform to better capture decentrally created, client-side value by consumers.

It is an open system, therefore value that is captured flows right back to everybody.  

The full document “DBpedia’s Databus and strategic initiative to facilitate “One Billion derived Knowledge Graphs by and for Consumers” until 2025 is available here.  

If you have any feedback or questions, please use the DBpedia Forum, the “report issues” button, or dbpedia@infai.org.

Yours,

DBpedia Association

DBpedia Growth Hack – Fall/Winter 2019

*UPDATE* – We are now 5 weeks in our growth hack. Read on below to find out how it all started. Click here to follow up on each of our milestones.

A growth hack – how come?

Things have gone a bit quiet around DBpedia. No new releases, no clear direction to go. Did DBpedia stop? Actually not. There were community and board member meetings, discussions, 500 messages per week on dbpedia.slack.com.

We are still there. We, as a community, restructured and now we are done, which means that DBpedia will now work more focused to build on its Technology Leadership role in the Web of Data and thus – with our very own DBpedia Growth Hack – bring new innovation and free fuel to everybody.

What is this growth hack?

We restructured in two areas:

  1. The agility of knowledge delivery –  our release cycle was too slow and too expensive. We were unable to include substantial contributions from DBpedians. Therefore, quality and features stagnated.
  2. Transparent processes – DBpedia has a crafty community with highly skilled knowledge engineers backing it. At some point, we grew too much and became lumpy, with a big monolithic system that nobody could improve because of side effects. So we designed a massive curation infrastructure where information can be retrieved, adjusted and errors discussed and fixed.

We have been consistently working on this restructuring for two years now and we now have the infrastructure ready as horizontal prototype meaning each part works and everybody can start using it. We ate our own dog food and built the first application.

(Frey et al. DBpedia FlexiFusion – Best of Wikipedia > Wikidata > Your Data (accepted at ISWC 2019) .

Now we will go through each part and polish & document it, and will report about it with a blog post each.  Stay tuned !

Is DBpedia Academic or Industrial?

The Semantic Web has a history of being labelled as too academic and a part of it colored DBpedia as well. Here is our personal truth: It is an engineering project and therefore it swings both ways. It is a great academic success with 25,000 papers using the data and  enabling research and innovation. The free data drives research on data-driven research. Also, we are probably THE fastest pathway from lab to market as our industry adoption has unprecedented speed. Proof will follow in the blog posts of the Growth Hack series.

Blog Posts of the Growth Hack series:

(not necessarily in that order, depending on how fast we can polish & document )

  • Query DBpedia as SQL – a first service on the Databus
  • DBpedia Live Extraction – Realtime updates of Wikipedia
  • DBpedia Business Models – How to earn money with DBpedia & the Databus
  • MARVIN Release Bot – together with https://blogs.tib.eu/wp/tib/ incl. an update of https://wiki.dbpedia.org/Datasets
  • The new forum https://forum.dbpedia.org is already ready to register, but needs some structure. Intended as replacement of support.dbpedia.org

In addition some announcements of on-going projects:

  • GlobalFactSync (GFS)Syncing facts between Wikipedia and Wikidata
  • Energy Databus: LOD GEOSS project focusing on energy system data on the bus
  • Supply-Chain-Management Databus – PLASS project focusing on SCM data on the bus

So, stay tuned for our upcoming posts and follow our journey.

Yours

DBpedia Association

Artificial Intelligence (AI) and DBpedia

Artificial Intelligence (AI) is currently the central subject of the just announced ‘Year of Science’  by the Federal German Ministry. In recent years, new approaches were explored on how to facilitate AI, new mindsets were established and new tools were developed, new technologies implemented. AI is THE key technology of the 21st century. Together with Machine Learning (ML), it transforms society faster than ever before and, will lead humankind to its digital future.

In this digital transformation era, success will be based on using analytics to discover the insights locked in the massive volume of data being generated today. Success with AI and ML depends on having the right infrastructure to process the data.[1]

The Value of Data Governance

One key element to facilitate ML and AI for the digital future of Europe, are ‘decentralized semantic data flows’, as stated by Sören Auer, a founding member of DBpedia and current director at TIB, during a meeting about the digital future in Germany at the Bundestag. He further commented that major AI breakthroughs were indeed facilitated by easily accessible datasets, whereas the Algorithms used were comparatively old.

In conclusion, Auer reasons that the actual value lies in data governance. Infact, in order to guarantee progress in  AI, the development of a common and transparent understanding of data is necessary. [2]

DBpedia Databus – Digital Factory Platform

The DBpedia Databus  – our digital factory Platform –  is one of many drivers that will help to build the much-needed data infrastructure for ML and AI to prosper.  With the DBpedia Databus, we create a hub that facilitates a ‘networked data-economy’ revolving around the publication of data. Upholding the motto, Unified and Global Access to Knowledge, the databus facilitates exchanging, curating and accessing data between multiple stakeholders  – always, anywhere. Publishing data on the Databus means connecting and comparing (your) data to the network. Check our current DBpedia releases via http://downloads.dbpedia.org/repo/dev/.

DBpediaDay – & AI for Smart Agriculture

Furthermore, you can learn about the DBpedia Databus during our 13th DBpedia Community meeting, co-located with LDK conference,  in Leipzig, May 2019. Additionally, as a special treat for you, we also offer an AI side-event on May 23rd, 2019.

May we present you the thinktank and hackathon  – “Artificial Intelligence for Smart Agriculture”. The goal of this event is to develop new ideas and small tools which can demonstrate the use of AI in the agricultural domain or the use of AI for a sustainable bio-economy. In that regard, a special focus will be on the use and the impact of linked data for AI components. 

In short, the two-part event, co-located with LSWT & DBpediaDay, comprises workshops, on-site team hacking as well as presentations of results. The activity is supported by the projects DataBio and Bridge2Era as well as CIAOTECH/PNO. All participating teams are invited to join and present their projects. Further Information are available here. Please submit your ideas and projects here.  

 

Finally, the DBpedia Association is looking forward to meeting you in Leipzig, home of our head office. Pay us a visit!

____

Resources:

[1] Zeus Kerravala; The Success of ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING Requires an Architectural Approach to Infrastructure. ZK Research: A Division of Kerravala Consulting © 2018 ZK Research, available via http://bit.ly/2UwTJRo

[2] Sören Auer; Statement at the Bundestag during a meeting in AI, Summary is available via https://www.tib.eu/de/service/aktuelles/detail/tib-direktor-als-experte-zu-kuenstlicher-intelligenz-ki-im-deutschen-bundestag/

Call for Participation – LDK Conference & DBpedia Day

Call for Participation LDK – Conference

With the advent of digital technologies, an ever-increasing amount of language data is now available across various application areas and industry sectors, thus making language data more and more valuable. In that context, we would like to draw your attention to the 2nd Language, Data and Knowledge conference, short LDK conference which will be held in Leipzig from May 20th till 22nd, 2019.

The Conference

This new biennial conference series aims at bringing together researchers from across disciplines concerned with language data in data science and knowledge-based applications.

Keynote Speakers

We are happy, that Christian Bizer, a founding member of DBpedia, will be one of the three amazing keynote speakers that open the LDK conference. Apart from Christian, Christiane Fellbaum from Princeton University and  Eduart Werner, representative of Leipzig University will share their thoughts on current language data issues to start vital discussions revolving around language data.

Be part of this event in Leipzig and catch up with the latest research outcomes in the areas of acquisition, provenance, representation, maintenance, usability, quality as well as legal, organizational and infrastructure aspects of language data.  

DBpedia Community Meeting

To get the full Leipzig experience, we also like to invite you to our DBpedia Community meeting, which is colocated with LDK and will be held on May, 23rd 2019. Contributions are still welcome. Just in get in touch via dbpedia@infai.org .

We also offer an interesting side-event, the Thinktank and Hackathon “Artificial Intelligence for Smart Agriculture”. Visit our website for further information.

Join LDK conference 2019 and our DBpedia Community Meeting to catch up with the latest research and developments in the Semantic Web Community. 

 

Yours DBpedia Association