All posts by Sandra Praetor

Global Fact Sync – Synchronizing Wikidata & Wikipedia’s infoboxes

How is data edited in Wikipedia/Wikidata? Where does it come from? And how can we synchronize it globally?  

The GlobalFactSync (GFS) Project — funded by the Wikimedia Foundation — started in June 2019 and has two goals:

  • Answer the above-mentioned three questions.
  • Build an information system to synchronize facts between all Wikipedia language-editions and Wikidata. 

Now we are seven weeks into the project (10+ more months to go) and we are releasing our first prototypes to gather feedback. 

How – Synchronization vs Consensus

We follow an absolute “Human(s)-in-the-loop” approach when we talk about synchronization. The final decision whether to synchronize a value or not should rest with a human editor who understands consensus and the implications. There will be no automatic imports. Our focus is to drastically reduce the time to research all references for individual facts.

A trivial example to illustrate our reasoning is the release date of the single “Boys Don’t Cry” (March 16th, 1989) in the English, Japanese, and French Wikipedia, Wikidata and finally in the external open database MusicBrainz.  A human editor might need 15-30 minutes finding and opening all different sources, while our current prototype can spot differences and display them in 5 seconds.

We already had our first successful edit where a Wikipedia editor fixed the discrepancy with our prototype: “I’ve updated Wikidata so that all five sources are in agreement.” We are now working on the following tasks:

  • Scaling the system to all infoboxes, Wikidata and selected external databases (see below on the difficulties there)
  • Making the system:
    •  “live” without stale information
    • “reliable” with less technical errors when extracting and indexing data
    • “better referenced” by not only synchronizing facts but also references 

Contributions and Feedback

To ensure that GlobalFactSync will serve and help the Wikiverse we encourage everyone to try our data and micro-services and leave us some feedback, either on our Meta-Wiki page or via email. In the following 10+ months, we intend to improve and build upon these initial results. At the same time, these microservices are available to every developer to exploit it and hack useful applications. The most promising contributions will be rewarded and receive the book “Engineering Agile Big-Data Systems”. Please post feedback or any tool or GUI here. In case you need changes to be made to the API, please let us know, too.
For the ambitious future developers among you, we have some budget left that we will dedicate to an internship. In order to apply, just mention it in your feedback post. 

Finally, to talk to us and other GlobalfactSync-Users you may want to visit WikidataCon and Wikimania, where we will present the latest developments and the progress of our project. 

Data, APIs & Microservices (Technical prototypes) 

Data Processing and Infobox Extraction

For GlobalFactSync we use data from Wikipedia infoboxes of different languages, as well as Wikidata, and DBpedia and fuse them to receive one big, consolidated dataset – a PreFusion dataset (in JSON-LD). More information on the fusion process, which is the engine behind GFS, can be found in the FlexiFusion paper. One of our next steps is to integrate MusicBrainz into this process as an external dataset. We hope to implement even more such external datasets to increase the amount of available information and references. 

First microservices 

We deployed a set of microservices to show the current state of our toolchain.

  • [Initial User Interface] The GlobalFactSync UI prototype (available at http://global.dbpedia.org) shows all extracted information available for one entity for different sources. It can be used to analyze the factual consensus between different Wikipedia articles for the same thing. Example: Look at the variety of population counts for Grimma.
  • [Reference Data Download] We ran the Reference Extraction Service over 10 Wikipedia languages. Download dumps here.
  • [ID service] Last but not least, we offer the Global ID Resolution Service. It ties together all available identifiers for one thing (i.e. at the moment all DBpedia/Wikipedia and Wikidata identifiers – MusicBrainz coming soon…) and shows their stable DBpedia Global ID. 

Finding sync targets

In order to test out our algorithms, we started by looking at various groups of subjects, our so-called sync targets. Based on the different subjects a set of problems were identified with varying layers of complexity:

  • identity check/check for ambiguity — Are we talking about the same entity? 
  • fixed vs. varying property — Some properties vary depending on nationality (e.g., release dates), or point in time (e.g., population count).
  • reference — Depending on the entity’s identity check and the property’s fixed or varying state the reference might vary. Also, for some targets, no query-able online reference might be available.
  • normalization/conversion of values — Depending on language/nationality of the article properties can have varying units (e.g., currency, metric vs imperial system).

The check for ambiguity is the most crucial step to ensure that the infoboxes that are being compared do refer to the same entity. We found, instances where the Wikipedia page and the infobox shown on that page were presenting information about different subjects (e.g., see here).

Examples

As a good sync target to start with the group ‘NBA players’ was identified. There are no ambiguity issues, it is a clearly defined group of persons, and the amount of varying properties is very limited. Information seems to be derived from mainly two web sites (nba.com and basketball-reference.com) and normalization is only a minor issue. ‘Video games’ also proved to be an easy sync target, with the main problem being varying properties such as different release dates for different platforms (Microsoft Windows, Linux, MacOS X, XBox) and different regions (NA vs EU).

More difficult topics, such as ‘cars’, ’music albums’, and ‘music singles’ showed more potential for ambiguity as well as property variability. A major concern we found was Wikipedia pages that contain multiple infoboxes (often seen for pages referring to a certain type of car, such as this one). Reference and fact extraction can be done for each infobox, but currently, we run into trouble once we fuse this data. 

Further information about sync targets and their challenges can be found on our Meta-Wiki discussion page, where Wikipedians that deal with infoboxes on a regular basis can also share their insights on the matter. Some issues were also found regarding the mapping of properties. In order to make GlobalFactSync as applicable as possible, we rely on the DBpedia community to help us improve the mappings. If you are interested in participating, we will connect with you at http://mappings.dbpedia.org and in the DBpedia forum.  

BottomlineWe value your feedback

Your DBpedia Association

timbr – the DBpedia SQL Semantic Knowledge Platform

With timbr, WPSemantix and the DBpedia Association launch the first SQL Semantic Knowledge Graph that integrates Wikipedia and Wikidata Knowledge into SQL engines.

In part three of DBpedia’s growth hack blog series, we feature timbr, the latest development at DBpedia in collaboration with WPSemantix. Read on to find out how it works.

timbr – DBpedia SQL Semantic Knowledge Platform

Tel Aviv, Israel and Leipzig, Germany – July 18, 2019 – WP-Semantix (WPS) – the “SQL Knowledge Graph Company™” and DBpedia Association – Institut für Angewandte Informatik e.V., announced today the launch of the timbr-DBpedia SQL Semantic Knowledge Platform, a unique version of WPS’ timbr SQL Semantic Knowledge Graph that integrates timbr-DBpedia ontology, timbr’s ontology explorer/visualizer and timbr’s SQL query service, to provide for the first time semantic access to DBpedia knowledge in SQL and to thus facilitate DBpedia knowledge integration into standard data warehouses and data lakes.

DBpedia

DBpedia is the crowd-sourced community effort to extract structured content from the information created in various Wikimedia projects and publish these as files on the Databus and via online databases. This structured information resembles an open knowledge graph which has been available for everyone on the Web for over a decade. Knowledge graphs are a new kind of databases developed to store knowledge in a machine-readable form, organized as connected, relationship-rich data. After the publication of DBpedia (in parallel to Freebase) 12 years ago, knowledge graphs have become very successful and Google uses a similar approach to create the knowledge cards displayed in search results.

Query the world’s knowledge in standard SQL

Amit Weitzner, founder and CEO at WPS commented: “Knowledge graphs use specialized languages, require resource-intensive, dedicated infrastructure and require costly ETL operations. That is, they did until timbr came along. timbr employs SQL – the most widely known database language, to eliminate the technological barriers to entry for using knowledge graphs and to implement Semantic Web principles to provide knowledge graph functionality in SQL. timbr enables modelling of data as connected, context-enriched concepts with inference and graph traversal capabilities while being queryable in standard SQL, to represent knowledge in data warehouses and data lakes. timbr-DBpedia is our first vertical application and we are very excited by the prospects of our cooperation with the DBpedia team to enable the largest user base to query the world’s knowledge in standard SQL.”

Sebastian Hellmann, executive director of the DBpedia Association commented that:

“our vision of the DBpedia Databus – transforming Linked Data into a networked data economy, is becoming a reality thanks to tools such as timbr-DBpedia which take full advantage of our unique data sets and data architecture. We look forward to working with WPS to also enable access to new data sets as they become available .”

timbr will help to explore the power of semantic technologies

Prof. James Hendler, pioneer and a world-leading authority in Semantic Web technologies and WPS’ advisory board member commented “timbr can be a game-changing solution by enabling the semantic inference capabilities needed in many modelling applications to be done in SQL. This approach will enable many users to get the advantages of semantic AI technologies and data integration without the learning curve of many current systems. By giving more people access to the semantic version of Wikipedia, timbr-DBpedia will definitely contribute to allowing the majority of the market to explore the power of semantic technologies.”

timbr-DBpedia is available as a query service or licensed for use as SaaS or on-premises. See the DBpedia website: wiki.dbpedia.org/timbr.

About WPSemantix

WP-Semantix Ltd. (wpsemantix.com) is the developer of the timbr SQL semantic knowledge platform, a dynamic abstraction layer over relational and non-relational data, facilitating declaration and powerful exploration of semantically rich ontologies using a standard SQL query interface. timbr is natively accessible in Apache Spark, Python, R and SQL to empower data scientists to perform complex analytics and generate sophisticated ML algorithms.  Its JDBC interface provides seamless integration with the most popular business intelligence solutions to make complex analytics accessible to analysts and domain experts across the organization.

WP-Semantix, timbr, “SQL Knowledge Graph”, “SQL Semantic Knowledge Graph” and associated marks and trademarks are registered trademarks of WP Semantix Ltd.

DBpedia is looking forward to this cooperation. Follow us on Twitter for the latest information and stay tuned for part four of our growth hack series. The next post features the GlobalFactSyncRe. Curious? You have to be a little more patient and wait till Thursday, July 25th.

Yours DBpedia Association

DBpedia Forum – New Ways to Exchange about DBpedia

From now on, in addition to our newsletter and slack as a means for communication, we have a new platform for exchange and support around DBpedia – the DBpedia Forum.

With part  II of our growth hack series, we would like to introduce you to the latest feature of our development – the new DBpedia Forum.

Why a new forum?

DBpedia has an inclusionist model and DBpedia is huge. At the core, there is data extracted from Wikipedia and Wikidata. Around this, there are derived datasets like the fusion/enrichment and also LHD. Additionally, we offer services such as DBpedia Spotlight, DBpedia Lookup, SameAs, and not to forget the main endpoint http://dbpedia.org/sparql as well as our DBpedia Chapters. All of this is surrounded by 25k academic papers and a vivid business network.

Since we have this inclusionist model, we believe that access to data and knowledge should be global and unified (and free where possible). That is exactly why we established the DBpedia Forum –  to further this mission. 

Welcome!

The DBpedia Forum is a shared community resource — a place to share skills, knowledge, and interests through an ongoing conversation about DBpedia and related topics. It is meant (among others) to replace our old support page for assistance with DBpedia. In the long run, we will shut down our (former) support page, as it is not serving our growing needs anymore. 

This is what the forum currently looks like. Traffic and communication are still a little low. Start your conversation about DBpedia here and now.

Where are all the DBpedians?

We figured, most of you are already actively involved in exchange about DBpedia. However, the majority of that is scattered all over the web which makes it hard for us and others to keep track of. With the new forum, we offer you a playground for vivid exchange, and to meet and greet fellow DBpedians – a platform for everyone’s benefit. 

The DBpedia Forum simplifies communication

Make this a great place for discussion by contributing yourself. It is super easy. Just visit https://forum.dbpedia.org/, browse the topics, and find the info that helps you or add your own. If you want to contribute just register and off you go. Improve the discussion by discovering ones that are already happening. Help us influence the future of the DBpedia community by engaging in discussions that make this forum an interesting place to be. 

Transparency is all

To assist with maintaining an appropriate code of conduct the forum utilizes little discourse tools that enable the community to collectively identify the best (and worst) contributions. The forum tracks bookmarks, likes, flags, replies, edits, and many more. That is similar to the ranking in the old support system but much more transparent and much more fun.

For the hunter-gatherers among you, you can also earn batches for various activities  – as long as you are active.  And if you feel very passionate about a certain topic, we would gladly make you a moderator – just let us know.  

Now is the time

Since you are already talking about DBpedia somewhere on the WWW, why not do it here and now for everyone else to follow? Your knowledge and skills are key, not only for individuals in this forum but also for the whole DBpedia community. 

Happy posting and stay tuned for part III in the growth hack series. The next post will feature timbr – DBpedia SQL Semantic Knowledge Platform.

Yours,

DBpedia Association

DBpedia Growth Hack – Fall/Winter 2019

*UPDATE* – We are now 5 weeks in our growth hack. Read on below to find out how it all started. Click here to follow up on each of our milestones.

A growth hack – how come?

Things have gone a bit quiet around DBpedia. No new releases, no clear direction to go. Did DBpedia stop? Actually not. There were community and board member meetings, discussions, 500 messages per week on dbpedia.slack.com.

We are still there. We, as a community, restructured and now we are done, which means that DBpedia will now work more focused to build on its Technology Leadership role in the Web of Data and thus – with our very own DBpedia Growth Hack – bring new innovation and free fuel to everybody.

What is this growth hack?

We restructured in two areas:

  1. The agility of knowledge delivery –  our release cycle was too slow and too expensive. We were unable to include substantial contributions from DBpedians. Therefore, quality and features stagnated.
  2. Transparent processes – DBpedia has a crafty community with highly skilled knowledge engineers backing it. At some point, we grew too much and became lumpy, with a big monolithic system that nobody could improve because of side effects. So we designed a massive curation infrastructure where information can be retrieved, adjusted and errors discussed and fixed.

We have been consistently working on this restructuring for two years now and we now have the infrastructure ready as horizontal prototype meaning each part works and everybody can start using it. We ate our own dog food and built the first application.

(Frey et al. DBpedia FlexiFusion – Best of Wikipedia > Wikidata > Your Data (accepted at ISWC 2019) .

Now we will go through each part and polish & document it, and will report about it with a blog post each.  Stay tuned !

Is DBpedia Academic or Industrial?

The Semantic Web has a history of being labelled as too academic and a part of it colored DBpedia as well. Here is our personal truth: It is an engineering project and therefore it swings both ways. It is a great academic success with 25,000 papers using the data and  enabling research and innovation. The free data drives research on data-driven research. Also, we are probably THE fastest pathway from lab to market as our industry adoption has unprecedented speed. Proof will follow in the blog posts of the Growth Hack series.

Blog Posts of the Growth Hack series:

(not necessarily in that order, depending on how fast we can polish & document )

  • Query DBpedia as SQL – a first service on the Databus
  • DBpedia Live Extraction – Realtime updates of Wikipedia
  • DBpedia Business Models – How to earn money with DBpedia & the Databus
  • MARVIN Release Bot – together with https://blogs.tib.eu/wp/tib/ incl. an update of https://wiki.dbpedia.org/Datasets
  • The new forum https://forum.dbpedia.org is already ready to register, but needs some structure. Intended as replacement of support.dbpedia.org

In addition some announcements of on-going projects:

  • GlobalFactSync (GFS)Syncing facts between Wikipedia and Wikidata
  • Energy Databus: LOD GEOSS project focusing on energy system data on the bus
  • Supply-Chain-Management Databus – PLASS project focusing on SCM data on the bus

So, stay tuned for our upcoming posts and follow our journey.

Yours

DBpedia Association

Home Sweet Home – The 13th DBpedia Community Meeting

For the second time now, we co-located one of our DBpedia community meetings with the LDK-conference. After the previous edition in Galway two years ago, It was Leipzig’s turn to host the event. Thus, the 13th DBpedia community meeting took place in this beautiful city which is also home to the DBpedia Association’s head office. Win, Win we’d say. 

After a very successful LDK conference May 20th-21st, representatives of the European DBpedia community met at Villa Ida Mediencampus,  on Thursday, May 23rd, to present their work with DBpedia and to exchange about the DBpedia Databus.  

For those of you who missed it or for those who want a little retrospective on the day, this blog post provides you with a short LDK-wrap-up as well as a recap of our DBpedia Day.

First things first

First and foremost, we would like to thank LDK organizers for co-locating our meeting and thus enabling fruitful synergies, and a platform for the DBpedia community to exchange.

LDK

The first presentation that kicked-off the conference was given by Prof. Christiane Fellbaum from Princeton University. The topic of her talk was on “Mapping the Lexicons of Signs and Words” with the main focus on her research of mapping WordNet and SignStudy, a resource for American Sign Language. Shortly after, Prof Eduard Werner from Leipzig University gave a very exciting talk on the “Sorbian languages”. He discussed the nature of the Sorbian languages, their historical background, and the unfortunate imminent extinction of lower Sorbian due to a decline of native speakers.

The first day of LDK was full of exciting presentations related to various language-oriented topics. Researchers exchanged about linguistic vocabularies, SPARQL query recommendations, role and reference grammar, language detection, entity recognition, machine translation, under-resourced languages, metaphor identification, event detection and linked data in general. The first day ended with fruitful discussions during the poster session. After at the end of the first conference day, LDK visitors had the chance to mingle with locals in some of Leipzig’s most exciting bars during a pub crawl.

Prof. Christian Bizer from the University of Mannheim opened the second day with a keynote on “Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the Web?”. In his talk, he gave a nice overview of the research on knowledge extraction around the large-scale Web Data Commons corpus, findings, open challenges and possible exploitations of this corpus.

The second day was busy with four sessions, each populated with presentations on exciting topics ranging from relation classification, dictionary linking and entity linking, to terminology models, topical thesauri and morphology.

The series of presentations was ended with an Organ Prelude played by David Timm, the University Music Director at the Leipzig University. Finally, the day and the conference was concluded with a conference dinner at Moritzbastei, one of Leipzig’s famous cultural centres.

DBpedia Day

On May 23rd, the DBpedia Community met for the 13th DBpedia community meeting. The event attracted more than 60 participants who extended their LDK experience or followed our call to Leipzig.

Opening & keynotes

The meeting was opened by Dr. Sebastian Hellmann, the executive director of the DBpedia Association. He gave an overview of the latest developments and achievements around DBpedia, with the main focus on the DBpedia Databus technologies. The first keynote was given by Dr. Peter Haase, from metaphacts, with an unusual interactive presentation on “Linked Data Fun with DBpedia”. The second keynote speaker was Prof. Heiko Paulheim, presenting findings, challenges and results from his work on the construction of the DBkWiki Knowledge Graph by exploiting the DBpedia extraction framework.

Showcase session

The showcases session started with a presentation given by  Krzysztof Węcel on “Citations and references in DBpedia”, followed by Peter Nancke with a presentation on the “TeBaQA Question Answering System”, Maribel Acosta Deibe speaking about “Crowdsourcing the Quality of DBpedia” and finally, a presentation by Angus Addlesee on “Data Reconciliation using DBpedia”.

NLP & DBpedia session

The DBpedia & NLP session was opened by  Diego Moussallem presenting the results from his work on “Generating Natural Language from RDF Data”. The second presentation was given by Christian Jilek on the topic of “Named Entity Recognition for Real-Time Applications”, which at the same time won the best research paper at the LDK conference. Next, Jonathan Kobbe presented the best student paper at the LDK conference on the topic of “Argumentative Relation Classification”. Finally, Edgard Marx closed the session with an overview presentation on “From the word to the resource”.

 

Side-Event – Hackathon

The “Artificial Intelligence for Smart Agriculture” Hackathon focused on enhancing the usability of automatic analysis tools which utilize semantic big data for agriculture, as well as conducting an outreach of the DataBio project for the DBpedia community. The event was supported by PNO, Spacebel, PSNC, and InfAI e.V.

We improved the visualization module of Albatross, a platform for processing and analyzing Linked Open Data, and added functionalities to geo-L, the geospatial link discovery tool.  

In addition, we presented a paper about Linked Data publication pipelines, focusing on agri-related data, at the co-located LSWT conference.

Wrap Up

After the event, DBpedians joined the DBpedia Association in the nearby pub Gosenschenke to delve into more vital talks about the Semantic Web world, Linked Data & DBpedia.

In case you missed the event, all slides and presentations are available on our website. Further insights feedback and photos about the event can be found on Twitter via #DBpediaLeipzig.

We are currently looking forward to the next DBpedia Community Meeting, on Sept, 12th in Karlsruhe, Germany. This meeting is co-located with the SEMANTiCS Conference. Contributions are still welcome. Just ping us via dbpedia@infai.org and show us what you’ve got. You should also get in touch with us if you want to host a DBpedia Meetup yourself. We will help you with the program, the dissemination or organizational matters of the event if need be.

Stay tuned, check Twitter, Facebook, and the website, or subscribe to our newsletter for the latest news and updates.

 

Your DBpedia Association

Artificial Intelligence (AI) and DBpedia

Artificial Intelligence (AI) is currently the central subject of the just announced ‘Year of Science’  by the Federal German Ministry. In recent years, new approaches were explored on how to facilitate AI, new mindsets were established and new tools were developed, new technologies implemented. AI is THE key technology of the 21st century. Together with Machine Learning (ML), it transforms society faster than ever before and, will lead humankind to its digital future.

In this digital transformation era, success will be based on using analytics to discover the insights locked in the massive volume of data being generated today. Success with AI and ML depends on having the right infrastructure to process the data.[1]

The Value of Data Governance

One key element to facilitate ML and AI for the digital future of Europe, are ‘decentralized semantic data flows’, as stated by Sören Auer, a founding member of DBpedia and current director at TIB, during a meeting about the digital future in Germany at the Bundestag. He further commented that major AI breakthroughs were indeed facilitated by easily accessible datasets, whereas the Algorithms used were comparatively old.

In conclusion, Auer reasons that the actual value lies in data governance. Infact, in order to guarantee progress in  AI, the development of a common and transparent understanding of data is necessary. [2]

DBpedia Databus – Digital Factory Platform

The DBpedia Databus  – our digital factory Platform –  is one of many drivers that will help to build the much-needed data infrastructure for ML and AI to prosper.  With the DBpedia Databus, we create a hub that facilitates a ‘networked data-economy’ revolving around the publication of data. Upholding the motto, Unified and Global Access to Knowledge, the databus facilitates exchanging, curating and accessing data between multiple stakeholders  – always, anywhere. Publishing data on the Databus means connecting and comparing (your) data to the network. Check our current DBpedia releases via http://downloads.dbpedia.org/repo/dev/.

DBpediaDay – & AI for Smart Agriculture

Furthermore, you can learn about the DBpedia Databus during our 13th DBpedia Community meeting, co-located with LDK conference,  in Leipzig, May 2019. Additionally, as a special treat for you, we also offer an AI side-event on May 23rd, 2019.

May we present you the thinktank and hackathon  – “Artificial Intelligence for Smart Agriculture”. The goal of this event is to develop new ideas and small tools which can demonstrate the use of AI in the agricultural domain or the use of AI for a sustainable bio-economy. In that regard, a special focus will be on the use and the impact of linked data for AI components. 

In short, the two-part event, co-located with LSWT & DBpediaDay, comprises workshops, on-site team hacking as well as presentations of results. The activity is supported by the projects DataBio and Bridge2Era as well as CIAOTECH/PNO. All participating teams are invited to join and present their projects. Further Information are available here. Please submit your ideas and projects here.  

 

Finally, the DBpedia Association is looking forward to meeting you in Leipzig, home of our head office. Pay us a visit!

____

Resources:

[1] Zeus Kerravala; The Success of ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING Requires an Architectural Approach to Infrastructure. ZK Research: A Division of Kerravala Consulting © 2018 ZK Research, available via http://bit.ly/2UwTJRo

[2] Sören Auer; Statement at the Bundestag during a meeting in AI, Summary is available via https://www.tib.eu/de/service/aktuelles/detail/tib-direktor-als-experte-zu-kuenstlicher-intelligenz-ki-im-deutschen-bundestag/

Call for Participation – LDK Conference & DBpedia Day

Call for Participation LDK – Conference

With the advent of digital technologies, an ever-increasing amount of language data is now available across various application areas and industry sectors, thus making language data more and more valuable. In that context, we would like to draw your attention to the 2nd Language, Data and Knowledge conference, short LDK conference which will be held in Leipzig from May 20th till 22nd, 2019.

The Conference

This new biennial conference series aims at bringing together researchers from across disciplines concerned with language data in data science and knowledge-based applications.

Keynote Speakers

We are happy, that Christian Bizer, a founding member of DBpedia, will be one of the three amazing keynote speakers that open the LDK conference. Apart from Christian, Christiane Fellbaum from Princeton University and  Eduart Werner, representative of Leipzig University will share their thoughts on current language data issues to start vital discussions revolving around language data.

Be part of this event in Leipzig and catch up with the latest research outcomes in the areas of acquisition, provenance, representation, maintenance, usability, quality as well as legal, organizational and infrastructure aspects of language data.  

DBpedia Community Meeting

To get the full Leipzig experience, we also like to invite you to our DBpedia Community meeting, which is colocated with LDK and will be held on May, 23rd 2019. Contributions are still welcome. Just in get in touch via dbpedia@infai.org .

We also offer an interesting side-event, the Thinktank and Hackathon “Artificial Intelligence for Smart Agriculture”. Visit our website for further information.

Join LDK conference 2019 and our DBpedia Community Meeting to catch up with the latest research and developments in the Semantic Web Community. 

 

Yours DBpedia Association

One of 206 – GSoC 2019 – Call for students

 

Pinky: Gee, Brain, what are we gonna do this year?
Brain: The same thing we do every year, Pinky. Taking over GSoC.

Exactly what DBpedia plans to do this summer. We have been accepted as one of 206 open source organizations to participate in Google Summer of Code  (GSoC) again. Yes, ONE OF 206, let that sink in. The upcoming GSoC marks the 15th consecutive year of the program and is the 8th year in a row for DBpedia.

What is GSoC? 

Google Summer of Code is a global program focused on bringing student developers into open source software development. Funds will be given to students (BSc, MSc, Ph.D.) to work for three months on a specific task. For GSoC- Newbies, this short video and the information provided on their website will explain all there is to know about GSoC.

Time for a New Narrative

In the past years, we mentored many successful projects by female students but mostly male applicants. Now, it is time to change this narrative and work towards more diversity in science. This year, we at DBpedia are more determined than ever to encourage female students to apply for our projects. That being said, we already engaged excellent female mentors to also raise the female percentage in our mentor team. We are proud of all female DBpedians that help to shape the future DBpedia.

In the following four weeks, we invite all students, female and male, who are interested in Semantic Web and Open Source development to apply for our projects. You can also contribute your own ideas to work on during the summer. We are regularly growing our community through GSoC and can deliver more and more opportunities to you. 

And this is how it works: 3 steps to GSoC stardom

  1. Open source organizations such as DBpedia announce their projects ideas.
  2. Students contact the mentor organizations they want to work with and write up a project proposal.
  3. After a selection phase, students are matched with a specific project and a set of mentors to work on the project during the summer.
To all the smart brains out there, if you are a student who wants to work with us during summer 2019, check our list of project ideas, warm-up tasks or come up with your own idea and get in touch with us.

Further information on the application procedure is available via our DBpedia -Guidelines.  There you will find information on how to contact us and how to appropriately apply for GSoC. Please also note the official GSoC 2019 timeline for your proposal submission and make sure to submit on time.  Unfortunately, extensions cannot be granted. Final submission deadline is April 9th, 2019, 8pm CET.

Finally, check our website for information on DBpedia, follow us on Twitter or subscribe to our newsletter.

And in case you still have questions, please do not hesitate to contact us via praetor@infai.org.

We are thrilled to meet you and your ideas.

Your DBpedia-GSoC -Team

A year with DBpedia – Retrospective Part 3

This is the final part of our journey around the world with DBpedia. This time we will take you from Austria, to Mountain View, California and to London, UK.

Come on, let’s do this.

Welcome to Vienna, Austria  – Semantics

More than 110 DBpedia enthusiasts joined our Community Meeting in Vienna, on September 10th, 2018. The event was again co-located with SEMANTiCS, a very successful collaboration. Lucky us, we got hold of two brilliant Keynote speakers, to open our meeting. Javier David Fernández García, Vienna University of Economics, opened the meeting with his keynote Linked Open Data cloud – act now before it’s too late. He reflected on challenges towards arriving at a truly machine-readable and decentralized Web of Data. Javier reviewed the current state of affairs, highlighted key technical and non-technical challenges, and outlined potential solution strategies. The second keynote speaker was Mathieu d’Aquin, Professor of Informatics at the Insight Centre for Data Analytics at NUI Galway. Mathieu, who is specialized in data analytics, completed the meeting with his keynote Dealing with Open-Domain Data.

The 12th edition of the DBpedia Community Meeting also covered a special chapter session, chaired by Enno Meijers, from the Dutch DBpedia Language Chapter. The speakers presented the latest technical or organizational developments of their respective chapter. This session has mainly created an exchange platform for the different DBpedia chapters. For the first time, representatives of the European chapters discussed problems and challenges of DBpedia from their point of view. Furthermore, tools, applications, and projects were presented by each chapter’s representative.

In case you missed the event, a more detailed article can be found here. All slides and presentations are also available on our Website. Further insights, feedback, and photos about the event are available on Twitter via #DBpediaDay.

Welcome to Mountain View  – GSoC mentor summit

GSoC was a vital part of DBpedia’s endeavors in 2018. We had three very talented students that with the help of our great mentors made it to the finish line of the program. You can read about their projects and success story in a dedicated post here.

After a successful 3-month mentoring, two of our mentors had the opportunity to attend the annual Google Summer of Code mentor summit. Mariano Rico and Thiago Galery represented DBpedia at the event this year. They engaged in a vital discussion about this years program, about lessons learned, highlights and drawbacks they experienced during the summer. A special focus was put on how to engage potential GSoC students as early as possible to get as much commitment as possible. The ideas the two mentors brought back in their suitcases will help to improve DBpedia’s part of the program for 2019. And apparently, chocolate was a very big thing there ;).

In case you have a project idea for GSoC2019 or want to mentor a DBpedia project next year, just drop us a line via dbpedia@infai.org. Also, as we intend to participate in the upcoming edition, please spread the word amongst students, and especially female students,  that fancy spending their summer coding on a DBpedia project. Thank you.

 

Welcome to London, England – Connected Data London 2018

In early November, we were invited to Connected Data London again. After 2017 this great event seems to become a regular in our DBpedia schedule.

Executive Director of the DBpedia Association, Sebastian Hellmannparticipated as panel candidate in the discussion around “Building Knowledge Graphs in the Real World”. Together with speakers from Thomson Reuters, Zalando, and Textkernel, he discussed definitions of KG, best practices of how to build and use knowledge graphs as well as the recent hype about it.

Visitors of CNDL2018 had the chance to grab a copy of our brand new flyer and exchange with us about the DBpedia Databus. This event gave us the opportunity to already met early adopters of our databus  – a decentralized data publication, integration, and subscription platform. Thank you very much for that opportunity.

A year went by

2018 has gone by so fast and brought so much for DBpedia. The DBpedia Association got the chance to meet more of DBpedia’s language chapters, we developed the DBpedia Databus to an extent that it can finally be launched in spring 2019. DBpedia is a community project relying on people and with the DBpedia Databus, we create a platform that allows publishing and provides a networked data economy around it. So stay tuned for exciting news coming up next year. Until then we like to thank all DBpedia enthusiasts around the world for their research with DBpedia, and support and contributions to DBpedia. Kudos to you.

 

All that remains to say is have yourself a very merry Christmas and a dazzling New Year. May 2019 be peaceful, exciting and prosperous.

 

Yours – being in a cheerful and festive mood –

 

DBpedia Association

 

A year with DBpedia – Retrospective Part Two

Retrospective Part II. Welcome to the second part of our journey around the world with DBpedia. This time we are taking you to Greece, Germany, to Australia and finally France.

Let the travels begin.

Welcome to Thessaloniki, Greece & ESWC

DBpedians from the Portuguese Chapter presented their research results during ESWC 2018 in Thessaloniki, Greece.  the team around Diego Moussalem developed a demo to extend MAG  to support Entity Linking in 40 different languages. A special focus was put on low-resources languages such as Ukrainian, Greek, Hungarian, Croatian, Portuguese, Japanese and Korean. The demo relies on online web services which allow for an easy access to (their) entity linking approaches. Furthermore, it can disambiguate against DBpedia and Wikidata. Currently, MAG is used in diverse projects and has been used largely by the Semantic Web community. Check the demo via http://bit.ly/2RWgQ2M. Further information about the development can be found in a research paper, available here

 

Welcome back to Leipzig Germany

With our new credo “connecting data is about linking people and organizations”, halfway through 2018, we finalized our concept of the DBpedia Databus. This global DBpedia platform aims at sharing the efforts of OKG governance, collaboration, and curation to maximize societal value and develop a linked data economy.

With this new strategy, we wanted to meet some DBpedia enthusiasts of the German DBpedia Community. Fortunately, the LSWT (Leipzig Semantic Web Tag) 2018 hosted in Leipzig, home to the DBpedia Association proofed to be the right opportunity.  It was the perfect platform to exchange with researchers, industry and other organizations about current developments and future application of the DBpedia Databus. Apart from hosting a hands-on DBpedia workshop for newbies we also organized a well-received WebID -Tutorial. Finally,  the event gave us the opportunity to position the new DBpedia Databus as a global open knowledge network that aims at providing unified and global access to knowledge (graphs).

Welcome down under – Melbourne Australia

Further research results that rely on DBpedia were presented during ACL2018, in Melbourne, Australia, July 15th to 20th, 2018. The core of the research was DBpedia data, based on the WebNLG corpus, a challenge where participants automatically converted non-linguistic data from the Semantic Web into a textual format. Later on, the data was used to train a neural network model for generating referring expressions of a given entity. For example, if Jane Doe is a person’s official name, the referring expression of that person would be “Jane”, “Ms Doe”, “J. Doe”, or  “the blonde woman from the USA” etc.

If you want to dig deeper but missed ACL this year, the paper is available here.

 

Welcome to Lyon, France

In July the DBpedia Association travelled to France. With the organizational support of Thomas Riechert (HTWK, InfAI) and Inria, we finally met the French DBpedia Community in person and presented the DBpedia Databus. Additionally, we got to meet the French DBpedia Chapter, researchers and developers around Oscar Rodríguez Rocha and Catherine Faron Zucker.  They presented current research results revolving around an approach to automate the generation of educational quizzes from DBpedia. They wanted to provide a useful tool to be applied in the French educational system, that:

  • helps to test and evaluate the knowledge acquired by learners and…
  • supports lifelong learning on various topics or subjects. 

The French DBpedia team followed a 4-step approach:

  1. Quizzes are first formalized with Semantic Web standards: questions are represented as SPARQL queries and answers as RDF graphs.
  2. Natural language questions, answers and distractors are generated from this formalization.
  3. We defined different strategies to extract multiple choice questions, correct answers and distractors from DBpedia.
  4. We defined a measure of the information content of the elements of an ontology, and of the set of questions contained in a quiz.

Oscar R. Rocha and Catherine F. Zucker also published a paper explaining the detailed approach to automatically generate quizzes from DBpedia according to official French educational standards. 

 

 

Thank you to all DBpedia enthusiasts that we met during our journey. A big thanks to

With this journey from Europe to Australia and back we provided you with insights into research based on DBpedia as well as a glimpse into the French DBpedia Chapter. In our final part of the journey coming up next week, we will take you to Vienna,  San Francisco and London.  In the meantime, stay tuned and visit our Twitter channel or subscribe to our DBpedia Newsletter.

 

Have a great week.

Yours DBpedia Association