All posts by Sandra Praetor

The DBpedia Databus – transforming Linked Data into a networked data economy

Working with data is hard and repetitive. That is why we are more than happy to announce the launch of the alpha version of our DBpedia Databus, a way that simplifies working with data. 

We have studied the data network for already 10 years and we conclude that organizations with open data are struggling to work together properly. Even though they could and should collaborate, they are hindered by technical and organizational barriers. They duplicate work on the same data. On the other hand, companies selling data cannot do so in a scalable way. The consumers are left empty-handed and trapped between the choice of inferior open data or buying from a jungle-like market. 

We need to rethink the incentives for linking data

Vision

We envision a hub, where everybody uploads data. In that hub, useful operations like versioning, cleaning, transformation, mapping, linking, merging, hosting are done automagically on a central communication system, the bus, and then again dispersed in a decentral network to the consumers and applications.  On the Databus, data flows from data producers through the platform to the consumers (left to right), any errors or feedback flows in the opposite direction and reaches the data source to provide a continuous integration service and improves the data at the source.

The DBpedia Databus is a platform that allows exchanging, curating and accessing data between multiple stakeholders. Any data entering the bus will be versioned, cleaned, mapped, linked and its licenses and provenance tracked. Hosting in multiple formats will be provided to access the data either as dump download or as API.

Publishing data on the Databus means connecting and comparing your data to the network

If you are grinding your teeth about how to publish data on the web, you can just use the Databus to do so. Data loaded on the bus will be highly visible, available and queryable. You should think of it as a service:

  • Visibility guarantees, that your citations and reputation goes up.
  • Besides a web download, we can also provide a Linked Data interface, SPARQL-endpoint, Lookup (autocomplete) or other means of availability (like AWS or Docker images).
  • Any distribution we are doing will funnel feedback and collaboration opportunities your way to improve your dataset and your internal data quality.
  • You will receive an enriched dataset, which is connected and complemented with any other available data (see the same folder names in data and fusion folders).

 How it works at the moment

Integration of data is easy with the Databus. We have been integrating and loading additional datasets alongside DBpedia for the world to query. Popular datasets are ICD10 (medical data) and organizations and persons. We are still in an initial state, but we already loaded 10 datasets (6 from DBpedia, 4 external) on the bus using these phases:

  1.  Acquisition: data is downloaded from the source and logged in.
  2. Conversion: data is converted to N-Triples and cleaned (Syntax parsing, datatype validation, and SHACL).
  3. Mapping: the vocabulary is mapped on the DBpedia Ontology and converted (We have been doing this for Wikipedia’s Infoboxes and Wikidata, but now we do it for other datasets as well).
  4. Linking: Links are mainly collected from the sources, cleaned and enriched.
  5. IDying: All entities found are given a new Databus ID for tracking.
  6.  Clustering: ID’s are merged onto clusters using one of the Databus ID’s as cluster representative.
  7. Data Comparison: Each dataset is compared with all other datasets. We have an algorithm that decides on the best value, but the main goal here is transparency, i.e. to see which data value was chosen and how it compares to the other sources.
  8. A main knowledge graph fused from all the sources, i.e. a transparent aggregate.
  9. For each source, we are producing a local fused version called the “Databus Complement”. This is a major feedback mechanism for all data providers, where they can see what data they are missing, what data differs in other sources and what links are available for their IDs.
  10. You can compare all data via a web service.

Contact us via dbpedia@infai.org if you would like to have additional datasets integrated and maintained alongside DBpedia.

From your point of view

Data Sellers

If you are selling data, the Databus provides numerous opportunities for you. You can link your offering to the open entities in the Databus. This allows consumers to discover your services better by showing it with each request.

Data Consumers

Open data on the Databus will be a commodity. We are greatly downing the cost of understanding the data, retrieving and reformatting it. We are constantly extending ways of using the data and are willing to implement any formats and APIs you need. If you are lacking a certain kind of data, we can also scout for it and load it onto the Databus.

Is it free?

Maintaining the Databus is a lot of work and servers incurring a high cost. As a rule of thumb, we are providing everything for free that we can afford to provide for free. DBpedia was providing everything for free in the past, but this is not a healthy model, as we can neither maintain quality properly nor grow.

On the Databus everything is provided “As is” without any guarantees or warranty. Improvements can be done by the volunteer community. The DBpedia Association will provide a business interface to allow guarantees, major improvements, stable maintenance, and hosting.

License

Final databases are licensed under ODC-By. This covers our work on recomposition of data. Each fact is individually licensed, e.g. Wikipedia abstracts are CC-BY-SA, some are CC-BY-NC, some are copyrighted. This means that data is available for research, informational and educational purposes. We recommend to contact us for any professional use of the data (clearing) so we can guarantee that legal matters are handled correctly. Otherwise, professional use is at own risk.

Current Statistics

The Databus data is available at http://downloads.dbpedia.org/databus/ ordered into three main folders:

  • Data: the data that is loaded on the Databus at the moment
  • Global: a folder that contains provenance data and the mappings to the new IDs
  • Fusion: the output of the Databus

Most notably you can find:

  • Provenance mapping of the new ids in global/persistence-core/cluster-iri-provenance-ntriples/<http://downloads.dbpedia.org/databus/global/persistence-core/cluster-iri-provenance-ntriples/> and global/persistence-core/global-ids-ntriples/<http://downloads.dbpedia.org/databus/global/persistence-core/global-ids-ntriples/>
  • The final fused version for the core: fusion/core/fused/<http://downloads.dbpedia.org/databus/fusion/core/fused/>
  • A detailed JSON-LD file for data comparison: fusion/core/json/<http://downloads.dbpedia.org/databus/fusion/core/json/>
  • Complements, i.e. the enriched Dutch DBpedia Version: fusion/core/nl.dbpedia.org/<http://downloads.dbpedia.org/databus/fusion/core/nl.dbpedia.org/>

(Note that the file and folder structure are still subject to change)

Sources

 

Upcoming Developments

Data market
  • build your own data inventory and merchandise your data via Linked Data or via secure named graphs in the DBpedia SPARQL Endpoint (WebID + TLS + OpenLink’s  Virtuoso database)
DBpedia Marketplace
  • Offer your Linked Data tools, services, products
  • Incubate new research into products
  • Example: Support for RDFUnit (https://github.com/AKSW/RDFUnit created by the SHACL editor), assistance with SHACL writing and deployment of the open-source software

 

DBpedia and the Databus will transform Linked Data into a networked data economy

 

For any questions or inquiries related to the new DBpedia Databus, please contact us via dbpedia@infai.org

 

Yours,

DBpedia Association

DBpedia supports young developers

Supporting young and aspiring developers has always been part of DBpedia‘s philosophy. Through various internships and collaborations with programmes such as Google Summer of Code, we were able to not only meet aspiring developers but also establish long-lasting relationships with these DBpedians ensuring a sustainable progress for and with DBpedia.  For 6 years now, we have been part of Google Summer of Code, one of our favorite programmes. This year, we are also taking part in Coding da Vinci, a German-based cultural data hackathon, where we support young hackers, coders and smart minds with DBpedia datasets.

DBpedia at Google Summer of Code 2018

This year, DBpedia will participate for the sixth time in a row in the Google Summer of Code program (GSoC). Together with our amazing mentors, we drafted 9 project ideas which GSOC applicants could apply to. Since March 12th, we received many proposal drafts out of which 12 final projects proposals have been submitted. Competition is very high as student slots are always limited. Our DBpedia mentors were critically reviewing all proposals for their potential and for allocating them one of the rare open slots in the GSoC program. Finally, on Monday, April 23rd, our 6 finalists have been announced. We are very proud and looking forward to the upcoming months of coding. The following projects have been accepted and will hopefully be realized during the summer.

Our gang of DBpedia mentors comprises of very experienced developers that are working with us on this project for several years now. Speaking of sustainability, we also have former GSoC students on board, who get the chance to mentor projects building on ideas of past GSoC’s. And while students and mentors start bonding, we are really looking forward to the upcoming months of coding – may it be inspiring, fun and fruitful.  

 

DBpedia @ Coding da Vinci 2018

As already mentioned in the previous newsletter, DBpedia is part of the CodingDaVinciOst 2018. Founded in Berlin in 2014, Coding da Vinci is a platform for cultural heritage institutions and the hacker, developer, designer, and gamer community to jointly develop new creative applications from cultural open data during a series of hackathon events. In this year’s edition, DBpedia provides its datasets to support more than 30 cultural institutions, enriching their datasets in order participants of the hackathon can make the most out of the data. Among the participating cultural institutions are, for example, the university libraries of Chemnitz, Jena, Halle, Freiberg, Dresden and Leipzig as well as the Sächsisches Staatsarchiv, Museum für Druckkunst Leipzig, Museum für Naturkunde Berlin, Duchess Anna Amalia Library, and the Museum Burg Posterstein.

CodingDaVinciOst 2018, the current edition of the hackathon, hosted a kick-off weekend at the Bibliotheca Albertina, the University Library in Leipzig. During the event, DBpedia offered a hands-on workshop for newbies and interested hackathon participants who wanted to learn about how to enrich their project ideas with DBpedia or how to solve potential problems in their projects with DBpedia.

We are now looking forward to the upcoming weeks of coding and hacking and can’t wait to see the results on June 18th, when the final projects will be presented and awarded. We wish all the coders and hackers a pleasant and happy hacking time. Check our DBpedia Twitter for updates and latest news.  

If you have any questions, like to support us in any way or if you like to learn more about DBpedia, just drop us a line via dbpedia@infai.org

Yours,
DBpedia Association

DBpedia and GSoC 2018 – Call for Students

This year, DBpedia will participate for the sixth time in a row in the Google Summer of Code program (GSoC). We are regularly growing our community through GSoC and are currently looking for students who want to join us for a summer of coding. Read below for further details about GSoC and how to apply.

What is GSoC?

Google Summer of Code is a global program focused on bringing more student developers into open source software development. Funds will be given to students (BSc, MSc, Ph.D.) to work for three months on a specific task. At first, open source organizations announce their student projects and then students should contact the mentor organizations they want to work with and write up a project proposal for the summer. After a selection phase, students are matched with a specific project and a set of mentors to work on the project during the summer.

If you are a GSoC student who wants to apply to our organization, please check our guidelines before you start drafting your project proposal.

This year GSoC timeline is as follows:

March 12th, 2018 Student applications open (Students can register and submit their applications to mentor organizations.)
April 9th, 2018 Student application deadline
April 23rd, 2018 Accepted students are announced and paired with a mentor. Bonding period begins.
May 14h, 2018 Coding officially begins!
August 6th, 2018 Final week: Students submit their final work product and their final mentor evaluation
August 22nd, 2018 Final results of Google Summer of Code 2017 announced

Check our website, follow us on #twitter or subscribe to our newsletter for further updates.

We are looking forward to your application.

Your DBpedia Association

Behind the scenes of DBpedia

DBpedia is part of a large network of industry and academia, companies, and organizations as well as 20 Universities including student members. Our aim is to qualify aspiring developers and knowledge graph enthusiasts by working together with industry partners on DBpedia-related tasks. The final goal is, that DBpedia can be effectively integrated into organizations and businesses and incubate their knowledge graph to the next level. We intend to foster collaboration between DBpedia and organizations sharing an interest in and want to profit from Open-Knowledge-Graph governance.

Therefore we have developed an internship and mentoring program to:

  1. qualify a motivated student and beginner level developer. He/she learns about DBpedia, your organization, and data. Collaborative coaching may qualify intern for hiring.
  2. move your organization’s data space closer to DBpedia. Internship goals are defined by DBpedia mentors and your organization and focus on concrete needs of your business.
  3. gain insight into your needs helping us to shape our strategy for the future.

Springer Nature was the first partner we collaborated with, in of our new program.  We set out on an endeavor to interlink Springer Nature’s SciGraph and DBpedia datasets. 

With Beyza Yaman, who managed to prevail against 7 other international competitors,  we found the perfect partner in crime to tackle this challenge. Read her interview below and find out more about the internship.

 

Who are you?

My name is Beyza Yaman and I am a Ph.D. student in the Department of Computer Science and Engineering (DIBRIS) at University of Genoa (Italy). I am working on the problem of source selection on Linked Open Data for live queries proposing a context and quality dependent solution. Beside my studies, I like to meet new people, learn their cultures and discover new places, especially by walking/hiking events.

Why DBpedia? What is your main interest in DBpedia and what was your motivation to apply for our collaborative internship?

I have already been using DBpedia datasets for my experiments. Besides from being the core of the Linked Data Cloud, DBpedia is one of the platforms which brings the applied semantic technologies forward and ahead of most other data technologies. Also, collaboration with Springer Nature which is one of the best publishing companies was the cherry on the cake! Springer is an innovative company which applies the latest technologies to their requirements. Thus, being involved in a project with different grounds seemed to be a fruitful experience.  When I saw the announcement of the internship, I thought this is a great opportunity not to be missed!

What have you been responsible for?

My tasks during the internship had been

  • Link discovery between SciGraph and DBpedia datasets
  • Text mining of DBpedia entities from Springer Nature publication content
  • Create useful linkset-files for the resulting links

 

As the web of data is growing into the interlinked data space, data sources should be connected to discover further insight from the data by creating meaningful relations. Moreover, further information (e.g. quality) about these link sets forms another aspect of the Semantic Web objectives. Thus, we worked on interlinking SciGraph and DBpedia datasets by using the  Link Discovery approach for the structured content and the Named Entity Recognition approach for unstructured text. We were able to integrate SciGraph data with DBpedia resources which improves the identity resolution in the existing resources and to enrich the SciGraph data with additional relations by annotating SciGraph content with DBpedia links which increases the discoverability of the data. One of the challenges we faced was having a huge amount of data and, actually, we have produced even more for the Linked Data users. You can follow our work, use the data and give us feedback from this repository (https://github.com/dbpedia/sci-graph-links).

What did you learn from the project?

It has been a fantastic experience which helped me to expand my theoretical knowledge with a lot of practical aspects. I worked with Markus Freudenberg from DBpedia and Tony Hammond, Michele Pasin and Evangelos Theodoridis from Springer Nature. Working with technically well-equipped researchers and professionals on the subject has been very influential for my research. Especially, working with a team of academics and professionals in collaboration has taught me two different views of looking at the project. I learned more about SciGraph data and DBpedia, as well as, many ways of dealing with huge amount of data, tools used in DBpedia and Linked Data environment, the importance of open source data/codes. Besides the project, I had a chance to witness development phases of DBpedia in the Knowledge Integration and Linked Data Technologies (KILT) group (Leipzig) with a bunch of cool guys and girls who made my stay more enjoyable. I also met a lot of researchers with Semantic Web experience which has extended my point of view widely.

What are your next plans? How do you want to contribute to DBpedia in the future?

I would like to finish my Ph.D. and extend my knowledge by involving new exciting projects like this one. Publishing what we have done and further quality improvements might be a nice follow up for the work and Linked Data community. Besides, I would like to contribute to the development of the Turkish DBpedia Chapter which is unfortunately missing. Thus, in this way, we can promote the usage and development of DBpedia and Linked Data to the Turkish research community and companies as well.

There will also be a report on the collaboration between Springer Nature and DBpedia that will cover the technical details of linking DBpedia and SciGraph datasets. We will keep you informed about news via Twitter and our Website.

We are really happy to have worked with her and we are now looking forward to a Turkish DBpedia Chapter.  If you are a DBpedia enthusiast and want to help to start the Turkish DBpedia chapter, just get in touch with Beyza or contact us.

Did her story inspire you? Do you want to become an intern at DBpedia? Check our Website, Twitter, and Social Media and don’t miss any internship updates.

Last but not least, we like to thank Springer Nature for their cooperation and commitment to the project.

In case you like to collaborate with us in order to find a developer that helps to integrate DBpedia into your business get in touch with us via dbpedia@infai.org.

 

Yours,

DBpedia Association

Keep using DBpedia!

Just recently, DBpedia Association member and hosting specialist, OpenLink released the DBpedia Usage report, a periodic report on the DBpedia SPARQL endpoint and associated Linked Data deployment.

The report not only gives some historical insight into DBpedia’s usage, number of visits and hits per day but especially shows statistics collected between October 2016 and December 2017. The report covers more than a year of logs from the DBpedia web service operated by OpenLink Software at http://dbpedia.org/sparql/.  

Before we want to highlight a few aspects of DBpedia’s usage we would like to thank Open Link for the continuous hosting of the DBpedia Endpoint and the creation of this report

The graph shows the average number of hits/requests per day that were made to the DBpedia service during each of the releases.
The graph shows the average number of unique visits per day made to the DBpedia service during each of the datasets.

Speaking of which, as you can see in the following tables, there has been a massive increase in the number of hits coinciding with the DBpedia 2015–10 release on April 1st, 2016.

 

 

 

 

This boost can be attributed to an intensive promotion of DBpedia via community meetings, communication with various partners in the Linked Data community and Social media presence among the community, in order to increase backlinks.

Since then, not only the numbers of hits increased but DBpedia also provided for better data quality. We are constantly working on improving accessibility, data quality and stability of the SPARQL endpoint. Kudos to Open Link for maintaining the technical baseline for DBpedia.

The table shows the usage overview of last year.

The full report is available here.

 

Subscribe to the DBpedia Newsletter, check our DBpedia Website and follow us on Twitter, Facebook, and LinkedIn for the latest news.

Thanks for reading and keep using DBpedia!

Yours DBpedia Associaton

 

To the DBpedia Community!

Can you believe ..?

… that it has already been eleven years since the first DBpedia dataset was released? Eleven years of development,  improvements and growth, and now, 13 billion pieces of information are comprised in our last DBpedia release. We want to take this opportunity to send out a big thank you to all contributors, developers, coders, hosters, funders, believers and DBpedia enthusiasts who made that possible. Thank you for your support.

But, apart from our data sets, there is much more DBpedia has been doing., especially during the past year. Think about the success story of Wouter Maroy, a GSoC 2016 student who got the opportunity to do a six weeks internship at our DBpedia office in Leipzig and who is still contributing to DBpedia’s progress.

All in all, 2017 was highly successful and full of exciting events. Remember our 10th DBpedia Community Meeting in Amsterdam featuring an inspiring keynote by Dr. Chris Welty, one of the developers at IBM computer Watson. Our DBpedia meetings are always a great way to bring the community closer together, and to not only meet our DBpedia audience but also new faces. Therefore, we have already started to plan our community meetings for 2018.

We hope to see you in Poznan, Poland, in spring and to meet you during the SEMANTiCS Conference in Vienna, from 10th – 13th of September 2018. Additionally, if everything goes according to plan, we will be mentoring young DBpedia enthusiasts throughout summer in GSoC 2018 and meet the US DBpedia community in autumn this year. Follow us on Twitter or check our Website for the latest News.

And last but not least, this year we plan something special. DBpedia intends to participate in Coding DaVinci – Germany’s first open cultural hackathon, which happens to take place in Leipzig, right around the corner. Aspiring data enthusiast will develop new creative applications from cultural open data. The kick-off is in early April, followed by 9 weeks of cooperative coding. We are eagerly awaiting the start of this event.

We do hope, we will meet you and some new faces during our events this year. The DBpedia Association want to get to know you because DBpedia is a community effort and would not continue to develop, improve and grow without you. Thank you and see you soon…

Subscribe to the DBpedia Newsletter, check our DBpedia Website and follow us on Twitter, Facebook and LinkedIn for latest news.

Your DBpedia Association

Meeting with the US-DBpedians – A Wrap-Up

One lightning event after the other. Just four weeks after our Amsterdam Community Meeting, we crossed the Atlantic for the third time to meet with over 110 US-based DBpedia enthusiasts. This time, the DBpedia Community met in Cupertino, California and was hosted at Apple Inc. 

Main Event

First and foremost, we would like to thank Apple for the warm welcome and the hosting of the event.

After a Meet & Greet with refreshments, Taylor Rhyne, Eng. Product Manager at Apple, and Pablo N. Mendes, Researcher at Apple and chair of the DBpedia Community Committee, opened the main event with a short introduction setting the tone for the following 2 hours.

The main event attracted attendees with eleven invited talks from major companies of the Bay Area actively using DBpedia or interested in knowledge graphs in general such as Diffbot, IBM, Wikimedia, NTENT, Nuance, Volley and Stardog Union.

Tommaso Soru – University of Leipzig

Tommaso Soru (University of Leipzig), DBpedia mentor in our Google Summer of Code (GSoC) projects, opened the invited talks session with the updates from the DBpedia developer community. This year, DBpedia participated in the GSoC 2017 program with 7 different projects including “First Chatbot for DBpedia”, which was selected as Best DBpedia GSoC Project 2017. His presentation is available here. 

DBpedia likes to thank the following poeple for organizinga nd hosting our Community Meeting in Cupertino, California.

Further Acknowledgments

 

Apple Inc. For sponsoring catering and hosting our meetup on their campus.
Google Summer of Code 2017 Amazing program and the reason some of our core DBpedia devs are visiting California
ALIGNED – Software and Data Engineering For funding the development of DBpedia as a project use-case and covering part of the travel cost
Institute for Applied Informatics For supporting the DBpedia Association
OpenLink Software For continuous hosting of the main DBpedia Endpoint

Invited Talks- A Short Recap

Filipe Mesquita (Diffbot) introduced the new DBpedia NLP Department, born from a recent partnership between our organization and the California based company, which aims at creating the most accurate and comprehensive database of human knowledge. His presentation is available here. Dan Gruhl (IBM Research) held a presentation about the in-house development of an omnilingual ontology and how DBpedia data supported this

Filipe Mesquita – Diffbot

endeavor. Stas Malyshev representative for Dario Taraborelli (both Wikimedia Foundation) presented the current state of the structured data initiatives at Wikidata and the query capabilities for Wikidata. Their slides are available here and here. Ricardo Baeza-Yates (NTENT) gave a short talk on mobile semantic search.

The second part of the event saw Peter F. Patel-Schneider (Nuance) holding a presentation with the title “DBpedia from the Fringe” giving some insights on how DBpedia could be further improved. Shortly after, Sebastian Hellmann, Executive Director of the DBpedia Association, walked the stage and presented the state of the art of the association, including achievements and future goals. Sanjay Krishnan (U.C. Berkeley) talked about the link between AlphaGo and data cleansing. You can find his slides here.  Bill Andersen (Volley.com) argued for the use of extremely precise and fine-grained approaches to deal with small data. His presentation is available here. Finally, Michael Grove (Stardog Union) stressed on the view of knowledge graphs as knowledge toolkits backed by a graph data model.

Michael Grove – Stardog Union

The event concluded with refreshments, snacks and drinks served in the atrium allowing to talk about the presented topics, discuss the latest developments in the field of knowledge graphs and network between all participants. In the end, this closing session was way longer than had been planned.

GSoC Mentor Summit

Shortly after the CA Community Meeting, our DBpedia mentors Tommaso Soru and Magnus Knuth participated at the Google Summer of Code Mentor Summit held in Sunnyvale California. During free sessions hosted by mentors of diverse open source organizations, Tommaso and Magnus presented selected projects during their lightning talks. Beyond open source, open data topics have been targeted in multiple sessions, as this is not only relevant for research, but there is also a strong need in software projects. The meetings paved the way for new collaborations in various areas, e.g. the field of question answering over the DBpedia knowledge corpus, in particular the use of Neural SPARQL Machines for the translation of natural language into structured queries. We expect that this hot deep-learning topic will be featured in the next edition of GSoC projects. Overall, it has been a great experience to meet so many open source fellows from all over the world.

Upcoming events

After the event is before another ….

Connected Data London, November 16th, 2017.

Sebastian Hellmann, executive director of the DBpedia Association will present Data Quality and Data Usage in a large-scale Multilingual Knowledge Graph during the content track at the Connected Data in London. He will also join the panelists in the late afternoon panel discussion about Linked Open Data: Is it failing or just getting out of the blocks? Feel free to join the event and support DBpedia.

A message for all  DBpedia enthusiasts – our next Community Meeting

Currently we are planning our next Community Meeting  and would like to invite DBpedia enthusiasts and chapters who like to host a meeting to send us their ideas to dbpedia@infai.org. The meeting is scheduled for the beginning of 2018. Any suggestions regarding place, time, program and topics are welcome!  

Check our website for further updates, follow us on #twitter or subscribe to our newsletter.

We will keep you posted

Your DBpedia Association

GSoC 2017 – Recap and Results

We are very pleased to announce that all of this year’s Google Summer of Code students made it successful through the program and passed their projects. All codes have been submitted, merged and are ready to be examined by the rest of the world.

Marco Fossati, Dimitris Kontokostas, Tommaso Soru, Domenico Potena, Emanuele Storti , anastasia Dimiou, Wouter Maroy, Peng Xu, Sandro Coelho and Ricardo Usbeck, members of the DBpedia Community, did a great job in mentoring 7 students from around the world. All of the students enjoyed the experiences made during the program and will hopefully continue to contribute to DBpedia in the future.

“GSoC is the perfect opportunity to learn from experts, get to know new communities, design principles and work flows.” (Ram G Athreya)”

Now, we would like to take that opportunity to  give you a little recap of the projects mentored by DBpedia members during the past months. Just click below for more details .

 

DBpedia Mappings Front-End Administration by Ismael Rodriguez

The goal of the project was to create a front-end application that provides a user-friendly interface so the DBPedia community can easily view, create and administrate DBpedia mapping rules using RML. The developed system includes user administration features, help posts, Github mappings synchronization, and rich RML related features such as syntax highlighting, RML code generation from templates, RML validation, extraction and statistics. Part of these features are possible thanks to the interaction with the DBPedia Extraction Framework. In the end, all the functionalities and goals that were required have been developed, with many functional tests and the approval of the DBpedia community. The system is ready for production deployment. For further information, please visit the project blog.  Mentors: Anastasia Dimou and Wouter Maroy (Ghent University), Dimitris Kontokostas (GeoPhy HQ).

Chatbot for DBpedia by Ram G Athreya

DBpedia Chatbot is a conversational chatbot for DBpedia which is accessible through the following platforms: a Web Interface, Slack and Facebook Messenger.

The bot is capable of responding to users in the form of simple short text messages or through more elaborate interactive messages. Users can communicate or respond to the bot through text and also through interactions (such as clicking on buttons/links). The bot tries to answer text based questions of the following types: natural language questions, location information, service checks, language chapters, templates and banter. For more information, please follow the link to the project site. Mentor: Ricardo Usbeck (AKSW).

Knowledge Base Embeddings for DBpedia by Nausheen Fatma

Knowledge base embeddings has been an active area of research. In recent years a lot of research work such as TransE, TransR, RESCAL, SSP, etc. has been done to get knowledge base embeddings. However none of these approaches have used DBpedia to validate their approach. In this project, I want to achieve the following tasks: i) Run the existing techniques for KB embeddings for standard datasets. ii) Create an equivalent standard dataset from DBpedia for evaluations. iii) Evaluate across domains. iv) Compare and Analyse the performance and consistency of various approaches for DBpedia dataset along with other standard datasets. v) Report any challenges that may come across implementing the approaches for DBpedia. For more information, please follow the links to her project blog and GitHub-repository. Mentors: Tommaso Soru (AKSW) and  Sandro Coelho (KILT).

Knowledge Base Embeddings for DBpedia by Akshay Jagatap

The project defined embeddings to represent classes, instances and properties by implementing Random Vector Accumulators with additional features in order to better encode the semantic information held by the Wikipedia corpus and DBpedia graphs. To test the quality of embeddings generated by the RVA, lexical memory vectors of locations were generated and tested on a modified subset of the Google Analogies Test Set. Check out further information via Akshay’s GitHub-repo. Mentors: Tommaso Soru (AKSW) and Xu Peng (University of Alberta).

The Table Extractor by Luca Vergili

Wikipedia is full of data hidden in tables. The aim of this project was to explore the possibilities of exploiting all the data represented with the appearance of tables in Wiki pages, in order to populate the different chapters of DBpedia through new data of interest. The Table Extractor has to be the engine of this data “revolution”: it would achieve the final purpose of extracting the semi structured data from all those tables now scattered in most of the Wiki pages. In this page you can observe dataset (english and italian) extracted using table extractor . Furthermore you can read log file created in order to see all operations made up for creating RDF triples. I recommend to also see this page, that contains the idea behind the project and an example of result extracted from log files and .ttl dataset. For more details see Luca’s Git-Hub repository. Mentors: Domenico Potena and Emanuele Storti (Università Politecnica delle Marche).

 

Unsupervised Learning of DBpedia Taxonomy by Shashank Motepalli

Wikipedia represents a comprehensive cross-domain source of knowledge with millions of contributors. The DBpedia project tries to extract structured information from Wikipedia and transform it into RDF.

The main classification system of DBpedia depends on human curation, which causes it to lack coverage, resulting in a large amount of untyped resources. DBTax provides an unsupervised approach that automatically learns a taxonomy from the Wikipedia category system and extensively assigns types to DBpedia entities, through the combination of several NLP and interdisciplinary techniques. It provides a robust backbone for DBpedia knowledge and has the benefit of being easy to understand for end users. details about his work and his code can e found on the projects site. Mentors: Marco Fossati (Università degli Studi di Trento) and Dimitris Kontokostas (GeoPhy HQ). 

The  Wikipedia List-Extractor by Krishanu Konar

This project aimed to augment upon the already existing list-extractor project by Federica in GSoC 2016. The project focused on the extraction of relevant but hidden data which lies inside lists in Wikipedia pages. Wikipedia, being the world’s largest encyclopedia, has humongous amount of information present in form of text. While key facts and figures are encapsulated in the resource’s infobox, and some detailed statistics are present in the form of tables, but there’s also a lot of data present in form of lists which are quite unstructured and hence its difficult to form into a semantic relationship. The main objective of the project was to create a tool that can extract information from Wikipedia lists and form appropriate RDF triplets that can be inserted in the DBpedia dataset. Fore details on the code and about the project check Krishanu’s blog and GitHub-repository. Mentors: Marco Fossati (Università degli Studi di Trento), Domenico Potena and Emanuele Storti (Università Politecnica delle Marche). 

Read more

We are regularly growing our community through GSoC and can deliver more and more opportunities to you. Ideas and applications for the next edition of GSoC are very much welcome. Just contact us via email or check our website for details.

Again, DBpedia is planning to be a vital part of the GSoC Mentor Summit, from October 13th -15th, at the Google Campus in Sunnyvale California. This summit is a way to say thank you to the mentors for the great job they did during the program. Moreover it is a platform to discuss what can be done to improve GSoC and how to keep students involved in their communities post-GSoC.

And there is more good news to tell.  DBpedia wants to meet up with the US community during the 11th DBpedia Community Meeting in California.  We are currently working on the program and keep you posted as soon as registration is open.

So, stay tuned and check  Twitter, Facebook and the Website or subscribe to our Newsletter for latest news and updates.

See you soon!

Yours,

DBpedia Association

Career Opportunities at DBpedia – A Success Story

Google summer of Code is a global program focused on introducing students to open source software development.

During the 3 months summer break from university, students work on a programming projects  with an open source organization, like DBpedia. 

We are part of this exciting program for more than 5 years now. Many exciting projects developed as results of intense coding during hot summers. Presenting you Wouter Maroy, who has been a GSoC student at GSoc 2016 and who is currently a mentor in this years program, we like to give you a glimpse behind the scenes and show you how important the program is to DBpedia.


Success Story: Wouter Maroy

Who are you?

I’m Wouter Maroy, a 23 years old Master’s student in Computer Science Engineering at Ghent University (Belgium). I’m affiliated with IDLab – imec. Linked Data and Big Data technologies are my two favorite fields of interest. Besides my passion for Computer Science, I like to travel, explore and look for adventures. I’m a student who enjoys his student life in Ghent.  

What is your main interest in DBpedia and what was your motivation to apply for a DBpedia project at GSoC 2016.

I took courses during my Bachelors with lectures about Linked Data and the Semantic Web which of course included DBpedia; it’s an interesting research field. Before my GSoC 2016 application I did some work on Semantic Web technologies and on a technology (RML) that was required for a GSoC 2016 project that was listed by DBpedia. I wanted to get involved in Open Source and DBpedia, so I applied.

What did you do?

DBpedia has used a custom mapping language up until now to generate structured data from raw data from Wikipedia infoboxes. A next step was to improve this process to a more modular and generic approach that leads to higher quality linked data generation . This new approach relied on the integration of RML, the RDF Mapping Language and was the goal of the GSoC 2016 project I applied for. Understanding all the necessary details about the GSoC project required some effort and research before I started with coding. I also had to learn a new programming language (Scala). I had good assistance from my mentors so this turned out very well in the end.  DBpedia’s Extraction Framework, which is used for extracting structured data from Wikipedia, has a quite large codebase. It was the first project of this size I was involved in. I learned a lot from reading its codebase and from contributing by writing code during these months.

Dimitris Kontokostas and Anastasia Dimou were my two mentors. They guided me well throughout the project. I interacted daily with them through Slack and each week we had a conference call to discuss the project.  After many months of research, coding and discussing we managed to deliver a working prototype at the end of the project. The work we did was presented in Leipzig on the DBpedia day during SEMANTICS 16’. Additionally, this work will also be presented at ISWC 2017.

You can check out his project here.

How do you currently contribute to improve DBpedia?  

I’m mentoring a GSoC17 project together with Dimitris Kontokostas and Anastasia Dimou as a follow up on the work that was done by our GSoC 2016 project last year. Ismael Rodriguez is the new student who is participating in the project and he already delivered great work! Besides being a mentor for GSoC 2017, I make sure that the integration of RML into DBpedia is going into the right direction in general (managing, coding,…). For this reason, I worked at the KILT/DBpedia office in Leipzig during summer for 6 weeks. Joining and getting to know the team was a great experience.

What did you gain from the project?

Throughout the project I practiced coding, working in a team, … I learned more about DBpedia, RML, Linked Data and other related technologies. I’m glad I had the opportunity to learn this much from the project. I would recommend it to all students who are curious about DBpedia, who are eager to learn and who want to earn a stipend during summer through coding. You’ll learn a lot and you’ll have a good time!

Final words to future GSoC applicants for DBpedia projects.

Give it a shot! Really, it’s a lot of fun! Coding for DBpedia through GSoC is a great, unique experience and one who is enthusiastic about coding and the DBpedia project should definitely go for it.

Check our website for further updates, follow us on #twitter or subscribe to our newsletter.

Yours

DBpedia Association

 

Failte, Éirinn go Brách

Thanks to LDK2017 for co-hosting the DBpedia Community Meeting

After our 2nd Community Meeting in the US, we delighted the Irish DBpedia Community with the 9th DBpedia Community Meeting, which was co-located with the Language, Data and Knowledge Conference 2017 in Galway at the premises of the NUI Galway.

First and foremost, we would like to thank John McCrae (Insight Centre for Data Analytics, NUI Galway) and the LDK Conference for co-hosting and support the event.

 

The focus of this Community Meeting was the Irish DBpedia and Linked Data Community in Ireland. Therefore we invited local data scientists as well as European DBpedia enthusiasts to discuss the state of Irish Linked Data.

The meeting started with two compelling keynotes by Brian Ó Raghallaigh, Dublin City University and Logainm.ie, and Sharon Flynn, NUI Galway and Wikimedia Ireland. Brian presented Logainm.ie, a data use case about placenames in Ireland with a special focus on linked Logainm and machine-readable data.

Brian Ó Raghallaigh

His insightful presentation was followed by Sharon Flynn talking about Wikimedia in Ireland and the challenges of “this monumental undertaking” with particular reference to the Wikimedia Community in Ireland.

Sharon Flynn

For more details on the content of the presentations, follow the links to the slides.

Brian’s and Sharon’s slides

 

 

Showcase Session

Eoin McCuirc

Eoin McCuirc started the DBpedia Showcase Session “MY sweet LOD”, an insightful presentation on Linked Open Data in Ireland from the perspective of a statistics office.

Shortly after, Ronald Stamper, Chairman of Measur Ltd. elaborated on semantic normal form, ontologies and the perils of paradigm change.

Ben de Meester

Ben De Meester, from Ghent University, presented the first DBpedia Showcase about Declarative Data Transformation for Linked Data Generation.

Followed by another showcase by Alan Meehan, presenting the SUMMR Interlink Validation tool which validates Interlinks from a source dataset to multiple targets.

Fred Durao

Closing the Showcase Session,  Frederico Araujo Durao, Insight Centre for Data Analytics – University College Cork (UCC), presented a demo of his linked data browser.

 

For further details of the presentations follow the links to the slides.

Parallel sessions

As a regular part of the DBpedia Community Meeting we have two parallel sessions in the afternoon where DBpedia newbies can learn about what DBpedia is and how to use the DBpedia data sets.

Markus Freudenberg giving a DBpedia Tutorial

 

Participants who wanted to learn DBpedia basics joined the DBpedia Tutorial Session by Markus Freudenberg (DBpedia Release Manager). The DBpedia Association Hour provided a platform for the community to discuss and give feedback.

 

 

Sebastian Hellman and Julia Holze @ the DBpedia Association Hour

Additionally, Sebastian Hellmann and Julia Holze, members of the DBpedia Association, updated the participants about the growing number of the DBpedia Association members, the formalized DBpedia language chapters, the established DBpedia Community Committee and they informed about technical developments such as the DBpedia API.

 

Ontology Engineering and Software Alignment in the ALIGNED Project

The afternoon session started with the DBpedia 2016-10 release update by Markus Freudenberg (DBpedia Release Manager). Following this, Kevin Chekov Feeney, (Trinity College Dublin) presented the software alignment in the ALIGNED project. He talked about “Generating correct-by-construction semantic datasets from unstructured, semi-structured and badly structured data sources”.

Kevin Feeney – ALIGNED

 

 

At this point, we also like to thank the ALIGNED project for the development of DBpedia as a project use case and for covering parts of the travel cost.

 

 

Session about Irish Linked Data Projects

Chaired by Rob Brennan and Bianca Pereira, the speakers in the last session presented new Irish Linked Data Projects, for example GeoHive, BIOOPENER and the TCD Open Linked Data Engagement Fund Project. The following panel session gave DBpedia and Linked Data enthusiasts a platform for exchange and discussion. Outcome of this session was the creation of a roadmap for the Irish Linked Data with all participants.     

Following, you find a list of all presentations of this session:

Closing this session John McCrae announced that the next edition of the Language, Data and Knowledge (LDK) Conference is scheduled for 2019 in Germany. We at the DBpedia Association are now looking forward to welcome the LDK Community in Leipzig!

Social Evening Event

The Community Meeting slowly came to an end with our social evening event, which was held at the PorterShed in Galway. The evening session revolved around the topic How to exploit data commercially? and featured two short impulse talks. Paul Buitelaar started the session by presenting “Kibi”, which is an Open Source platform for Data Intelligence based on the search engine Elasticsearch. Finally, Sebastian Hellmann talked about “Improving the Utility of DBpedia by co-designing a public and commercial DBpedia API” (slides).

Summing up, the 9th DBpedia Community Meeting brought together more than 45 DBpedia enthusiasts from Ireland and Europe who engaged in vital discussions about Linked Data, DBpedia use cases and services.

You can find feedback about the event on Twitter via  #DBpediaGalway17.

We would like to thank Bianca Pereira and Caoilfhionn Lane from Insight Centre for Data Analytics, NUI Galway, as well Rob Brennan from ADAPT Research Centre, Trinity College Dublin, for devoting their time to curating the program organizing the meeting.

Special thanks go to LDK 2017 for hosting the meeting.

Thanks Ireland and hello Amsterdam!

We are looking forward to the next DBpedia Community Meeting which will be held in Amsterdam, Netherlands. Co-located with the SEMANTiCS17, the Community will get together on the 14th of September on the DBpedia Day.

 

Check our website for further updates, follow us on #twitter or subscribe to our newsletter.

Your DBpedia Association