We are very pleased to announce that all of this year’s Google Summer of Code students made it successful through the program and passed their projects. All codes have been submitted, merged and are ready to be examined by the rest of the world.
Marco Fossati, Dimitris Kontokostas, Tommaso Soru, Domenico Potena, Emanuele Storti , anastasia Dimiou, Wouter Maroy, Peng Xu, Sandro Coelho and Ricardo Usbeck, members of the DBpedia Community, did a great job in mentoring 7 students from around the world. All of the students enjoyed the experiences made during the program and will hopefully continue to contribute to DBpedia in the future.
“GSoC is the perfect opportunity to learn from experts, get to know new communities, design principles and work flows.” (Ram G Athreya)”
Now, we would like to take that opportunity to give you a little recap of the projects mentored by DBpedia members during the past months. Just click below for more details .
The goal of the project was to create a front-end application that provides a user-friendly interface so the DBPedia community can easily view, create and administrate DBpedia mapping rules using RML. The developed system includes user administration features, help posts, Github mappings synchronization, and rich RML related features such as syntax highlighting, RML code generation from templates, RML validation, extraction and statistics. Part of these features are possible thanks to the interaction with the DBPedia Extraction Framework. In the end, all the functionalities and goals that were required have been developed, with many functional tests and the approval of the DBpedia community. The system is ready for production deployment. For further information, please visit the project blog. Mentors: Anastasia Dimou and Wouter Maroy (Ghent University), Dimitris Kontokostas (GeoPhy HQ).
Chatbot for DBpedia by Ram G Athreya
DBpedia Chatbot is a conversational chatbot for DBpedia which is accessible through the following platforms: a Web Interface, Slack and Facebook Messenger.
The bot is capable of responding to users in the form of simple short text messages or through more elaborate interactive messages. Users can communicate or respond to the bot through text and also through interactions (such as clicking on buttons/links). The bot tries to answer text based questions of the following types: natural language questions, location information, service checks, language chapters, templates and banter. For more information, please follow the link to the project site. Mentor: Ricardo Usbeck (AKSW).
Knowledge Base Embeddings for DBpedia by Nausheen Fatma
Knowledge base embeddings has been an active area of research. In recent years a lot of research work such as TransE, TransR, RESCAL, SSP, etc. has been done to get knowledge base embeddings. However none of these approaches have used DBpedia to validate their approach. In this project, I want to achieve the following tasks: i) Run the existing techniques for KB embeddings for standard datasets. ii) Create an equivalent standard dataset from DBpedia for evaluations. iii) Evaluate across domains. iv) Compare and Analyse the performance and consistency of various approaches for DBpedia dataset along with other standard datasets. v) Report any challenges that may come across implementing the approaches for DBpedia. For more information, please follow the links to her project blog and GitHub-repository. Mentors: Tommaso Soru (AKSW) and Sandro Coelho (KILT).
Knowledge Base Embeddings for DBpedia by Akshay Jagatap
The project defined embeddings to represent classes, instances and properties by implementing Random Vector Accumulators with additional features in order to better encode the semantic information held by the Wikipedia corpus and DBpedia graphs. To test the quality of embeddings generated by the RVA, lexical memory vectors of locations were generated and tested on a modified subset of the Google Analogies Test Set. Check out further information via Akshay’s GitHub-repo. Mentors: Tommaso Soru (AKSW) and Xu Peng (University of Alberta).
The Table Extractor by Luca Vergili
Wikipedia is full of data hidden in tables. The aim of this project was to explore the possibilities of exploiting all the data represented with the appearance of tables in Wiki pages, in order to populate the different chapters of DBpedia through new data of interest. The Table Extractor has to be the engine of this data “revolution”: it would achieve the final purpose of extracting the semi structured data from all those tables now scattered in most of the Wiki pages. In this page you can observe dataset (english and italian) extracted using
table extractor . Furthermore you can read log file created in order to see all operations made up for creating RDF triples. I recommend to also see this page, that contains the idea behind the project and an example of result extracted from log files and .ttl dataset. For more details see Luca’s Git-Hub repository. Mentors: Domenico Potena and Emanuele Storti (Università Politecnica delle Marche).
Unsupervised Learning of DBpedia Taxonomy by Shashank Motepalli
Wikipedia represents a comprehensive cross-domain source of knowledge with millions of contributors. The DBpedia project tries to extract structured information from Wikipedia and transform it into RDF.
The main classification system of DBpedia depends on human curation, which causes it to lack coverage, resulting in a large amount of untyped resources. DBTax provides an unsupervised approach that automatically learns a taxonomy from the Wikipedia category system and extensively assigns types to DBpedia entities, through the combination of several NLP and interdisciplinary techniques. It provides a robust backbone for DBpedia knowledge and has the benefit of being easy to understand for end users. details about his work and his code can e found on the projects site. Mentors: Marco Fossati (Università degli Studi di Trento) and Dimitris Kontokostas (GeoPhy HQ).
The Wikipedia List-Extractor by Krishanu Konar
This project aimed to augment upon the already existing list-extractor project by Federica in GSoC 2016. The project focused on the extraction of relevant but hidden data which lies inside lists in Wikipedia pages. Wikipedia, being the world’s largest encyclopedia, has humongous amount of information present in form of text. While key facts and figures are encapsulated in the resource’s infobox, and some detailed statistics are present in the form of tables, but there’s also a lot of data present in form of lists which are quite unstructured and hence its difficult to form into a semantic relationship. The main objective of the project was to create a tool that can extract information from Wikipedia lists and form appropriate RDF triplets that can be inserted in the DBpedia dataset. Fore details on the code and about the project check Krishanu’s blog and GitHub-repository. Mentors: Marco Fossati (Università degli Studi di Trento), Domenico Potena and Emanuele Storti (Università Politecnica delle Marche).
We are regularly growing our community through GSoC and can deliver more and more opportunities to you. Ideas and applications for the next edition of GSoC are very much welcome. Just contact us via email or check our website for details.
Again, DBpedia is planning to be a vital part of the GSoC Mentor Summit, from October 13th -15th, at the Google Campus in Sunnyvale California. This summit is a way to say thank you to the mentors for the great job they did during the program. Moreover it is a platform to discuss what can be done to improve GSoC and how to keep students involved in their communities post-GSoC.
And there is more good news to tell. DBpedia wants to meet up with the US community during the 11th DBpedia Community Meeting in California. We are currently working on the program and keep you posted as soon as registration is open.
See you soon!