Tag Archives: Google Summer of Code

Better late than never – GSOC 2019 recap & outlook GSoC 2020

  • Pinky: Gee, Brain, what are we gonna do this year?
  • Brain: The same thing we do every year, Pinky. Taking over GSoC.

And, this is exactly what we did. We had been accepted as one of 206 open source organizations to participate in Google Summer of Code (GSoC) again. More than 25 students followed our call for project ideas. In the end, we chose six amazing students and their project proposals to work with during summer 2019. 
In the following post, we will show you some insights into the project ideas and how they turned out. Additionally, we will shed some light onto our amazing team of mentors who devoted a lot of time and expertise in mentoring our students. 

Meet the students and their projects

A Neural QA Model for DBpedia by Anand Panchbhai

With booming amount of information being continuously added to the internet, organising the facts and serving this information to the users becomes a very difficult task. Currently, DBpedia hosts billions of data points and corresponding relations in the RDF format. Accessing data on DBpedia via a SPARQL query is difficult for amateur users, who do not know how to write a query. This project tried to make this humongous linked data available to a larger user base in their natural languages (now restricted to English). The primary objective of the project was to translate natural language questions to a valid SPARQL query. Click here if you want to check his final code.

Multilingual Neural RDF Verbalizer for DBpedia by Dwaraknath Gnaneshwar

Presently, the generation of Natural Language from RDF data has gained substantial attention and has also been proven to support the creation of Natural Language Generation benchmarks. However, most models are aimed at generating coherent sentences in English, while other languages have enjoyed comparatively less attention from researchers. RDF data is usually in the form of triples, <subject, predicate, object>. Subject denotes the resource, the predicate denotes traits or aspects of the resource and expresses the relationship between subject and object. In this project, we aimed to create a multilingual Neural Verbalizer, ie, generating high-quality natural-language text from sets of RDF triples in multiple languages using one stand-alone, end-to-end trainable model. You can follow up on the progress and outcome of the project here. 

Predicate Detection using Word Embeddings for Question Answering over Linked Data by Yajing Bian

Knowledge-based question-answering system (KBQA) has demonstrated an ability to generate answers to natural language from information stored in a large-scale knowledge base. Generally, it completes the analysis challenge via three steps: identifying named entities, detecting predicates and generate SPARQL queries. In these three steps, predicate detection identifies the KB relation(s) a question refers to. To build a predicate detection structure, we identified all possible named entity first, then collected all predicates corresponding to the above entities. What follows is to calculate the similarity between problem and candidate predicates using a multi-granularity neural network model (MGNN). To find the globally optimal entity-predicate assignment, we use a joint model which is based on the result of entity linking and predicate detection process rather than considering the local predictions (i.e. most possible entity or predicate) as the final result. More details on the project are available here

A tool to generate RDF triples from DBpedia abstract by  Jayakrishna Sahit

The main aim of this project was to research and develop a tool in order to generate highly trustable RDF triples from DBpedia abstracts. In order to develop such a tool, we implemented algorithms which would take the output generated from the syntactic analyzer along with DBpedia spotlight’s named entity identifiers. Further information and the project’s results can be found here

A transformer of Attention Mechanism for Long-context QA by Stuart Chan

In this GSoC project, I choose to employ the language model of the transformer with an attention mechanism to automatically discover query templates for the neural question-answering knowledge-based model. The ultimate goal was to train the attention-based NSpM model on DBpedia with its evaluation against the QALD benchmark. Check here for more details on the project.

Workflow for linking External datasets by Jaydeep Chakraborty

The requirement of the project was to create a workflow for entity linking between DBpedia and external data sets. We aimed at an approach for ontology alignment through the use of an unsupervised mixed neural network. We explored reading and parsing the ontology and extracted all necessary information about concepts and instances. Additionally, we generated semantic vectors for each entity with different meta information like entity hierarchy, object property, data property, and restrictions and designed a User Interface based system which showed all necessary information about the workflow. Further info, download details and project results are available here

Meet our Mentors

First of all, a big shout out and thank you to all mentors and co-mentors who helped our students to succeed in their endeavours.

  • Aman Mehta, former GSoC student and current junior mentor, recently interned as a software engineer at Facebook, London.
  • Beyza Yaman, a senior mentor and organizational admin, Post-Doctoral Researcher based in ADAPT, Dublin City University, former Springer Nature-DBpedia intern and former research associate at the InfAI/University of Leipzig. She is responsible for the Turkish DBpedia and her field of interests are information retrieval, data extraction and integration over Linked Data.
  • Tommaso Soru, senior mentor and organizational admin. I’m a Machine Learning & AI enthusiast, Data Scientist at Data Lens Ltd in London and a PhD candidate at the University of Leipzig. 

“DBpedia is my window to the world of semantic data, not only for its intuitive interface but also because its knowledge is organised in a simple and uncomplicated way”

Tommaso Soru, GSoC 2019
  • Amandeep Srivastava, Junior Mentor and analyst at Goldman Sachs. He’s a huge fan of Christopher Nolan and likes to read fiction books in his free time.
  • Diego Moussalem, Senior mentor, Senior Researcher at Paderborn University, an active and vital member of the Portuguese DBpedia Chapter
  • Luca Virgili, currently a Computer Science PhD student at the Polytechnic University of Marche.He was a GSoC student for a year and a GSoC mentor for 2 years in DBpedia. 
  • Bharat Suri, former GSOC student, Junior Mentor, Masters degree in Computer Science at The Ohio State University

“I have thoroughly enjoyed both my years of GSoC with DBpedia and I plan to stay and help out in whichever way I can”

Bharat Suri, GSoC 2019
  • Mariano Rico, senior mentor,  Senior Doctor Researcher at Ontology Engineering Group, Universidad Politécnica de Madrid.
  • Nausheen Fatma, senior mentor, Data Scientist, Natural Language Processing, Machine Learning at Info Edge (naukri.com).
  • Ram G Athreya long-term GSoC mentor, Research Engineer at Viv Labs, Bay Area, San Francisco. 
  • Ricardo Usbeck, team leader ‘Conversational AI and Knowledge Graphs’ at Fraunhofer IAIS.
  • Rricha Jalota, former GSoC students, current senior mentor, developer in the Data Science Group at University of Paderborn, Germany 

“The reason why I love collaborating with DBpedia (apart from the fact that, it’s a powerhouse of knowledge-driven applications) is not only it gave me my first big break to the amazing field of NLP but also to the world of open-source!”

Rricha Jalota, GSoC 2019

In addition, we also like to thank the rest of our mentor team namely, Thiago Castro Ferreira, Aashay Singhal and Krishanu Konar, former GSoC student and current senior mentor, for their great work.  

Mentor Summit Recap 

This GSoC marked the 15th consecutive year of the program and was the 8th season in a row for DBpedia. As usual in each year we had two of our mentors, Rricha Jalota and Aashay Singhal joining the annual GSoC mentor summit. Selected mentors get the chance to meet each other and engage in a vital knowledge and expertise exchange around various GSoC related and non-related topics. Apart from more entertaining activities such as games, a scavenger hunt and a guided trip through Munich mentors also discussed pressing questions such as “why is it important to fail your students” or “how can we have our GSoC students stay and contribute for long”.

After GSoC is before the next GSoC

If you are interested in either mentoring a DBpedia GSoC project or if you want to contribute to a project of your own we are happy to have you on board. There are a few things to get you started.

Likewise, if you are an ambitious student who is interested in open source development and working with DBpedia you are more than welcome to either contribute your own project idea or apply for project ideas we offer starting in early 2020.

Stay tuned, frequently check Twitter or the DBpedia Forum to stay in touch and don’t miss your chance of becoming a crucial force in this endeavour as well as a vital member of the DBpedia community.

See you soon,

yours

DBpedia Association

Meet the DBpedia Chatbot

This year’s GSoC is slowly coming to an end with final evaluations already being submitted. In order to bridge the waiting time until final results are published, we like to draw your attention to a former project and great tool that was developed during last years’ GSoC.

Meet the DBpedia Chatbot. 

DBpedia Chatbot is a conversational Chatbot for DBpedia which is accessible through the following platforms:

  1. A Web Interface
  2. Slack
  3. Facebook Messenger

Main Purpose

The bot is capable of responding to users in the form of simple short text messages or through more elaborate interactive messages. Users can communicate or respond to the bot through text and also through interactions (such as clicking on buttons/links). There are 4 main purposes for the bot. They are:

  1. Answering factual questions
  2. Answering questions related to DBpedia
  3. Expose the research work being done in DBpedia as product features
  4. Casual conversation/banter
Question Types

The bot tries to answer text-based questions of the following types:

Natural Language Questions
  1. Give me the capital of Germany
  2. Who is Obama?
Location Information
  1. Where is the Eiffel Tower?
  2. Where is France’s capital?
Service Checks

Users can ask the bot to check if vital DBpedia services are operational.

  1. Is DBpedia down?
  2. Is lookup online?
Language Chapters

Users can ask basic information about specific DBpedia local chapters.

  1. DBpedia Arabic
  2. German DBpedia
Templates

These are predominantly questions related to DBpedia for which the bot provides predefined templatized answers. Some examples include:

  1. What is DBpedia?
  2. How can I contribute?
  3. Where can I find the mapping tool?
Banter

Messages which are casual in nature fall under this category. For example:

  1. Hi
  2. What is your name?

if you like to have a closer look at the internal processes and how the chatbot was developed, check out the DBpedia GitHub pages. 

DBpedia Chatbot was published on wiki.dbpedia.org and is one of many other projects and applications featuring DBpedia.

Powered by WPeMatico

In case you want your DBpedia based tool or demo to publish on our website just follow the link and submit your information, we will do the rest.

 

Yours

DBpedia Association

Career Opportunities at DBpedia – A Success Story

Google summer of Code is a global program focused on introducing students to open source software development.

During the 3 months summer break from university, students work on a programming projects  with an open source organization, like DBpedia. 

We are part of this exciting program for more than 5 years now. Many exciting projects developed as results of intense coding during hot summers. Presenting you Wouter Maroy, who has been a GSoC student at GSoc 2016 and who is currently a mentor in this years program, we like to give you a glimpse behind the scenes and show you how important the program is to DBpedia.


Success Story: Wouter Maroy

Who are you?

I’m Wouter Maroy, a 23 years old Master’s student in Computer Science Engineering at Ghent University (Belgium). I’m affiliated with IDLab – imec. Linked Data and Big Data technologies are my two favorite fields of interest. Besides my passion for Computer Science, I like to travel, explore and look for adventures. I’m a student who enjoys his student life in Ghent.  

What is your main interest in DBpedia and what was your motivation to apply for a DBpedia project at GSoC 2016.

I took courses during my Bachelors with lectures about Linked Data and the Semantic Web which of course included DBpedia; it’s an interesting research field. Before my GSoC 2016 application I did some work on Semantic Web technologies and on a technology (RML) that was required for a GSoC 2016 project that was listed by DBpedia. I wanted to get involved in Open Source and DBpedia, so I applied.

What did you do?

DBpedia has used a custom mapping language up until now to generate structured data from raw data from Wikipedia infoboxes. A next step was to improve this process to a more modular and generic approach that leads to higher quality linked data generation . This new approach relied on the integration of RML, the RDF Mapping Language and was the goal of the GSoC 2016 project I applied for. Understanding all the necessary details about the GSoC project required some effort and research before I started with coding. I also had to learn a new programming language (Scala). I had good assistance from my mentors so this turned out very well in the end.  DBpedia’s Extraction Framework, which is used for extracting structured data from Wikipedia, has a quite large codebase. It was the first project of this size I was involved in. I learned a lot from reading its codebase and from contributing by writing code during these months.

Dimitris Kontokostas and Anastasia Dimou were my two mentors. They guided me well throughout the project. I interacted daily with them through Slack and each week we had a conference call to discuss the project.  After many months of research, coding and discussing we managed to deliver a working prototype at the end of the project. The work we did was presented in Leipzig on the DBpedia day during SEMANTICS 16’. Additionally, this work will also be presented at ISWC 2017.

You can check out his project here.

How do you currently contribute to improve DBpedia?  

I’m mentoring a GSoC17 project together with Dimitris Kontokostas and Anastasia Dimou as a follow up on the work that was done by our GSoC 2016 project last year. Ismael Rodriguez is the new student who is participating in the project and he already delivered great work! Besides being a mentor for GSoC 2017, I make sure that the integration of RML into DBpedia is going into the right direction in general (managing, coding,…). For this reason, I worked at the KILT/DBpedia office in Leipzig during summer for 6 weeks. Joining and getting to know the team was a great experience.

What did you gain from the project?

Throughout the project I practiced coding, working in a team, … I learned more about DBpedia, RML, Linked Data and other related technologies. I’m glad I had the opportunity to learn this much from the project. I would recommend it to all students who are curious about DBpedia, who are eager to learn and who want to earn a stipend during summer through coding. You’ll learn a lot and you’ll have a good time!

Final words to future GSoC applicants for DBpedia projects.

Give it a shot! Really, it’s a lot of fun! Coding for DBpedia through GSoC is a great, unique experience and one who is enthusiastic about coding and the DBpedia project should definitely go for it.

Check our website for further updates, follow us on #twitter or subscribe to our newsletter.

Yours

DBpedia Association