Category Archives: Uncategorized

GSoC2020 – Call for Contribution

James: Sherry with the soup, yes… Oh, by the way, the same procedure as last year, Miss Sophie?

Miss Sophie: Same procedure as every year, James.

…and we are proud of it. We are very grateful to be accepted as an open-source organization in this years’  Google Summer of Code (GSoC2020) edition, again. The upcoming GSoC2020 marks the 16th consecutive year of the program and is the 9th year in a row for DBpedia. 

We did it again – We are mentoring organization!

What is GSoC2020? 

Google Summer of Code is a global program focused on bringing student developers into open source software development. Funds will be given to students (BSc, MSc, PhD.) to work for three months on a specific task. For GSoC-Newbies, this short video and the information provided on their website will explain all there is to know about GSoC2020.

This year’s Narrative

Last year we tried to increase female participation in the program and we will continue to do so this year. We want to encourage explicitly female students to apply for our projects. That being said, we already engaged excellent female mentors to also raise the female percentage in our mentor team. 

In the following weeks, we invite all students, female and male alike, who are interested in Semantic Web and Open Source development to apply for our projects. You can also contribute your own ideas to work on during the summer. 

And this is how it works: 4 steps to GSoC2020 stardom

  1. Open source organizations such as DBpedia announce their projects ideas. You can find our project here
  2. Students contact the mentor organizations they want to work with and write up a project proposal. Please get in touch with us via the DBpedia Forum or dbpedia@infai.org as soon as possible.
  3. The official application period at GSoC starts March, 16th. Please note, you have to submit your final application not through our Forum, but the GSoC Website
  4. After a selection phase, students are matched with a specific project and a set of mentors to work on the project during the summer.

To all the smart brains out there, if you are a student who wants to work with us during summer 2020, check our list of project ideas, warm-up tasks or come up with your own idea and get in touch with us.

Application Procedure

Further information on the application procedure is available in our DBpedia Guidelines. There you will find information on how to contact us and how to appropriately apply for GSoC2020. Please also note the official GSoC 2020 timeline for your proposal submission and make sure to submit on time.  Unfortunately, extensions cannot be granted. Final submission deadline is March 31st, 2020, 8 pm, CEST.

Finally, check our website for information on DBpedia, follow us on Twitter or subscribe to our newsletter.

And in case you still have questions, please do not hesitate to contact us via praetor@infai.org.

We are thrilled to meet you and your ideas.

Your DBpedia-GSoC-Team


New Prototype: Databus Collection Feature

We are thrilled to announce that our Databus Collection Feature for the DBpedia Databus has been developed and is now available as a prototype. It simplifies the way to bundle your data and use it in your application.

A new Databus Collection Feature? How come, and how does it work? Read below and find out how using the DBpedia Databus becomes easier by the day and with each new tool.

Motivation

With more and more data being uploaded to the databus we started to develop test applications using that data. The SPARQL endpoint offers a central hub to access all metadata for datasets uploaded to the databus provided you know how to write SPARQL queries. The metadata includes the download links of the data files – it was, therefore, possible to pass a SPARQL query to an application, download the actual data and then use for whatever purpose the app had.

The Databus Collection Editor

The DBpedia Databus now provides an editor for collections. A collection is basically a labelled SPARQL query that is retrievable via URI. Hence, with the collection editor you can group Databus groups and artifacts into a bundle and publish your selection using your Databus account. It is now a breeze to select the data you need, share the exact selection with others and/or use it in existing or self-made applications.

If you are not familiar with SPARQL and data queries, you can think of the feature as a shopping cart for data: You create a new cart, put data in it and tell your friends or applications where to find it. Quite neat, right?

In the following section, we will cover the user interface of the collection editor.

The Editor UI

Firstly, you can find the collection editor by going to the DBpedia Databus and following the Collections link at the top or you can get there directly by clicking here.

What you will see is the following:

General Collection Info

Secondly, since you do not have any collections yet, the editor has already created an empty collection named “Unnamed” for you. At the right side next to the label and description you will find a pen icon. By clicking the icon or the label itself you can edit its content. The collection is not published yet, so the Collection URI is blank.

Whenever you are not logged in or the collection has not been published yet, the editor will also notify you that your changes are only saved in your local browser cache and NOT remotely on our server. Keep that in mind when clearing your cache. Publishing the collection however is easy: Simply log into (or create) your Databus account and hit the publish button in the action bar. This will open up a modal where you can pick your unique collection id and hit publish again. That’s it!

The Collection Info section will now show the collection URI. Following the link will take you to the HTML representation of your collection that will be visible to others. Hitting the Edit button in the action bar will bring you back to the editor.

Collection Hierarchy

Let’s have a look at the core piece of the collection editor: the hierarchy view. A collection can be a bundle of different Databus groups and artifacts but is not limited to that. If you know how to write a SPARQL query, you can easily extend your collection with more powerful selections. Therefore, the hierarchy is split into two nodes:

  • Generated Queries: Contains all queries that are generated from your selection in the UI
  • Custom Queries: Contains all custom written SPARQL queries

Both, hierarchy nodes have a “+” icon. Clicking on this button will let you add generated or custom queries respectively.

Custom Queries

If you hit the “+” icon on the Custom Queries node, a new node called “Custom Query” will appear in the hierarchy. You can remove a custom query by clicking on the trashcan icon in the hierarchy. If you click the node it will take you to a SPARQL input field where you can edit the query.

To make your collection more understandable for others, you can even document the query by adding a label and description.

Writing Your Own Custom Queries

A collection query is a SPARQL query of the form:

SELECT DISTINCT ?file WHERE {
    {
        [SUBQUERY]
    }
    UNION
    {
        [SUBQUERY]
    }
    UNION
    ...
    UNION
    {
        [SUBQUERY]
    }
}

All selections made by generated and custom queries will be joined into a single result set with a single column called “file“. Thus it is important that your custom query binds data to a variable called “file” as well.

Generated Queries

Clicking the “+” icon on the Generated Queries node will take you to a search field. Make use of the indexed search on the Databus to find and add the groups and artifacts you need. If you want to refine your search, don’t worry: you can do that in the next step!

Once the artifact or group has been added to your collection, the Add to Collection button will turn green. Once you are done you can go back to the Editor with Back to Hierarchy button.

Your hierarchy will now contain several new nodes.

Group Facets, Artifact Facets and Overrides

Group and artifacts that have been added to the collection will show up as nodes in the hierarchy. Clicking a node will open a filter where you can refine your dataset selection. Setting a filter to a group node will apply it to all artifact nodes unless you override that setting in any artifact node manually. The filter set in the group node is shown in the artifact facets in dark grey. Any overrides in the artifact facets will be highlighted in green:

Group Nodes

A group node will provide a list of filters that will be applied to all artifacts of that group:

Artifact Nodes

Artifact nodes will then actually select data files which will be visible in the faceted view. The facets are generated dynamically from the available variants declared in the metadata.

Example: Here we selected the latest version of the databus dump as n-triple. This collection is already in use: The collection URI is passed to the new generic lookup application, which then creates the search function for the databus website. If you are interested in how to configure the lookup application, you can go here: https://github.com/dbpedia/lookup-application. Additionally, there will also be another blog post about the lookup within the next few weeks

Use Cases

The DBpedia Databus Collections are useful in many ways.

  • You can share a specific dataset with your community or colleagues.
  • You can re-use dataset others created
  • You can plug collections into databus-ready applications and avoid spending time on the download and setup process
  • You can point to a specific piece of data (e.g. for testing) with a single URI in your publications
  • You can help others to create data queries more easily

We hope you enjoy the Databus Collection Feature and we would love to hear your feedback! You can leave your thoughts and suggestions in the new DBpedia Forum. Feedback of any kinds is highly appreciated since we want to improve the prototype as fast and user-driven as possible! Cheers!

A big thanks goes to DBpedia developer Jan Forberg who finalized the Databus Collection Feature and compiled this text.

Yours

DBpedia Association

More than 50 DBpedia enthusiasts joined the Community Meeting in Karlsruhe.

SEMANTiCS is THE leading European conference in the field of semantic technologies and the platform for professionals who make semantic computing work, and understand its benefits and know its limitations.

Since we at DBpedia have a long-standing partnership with Semantics we also joined this year’s event in Karlsruhe. September 12, the last day of the conference was dedicated to the DBpedia community. 

First and foremost, we would like to thank the Institute for Applied Informatics for supporting our community and many thanks to FIZ Karlsruhe for hosting our community meeting.

Following, we will give you a brief retrospective about the presentations.

Opening Session

Katja Hose – “Querying the web of data”

….on the search for the killer App.

The concept of Linked Open Data and the promise of the Web of Data have been around for over a decade now. Yet, the great potential of free access to a broad range of data that these technologies offer has not yet been fully exploited. This talk will, therefore review the current state of the art, highlight the main challenges from a query processing perspective, and sketch potential ways on how to solve them. Slides are available here.

Dan Weitzner – “timbr-DBpedia – Exploration and Query of DBpedia in SQL

The timbr SQL Semantic Knowledge Platform enables the creation of virtual knowledge graphs in SQL. The DBpedia version of timbr supports query of DBpedia in SQL and seamless integration of DBpedia data into data warehouses and data lakes. We already published a detailed blogpost about timbr where you can find all relevant information about this amazing new DBpedia Service.

Showcase Session

Maribel Acosta“A closer look at the changing dynamics of DBpedia mappings”

Her presentation looked at the mappings wiki and how different language chapters use and edit it. Slides are available here.

Mariano Rico“Polishing a diamond: techniques and results to enhance the quality of DBpedia data”

DBpedia is more than a source for creating papers. It is also being used by companies as a remarkable data source. This talk is focused on how we can detect errors and how to improve the data, from the perspective of academic researchers and but also on private companies. We show the case for the Spanish DBpedia (the second DBpedia in size after the English chapter) through a set of techniques, paying attention to results and further work. Slides are available here.

Guillermo Vega-Gorgojo – “Clover Quiz: exploiting DBpedia to create a mobile trivia game”

Clover Quiz is a turn-based multiplayer trivia game for Android devices with more than 200K multiple choice questions (in English and Spanish) about different domains generated out of DBpedia. Questions are created off-line through a data extraction pipeline and a versatile template-based mechanism. A back-end server manages the question set and the associated images, while a mobile app has been developed and released in Google Play. The game is available free of charge and has been downloaded by +10K users, answering more than 1M questions. Therefore, Clover Quiz demonstrates the advantages of semantic technologies for collecting data and automating the generation of multiple-choice questions in a scalable way. Slides are available here.

Fabian Hoppe and Tabea Tiez – “The Return of German DBpedia”

Fabian and Tabea will present the latest news on the German DBpedia chapter as it returns to the language chapter family after an extended offline period. They will talk about the data set, discuss a few challenges along the way and give insights into future perspectives of the German chapter. Slides are available here.

Wlodzimierz Lewoniewski and Krzysztof Węcel  – “References extraction from Wikipedia infoboxes”

In Wikipedia’s infoboxes, some facts have references, which can be useful for checking the reliability of the provided data. We present challenges and methods connected with the metadata extraction of Wikipedia’s sources. We used DBpedia Extraction Framework along with own extensions in Python to provide statistics about citations in 10 language versions. Provided methods can be used to verify and synchronize facts depending on the quality assessment of sources. Slides are available here.

Wlodzimierz Lewoniewski – “References extraction from Wikipedia infoboxes” … He gave insight into the process of extracting references for Wikipedia infoboxes, which we will use in our GFS project.

Afternoon Session

Sebastian Hellmann, Johannes Frey, Marvin Hofer – “The DBpedia Databus – How to build a DBpedia for each of your Use Cases”

The DBpedia Databus is a platform that is intended for data consumers. It will enable users to build an automated DBpedia-style Knowledge Graph for any data they need. The big benefit is that users not only have access to data, but are also encouraged to apply improvements and, therefore, will enhance the data source and benefit other consumers. We want to use this session to officially introduce the Databus, which is currently in beta and demonstrate its power as a central platform that captures decentrally created client-side value by consumers.  

We will give insight on how the new monthly DBpedia releases are built and validated to copy and adapt for your use cases. Slides are available here.

Interactive session, moderator: Sebastian Hellmann – “DBpedia Connect & DBpedia Commerce – Discussing the new Strategy of DBpedia”

In order to keep growing and improving, DBpedia has been undergoing a growth hack for the last couple of months. As part of this process, we developed two new subdivisions of DBpedia: DBpedia Connect and DBpedia Commerce. The former is a low-code platform to interconnect your public or private databus data with the unified, global DBpedia graph and export the interconnected and enriched knowledge graph into your infrastructure. DBpedia Commerce is an access and payment platform to transform Linked Data into a networked data economy. It will allow DBpedia to offer any data, mod, application or service on the market. During this session, we will provide more insight into these as well as an overview of how DBpedia users can best utilize them. Slides are available here.

In case you missed the event, all slides and presentations are also available on our Website. Further insights, feedback and photos about the event are available on Twitter via #DBpediaDay

We are now looking forward to more DBpedia meetings next year. So, stay tuned and check Twitter, Facebook and the Website or subscribe to our Newsletter for the latest news and information.

If you want to organize a DBpedia Community meeting yourself, just get in touch with us via dbpedia@infai.org regarding program and organization.

Yours

DBpedia Association

A year with DBpedia – A Retrospective Part One

Looking back, 2018 was a very successful year for DBpedia. First and foremost, we refined our strategy and developed our concept of the DBpedia Databus, a central communication system that allows exchanging, curating and accessing data between multiple stakeholders. The Databus simplifies working with data and will be launched in early 2019. 

Moreover, we travelled many miles in 2018 to not only visit our language chapters and exchange about DBpedia but also to meet enthusiast from our community to exchange during workshops and conferences worldwide.

In the upcoming Blog-Series, we like to take you on a retrospective tour around the world, giving you insights into a year with DBpedia. We will start out with Stop-overs in Japan, Poland and Germany and will continue our journey to other continents in the following two weeks.

Sit back and read on.

Big Spring in Japan – Welcome to Myazaki

Welcome to Miyazaki, to LREC, Language Resources and Evaluation Conference 2018 and meet RDF2PT.  No idea what that is and what it has to do with DBpedia? Read on!

The generation of natural language from RDF data has recently gained significant attention due to the continuous growth of Linked Data. Proposing the RDF2PT approach a research team around Diego Moussalem, part of the Portuguese DBpedia Chapter described how RDF data is verbalized to Brazilian Portuguese texts. They highlight the steps taken to generate Portuguese texts and addressed challenges with grammatical gender, classes and resources and properties. The results suggest that RDF2PT generates texts that can be easily understood by humans. It also helps to identify some of the challenges related to the automatic generation of Brazilian Portuguese (especially from RDF).

The full paper is available via https://arxiv.org/pdf/1802.08150.pdf 

 

Welcome to Poznan, Poland

Our community is our asset. In order to grow it and encourage contributions, the DBpedia Association continuously organizes community meetups to tackle the interests of our multi-faceted community. In late May, we travelled to Poland to meet Polish DBpedia enthusiasts in our meetup in Poznań. The idea was, to find out what the Polish DBpedia community uses DBpedia for, what applications and tools they have and what they are currently developing. Members of the chapter presented, amongst others, results of the primary research project “Quality of Data in DBpedia”. Attendees exchanged in vital discussions about uses of DBpedia applications and tools and listened to a presentation of Professor Witold Abramowicz, chair of the Department of Information Systems at Poznan University of Economics and also the head of SmartBrain. He talked about opportunities and challenges of data science.

Further information on the Polish DBpedia Chapter can be found on their website.

Welcome to Leipzig, home to the DBpedia Association

For the first time ever, DBpedia was part of the German culture-hackathon Coding da Vinci, held at Bibliotheca Albertina, University Library of Leipzig University,  in June 2018. In this year’s edition, we not only offered a hands-on workshop but also provided our DBpedia datasets. This data supported more than 30 cultural institutions, enriching their own datasets. In turn, hackathon participants could creatively develop new tools, apps, games quizzes etc. out of the data. 

One of the projects that used DBpedia as a source was Birdory . It is a memory game using bird voices and pictures. The goal is, much like in regular memory games, to match the correct picture to the bird sound that is played. The data used for the game was taken from Museum für Naturkunde Berlin (bird voices) as well as from DBpedia (pictures). So in case you need some me-time during Christmas gatherings, you might want to check it out via: https://birdory.firebaseapp.com/.

 

In our upcoming Blog-Post next week we will take you to Thessaloniki Greece, Australia and again, Leipzig.  In the meantime, stay tuned and visit our Twitter channel or subscribe to our DBpedia Newsletter.  

 

Have a great week,

 

Yours DBpedia Association

 

The Release Circle – A Glimpse behind the Scenes

As you already know, with the new DBpedia strategy our mode of publishing releases changed.  The new DBpedia release process follows a three-step approach starting from the Extraction to ID-Management towards the Fusion, which finalizes the release process.  Our DBpedia releases are currently published on a monthly basis. In this post, we give you insight into the single steps of the release process and into what our developers actually do when preparing a DBpedia release.

Extraction  – Step one of the Release

The good news is, our new release mode is taking shape and noticeable picked up speed. Finally the 2018-08 and, additionally the 2018.09.12 and the 2018.10.16 Releases are now available in our LTS repository.

The 2018-08 Release was generated on the basis of the Wikipedia datasets extracted in early August and currently comprises 136 languages. The extraction release contains the raw extracted data generated by the DBpedia extraction-framework. The post-processing steps, such as data-deduplication or URI-normalization are omitted and moved to later parts of the release process. Thus, we can provide direct, transparent access to the generated data in every step. Until we manage two releases per month, our data is mostly based on the second Wikipedia datasets of the previous month. In line with that, the 2018.09.12 release is based on late August data and the recent 2018.10.16 Release is based on Wikipedia datasets extracted on September 20th. They all comprise 136 languages and contain a stable list of datasets since the 2018-08 release.

Our releases are now ready for parsing and external use. Additionally, there will be a new Wikidata-based release this week.

ID-Management – Step two of the Release

For a complete “new DBpedia” release the DBpedia ID-Management and Fusion of the data have to be added to the process. The Databus ID Management is a process to unify various different IRIs identifying the same entities coined from different data providers. Taking datasets with overlapping domains of interest from multiple data providers, the set of IRIs denoting the entities in the source datasets are determined heuristically (e.g. excluding RDF/OWL types/classes).

Afterwards, these selected IRIs a numeric primary key, the ‘Singleton ID’. The core of the ID Management process happens in the next step: Based on the large set of owl:sameAs assertions in the input data with high confidence, the connected components induced from the corresponding sameAs-graph is computed. In other words: The groups of all entities from the input datasets (transitively) reachable from one to another are determined. We dubbed these groups the sameAs-clusters. For each sameAs-cluster we pick one member as representant, which determines the ‘Cluster ID’ or ‘Global Identifier’ for all cluster members.

Apart from being an essential preparatory step for the Fusion, these Global Identifiers serve purpose in their own right as unified Linked Data identifiers for groups of Linked Data entities that should be viewed as equivalent or ‘the same thing’.

A processing workflow based on Apache Spark to perform the process described on above for large quantities of RDF input data is already in place and has been run successfully for a large set of DBpedia inputs consisting of:

 

Fusion – Step three of the Release

Based on the extraction and the ID-Management, the Data Fusion finalizes the last step of the  DBpedia release cycle. With the goal of improving data quality and data coverage, the process uses the DBpedia global IRI clusters to fuse and enrich the source datasets. The fused data contains all resource of the input datasets. The fusion process is based on a functional property decision to decide the number of selected values ( owl:FunctionalProperty determination ). Further, the value selection for this functional properties is based on a preference dependent on the originated source dataset. For example, preferred values for En-DBpedia over DE-DBpedia.

The enrichment improves entity-properties and -values coverage for resources only contained in the source data. Furthermore, we create provenance data to keep track of the origin of each triple. This provenance data is also used for the http-based http://global.dbpedia.org resource view.

At the moment the fused and enriched data is available for the generic, and mapping-based extractions. More datasets are still in progress.  The DBpedia-fusion data is uploading to http://downloads.dbpedia.org/repo/dev/fusion/

Please note we are still in the midst of the beta testing for our data release tool, so in case you do come across any errors, reporting them to us is much appreciated to fuel the testing process.

Further information regarding the releases progress can be found here: http://dev.dbpedia.org/

Next steps

We will add more releases to the repository on a monthly basis aiming for a bi-weekly release mode as soon as possible. In between the intervals, any mistakes or errors you find and report in this data can be fixed for the upcoming release. Currently, the generated metadata in the DataID-file is not stable. This will fluctuate, still needs to be improved and will change in the near future.  We are now working on the next release and will inform you as soon as it is published.

Yours DBpedia Association

This blog post was written with the help of our DBpedia developers Robert Bielinski, Markus Ackermann and Marvin Hofer who were responsible for the work done with respect to the DBpedia releases. We like to thank them for their great work. 

 

DBpedia supports young developers

Supporting young and aspiring developers has always been part of DBpedia‘s philosophy. Through various internships and collaborations with programmes such as Google Summer of Code, we were able to not only meet aspiring developers but also establish long-lasting relationships with these DBpedians ensuring a sustainable progress for and with DBpedia.  For 6 years now, we have been part of Google Summer of Code, one of our favorite programmes. This year, we are also taking part in Coding da Vinci, a German-based cultural data hackathon, where we support young hackers, coders and smart minds with DBpedia datasets.

DBpedia at Google Summer of Code 2018

This year, DBpedia will participate for the sixth time in a row in the Google Summer of Code program (GSoC). Together with our amazing mentors, we drafted 9 project ideas which GSOC applicants could apply to. Since March 12th, we received many proposal drafts out of which 12 final projects proposals have been submitted. Competition is very high as student slots are always limited. Our DBpedia mentors were critically reviewing all proposals for their potential and for allocating them one of the rare open slots in the GSoC program. Finally, on Monday, April 23rd, our 6 finalists have been announced. We are very proud and looking forward to the upcoming months of coding. The following projects have been accepted and will hopefully be realized during the summer.

Our gang of DBpedia mentors comprises of very experienced developers that are working with us on this project for several years now. Speaking of sustainability, we also have former GSoC students on board, who get the chance to mentor projects building on ideas of past GSoC’s. And while students and mentors start bonding, we are really looking forward to the upcoming months of coding – may it be inspiring, fun and fruitful.  

 

DBpedia @ Coding da Vinci 2018

As already mentioned in the previous newsletter, DBpedia is part of the CodingDaVinciOst 2018. Founded in Berlin in 2014, Coding da Vinci is a platform for cultural heritage institutions and the hacker, developer, designer, and gamer community to jointly develop new creative applications from cultural open data during a series of hackathon events. In this year’s edition, DBpedia provides its datasets to support more than 30 cultural institutions, enriching their datasets in order participants of the hackathon can make the most out of the data. Among the participating cultural institutions are, for example, the university libraries of Chemnitz, Jena, Halle, Freiberg, Dresden and Leipzig as well as the Sächsisches Staatsarchiv, Museum für Druckkunst Leipzig, Museum für Naturkunde Berlin, Duchess Anna Amalia Library, and the Museum Burg Posterstein.

CodingDaVinciOst 2018, the current edition of the hackathon, hosted a kick-off weekend at the Bibliotheca Albertina, the University Library in Leipzig. During the event, DBpedia offered a hands-on workshop for newbies and interested hackathon participants who wanted to learn about how to enrich their project ideas with DBpedia or how to solve potential problems in their projects with DBpedia.

We are now looking forward to the upcoming weeks of coding and hacking and can’t wait to see the results on June 18th, when the final projects will be presented and awarded. We wish all the coders and hackers a pleasant and happy hacking time. Check our DBpedia Twitter for updates and latest news.  

If you have any questions, like to support us in any way or if you like to learn more about DBpedia, just drop us a line via dbpedia@infai.org

Yours,
DBpedia Association

Keep using DBpedia!

Just recently, DBpedia Association member and hosting specialist, OpenLink released the DBpedia Usage report, a periodic report on the DBpedia SPARQL endpoint and associated Linked Data deployment.

The report not only gives some historical insight into DBpedia’s usage, number of visits and hits per day but especially shows statistics collected between October 2016 and December 2017. The report covers more than a year of logs from the DBpedia web service operated by OpenLink Software at http://dbpedia.org/sparql/.  

Before we want to highlight a few aspects of DBpedia’s usage we would like to thank Open Link for the continuous hosting of the DBpedia Endpoint and the creation of this report

The graph shows the average number of hits/requests per day that were made to the DBpedia service during each of the releases.

The graph shows the average number of unique visits per day made to the DBpedia service during each of the datasets.

Speaking of which, as you can see in the following tables, there has been a massive increase in the number of hits coinciding with the DBpedia 2015–10 release on April 1st, 2016.

 

 

 

 

This boost can be attributed to an intensive promotion of DBpedia via community meetings, communication with various partners in the Linked Data community and Social media presence among the community, in order to increase backlinks.

Since then, not only the numbers of hits increased but DBpedia also provided for better data quality. We are constantly working on improving accessibility, data quality and stability of the SPARQL endpoint. Kudos to Open Link for maintaining the technical baseline for DBpedia.

The table shows the usage overview of last year.

The full report is available here.

 

Subscribe to the DBpedia Newsletter, check our DBpedia Website and follow us on Twitter, Facebook, and LinkedIn for the latest news.

Thanks for reading and keep using DBpedia!

Yours DBpedia Associaton

 

Career Opportunities at DBpedia – A Success Story

Google summer of Code is a global program focused on introducing students to open source software development.

During the 3 months summer break from university, students work on a programming projects  with an open source organization, like DBpedia. 

We are part of this exciting program for more than 5 years now. Many exciting projects developed as results of intense coding during hot summers. Presenting you Wouter Maroy, who has been a GSoC student at GSoc 2016 and who is currently a mentor in this years program, we like to give you a glimpse behind the scenes and show you how important the program is to DBpedia.


Success Story: Wouter Maroy

Who are you?

I’m Wouter Maroy, a 23 years old Master’s student in Computer Science Engineering at Ghent University (Belgium). I’m affiliated with IDLab – imec. Linked Data and Big Data technologies are my two favorite fields of interest. Besides my passion for Computer Science, I like to travel, explore and look for adventures. I’m a student who enjoys his student life in Ghent.  

What is your main interest in DBpedia and what was your motivation to apply for a DBpedia project at GSoC 2016.

I took courses during my Bachelors with lectures about Linked Data and the Semantic Web which of course included DBpedia; it’s an interesting research field. Before my GSoC 2016 application I did some work on Semantic Web technologies and on a technology (RML) that was required for a GSoC 2016 project that was listed by DBpedia. I wanted to get involved in Open Source and DBpedia, so I applied.

What did you do?

DBpedia has used a custom mapping language up until now to generate structured data from raw data from Wikipedia infoboxes. A next step was to improve this process to a more modular and generic approach that leads to higher quality linked data generation . This new approach relied on the integration of RML, the RDF Mapping Language and was the goal of the GSoC 2016 project I applied for. Understanding all the necessary details about the GSoC project required some effort and research before I started with coding. I also had to learn a new programming language (Scala). I had good assistance from my mentors so this turned out very well in the end.  DBpedia’s Extraction Framework, which is used for extracting structured data from Wikipedia, has a quite large codebase. It was the first project of this size I was involved in. I learned a lot from reading its codebase and from contributing by writing code during these months.

Dimitris Kontokostas and Anastasia Dimou were my two mentors. They guided me well throughout the project. I interacted daily with them through Slack and each week we had a conference call to discuss the project.  After many months of research, coding and discussing we managed to deliver a working prototype at the end of the project. The work we did was presented in Leipzig on the DBpedia day during SEMANTICS 16’. Additionally, this work will also be presented at ISWC 2017.

You can check out his project here.

How do you currently contribute to improve DBpedia?  

I’m mentoring a GSoC17 project together with Dimitris Kontokostas and Anastasia Dimou as a follow up on the work that was done by our GSoC 2016 project last year. Ismael Rodriguez is the new student who is participating in the project and he already delivered great work! Besides being a mentor for GSoC 2017, I make sure that the integration of RML into DBpedia is going into the right direction in general (managing, coding,…). For this reason, I worked at the KILT/DBpedia office in Leipzig during summer for 6 weeks. Joining and getting to know the team was a great experience.

What did you gain from the project?

Throughout the project I practiced coding, working in a team, … I learned more about DBpedia, RML, Linked Data and other related technologies. I’m glad I had the opportunity to learn this much from the project. I would recommend it to all students who are curious about DBpedia, who are eager to learn and who want to earn a stipend during summer through coding. You’ll learn a lot and you’ll have a good time!

Final words to future GSoC applicants for DBpedia projects.

Give it a shot! Really, it’s a lot of fun! Coding for DBpedia through GSoC is a great, unique experience and one who is enthusiastic about coding and the DBpedia project should definitely go for it.

Check our website for further updates, follow us on #twitter or subscribe to our newsletter.

Yours

DBpedia Association

 

Smart Minds Wanted

New Internship Opportunity @

In conjunction with Springer Nature,  DBpedia offers a 3 months internship at Springer Nature in London, UK and at DBpedia in Leipzig, Germany.

Internship Details

Position DBpedia Intern
Main Employer DBpedia Association
Deadline June 30th, 2017
Duration 3 months/full-time, internship will starts in the second half of 2017
Location 50% in London (UK) and 50% in Leipzig (GER)
Type of students desired Undergraduate, Graduate (Junior role)
Compensation You will receive a stipend of 1300€ per month and additional reimbursement of your travel and visa costs (total up to 1000€)

The student intern will be responsible for assisting with mappings for DBpedia at SpringerNature. Your tasks include and are not restricted to improving the quality of the extraction mechanism of DBpedia scholarly references/wikipedia citations to Springer Nature URIs and Text mining of DBpedia entities from Springer Nature publication content.

Did we spark your interest? Check  our website for further information or apply directly via our online application form

We are looking forward to meet all the whiz kids out there.

Your

DBpedia Association

GSoC 2017- may the code be with you

GSoC students have finally been selected.

We are very excited to announce this year’s final students for our projects  at the Google Summer of Code program (GSoC).

Google Summer of Code is a global program focused on bringing more student developers into open source software development. Stipends are awarded to students to work on a specific DBpedia related project together with a set of dedicated mentors during summer 2017 for the duration of three months.

For the past 5 years DBpedia has been a vital part of the GSoC program. Since the very first time many Dbpedia projects have been successfully completed.

In this years GSoC edition, DBpedia received more than 20 submissions for selected DBpedia projects. Our mentors read many promising proposals, evaluated them and now the crême de la crême of students snatched a spot for this summer.  In the end 7 students from around the world were selected and will jointly work together with their assigned mentors on their projects. DBpedia developers and mentors are really excited about this 7 promising student projects.

List of students and projects:

You want to read more about their specific projects? Just click below… or check GSoC pages for details.

 Ismael Rodriguez – Project Description: Although the DBPedia Extraction Framework was adapted to support RML mappings thanks to a project of last year GSoC, the user interface to create mappings is still done by a MediaWiki installation, not supporting RML mappings and needing expertise on Semantic Web. The goal of the project is to create a front-end application that provides a user-friendly interface so the DBPedia community can easily view, create and administrate DBPedia mapping rules using RML. Moreover, it should also facilitate data transformations and overall DBPedia dataset generation. Mentors: Anastasia Dimou, Dimitris Kontokostas, Wouter Maroy 

Ram Ganesan Athreya – Project Description:The requirement of the project is to build a conversational Chatbot for DBpedia which would be deployed in at least two social networks.There are three main challenges in this task. First is understanding the query presented by the user, second is fetching relevant information based on the query through DBpedia and finally tailoring the responses based on the standards of each platform and developing subsequent user interactions with the Chatbot.Based on my understanding, the process of understanding the query would be undertaken by one of the mentioned QA Systems (HAWK, QANARY, openQA). Based on the response from these systems we need to query the DBpedia dataset using SPARQL and present the data back to the user in a meaningful way. Ideally, both the presentation and interaction flow needs to be tailored for the individual social network.I would like to stress that although the primary medium of interaction is text, platforms such as Facebook insist that a proper mix between chat and interactive elements such as images, buttons etc would lead to better user engagement. So I would like to incorporate these elements as part of my proposal.

Mentor: Ricardo Usbeck

 

Nausheen Fatma – Project discription:  Knowledge base embeddings has been an active area of research. In recent years a lot of research work such as TransE, TransR, RESCAL, SSP, etc. has been done to get knowledge base embeddings. However none of these approaches have used DBpedia to validate their approach. In this project, I want to achieve the following tasks: i) Run the existing techniques for KB embeddings for standard datasets. ii) Create an equivalent standard dataset from DBpedia for evaluations. iii) Evaluate across domains. iv) Compare and Analyse the performance and consistency of various approaches for DBpedia dataset along with other standard datasets. v)Report any challenges that may come across implementing the approaches for DBpedia. Along the way, I would also try my best to come up with any new research approach for the problem.

Mentors: Sandro Athaide Coelho, Tommaso Soru

 

Akshay Jagatap – Project Description: The project aims at defining embeddings to represent classes, instances and properties. Such a model tries to quantify semantic similarity as a measure of distance in the vector space of the embeddings. I believe this can be done by implementing Random Vector Accumulators with additional features in order to better encode the semantic information held by the Wikipedia corpus and DBpedia graphs.

Mentors: Pablo Mendes, Sandro Athaide Coelho, Tommaso Soru

 

Luca Virgili –  Project Description: In Wikipedia a lot of data are hidden in tables. What we want to do is to read correctly all tables in a page. First of all, we need a tool that can allow us to capture the tables represented in a Wikipedia page. After that, we have to understand what we read previously. Both these operations seem easy to make, but there are many problems that could arise. The main issue that we have to solve is due to how people build table. Everyone has a particular style for representing information, so in some table we can read something that doesn’t appear in another structure. In this paper I propose to improve the last year’s project and to create a general way for reading data from Wikipedia tables. I want to review the parser for Wikipedia pages for trying to understand more types of tables possible. Furthermore, I’d like to build an algorithm that can compare the column’s elements (that have been read previously by the parser) to an ontology so it could realize how the user wrote the information. In this way we can define only few mapping rules, and we can make a more generalized software.

Mentors: Emanuele Storti, Domenico Potena

 

Shashank Motepalli – Project Description: DBpedia tries to extract structured information from Wikipedia and make information available on the Web. In this way, the DBpedia project develops a gigantic source of knowledge. However, the current system for building DBpedia Ontology relies on Infobox extraction. Infoboxes, being human curated, limit the coverage of DBpedia. This occurs either due to lack of Infoboxes in some pages or over-specific or very general taxonomies. These factors have motivated the need for DBTax.DBTax follows an unsupervised approach to learning taxonomy from the Wikipedia category system. It applies several inter-disciplinary NLP techniques to assign types to DBpedia entities. The primary goal of the project is to streamline and improve the approach which was proposed. As a result, making it easy to run on a new DBpedia release. In addition to this, also to work on learning taxonomy of DBTax to other Wikipedia languages.

Mentors: Marco Fossati, Dimitris Kontokostas

 

Krishanu Konar – Project Description: Wikipedia, being the world’s largest encyclopedia, has humongous amount of information present in form of text. While key facts and figures are encapsulated in the resource’s infobox, and some detailed statistics are present in the form of tables, but there’s also a lot of data present in form of lists which are quite unstructured and hence its difficult to form into a semantic relationship. The project focuses on the extraction of relevant but hidden data which lies inside lists in Wikipedia pages. The main objective of the project would be to create a tool that can extract information from wikipedia lists, form appropriate RDF triplets that can be inserted in the DBpedia dataset.

Mentor: Marco Fossati 

Read more

Congrats to all selected students! We will keep our fingers crossed now and patiently wait until early September, when final project results are published.

An encouraging note to the less successful students.

The competition for GSoC slots is always on a very high level and DBpedia only has a limited amount of slots available for students.  In case you weren’t among the selected, do not give up on DBpedia just yet. There are plenty of opportunities to prove your abilities and be part of the DBpedia experience. You, above all, know DBpedia by heart. Hence, contributing to our support system is not only a great way to be part of the DBpedia community but also an opportunity to be vital to DBpedia’s development. Above all, it is a chance for current DBpedia mentors to get to know you better. It will give your future mentors a chance to  support you and help you to develop your ideas from the very beginning.

Go on you smart brains, dare to become a top DBpedia expert and provide good support for other DBpedia Users. Sign up to our support page  or check out the following ways to contribute:

Get involved:
  • Join our DBpedia-discussion -mailinglist, where we discuss current DBpedia developments. NOTE: all mails announcing tools or call to papers unrelated to DBpedia are not allowed. This is a community discussion list.
  • If you like to join DBpedia developers discussion and technical discussions sign up in Slack
  • Developer Discussion
  • Become a DBpedia Student and sign up for free at the DBpedia Association. We offer special programs that provide training and other opportunities to learn about DBpedia and extend your Semantic Web and programming skills

We are looking forward to working with you!

You don’t have enough of DBpedia yet? Stay tuned and join us on facebook, twitter or subscribe to our newsletter for the latest news!

 

Have a great weekend!

Your

DBpedia Association