Tag Archives: Dutch DBpedia

More than 130 knowledge graph enthusiasts joined the KGiA event.

Opening the KG in Action event

The SEMANTiCS Onsite Conference 2020 had to be postponed till September 2021. To bridge the gap until 2021, we took this opportunity to organize the Knowledge Graphs in Action (KGiA) online track as a SEMANTiCS satellite event on October 6, 2020. This new online conference is a combination of two existing events: the DBpedia Community Meeting and the annual Spatial Linked Data conference organised by EuroSDR and the Platform Linked Data Netherlands. We combined the best of both and as a bonus we added a track about Geo-information Integration organized by EuroSDR. As special joint sessions we presented four keynote speakers. 

First and foremost, we would like to thank the SEMANTiCS, EuroSDR and Platform Linked Data Netherlands for organizing the KGiA online event and many thanks to all chairs who supported the conference.

Following, we will give you a brief retrospective about the keynote presentations and talks.

Opening & Keynote #1

The Knowledge Graphs in Action conference was opened with a keynote presentation ‘Data Infrastructure for Energy System Models’ by Carsten Hoyer-Klick (German Aerospace Center). He presented LOD GEOSS, a project for the development of a distributed data infrastructure for the analysis of energy systems. The project is about the development of networked database concepts based on the ideas of linked open data and the semantic web for input and output data of energy system models in energy systems analysis. Afterwards the conference chairs offered three parallel sessions in the morning. 

Morning Sessions 

Session 1: Spatial Linked Data Country Update

In this session 7 speakers presented the uptake and latest progress of Spatial Linked Data adoption in European countries, either within national mapping agencies or beyond.

Session 2: VGI country presentations

There is an increasing use of crowdsourced geo-information (CGI) in spatial data applications by National Mapping and Cadastral Agencies (NMCAs). Applications range from using CGI for supporting the actualisation of spatial data to adding extra content, such as land use, building entrances, road barriers, sensors placed in the public space and many more. This session hosted five presentations from NMCAs showing the status of their CGI integration in mapping applications and processes.

Session 3: DBpedia Member presentations

Members of the DBpedia Association presented their latest tools, applications and technical developments in this session. Filipe Mesquita (Diffbot) opened the member session with his talk ‘Beyond Human Curation: How Diffbot Is Building A Knowledge Graph of the Web’. Also ImageSnippets, timbr.ai and GNOSS gave interesting and delightful talks about their technical developments. Vassil Momtchev from Ontotext closed the session by giving insights into the GraphDB 9.4.   

For further details of the presentations follow the links to the slides on the event page.

Afternoon Sessions 

Keynote #2

The afternoon sessions started with an interesting keynote by Peter Mooney (Maynooth University). He talked about the opportunities for a more integrated approach to Geo-information integration. 

Dutch National Graph as a Digital Twin

After the second keynote Sebastian Hellmann, the CEO of the DBpedia Association, presented the development and methodology of the National Knowledge Graph for the Netherlands. In cooperation with Dutch partners, DBpedia invested two months to develop this new knowledge graph. His insightful presentation was followed by Benedicte Bucher (University Gustave Eiffel) talking about ‘Knowledge Graph on spatial digital assets in European’. She also presented the EuroSDR LDG initiative in many details.      

Afternoon Parallel Sessions

Session 4: Transforming Linked Data into a networked data economy – DBpedia Chapter Session

In the DBpedia Chapter Session, members of different European DBpedia chapters gave an overview about the data landscape in their countries. They presented identified business opportunities and important challenges, such as automated clearance of licenses in their countries. Enno Meijers (National Library of the Netherlands) summarized the data landscape in the Netherlands. There were also presentations about the data landscape in Brazil, Spain, Austria and Poland.   

Session 5: EuroSDR VGI data wrangling

This session intends to uncover new combinations and integration of CGI data with data from NMCAs which demonstrate the added value for map creation and map usage. Data wrangling (the process of creating small reproducible data processing workflows) is deployed for this work by using and combining existing geospatial software (desktop, web and mobile). In this session the results of the data wrangling process were presented. 

Session 6: Spatial Session

In this session, two speakers presented how they built knowledge graphs, and in the second part three presenters gave insights into tooling and presented the state of the art on working with Linked Data.

For further details of the presentations follow the links to the slides on the event page.

Keynote #3 and #4

Keynote #3 ‘Spatial Knowledge in Action – Deep semantics, geospatial thinking, and new cartographies’ was given by Marinos Kavouras (National Technical University of Athens). Marinos stated that the power of maps and modern cartographic language proves to have a new role for society at large, as an indispensable communication and cognitive tool. The KG in Action conference ended with the keynote presentation ‘Know, Know Where, KnowWhereGraph’ by Krzysztof Janowicz (University of California). During his live talk from California, Krzysztof provided an overview of ideas and hopes for creating geo-specific knowledge graphs and geo-enrichment services on top of this graph to address some of the aforementioned challenges.

In case you missed the event, all slides and presentations are also available on the DBpeda website. We will upload all recordings on the DBpedia youtube channel. Further insights, feedback and photos about the event are available on Twitter (#KGiA hashtag).

We are now looking forward to 2021. We plan to have meetings at the Knowledge Graph Conference and the SEMANTiCS conference in Amsterdam. Stay safe and check Twitter, LinkedIn and our Website or subscribe to our Newsletter for the latest news and information.

Yours,

DBpedia Association

Who are these DBpedia users ? …(and why ? )

Guest article by Victor de Boer, Vrije Universiteit Amsterdam, NL, member of NL-DBpedia

Who uses DBpedia anyway?…

This question started a research project for Frank Walraven, an Information Sciences Master student at Vrije Universiteit Amsterdam (VUA). The question came up during one of the meetings of the Dutch DBpedia chapter, of which VUA is a member.

If DBpedia users and their usage are better understood, this can lead to better servicing of those Dbpedia users by, for example, prioritizing the enrichment or improvement of specific sections of DBpedia. Characterizing use(r)s of a Linked Open Dataset is an inherently challenging task because in an open web world it is difficult to tell who is accessing your digital resources.

Frank conducted his MSc project research at the Dutch National Library  and used a hybrid approach utilizing both, a data-driven method based on user log analysis and a short survey to get to know the users of the dataset.

 As a scope, Frank selected just the Dutch DBpedia dataset. For the data-driven part of the method, Frank used a complete user log of HTTP requests on the Dutch DBpedia. This log file consisted of over 4.5 Million entries and logged both URI lookups and SPARQL endpoint requests. For this research, he only included a subset of the URI lookups.

Analysis of IP- Addresses od DBpedia Users

As a first analysis step, the requests’ origins IPs were categorized. Five classes can be identified (A-E), with the vast majority of IP addresses being in class “A”: Very large networks and bots. Most of the IP addresses in these lists could be traced back to search engine indexing bots such as those from Yahoo or Google. In classes B-F, Frank manually traced the top 30 most encountered IP-addresses. He concluded that even there 60% of the requests came from bots, 10% definitely not from bots, with 30% remaining unclear.

 

 

 

Step II – Identification of Page Requests

The second analysis step in the data-driven method consisted of identifying what types of pages were most requested. To cluster the thousands of DBpedia URI request, Frank retrieved the ‘categories’ of the pages. These categories are extracted from Wikipedia category links. An example is the “Android_TV” resource, which has two categories: “Google” and “Android_(operating_system)”. Following skos:broader links, a ‘level 2 category’ could also be found to aggregate to an even higher level of abstraction. As not all resources have such categories, this does not give a complete image, but it does provide some ideas on the most popular categories of items requested. After normalizing for categories with large amounts of incoming links, for example, the category “non-endangered animal”, the most popular categories where

  • 1. Domestic & International movies,
  • 2. Music,
  • 3. Sports,
  • 4. Dutch & International municipality information and
  • 5. Books.
 Survey

Additionally, Frank set up a user survey to corroborate this evidence. The survey contained questions about the how and why of the respondents use of the Dutch DBpedia, including the categories they were most interested in.

The survey was distributed using the Dutch DBpedia website and via Twitter. However, the endeavour only attracted 5 respondents. This illustrates the difficulty of the problem that users of the DBpedia resource are not necessarily easily reachable through communication channels. The five respondents were all quite closely related to the chapter but the results were interesting nonetheless. Most of the DBpedia users used the DBpedia SPARQL endpoint. The full results of the survey can be found through Frank’s thesis, but in terms of corroboration, the survey revealed that four out of the five categories found in the data-driven method were also identified in the top five results from the survey. The fifth one identified in the survey was ‘geography’, which could be matched to the fifth from the data-driven method.

Conclusion

Frank’s research shows that it remains a challenging problem, using a combination of data-driven and user-driven method. Yet,  it is indeed possible to get an indication into the most-used categories on DBpedia. Within the Dutch DBpedia Chapter, we are currently considering follow-up research questions based on Frank’s research. For further information about the work of the Dutch DBpedia chapter, please visit their website. 

A big thanks to the Dutch DBpedia Chapter for supervising this research and providing insights via this post.

Yours

DBpedia Association