Tag Archives: survey

Who are these DBpedia users ? …(and why ? )

Guest article by Victor de Boer, Vrije Universiteit Amsterdam, NL, member of NL-DBpedia

Who uses DBpedia anyway?…

This question started a research project for Frank Walraven, an Information Sciences Master student at Vrije Universiteit Amsterdam (VUA). The question came up during one of the meetings of the Dutch DBpedia chapter, of which VUA is a member.

If DBpedia users and their usage are better understood, this can lead to better servicing of those Dbpedia users by, for example, prioritizing the enrichment or improvement of specific sections of DBpedia. Characterizing use(r)s of a Linked Open Dataset is an inherently challenging task because in an open web world it is difficult to tell who is accessing your digital resources.

Frank conducted his MSc project research at the Dutch National Library  and used a hybrid approach utilizing both, a data-driven method based on user log analysis and a short survey to get to know the users of the dataset.

 As a scope, Frank selected just the Dutch DBpedia dataset. For the data-driven part of the method, Frank used a complete user log of HTTP requests on the Dutch DBpedia. This log file consisted of over 4.5 Million entries and logged both URI lookups and SPARQL endpoint requests. For this research, he only included a subset of the URI lookups.

Analysis of IP- Addresses od DBpedia Users

As a first analysis step, the requests’ origins IPs were categorized. Five classes can be identified (A-E), with the vast majority of IP addresses being in class “A”: Very large networks and bots. Most of the IP addresses in these lists could be traced back to search engine indexing bots such as those from Yahoo or Google. In classes B-F, Frank manually traced the top 30 most encountered IP-addresses. He concluded that even there 60% of the requests came from bots, 10% definitely not from bots, with 30% remaining unclear.

 

 

 

Step II – Identification of Page Requests

The second analysis step in the data-driven method consisted of identifying what types of pages were most requested. To cluster the thousands of DBpedia URI request, Frank retrieved the ‘categories’ of the pages. These categories are extracted from Wikipedia category links. An example is the “Android_TV” resource, which has two categories: “Google” and “Android_(operating_system)”. Following skos:broader links, a ‘level 2 category’ could also be found to aggregate to an even higher level of abstraction. As not all resources have such categories, this does not give a complete image, but it does provide some ideas on the most popular categories of items requested. After normalizing for categories with large amounts of incoming links, for example, the category “non-endangered animal”, the most popular categories where

  • 1. Domestic & International movies,
  • 2. Music,
  • 3. Sports,
  • 4. Dutch & International municipality information and
  • 5. Books.
 Survey

Additionally, Frank set up a user survey to corroborate this evidence. The survey contained questions about the how and why of the respondents use of the Dutch DBpedia, including the categories they were most interested in.

The survey was distributed using the Dutch DBpedia website and via Twitter. However, the endeavour only attracted 5 respondents. This illustrates the difficulty of the problem that users of the DBpedia resource are not necessarily easily reachable through communication channels. The five respondents were all quite closely related to the chapter but the results were interesting nonetheless. Most of the DBpedia users used the DBpedia SPARQL endpoint. The full results of the survey can be found through Frank’s thesis, but in terms of corroboration, the survey revealed that four out of the five categories found in the data-driven method were also identified in the top five results from the survey. The fifth one identified in the survey was ‘geography’, which could be matched to the fifth from the data-driven method.

Conclusion

Frank’s research shows that it remains a challenging problem, using a combination of data-driven and user-driven method. Yet,  it is indeed possible to get an indication into the most-used categories on DBpedia. Within the Dutch DBpedia Chapter, we are currently considering follow-up research questions based on Frank’s research. For further information about the work of the Dutch DBpedia chapter, please visit their website. 

A big thanks to the Dutch DBpedia Chapter for supervising this research and providing insights via this post.

Yours

DBpedia Association

DBpedia Chapters – Survey Evaluation – Episode Two

Welcome back to part two of the evaluation of the surveys, we conducted with the DBpedia chapters.

Survey Evaluation – Episode Two

The second survey focused on technical matters. We asked the chapters about the usage of DBpedia services and tools, technical problems and challenges and potential reasons to overcome them.  Have a look below.

Again, only nine out of 21 DBpedia chapters participated in this survey. And again, that means, the results only represent roughly 42% of the DBpedia chapter population

The good news is, all chapters maintain a local DBpedia endpoint. Yay! More than 55 % of the chapters perform their own extraction. The rest of them apply a hybrid approach reusing some datasets from DBpedia releases and additionally, extract some on their own.

Datasets, Services and Applications

In terms of frequency of dataset updates, the situation is as follows:  44,4 % of the chapters update them once a year. The answers of the remaining ones differ in equal shares, depending on various factors. See the graph below. 

 

 

 

 

 

 

 

When it comes to the maintenance of links to local datasets, most of the chapters do not have additional ones. However, some do maintain links to, for example, Greek WordNet, the National Library of Greece Authority record, Geonames.jp and the Japanese WordNet. Furthermore, some of the chapters even host other datasets of local interest, but mostly in a separate endpoint, so they keep separate graphs.

Apart from hosting their own endpoint, most chapters maintain one or the other additional service such as Spotlight, LodLive or LodView.

 

 

 

 

 

 

 

Moreover,  the chapters have additional applications they developed on top of DBpedia data and services.

Besides, they also gave us some reasons why they were not able to deploy DBpedia related services. See their replies below.

 

 

 

 

 

 

 

 

 

DBpedia Chapter set-up

Lastly, we asked the technical heads of the chapters what the hardest task for setting up their chapter had been.  The answers, again, vary as the starting position of each chapter differed. Read a few of their replies below.

The hardest technical task for setting up the chapter was:

  • to keep virtuoso up to date
  • the chapter specific setup of DBpedia plugin in Virtuoso
  • the Extraction Framework
  • configuring Virtuoso for serving data using server’s FQDN and Nginx proxying
  • setting up the Extraction Framework, especially for abstracts
  • correctly setting up the extraction process and the DBpedia facet browser
  • fixing internationalization issues, and updating the endpoint
  • keeping the extraction framework working and up to date
  • updating the server to the specific requirements for further compilation – we work on Debian

 

Final  words

With all the data and results we gathered, we will get together with our chapter coordinator to develop a strategy of how to improve technical as well as organizational issues the surveys revealed. By that, we hope to facilitate a better exchange between the chapters and with us, the DBpedia Association. Moreover, we intend to minimize barriers for setting up and maintaining a DBpedia chapter so that our chapter community may thrive and prosper.

In the meantime, spread your work and share it with the community. Do not forget to follow and tag us on Twitter ( @dbpedia ). You may also want to subscribe to our newsletter.

We will keep you posted about any updates and news.

Yours

DBpedia Association

DBpedia Chapters – Survey Evaluation – Episode One

DBpedia Chapters – Challenge Accepted

The DBpedia community currently comprises more than 20 language chapters, ranging from  Basque, Japanese to Portuguese and Ukrainian. Managing such a variety of chapters is a huge challenge for the DBpedia Association because individual requirements are as diverse as the different languages the chapters represent. There are chapters that started out back in 2012 such as DBpediaNL. Others like the Catalan chapter are brand new and have different haves and needs.

So, in order to optimize chapter development, we aim to formalize an official DBpedia Chapter Consortium. It permits a close dialogue with the chapters in order to address all relevant matters regarding communication, organization as well as technical issues. We want to provide the community with the best basis to set up new chapters and to maintain or develop the existing ones.

Our main targets for this are to: 

  • improve general chapter organization,
  • unite all DBpedia chapters with central DBpedia,
  • promote better communication and understanding and,
  • create synergies for further developments and make easier the access to information about which is done by all DBpedia bodies

As a first step, we needed to collect information about the current state of things.  Hence, we conducted two surveys to collect the necessary information. One was directed at chapter leads and the other one at technical heads. 

In this blog-post, we like to present you the results of the survey conducted with chapter leads.  It addressed matters of communication and organizational relevance. Unfortunately, only nine out of 21 chapters participated, so the respective outcome of the survey speaks only for roughly 42% of all DBpedia chapters.

Chapter-Survey  – Episode One

Most chapters have very little personnel committed to the work done for the chapter, due to different reasons. 66 % of the chapters have only one till four people being involved in the core work. Only one chapter has about ten people working on it.

Overall, the chapters use various marketing channels for promotion, visibility and outreach. The website as well as event participation, Twitter and Facebook are among the most favourite channels they use. 

The following chart shows how chapters currently communicate organizational and communication issues in their respective chapter and to the DBpedia Association.

 

 

The second one explicit that ⅓ of the chapters favour an exchange among chapters and with the DBpedia Association via the discussion mailing list as well as regular chapter calls.

 

The survey results show that 66,6% of the chapters currently do not consider their current mode of communication efficient enough. They think that their communication with the DBpedia Association should improve.

 

As pointed out before, most chapters only have little personnel resources. It is no wonder that most of them need help to improve the work and impact of chapter results. The following chart shows the kind of support chapters require to improve their overall work, organization and communication. Most noteworthy, technical, marketing and organization support are hereby the top three aspects the chapters need help with. 

 

 

The good news is all of the chapters maintain a DBpedia Website. However, the frequency of updates varies among them. See the chart on the right.

 

 

 

Earlier this year, we announced that we like to align all chapter websites with the main DBpedia website. That includes a common structure and a corporate design, similar to the main one.  Above all, this is important for the overall image and recognition factor of DBpedia in the tech community. With respect to that, we inquired whether chapters would like to participate in an alignment of the websites or not.

 

 

 

With respect to marketing support, the chapters require from the Association, more than 50% of the chapters like to be frequently promoted via the main DBpedia twitter channel.

 

 

Good news: just forward us your news or tag us with @dbpedia and we will share ’em.

Almost there.

Finally, we asked about chapters requirements to improve their work and, the impact of their chapters’ results. 

 

Bottom line

All in all, we are very grateful for your contribution. Those data will help us to develop a strategy to work towards the targets mentioned above. We will now use this data to conceptualize a little program to assist chapters in their organization and marketing endeavours. Furthermore, the information given will also help us to tackle the different issues that arose, implement the necessary support and improve chapter development and chapter visibility.

In episode two, we will delve into the results of the technical survey. Sit tight and follow us on Twitter, Facebook, LinkedIn or subscribe to our newsletter.

Finally, one last remark. If you want to promote news of your chapter or otherwise like to increase its visibility, you are always welcome to:

  • forward us the respective information to be promoted via our marketing channels 
  • use your own Twitter channel and tag your post with @dbpedia,  so we can retweet your news. 
  • always use #dbpediachapters

Looking forward to your news.

Yours

DBpedia Association

Results of the DBpedia Strategy Survey 2017

Sören Auer and the DBpedia Board members prepared a survey to assess the direction of the DBpedia Association. We wanted to know what the DBpedia Community thinks about DBpedia’s strategic priorities and how the funds of the DBpedia Association are be spent. Between February 2017 and April 2017, a total of 40 members of the DBpedia Community actively participated in the survey and voted as follows:

1. What should be the priorities of the DBpedia Association in the next year?

To overview the various priorities which were mentioned, the following digest illustrates the answers in four different groups. The most frequent answer was: to increase the data quality, followed by the enlargement of the DBpedia Community through broader dissemination.

2. What should be the priorities of the DBpedia Association in the next three years?

In contrast to question one, this one is based on the priorities the DBpedia Association focuses on during the next three years. As well as in the previous overview, the specified priorities are divided into four categories.

3. What is your main interest in DBpedia?

The chart above depicts the several main interests in DBpedia. The majority of participants have an “academic & professional” (45.7%) interest in DBpedia, followed by “professional” (28.6%) and “academic” (20.0%) interests. Only 2.9% of the answers are student-related interests.

4. How should the funds of the association be used?

With respects to “How should the funds of the association be used?”, most attendees chose “service provisioning”. The “development of new DBpedia features” was the second most popular choice. Nevertheless, also “Community building” and “release production” scored many votes.

5. How should the DBpedia Association collaborate with national/language chapters?

  • Agreeing on strategic goals; making sure that national contributions can be spread to other chapters, thus increasing the overall usability of DBpedia; keeping track of good practices
  • Facilitating grassroots initiatives – so mainly promote and stimulate national/language initiatives
  • Local events related to DBpedia tasks
  • Regular events to share ideas and data
  • Join other languages members onto DBpedia
  • As an umbrella organization: support, mediation, and representation
  • Regular exchange and involvement
  • Consult, try to figure out common priorities

6. Should DBpedia open itself to contain and curate more data not directly extracted from Wikipedia?As the chart above clearly depicts, more than half of the participants are in favor of DBpedia comprising datasets not directly derived or extracted from Wikipedia. In contrast, 34.3% have the oppositional opinion and appreciate  DBpedia focussing solely on data extraction from Wikipedia.

  • If yes, which other datasets should DBpedia prioritize for fusion to improve its coverage and quality?

7. Which of the following features do you consider most important?

The following diagram gives a review of particular features and their importance from the participants point of view. As the result of question one reveals, data quality is considered the most important issue by the survey participants (23.7%). The second most important features, with 17.2% each, are: the provision of datasets extracted from the Wikipedia article text, substantial collaboration/integration with WikiData and a provision of better search, respectively an exploration of user interfaces.

8. Any other question, feedback, opinion, ideas or suggestion you would like to send to the association.

  • KUTGW
  • Increased support of non-RDF publication formats is probably wise as an insurance policy that DBpedia will stay relevant.
  • In users mailing-list being more open-minded in an easy manner and always signalling provocative postings are welcome. And I fear it is a bit late for this survey, but better late than never, my greetings to all making some thoughts about this stuff.
  • DBpedia Spotlight should return Wikidata URIs by default, for stability
  • Use a richer ontology without contradictions, e.g. Book-Physical vs. Book-Conceptual Work

Thank you for your input and your participation! Your priorities and opinions are of vital importance for the success of DBpedia in the future. We will discuss the implementation of your answers during our next DBpedia Board Meetings in order to find a reasonable strategic direction of the DBpedia Association for the next years.

Check our website for further updates, follow us on #twitter or subscribe to our newsletter.

Your

DBpedia Association

DBpedia strategy survey

Dear DBpedians,

Sören Auer and the DBpedia Board members prepared a survey to assess the direction of the DBpedia Association. We would like to know what you think should be our priorities and how you would like the funds of the association to be used.

Your opinion counts so please contribute actively in developing a better DBpedia. If you use DBpedia and want us to keep going forward, we kindly invite you to vote here: https://goo.gl/forms/rDqLcwL823Ok09Uw2

We will publish the results in anonymized, aggregated form on the DBpedia website.

We are looking forward to your input. Check our website for further updates, follow us on #twitter or subscribe to our newsletter.

Your DBpedia Association