Monthly Archives: December 2020

2020 – Oh What a Challenging Year

Can you believe it..? … thirteen years ago the first DBpedia dataset was released. Thirteen years of development, improvements and growth. Now more than 2,600 GByte of Data is uploaded on the DBpedia Databus. We want to take this as an opportunity to send out a big Thank you! to all contributors, developers, coders, hosters, funders, believers and DBpedia enthusiasts who made that possible. Thank you for your support!

In the upcoming Blog-Series, we like to take you on a retrospective tour through 2020, giving you insights into a year with DBpedia. We will highlight our past events and the development around the DBpedia dataset. 

A year with DBpedia and the DBpedia dataset – Retrospective Part 1

DBpedia Workshop colocated with LDAC2020

On June 19, 2020 we organized a DBpedia workshop co-located with the LDAC workshop series to exchange knowledge regarding new technologies and innovations in the fields of Linked Data and Semantic Web. Dimitris Kontokostas (diffbot, US) opened the meeting with his delightful keynote presentation ‘{RDF} Data quality assessment – connecting the pieces’. His presentation focused on defining data quality and identification of data quality issues. Following Dimitri’s keynote many community based presentations were held, enabling an exciting workshop day

Most Influential Scholars

DBpedia has become a high-impact, high-visibility project because of our foundation in excellent Knowledge Engineering as the pivot point between scientific methods, innovation and industrial-grade output. The drivers behind DBpedia are 6 out of the TOP 10 Most Influential Scholars in Knowledge Engineering and the C-level executives of our members. Check all details here: https://www.aminer.cn/ai2000/country/Germany 

DBpedia (dataset) and Google Summer of Code 2020

For the 9th year in a row, we were part of this incredible journey of young ambitious developers who joined us as an open source organization to work on a GSoC coding project all summer. With 45 project proposals, this GSoC edition marked a new record for DBpedia. Even though Covid-19 changed a lot in the world, it couldn’t shake GSoC. If you want to have deeper insights in our GSoC student’s work you can find their blogs and repos here: https://blog.dbpedia.org/2020/10/12/gsoc2020-recap/

DBpedia Tutorial Series 2020

Stack slide from the tutorial

During this year we organized three amazing tutorials in which more than 120 DBpedians took part. Over the last year, the DBpedia core team has consolidated a great amount of technology around DBpedia. These tutorials are target to developers (in particular of DBpedia Chapters) that wish to learn how to replicate local infrastructure such as loading and hosting an own SPARQL endpoint. A core focus was the new DBpedia Stack, which contains several dockerized applications that are automatically loading data from the DBpedia Databus. We will continue organizing more tutorials in 2021. Looking forward to meeting you online! In case you miss the DBpedia Tutorial series 2020, watch all videos here

In our upcoming Blog-Post after the holidays we will give you more insights in past events and technical achievements. We are now looking forward to the year 2021. The DBpedia team plans to have meetings at the Knowledge Graph Conference, the LDK conference in Zaragoza, Spain and the SEMANTiCS conference in Amsterdam, Netherlands. We wish you a merry Christmas and a happy New Year. In the meantime, stay tuned and visit our Twitter channel or subscribe to our DBpedia Newsletter.   

Yours DBpedia Association

FinScience: leveraging DBpedia tools for fintech applications

DBpedia Member Features – In the last few weeks, we gave DBpedia members the chance to present special products, tools and applications and share them with the community. We already published several posts in which DBpedia members provided unique insights. This week we will continue with FinScience. They will present their latest products, solutions and challenges. Have fun while reading!

by FinScience

A brief presentation of who we are

FinScience is an Italian data-driven fintech company founded in 2017 in Milan by Google’s former senior managers and Alternative Data experts, who have combined their digital and financial expertise. FinScience, thus, originates from this merger of the world of Finance and the world of Data Science.
The company leverages founders’ experiences concerning Data Governance, Data Modeling and Data Platforms solutions. These are further enriched through the tech role in the European Consortium SSIX (Horizon 2020 program) focused on the building of a Social Sentiment for financial purposes. FinScience applies proprietary AI-based technologies to combine financial data/insights with alternative data in order to generate new investment ideas, ESG scores and non-conventional lists of companies that can be included in investment products by financial operators.

The FinScience’s data analysis pipeline is strongly grounded on the DBpedia ontology: the greatest value, according to our experience, is given by the possibility to connect knowledge in different languages, to query automatically-extracted structured information and to have rather frequently updated models.

Products and solutions

FinScience daily retrieves content from the web. About 1.5 million web pages are visited every day on about 35.000 different domains. The content of these pages is extracted, interpreted and analysed via Natural Language Processing techniques to identify valuable information and sources. Thanks to the structured information based on the DBpedia ontology, we can apply our proprietary AI algorithms to suggest to our customers the right investment opportunities.Our products are mainly based on the integration of this purely digital data – we call it “alternative data”- with traditional sources coming from the world of finance and sustainability. We describe these products briefly:

  • FinScience Platform for traders​: it leverages the power of machine learning to help traders monitor specific companies, spot new trends in the financial market, give access to an high added-value selection of companies and themes.
  • ESG scoring​: we provide an assessment of corporate ESG performance, by combining internal data (traditional, self-disclosed data) with external ‘alternative’ data (stakeholder-generated data) in order to measure the gap between what the companies communicate and what is stakeholder perception related to corporate sustainability commitments.
  • Thematic selections of listed companies​ : we create Trend-Driven selections oriented towards innovative themes: our data, together with the analysis of financial specialists, contribute to the selection of a set of listed companies related to trending themes such as the Green New Deal, the 5G technology or new medtech applications.

FinScience and DBpedia

As mentioned before, FinScience is strongly grounded in the DBpedia ontology, since we employ Spotlight to perform Named Entity Recognition (NER), namely automatic annotation of entities in a text. The NER task is performed with a two step procedure. The first step consists in annotating the named entity of a text using ​ DBpedia Spotlight​. In particular, Spotlight links a mention in the text (that is identified by its name and its context within the text) to the DBpedia entity that maximizes the joint probability of occurrence of both. The model is pre-trained on texts extracted from Wikipedia. Note that each entity is represented by a link to a DBpedia page (see, e.g. ​ http://dbpedia.org/page/Eni​ ), a DBpedia type indicating the type of the entity according to ​ this​ ontology and other information.

Another interesting feature of this approach is that we have a one to one mapping of the italian and english entities (and in general any language supported by DBpedia), allowing us to have a unified representation of an entity in the two languages. We are able to obtain this kind of information by exploiting the potential of ​ DBpedia Virtuoso​, which allows us to access DBpedia dataset via SPARQL. By identifying the entities mentioned in the online content, we can understand which topics are mentioned and thus identify companies and trends that are spreading in the digital ecosystem as well as analyzing how they are related to each other.

Challenges and next steps

One of the toughest challenges for us is to find an optimal way to update the models used by DBpedia Spotlight. Every day new entities and concepts arise and we are willing to recognise them in the news we analyze. And that is not all. In addition to recognizing new concepts, we need to be able to track an entity through all the updated versions of the model. In this way, we will not only be able to identify entities, but we will also have evidence of when some concepts were first generated. And we will know how they have changed over time, regardless of the names that have been used to identify them.

We are strongly involved in the DBpedia community and we try to contribute with our know-how. Particularly FinScience will contribute on infrastructure and Dockerfiles as well as on finding issues on the new released project (for instance, ​wikistats-extractor​).

A big thank you to FinSciene for presenting their products, challenges and contribution to DBpedia.  

Yours,

DBpedia Association

DBpedia Archivo – Call to improve the web of ontologies

Dear all, 

We are proud to announce DBpedia Archivo – an augmented ontology archive and interface to implement FAIRer ontologies. Each ontology is rated with 4 stars measuring basic FAIR features. We discovered 890 ontologies reaching on average 1.95 out of 4 stars. Many of them have no or unclear licenses and have issues w.r.t. retrieval and parsing. 

DBpedia Archivo: Community action on individual ontologies

We would like to call on all ontology maintainers and consumers to help us increase the average star rating of the web of ontologies by fixing and improving its ontologies. You can easily check an ontology at https://archivo.dbpedia.org/info. If you are an ontology maintainer just release a patched version – archivo will automatically pick it up 8 hours later. If you are a user of an ontology and want your consumed data to become FAIRer, please inform the ontology maintainer about the issues found with Archivo.

The star rating is very basic and only requires fixing small things. However, the impact on technical and legal usability can be immense.

Community action on all ontologies (quality, FAIRness, conformity)

Archivo is extensible and allows contributions to give consumers a central place to encode their requirements. We envision fostering adherence to standards and strengthening incentives for publishers to build a better (FAIRer) web of ontologies.

  1. SHACL (https://www.w3.org/TR/shacl/, co-edited by DBpedia’s CTO D. Kontokostas) enables easy testing of ontologies. Archivo offers free SHACL continuous integration testing for ontologies. Anyone can implement their SHACL tests and add them to the SHACL library on Github. We believe that there are many synergies, i.e. SHACL tests for your ontology are helpful for others as well. 
  2. We are looking for ontology experts to join DBpedia and discuss further validation (e.g. stars) to increase FAIRness and quality of ontologies. We are forming a steering committee and also a PC for the upcoming Vocarnival at SEMANTiCS 2021. Please message hellmann@informatik.uni-leipzig.de if you would like to join. We would like to extend the Archivo platform with relevant visualisations, tests, editing aides, mapping management tools and quality checks. 

How does DBpedia Archivo work?

Each week Archivo runs several discovery algorithms to scan for new ontologies. Once discovered Archivo checks them every 8 hours. When changes are detected, Archivo downloads and rates and archives the latest snapshot persistently on the DBpedia Databus.

Archivo’s mission

Archivo’s mission is to improve FAIRness (findability, accessibility, interoperability, and reusability) of all available ontologies on the Semantic Web. Archivo is not a guideline, it is fully automated, machine-readable and enforces interoperability with its star rating.

– Ontology developers can implement against Archivo until they reach more stars. The stars and tests are designed to guarantee the interoperability and fitness of the ontology.

– Ontology users can better find, access and re-use ontologies. Snapshots are persisted in case the original is not reachable anymore adding a layer of reliability to the decentral web of ontologies.

Please find the current paper about DBpedia Archivo here: https://svn.aksw.org/papers/2020/semantics_archivo/public.pdf 

Let’s all join together to make the web of ontologies more reliable and stable.

Yours,

Johannes Frey, Denis Streitmatter, Fabian Götz, Sebastian Hellmann and Natanael Arndt

PoolParty Semantic Suite: The Ideal Tool To Build And Manage Enterprise Knowledge Graphs

DBpedia Member Features – In the coming weeks, we will give DBpedia members the chance to present special products, tools and applications and share them with the community. We will publish several posts in which DBpedia members provide unique insights. This week the Semantic Web Company will present use cases for the PoolParty Semantic Suite. Have fun while reading!

by the Semantic Web Company

About 80 to 90 percent of the information companies generate is extremely diverse and unstructured — stored in text files, e-mails or similar documents, what makes it difficult to search and analyze. Knowledge graphs have become a well-known solution to this problem because they make it possible to extract information from text and link it to other data sources, whether structured or not. However, building a knowledge graph at enterprise scale can be challenging and time-consuming.

PoolParty Semantic Suite is the most complete and secure semantic platform on the global market. It is also the ideal tool to help companies build and manage Enterprise Knowledge Graphs. With PoolParty in place, you will have no problems extracting value from large amounts of heterogeneous data, no matter if it’s stored in a relational database or in text files. The platform provides comprehensive tools for the management of enterprise knowledge graphs along the entire life cycle. Here is a list of the main use cases for the PoolParty Semantic Suite:

Data linking and enrichment

Driven by the Linked Data initiative, increasing amounts of viable data sets about various topics have been published on the Semantic Web. PoolParty allows users to use these online resources, amongst them DBPedia, to easily and quickly enrich a thesaurus with more data.

Search and recommender engines

Arrive at enriched and in-depth search results that provide relevant facts and contextualized answers to your specific questions, rather than a broad search result with many (ir)relevant documents and messages – but no valuable input. PoolParty Semantic Suite can be used to implement semantic search and recommendations that are relevant to your users.

Text Mining and Auto Tagging

Manually tagging an entire database is very time-consuming and often leads to inconsistent search results. PoolParty’s graph-based text mining can improve this process making it faster, consistent and precise. This is achieved by using advanced text mining algorithms and Natural Language Processing to automatically extract relevant entities, terms and other metadata from text and documents, helping drive in-depth text analytics.

Data Integration and Data Fabric

The Semantic Data Fabric is a new solution to data silos that combines the best-of-breed technologies, data catalogs and knowledge graphs, based on Semantic AI. With a semantic data fabric, companies can combine text and documents (unstructured) with data residing in relational databases and data warehouses (structured) to create a comprehensive view of their customers, employees, products, and other vital areas of business.

Taxonomies, Ontologies and Knowledge Graphs That Scale

With release 8.0 of the PoolParty Semantic Suite, users have even more options to conveniently generate, edit, and use knowledge graphs. In addition, the powerful and performant GraphDB by Ontotext has been added as PoolParty’s recommended embedded store and it is shipped as an add-on module. GraphDB is an enterprise-level graph database with state-of-the-art performance, scalability and security. This provides greater robustness to PoolParty and allows you to work with much larger taxonomies effectively.

A big thank you to the Semantic Web Company presenting use cases for the PoolParty Semantic Suite. 

Yours,

DBpedia Association