DBpedia Member Features – In the last few weeks, we gave DBpedia members the chance to present special products, tools and applications and share them with the community. We already published several posts in which DBpedia members provided unique insights. This week we will continue with Diffbot. They will present the Diffbot Knowledge Graph and various extraction tools. Have fun while reading!
Diffbot’s mission to “structure the world’s knowledge” began with Automatic Extraction APIs meant to pull structured data from most pages on the public web by leveraging machine learning rather than hand-crafted rules.
More recently, Diffbot has emerged as one of only three Western entities to crawl a vast majority of the web, utilizing our Automatic Extraction APIs to make the world’s largest commercially-available Knowledge Graph.
A Knowledge Graph At The Scale Of The Web
The Diffbot Knowledge Graph is automatically constructed by crawling and extracting data from over 60 billion web pages. It currently represents over 10 billion entities and 1 trillion facts about People, Organizations, Products, Articles, Events, among others.
Users can access the Knowledge Graph programmatically through an API. Other ways to access the Knowledge Graph include a visual query interface and a range of integrations (e.g., Excel, Google Sheets, Tableau).
Whether you’re consuming Diffbot KG data in a visual “low code” way or programmatically, we’ve continually added features to our powerful query language (Diffbot Query Language, or DQL) to allow users to “query the web like a database.”
Guilt-Free Public Web Data
Current use cases for Diffbot’s Knowledge Graph and web data extraction products run the gamut and include data enrichment; lead enrichment; market intelligence; global news monitoring; large-scale product data extraction for ecommerce and supply chain; sentiment analysis of articles, discussions, and products; and data for machine learning. For all of the billions of facts in Diffbot’s KG, data provenance is preserved with the original source (a public URL) of each fact.
Entities, Relationships, and Sentiment From Private Text Corpora
The team of researchers at Diffbot has been developing new natural language processing techniques for years to improve their extraction and KG products. In October 2020, Diffbot made this technology commercially-available to all via the Natural Language API.
Our Natural Language API pulls out entities, relationships/facts, categories and sentiment from free-form texts. This allows organizations to turn unstructured texts into structured knowledge graphs.
Diffbot and DBpedia
In addition to extracting data from web pages, Diffbot’s Knowledge Graph compiles public web data from many structured sources. One important source of knowledge is DBpedia. Diffbot also contributes to DBpedia by providing access to our extraction and KG services and collaborating with researchers in the DBpedia community. For a recent collaboration between DBpedia and Diffbot, be sure to check out the Diffbot track in DBpedia’s Autumn Hackathon for 2020.
A big thank you to Diffbot, especially Filipe Mesquita for presenting their innovative Knowledge Graph.
DBpedia Member Features – In the last few weeks, we gave DBpedia members the chance to present special products, tools and applications and share them with the community. We already published several posts in which DBpedia members provided unique insights. This week we will continue with FinScience. They will present their latest products, solutions and challenges. Have fun while reading!
A brief presentation of who we are
FinScience is an Italian data-driven fintech company founded in 2017 in Milan by Google’s former senior managers and Alternative Data experts, who have combined their digital and financial expertise. FinScience, thus, originates from this merger of the world of Finance and the world of Data Science. The company leverages founders’ experiences concerning Data Governance, Data Modeling and Data Platforms solutions. These are further enriched through the tech role in the European Consortium SSIX (Horizon 2020 program) focused on the building of a Social Sentiment for financial purposes. FinScience applies proprietary AI-based technologies to combine financial data/insights with alternative data in order to generate new investment ideas, ESG scores and non-conventional lists of companies that can be included in investment products by financial operators.
The FinScience’s data analysis pipeline is strongly grounded on the DBpedia ontology: the greatest value, according to our experience, is given by the possibility to connect knowledge in different languages, to query automatically-extracted structured information and to have rather frequently updated models.
Products and solutions
FinScience daily retrieves content from the web. About 1.5 million web pages are visited every day on about 35.000 different domains. The content of these pages is extracted, interpreted and analysed via Natural Language Processing techniques to identify valuable information and sources. Thanks to the structured information based on the DBpedia ontology, we can apply our proprietary AI algorithms to suggest to our customers the right investment opportunities.Our products are mainly based on the integration of this purely digital data – we call it “alternative data”- with traditional sources coming from the world of finance and sustainability. We describe these products briefly:
FinScience Platform for traders: it leverages the power of machine learning to help traders monitor specific companies, spot new trends in the financial market, give access to an high added-value selection of companies and themes.
ESG scoring: we provide an assessment of corporate ESG performance, by combining internal data (traditional, self-disclosed data) with external ‘alternative’ data (stakeholder-generated data) in order to measure the gap between what the companies communicate and what is stakeholder perception related to corporate sustainability commitments.
Thematic selections of listed companies : we create Trend-Driven selections oriented towards innovative themes: our data, together with the analysis of financial specialists, contribute to the selection of a set of listed companies related to trending themes such as the Green New Deal, the 5G technology or new medtech applications.
FinScience and DBpedia
As mentioned before, FinScience is strongly grounded in the DBpedia ontology, since we employ Spotlight to perform Named Entity Recognition (NER), namely automatic annotation of entities in a text. The NER task is performed with a two step procedure. The first step consists in annotating the named entity of a text using DBpedia Spotlight. In particular, Spotlight links a mention in the text (that is identified by its name and its context within the text) to the DBpedia entity that maximizes the joint probability of occurrence of both. The model is pre-trained on texts extracted from Wikipedia. Note that each entity is represented by a link to a DBpedia page (see, e.g. http://dbpedia.org/page/Eni ), a DBpedia type indicating the type of the entity according to this ontology and other information.
Another interesting feature of this approach is that we have a one to one mapping of the italian and english entities (and in general any language supported by DBpedia), allowing us to have a unified representation of an entity in the two languages. We are able to obtain this kind of information by exploiting the potential of DBpedia Virtuoso, which allows us to access DBpedia dataset via SPARQL. By identifying the entities mentioned in the online content, we can understand which topics are mentioned and thus identify companies and trends that are spreading in the digital ecosystem as well as analyzing how they are related to each other.
Challenges and next steps
One of the toughest challenges for us is to find an optimal way to update the models used by DBpedia Spotlight. Every day new entities and concepts arise and we are willing to recognise them in the news we analyze. And that is not all. In addition to recognizing new concepts, we need to be able to track an entity through all the updated versions of the model. In this way, we will not only be able to identify entities, but we will also have evidence of when some concepts were first generated. And we will know how they have changed over time, regardless of the names that have been used to identify them.
We are strongly involved in the DBpedia community and we try to contribute with our know-how. Particularly FinScience will contribute on infrastructure and Dockerfiles as well as on finding issues on the new released project (for instance, wikistats-extractor).
A big thank you to FinSciene for presenting their products, challenges and contribution to DBpedia.
DBpediaMember Features – In the coming weeks, we will give DBpedia members the chance to present special products, tools and applications and share them with the community. We will publish several posts in which DBpedia members provide unique insights. This week the Semantic Web Company will present use cases for the PoolParty Semantic Suite. Have fun while reading!
by the Semantic Web Company
About 80 to 90 percent of the information companies generate is extremely diverse and unstructured — stored in text files, e-mails or similar documents, what makes it difficult to search and analyze. Knowledge graphs have become a well-known solution to this problem because they make it possible to extract information from text and link it to other data sources, whether structured or not. However, building a knowledge graph at enterprise scale can be challenging and time-consuming.
PoolParty Semantic Suite is the most complete and secure semantic platform on the global market. It is also the ideal tool to help companies build and manage Enterprise Knowledge Graphs. With PoolParty in place, you will have no problems extracting value from large amounts of heterogeneous data, no matter if it’s stored in a relational database or in text files. The platform provides comprehensive tools for the management of enterprise knowledge graphs along the entire life cycle. Here is a list of the main use cases for the PoolParty Semantic Suite:
Data linking and enrichment
Driven by the Linked Data initiative, increasing amounts of viable data sets about various topics have been published on the Semantic Web. PoolParty allows users to use these online resources, amongst them DBPedia, to easily and quickly enrich a thesaurus with more data.
Search and recommender engines
Arrive at enriched and in-depth search results that provide relevant facts and contextualized answers to your specific questions, rather than a broad search result with many (ir)relevant documents and messages – but no valuable input. PoolParty Semantic Suite can be used to implement semantic search and recommendations that are relevant to your users.
Text Mining and Auto Tagging
Manually tagging an entire database is very time-consuming and often leads to inconsistent search results. PoolParty’s graph-based text mining can improve this process making it faster, consistent and precise. This is achieved by using advanced text mining algorithms and Natural Language Processing to automatically extract relevant entities, terms and other metadata from text and documents, helping drive in-depth text analytics.
Data Integration and Data Fabric
The Semantic Data Fabric is a new solution to data silos that combines the best-of-breed technologies, data catalogs and knowledge graphs, based on Semantic AI. With a semantic data fabric, companies can combine text and documents (unstructured) with data residing in relational databases and data warehouses (structured) to create a comprehensive view of their customers, employees, products, and other vital areas of business.
Taxonomies, Ontologies and Knowledge Graphs That Scale
With release 8.0 of the PoolParty Semantic Suite, users have even more options to conveniently generate, edit, and use knowledge graphs. In addition, the powerful and performant GraphDB by Ontotext has been added as PoolParty’s recommended embedded store and it is shipped as an add-on module. GraphDB is an enterprise-level graph database with state-of-the-art performance, scalability and security. This provides greater robustness to PoolParty and allows you to work with much larger taxonomies effectively.
A big thank you to the Semantic Web Company presenting use cases for the PoolParty Semantic Suite.
DBpedia Member Features – In the coming weeks, we will give DBpedia members the chance to present special products, tools and applications and share them with the community. We will publish several posts in which DBpedia members provide unique insights. This week TerminusDB will show you how to use TerminusDB’s unique collaborative features to access DBpedia data. Have fun while reading!
by Luke Feeney from TerminusDB
This post introduces TerminusDB as a member of the DBpedia Association – proudly supporting the important work of DBpedia. It will also show you how to use TerminusDB’s unique collaborative features to access DBpedia data.
TerminusDB – an Open Source Knowledge Graph
TerminusDB is an open-source knowledge graph database that provides reliable, private & efficient revision control & collaboration. If you want to collaborate with colleagues or build data-intensive applications, nothing will make you more productive.
TerminusDB provides the full suite of revision control features and TerminusHub allows users to manage access to databases and collaboratively work on shared resources.
data storage, sharing, and versioning capabilities
for your team or integrated in your app
locally then sync when you push your changes
querying, cleaning, and visualization
powerful version control and collaboration for your enterprise and
The TerminusDB project originated in Trinity College Dublin in Ireland in 2015. From its earliest origins, TerminusDB worked with DBpedia through the ALIGNED project, which was a research project funded by Horizon 2020 that focused on building quality-centric software for data management.
While working on this project and especially our work building the architecture behind Seshat: The Global History Databank, we needed a solution that could enable collaboration among a highly distributed team on a shared database whose primary function was the curation of high-quality datasets with a very rich structure. While the scale of data was not particularly large, the complexity was extremely high. Unfortunately, the linked-data and RDF toolchains was severely lacking – we evaluated several tools in an attempt to architect a solution; however, in the end we were forced to build an end-to-end ourselves.
Evolution of TerminusDB
In general, we think that computers are fantastic things because they allow you to leverage much more evidence when making decisions than would otherwise be possible. It is possible to write computer programs that automate the ingestion and analysis of unimaginably large quantities of data.
If the data is well chosen, it is almost always the case that computational analysis reveals new and surprising insights simply because it incorporates more evidence than could possibly be captured by a human brain. And because the universe is chaotic and there are combinatorial explosions of possibilities all over the place, evidence is always better than intuition when seeking insight.
As anybody who has grappled with computers and large quantities of data will know, it’s not as simple as that. Computers should be able to do most of this for us. It makes no sense that we are still writing the same simple and tedious data validation and transformation programs over and over ad infinitum. There must be a better way.
This is the problem that we set out to solve with TerminusDB. We identified two indispensable characteristics that were lacking in data management tools:
rich and universally machine-interpretable modelling language. If we
want computers to be able to transform data between different
representations automatically, they need to be able to describe
their data models to one another.
revision control. Revision control technologies have been
instrumental in turning software production from a craft to an
engineering discipline because they make collaboration and
coordination between large groups much more fault tolerant. The need
for such capabilities is obvious when dealing with data – where the
existence of multiple versions of the same underlying dataset is
almost ubiquitous and with only the most primitive tool support.
TerminusDB and DBpedia
Team TerminusDB took part in the DBpedia Autumn Hackathon 2020. As you know, DBpedia is an extract of the structured data from Wikipedia.
You can read all about our DBpedia Autumn Hackathon adventures in this blog post.
Unlike many systems in the graph database world, TerminusDB is committed to open source. We believe in the principals of open source, open data and open science. We welcome all those data people that want to contribute to the general good of the world. This is very much in alignment with the DBpedia Association and community.
DBpedia on TerminusHub
is the collaborative point between TerminusDBs. You can push data to
you colleagues and collaborators, you can pull updates (efficiently –
just the diffs) and you can clone databases that are made available
on the Hub (by the TerminusDB team or by others). Think of it as
GitHub, but for data.
The DBpedia database is available on TerminusHub. You can clone the full DB in a couple of minutes (depending on your internet connection of course) and get querying. TerminusDB uses succinct data structures to compress everything so it makes sharing large database feasible – more technical detail here: https://github.com/terminusdb/terminusdb/blob/dev/docs/whitepaper/terminusdb.pdf for interested parties.
TerminusDB in the DBpedia Association
We will contribute to DBpedia by working to improve the quality of data available, by introducing new datasets that can be integrated with DBpedia, and by participating fully in the community.
We are looking forward to a bright future together.
A big thank you to Luke and TerminusDB presenting how TerminusDB works and how they would like to work with DBpedia in the future.
DBpedia Member Features – In the coming weeks, we will give DBpedia members the chance to present special products, tools and applications and share them with the community. We will publish several posts in which DBpedia members provide unique insights. This week GNOSS will give an overview of their products and business focus. Have fun while reading!
by Irene Martínez and Susana López from GNOSS
GNOSS (https://www.gnoss.com/) is a Spanish technology manufacturing company that has developed its own platform for the construction and exploitation of knowledge graphs. GNOSS technology operates within the framework of the set of technologies that concur in the Artificial Intelligence Program semantically interpreted: NLU (Natural Language Understanding); identification, extraction, disambiguation and linking of entities; as well as the construction of interrogation and knowledge discovery systems based on inferences and on systems that emulate the forms of natural reasoning.
How is our business focus
The GNOSS project is positioned in the emerging market for Deep AI (Deep Understanding AI). By Deep AI we mean the convergence of symbolic AI and sub-symbolic AI.
GNOSS is the leading company in Spain in the construction of solutions aimed at the construction of knowledge ecosystems interpretable and queryable (interrogable) by machines and people, which integrate heterogeneous and distributed data represented by technical vocabularies and ontologies written in programming languages (OWL-RDF ) interpretable by machines, which are consolidated and exploited through knowledge graphs
The technology developed by GNOSS facilitates the construction, within the framework of the aforementioned ecosystems, of intelligent interrogation and search systems, information enrichment and context generation systems, advanced recommendation systems, predictive Business Intelligence systems based on dynamic visualizations and NLP/NLU systems.
GNOSS works in the cloud and is offered as a service. We have a complex and robust technological infrastructure designed to compute intelligent data in a framework that offers the maximum guarantee of security and best practices in technology services.
Products and Solutions
GNOSS Knowledge Graph Builder is a development platform upon which third parties can deploy their web projects, with a complete suite of components to build Knowledge Graphs and deploy an intelligent web semantically aware in record time. The platform enables the interrogation of a Knowledge Graph by both machines and people. The main modules of the platform are 1) Metadata and Knowledge Graph Construction and Management; 2)Discovery, reasoning and analysis through Knowledge Graphs; 3) Semantic Content Management. It also includes some configurable characteristics and functions for fast, agile and flexible adaptation and evolution of intelligent digital ecosystems
Thanks to GNOSS Knowledge Graph Builder and GNOSS Sherlock Services, we have developed a suite of transversal solutions and some sectorial solutions based on the creation and exploitation of Knowledge Graphs.
The transversal solutions are: GNOSS Metadata Management Solution (for the integration of heterogeneous and distributed information into semantic data layer consolidating information into a knowledge graph), GNOSS Sherlock NLP-NLU Service (Intelligent software services for machines to understand us, based on natural language processing and on entity recognition and linking; and dynamic graphic visualizations), GNOSS Search Cloud (which includes intelligent information search, interrogation and retrieval systems; inferences; recommendations and generation of significant contexts), GNOSS Semantic BI&Analytics (expressive and dynamic Business Intelligence based on Smart Data).
We have developed sectorial solutions in Education and University, Tourism, Culture and Museums, Healthcare, Communication and MK, Banking, Public Administration; Catalogs and support to supply chain.
What significance does DBpedia for us
We think that the foundations for the construction of the great European Project of Symbolic AI are being created thanks to DBpedia and other Linked Open Data projects, by turning the internet into a Universal Knowledge Base, which works according to the principles and standards of Linked Open Data and Semantic Web. This knowledge base, as the brain of the internet, would be the basis of the IA of the future. In this context, we consider that DBpedia plays a central role as an open general knowledge base and, therefore, as the core of the European Project of Symbolic AI.
Currently, some projects developed with GNOSS platform are already using DBpedia to access a large amount of structured and ontologically-represented information, in order to link entities, enrich information and offer contextual information. Two examples of this are the ‘Augmented Reading’ of Museo del Prado in the descriptions of the artworks of the Museum Prado, and the Graph of related entities in Didactalia.net.
The ‘Augmented Reading’ of Museo del Prado in the descriptions of the artworks of the Museum (see for instance ‘The Family of Carlos IV’, by Francisco de Goya) recognizes and extracts the entities contained in them, thereby providing additional and contextual information about them, so that anyone who can read them without giving up understanding them in depth.
In Didactalia.net, for a given educational resource, its Graph of related entities works as a conceptual map of the resource to support the teacher and the student in the teaching-learning process (see for instance this resource about Descartes).
How do we envision our future work with DBpedia
GNOSS can contribute to DBpedia at different levels, from making suggestions for further development to participating in strategy and commercialization.
We could collaborate with DBpedia contributing to tests of the releases of DBpedia and giving our feedback of the use of DBpedia in projects applied to public and private organizations developed with GNOSS. Based on this, we could make suggestions for future work considering our experience and customer needs in this context.
We could participate in the strategy and commercialization, in order to gain more presence in sectors in which we work, such as healthcare, education, culture or communication, and to achieve that the private companies can appreciate and benefit from the great value that DBpedia can offer them.
A big thank you to GNOSS for presenting their product and envisioning how they would like to work with DBpedia in the future.
DBpedia Member Features – In the coming weeks we will give DBpedia members the chance to present special products, tools and applications and share them with the community. We will publish several posts in which DBpedia members provide unique insights. Ontotext will start with the GraphDB database. Have fun while reading!
by Milen Yankulov from Ontotext
GraphDB is a family of highly efficient, robust, and scalable RDF databases. It streamlines the load and use of linked data cloud datasets, as well as your own resources. For easy use and compatibility with the industry standards, GraphDB implements the RDF4J framework interfaces, the W3C SPARQL Protocol specification, and supports all RDF serialization formats. The database offers open source API and it is the preferred choice of both small independent developers and big enterprise organizations because of its community and commercial support, as well as excellent enterprise features such as cluster support and integration with external high-performance search applications – Lucene, Solr, and Elasticsearch. GraphDB is build 100% on Java in order to be OS Platform independent.
GraphDB is one of the few triplestores that can perform semantic inferencing at scale, allowing users to derive new semantic facts from existing facts. It handles massive loads, queries, and inferencing in real-time.
Workbench is the GraphDB web-based administration tool. The user interface is similar to the RDF4J Workbench Web Application, but with more functionality.
The GraphDB Workbench REST API can be used for managing locations and repositories programmatically, as well as managing a GraphDB cluster. It includes connecting to remote GraphDB instances (locations), activating a location, and different ways for creating a repository.
It includes also connecting workers to masters, connecting masters to each other, as well monitoring the state of a cluster.
GraphQL access via Ontotext Platform 3
GraphDB enables Knowledge Graph access and updates via GraphQL. GraphDB is extended to support the efficient processing of GraphQL queries and mutations to avoid the N+1 translation of nested objects to SPARQL queries.
Ontotext offers three editions of GraphDB: Free, Standard, and Enterprise.
Free – commercial, file-based, sameAs & query optimizations, scales to tens of billions of RDF statements on a single server with a limit of two concurrent queries.
Standard Edition (SE) – commercial, file-based, sameAs & query optimizations, scales to tens of billions of RDF statements on a single server and an unlimited number of concurrent queries.
Enterprise Edition (EE) – high-availability cluster with worker and master database implementation for resilience and high-performance parallel query answering.
Why GraphDB is preferred choice of many data architects and data ops?
1. High Availability Cluster Architecture
GraphDB offers you a high-performance cluster proven to scale in production environments. It supports
(1) coordinating all read and write operations,
(2) ensuring that all worker nodes are synchronized,
(3) propagating updates (insert and delete tasks) across all workers and checking updates for inconsistencies,
(4) load balancing read requests between all available worker nodes
failover, dynamic configuration
Improved query bandwidth
larger cluster means more queries per unit time
Deployable across multiple data centres
Elastic scaling in cloud environments
Integration with search engines
Cluster Management and Monitoring
(1) automatic cluster reconfiguration in the event of failure of one or more worker nodes,
(2) a smart client supporting multiple endpoints.
2. Easy Setup
GraphDB is 100% Java based in order to be Platform Independent. It is available through Native Installation Packages or Open Maven. It supports also Puppet and could be Dockerized. GraphDB is Cloud agnostic – It could be deployd on AWS, Azure, Google Cloud, etc.
Based on the Edition you are using you could use the Community Support (StackOverFlow monitoring)
Ontotext has its Dedicated Support Team tha could assist through Customized Runbooks, Easy Slack communication, Jira Issue-Tracking System
A big thank you to Ontotext for providing some insights into their product and database.