Knowledge Graphs — What, Why, and How

8 min readDec 19, 2022

--

What is Knowledge?

To understand the application of the Knowledge Graph, the first thing we need more clarity on is ‘Knowledge’.

A unit of knowledge can be defined as a piece of information that allows users to reach an outcome when confronted with specific questions. Knowledge in the real world can be classified into three high-level categories:

Situational Knowledge: Changes based on events, situations, or circumstances
Layered Knowledge: Spans various layers through associations and relations
Evolving Knowledge: Changes context and meaning based on new information

Different types of knowledge are interchanged across people, processes, and tools. The job of the knowledge graph is to ensure the interchange is manageable at scale, is uncorrupted, and is easily discoverable.

Image Representation: How a Knowledge Graph transforms raw data into Knowledge (Source)

What is a Knowledge Graph?

Now that we have a specific understanding of Knowledge, let’s dive into the nuances of a Knowledge Graph.

A knowledge graph is a semantic web of entities, relationships, and events. More fundamentally, it is a directed graph where every element is populated with rich information regarding itself and its relationships with other elements.

The Knowledge Graph is tasked with surfacing up-to-date and related information to users based on their specific requirements around data sourced from multiple sources.

A knowledge graph is a data model for metadata that allows users to explore relationships and identify top datasets relevant to their current query.
A knowledge graph interweaves multiple data assets, sources, services, targets, and users to enable logical connections that give meaning to the data.
It activates dormant or siloed data by connecting it to the vast network of the data ecosystem, allowing users and machines to start invoking and leveraging vast volumes of data that were previously meaningless due to missing semantics.

🔑 Every data problem is a knowledge transfer problem, and every knowledge transfer problem can be formalized as a graph. Therefore, every data problem can be formalized as a graph. ~ Stephen Bailey

A Knowledge Layer contains information across the three primary layers: Data, Process, and People. This includes information about data lineage, provenance, and governance. But these are not enough. The Knowledge Graph is expected to capture information even around the dynamics of every data asset with the data ecosystem as a whole.

Glossary

Semantics: Adding semantics means adding context and meaning to arbitrary information.
Entity: A real-world object, unit, or idea that is self-sustaining. An entity can talk to other entities via relationships or associations to execute targeted tasks.
Data Model: A data model is a visual representation of an organization’s or team’s data components and the relations between them.
Dormant data (here): Data that cannot be leveraged by users or data applications because they have lost contextual connections to the core data model. Dormant data, be it of good or poor quality, ends up eating the organization’s resources (time and storage).

Why does a Knowledge Graph matter?

The end objective of a Knowledge graph is to operationalize knowledge and make it available to users when they feed specific questions to the graph. The outcome of these questions powers integration, data recycling, and analytics.

The Knowledge Graph ideology was popularized by Google in 2012 when they publicly attributed their search solution to Knowledge Graphs. Google defined its Knowledge Graph to serve the following objectives:

Discoverability: Make it easy for users to navigate billions of data points to discover specific Knowledge
Knowledge Creation: Offer new or unexpected Knowledge to users through new connections or related results. Users are not looking for it, but it adds value to what they are looking for.
Distinguishability: Intuitive search capability that understands the context around which the user is searching and presents results accordingly. For example, searching ‘Apple’ should present the Apple Company or the Fruit.
Speed: Surface relevant information within milliseconds.

📈 88% CXOs believe knowledge graphs will significantly improve the bottom line ~ Pulse Survey, 2020

Other use cases that the Knowledge Graph ended up serving

Operationalized data

Google built applications on top of their Knowledge Graph to add an additional layer for Insights. For example: When a user searches “restaurants near me,” it doesn’t just surface the specific detail (restaurant names) the user searched for. It also brings up review data, ratings, directions, and a plethora of well-curated insights that the user can instantly process to choose within seconds.

Relevance

If billions of data points are produced for a specific object, and a user searches that object, which data point should surface? This is solved through the Knowledge Graph’s capability to include peer insights (peer here means peer data assets). SEO or Search Engine Optimization is the method to curate and surface information that, on a high level, has:

The right set of keywords (or tags)
Several data points that point to that data point as a relevant source of information (backlinks- peer validation)

Combating Silos with Data Unification

A knowledge graph is able to pool data from across siloed data sources. In Google’s case, it would mean data stored across various websites, clouds, servers, and geographies. Due to this capability, Knowledge Graphs can power Data Fabric and Data Mesh use cases.

How does a Knowledge Graph work?

The semantic meaning added by Knowledge Graphs is written formally to eliminate ambiguity, make it digestible for both users and machines, and enable automated reasoning to contemplate inferred reasoning.

Every description is a whole and a partial description

In a knowledge graph, the description attributed to any entity or relationship also partially encompasses descriptions for related entities, which is how the big picture of a web-like structure develops.

This is, in fact, a key attribute of Knowledge Graphs- descriptions for each component partially describe other components. For example, while describing the entity ‘Cat’ as a mammal that hunts rats, the description of the entity ‘Rat’ and ‘Mammal’ gets partially defined: ‘Rat’ eaten by ‘Cat’ & ‘Mammal’ contains ‘Cat’.

Ontology: A Contract

Formal semantics is the process of defining meaning and context for objects through formal computational and logical tools. A Knowledge Graph can be achieved through formal semantics and ontology is the foundation of formal semantics.

Ontology is the classification and explanation of entities and their structure. It ensures both developers and users of the knowledge graph have a shared understanding of data. In other words, ontology serves as the contract that brings a consensus around the meaning of the data between users and creators of the knowledge graph. This objective is achieved through tools such as classes, categories, relationships, or even human-friendly textual descriptions.

Ontology & Taxonomy: The Difference

While taxonomies are a way to define hierarchical structures or relationships, Ontology goes a step further to add richer information to the data. Ontology is a superset of Taxonomy and can define interrelationships between the entities in the taxonomy. Therefore, an ontology can contain multiple taxonomies.

Resource Description Framework (RDF)

RDF is a type of data model that enables users to run CRUD operations on the data without affecting the physical data. It is a standard framework for interchanging highly interconnected data. Through RDF, users can unify or integrate data from various sources while detaching the original data and run queries on the entire global data instead of querying scattered data instances.

RDFs enable Knowledge Graphs to entail the attributes of multiple data management models:

Databases: Allows queried search across all data assets pulled in from various sources as if the query is on one global data asset.
Graphs: Allows analysis on a networked structure.
Knowledge Bases: Allows interpretation of context around every data asset and inference of new facts through formal semantics.

Nodes. Edges. Properties.

On a fundamental level, a knowledge graph has three structural elements: Nodes, Edges, and Labels. Nodes are logical representations of real-world entities, edges are directed logical representations of the connections between these entities, and Properties are logical descriptions or features of the Nodes and Edges.

The real-world entities could be a data asset, a concept, a service, or a user, while relationships could define hierarchical associations (’subset of’), locations (’contains’), definitions (’is a’), etc.

Classes and categories are represented through Nodes, Relationships are represented through Edges, and all of them including textual descriptions can be represented through Properties.

Components of a Knowledge Graph

Datasets: A knowledge graph pulls in data from various sources and these datasets tend to frequently change structures and relationships with other data assets.
Schemas: Schemas are a structural representation or a framework of the Knowledge Graph. Models such as FIBO, Brick, and others found on schema.org can serve as great reference structures to get started.
Identities or Tags: Identities define and classify nodes in the Knowledge Graph.
Metadata and Context: Context defines the setting in which the Knowledge exists and is powered through metadata that serves information about and around a data asset.

AI-Augmented Knowledge Graphs

Natural Language Processing or NLP is used to augment Knowledge Graphs for semantic enrichment where tags, descriptions, and context are improved through AI.

Imagine the knowledge graph like a digital brain. Every time a human brain learns something new, a neuron connection is developed to retain that information. This neuron connection is triggered when a similar situation arises where that knowledge could be applied.

With AI-Augmentation, it is possible to develop such connections at the scale every time AI is able to detect new patterns or new relationships between entities. This discovery gets wired into the knowledge graph and starts adding to subsequent queries or knowledge formation.

AI-identified relationships enable:

Identifying and forming relationships between similar data assets
Automated Question-Answering systems
Discovering and adding new information or context around data assets
Graph growth at scale