Foundational Data Layer

A conversation about solving hard problems in the near future

Gabriele Volpato

and

Enrico Fallacara

Feb 05, 2025

A brief introduction

The journey of artificial intelligence mirrors humanity's path of knowledge acquisition. Just as humans build understanding through diverse sources - from formal education and books to personal experiences and social interactions - AI systems are fundamentally shaped by their "knowledge diet." While humans naturally integrate information from multiple sources, creating rich mental models that connect disparate pieces of knowledge, AI systems face a more structured challenge in making sense of their data.

Think of a child learning about animals. They don't just memorize facts from a textbook - they combine stories from parents, observations at the zoo, nature documentaries, and direct experiences with pets. This multi-faceted learning creates a deep, interconnected understanding. Similarly, modern AI models require vast amounts of data and thoughtfully structured information that captures relationships and context. This data's quality, organization, and interconnectedness often matter more than sheer quantity.

EQT published, “Knowledge Graph(s) and LLM-based ontologies have a very good shot at unlocking GenAI in production”, I had the opportunity to chat with Julien, one of the authors and Partner at EQT.

In the article, Julien and his team proposed five unresolved questions that my CTO/ co-founder (Y) and I discussed after reading this article. Now imagine us, phone on hand, while hiking on Saturday early morning in -1°C weather. This is what came out:

While Walking

1st EQT’s question: “What are the limitations of abstracting the ontology with a GraphRAG-like approach? If the winning approach (assuming there is one) includes humans in the loop for ontology definition, what is the right way to include domain knowledge?”

Precision is what models should aim for; truthfulness is what we aim for.

Abstracting ontologies using a GraphRAG-like approach can be very challenging for three main reasons:

Incomplete Data,
Reasoning Difficulties in Complex Tasks,
Maintenance and Scalability;

To tackle this problem in a practical use case, I believe the best approach is to address one issue at a time using both a scientific and a business perspective.

The problem should be studied based on the state-of-the-art literature while incorporating human expertise specific to the problem.

Literature provides a solid foundation of knowledge, but the world is constantly changing. Our actions generate information before the literature can capture it. Whether this information is right or wrong, it represents a human contribution to ontology construction. Including humans in the loop helps build the fundamental reasoning base for models and data.

Focusing on a smaller problem allows for a more structured approach, enabling models to reason more effectively and improve performance and inference.

This also impacts the maintenance and scalability of the entire system.

This approach relies on both theoretical and practical knowledge, where human involvement plays a fundamental role.

Solving one issue at a time helps develop a deeper understanding of the overall system. However, the ontology must remain focused - it should only include knowledge relevant to the system. This means different ontologies may be needed for different problems.

Combining ontologies can be a viable solution, but only if the problems are truly related. Otherwise, we risk losing sight of our goal.

2nd EQT’s question: “What is the right starting point from a market perspective, horizontal or vertical?”

Well, given the constraints of the model and the product we are developing, the right starting point is vertical. Or, at the very least, we should focus on a specific knowledge domain and then expand vertically within the industry and horizontally within different segments.

For example, if beginning by building an ontology for the healthcare system, you can later extend its knowledge to manufacturers or advertisers. Similarly, if you start with retail, you can first focus on a specific category and then adapt that knowledge for advertisers and manufacturers, those responsible for creating the products sold on shelves.

A vertical approach offers several key advantages:

1. Clear Market Fit – By targeting a specific industry (e.g., healthcare, finance, manufacturing), an ontology can provide immediate and tangible value by structuring domain-specific data.

2. Easier Adoption – Industry-specific ontologies directly address known pain points, making it easier to secure early adopters.

3. Faster Time-to-Value – Enterprises in specialized verticals are more willing to invest in solutions tailored to their domain, leading to quicker ROI.

4. Domain-Specific Standards – Many industries already have established ontological frameworks, such as SNOMED CT for healthcare and FIBO for finance, making it easier to align with existing ecosystems. But are they AI-ready?

Given the interconnected nature of the world we live in, the interesting aspect of this approach lies in how we can connect these knowledge graphs and understand how they interact with each other—maybe to analyze the true meaning behind the “Butterfly Effect.”

3rd EQT’s question: How will the market actually split across the use cases that work best with RelationalRAG, GraphRAG (without ontology), and multiple ontology-based knowledge graphs?

The market will be segmented based on the complexity of relationships and the interpretability of the connections. Some approaches work better in specific scenarios:

RelationalRAG is suited for structured and well-defined relationships, like finance or tax compliance, where data follows strict rules. But not everything structured is static- some industries evolve while still keeping relational logic.

GraphRAG (without ontology) shines in dynamic and less-defined relationships, where the focus is on connections rather than their meaning. Think about cybersecurity, where identifying unusual patterns is more important than deeply understanding why they exist.

Ontology-based knowledge graphs work in high-stakes, knowledge-intensive fields, where structure and reasoning are crucial. This applies to healthcare, law, and industry-specific research, where accuracy and logical consistency are required.

Merging these approaches depends on the problem - sometimes combining ontologies makes sense, but only when relationships are deeply connected. Otherwise, forcing structure where it isn’t needed might hurt performance rather than improve it.

It’s not certain that the market will split based purely on technical reasons, but these approaches tend to offer the best accuracy and interpretability in their respective areas.

4th EQT’s question: Assuming building multiple knowledge graphs is the right approach - how do we know and optimize the links and limits between them?

This is a tough question and a key challenge. The optimal design of inter-graph connections requires a systematic validation process. I believe the primary approach should be starting with industries that have clear, established relationships, such as payment systems and retail, where data flows and dependencies are already well documented. This foundation allows us to validate a linking methodology with known patterns before expanding to less obvious connections.

The key to optimizing these links lies in three critical metrics: connection density between graphs, real-world usage patterns, and performance analytics. We should measure how frequently different graphs interact, track which connections are most actively used in production, and analyze the performance impact of various linking strategies. When graphs have too many connections, it can lead to unnecessary complexity and maintenance overhead. Conversely, too few connections might miss valuable insights.

To establish effective boundaries, each graph must maintain a clearly defined scope aligned with specific industry domains.

5th EQT Question: Some cases require mixing public and private data. Which layer will capture where the “magic” happens?

The real “magic” happens at the intersection of semantic reasoning and governance. Structured ontologies enable meaningful connections while maintaining privacy, accuracy, and adaptability.

This layer ensures AI does not merely aggregate data but understands its context—distinguishing between relevant and misleading relationships. By embedding logical reasoning mechanisms, AI can infer new insights while balancing automation with human oversight.

At the same time, governance structures must regulate data visibility, ensuring sensitive private information remains protected while still contributing to broader knowledge frameworks. Ethical considerations such as bias mitigation, transparency, and data ownership will be crucial in shaping how these systems evolve.

Mission at One Ring Labs

All this chatting for what? Well, this article has played a crucial role in laying down a shared approach to problems we will face shortly at One Ring Labs, where we are creating data systems that give AI models deeper contextual understanding, similar to how Scale AI improved computer vision through high-quality training data.

Today's AI challenge isn't about quantity of data, but rather its quality and organization. While models have access to vast amounts of information, they lack the structured frameworks needed to truly understand relationships and context. Our solution is to develop specialized knowledge graphs and ontologies that help AI systems reason about information rather than simply retrieve it.

We start by focusing deeply on specific industries where well-structured knowledge can have immediate impact. Once we prove our approach in these focused areas, we expand to connect related domains. This measured expansion combines technical innovation with domain expertise, ensuring our systems remain practical and effective as they grow.

Our core belief is that valuable insights lie in the hidden connections between different pieces of information. By helping AI systems understand and navigate these relationships, we're building tools that can handle increasing real-world complexity. We see the ability to uncover and utilize these hidden connections as crucial to the future of AI, and we're focused on creating the fundamental data systems to make this possible.

A guest post by