Why do LLMs sound smart but often get it wrong? A decades-old knowledge system offers a blueprint for truly dependable AI.

A Tale of Two AIs: From Generative Flaws to Trustworthy Systems — Credit: Tesfu Assefa

While today’s large language models (LLMs) dazzle with fluent prose and clever mimicry, they remain unreliable when applied to domains requiring precise, consistent, and explainable reasoning. This is why Generative AI captivates but falters in reliability.

Introduction

It is true that LLMs have mastered most languages from the developed world (such as English and Spanish) while critically lagging in languages from less developed regions. Undoubtedly, they have revolutionized how machines generate human-like text. However, limitations in accuracy and reliability hinder their application in high-stakes domains even if the language of communication is still that of the developed world.

Despite their fluency, LLMs often fabricate facts (Generative AI's hallucination) and produce inconsistent results, largely because they are trained to mimic language patterns rather than to reason based on structured knowledge.

This article explores the flaws of current generative models, the principles for building dependable AI, and the transformative potential of knowledge-driven systems by drawing on a recently published research paper titled: 'Getting From Generative AI to Trustworthy AI: What LLMs Might Learn From Cyc'. In this paper, Doug Lenat and Gary Marcus propose a new approach: integrating explicit symbolic knowledge and logic into AI systems to build trustworthy, transparent, and consistent models. Drawing on insights from the Cyc project—a decades-long effort to encode human knowledge in symbolic form—they outline a roadmap for developing AI systems that don't just sound right but are right.

The Limits of Generative AI

Plausibility Over Truth

Current LLMs, such as GPT and Claude, operate by predicting the next most likely word based on vast amounts of text data. This method makes them remarkably fluent but fundamentally unreliable. To achieve fluency, they are optimized to fill gaps with fabricated facts and shaky reasoning. Moreover, their outputs vary—with slight or significant changes—depending on user prompts.

What is supposed to be their strength—their reliance on statistical patterns—has also become their Achilles' heel. As a result, they lack structured knowledge of 'Truth', making them prone to errors in complex or ambiguous scenarios.

They do not "understand" the truth content of their outputs. As Lenat and Marcus emphasize, LLMs are not grounded in a consistent model of the world. They cannot distinguish facts from plausible fictions, leading to erratic behaviors that vary by prompt, temperature setting, or even punctuation.

The Trustworthiness Gap

Trustworthy AI is the one that can master the truth of the content which requires qualities like consistency, transparency, and the ability to reason robustly. The authors of Getting from Generative AI to Trustworthy AI: What LLMs might learn from Cyc argue that for AI to be trustworthy, it must meet sixteen essential criteria—ranging from auditability and explainability to commonsense reasoning and ethical alignment. Most LLMs fall short of these attributes. Their inner workings are opaque; they lack persistent memory, and their reasoning processes are not easily interpretable or consistent. These gaps pose serious risks when deploying AI in critical areas like medicine, finance, or policy-making.

A Knowledge-Based Alternative

To overcome LLMs’ weaknesses, the authors advocate for systems that use structured knowledge bases—collections of facts and rules about the world. Unlike LLMs’ pattern-based learning, their approach aims to ensure outputs are grounded in verifiable information, enhancing accuracy and consistency.

Why Cyc Still Matters

For nearly forty years, the Cyc project has aimed to encode broad, commonsense knowledge in a formal, logical language. While its symbolic approach has often been overshadowed by data-driven deep learning, Lenat and Marcus argue that Cyc offers essential lessons. Unlike LLMs, Cyc supports traceable inference chains, uses structured representations, and can explicitly reason about cause and effect. These capabilities are vital for AI systems meant to be auditable and logically robust.

Logical Inference Over Pattern Matching

The heart of their proposal lies in combining large-scale symbolic reasoning with language capabilities. Rather than generating output based on surface-level pattern recognition, an AI system grounded in symbolic logic can apply deductive and inductive reasoning, assess conflicting evidence, and adapt to new contexts in a principled way. This allows for far greater consistency, especially in domains that require chains of logic or mathematical rigor.

Structured Knowledge Representation

A central tenet of the paper is that knowledge should not just be latent within a neural network but explicitly represented. Cyc, for instance, stores facts in logic-based triples (subject-predicate-object) and uses a rich ontology to model relationships between concepts. This structure enables the system to validate inferences, identify inconsistencies, and explain its conclusions—capabilities largely absent from today’s LLMs.

Building a Path to Trustworthy AI

What Makes AI Trustworthy?

According to the authors, a trustworthy AI must do more than generate coherent text. It should:

Apply valid, reproducible reasoning.
Distinguish between fact and conjecture.
Incorporate long-term memory of facts and context.
Adapt behavior based on moral and ethical considerations.
Offer explanations for its outputs that can be traced back to explicit inputs.

These features align closely with the capabilities built into Cyc and are largely absent in neural LLMs.

Bridging the Gap: A Hybrid Model

Lenat and Marcus suggest that the path forward lies in a hybrid system—one that marries the language fluency of LLMs with the structured reasoning of symbolic AI. In this model, an LLM might generate candidate answers, but a symbolic reasoner would vet them against known facts and logical constraints. Alternatively, the symbolic engine might generate queries or infer missing premises that the LLM can then elaborate. This division of labor could offer the best of both worlds: flexibility and reliability.

Toward a New Evaluation Paradigm

The authors also critique the current evaluation standards for AI, which often rely on benchmark scores rather than deeper tests of reasoning and consistency. They call for new metrics that assess:

Logical soundness
Factual grounding
Internal consistency
Adherence to ethical norms

Such metrics would better reflect the goals of trustworthy AI and help identify systems suitable for critical deployments.

Challenges and Future Directions

Scaling Symbolic Systems

A frequent criticism of symbolic AI is its limited scalability. Building and maintaining a knowledge base like Cyc is resource-intensive. However, the authors argue that new tools for automated knowledge extraction, combined with collaborative editing, could make this process more scalable. They also propose that LLMs could help identify gaps in knowledge bases by generating counterfactuals or proposing novel inferences.

Integrating Commonsense and Context

LLMs frequently fail at commonsense reasoning because they lack an embedded model of the world. A symbolic system, on the other hand, can encode everyday knowledge explicitly: "if a person drops a glass, it usually breaks." Integrating this kind of knowledge enables systems to make better inferences and avoid absurd mistakes.

Human-in-the-Loop Reasoning

Another important proposal is to keep humans involved in the reasoning process. Trustworthy AI should not be fully autonomous (at least while it is still in its pre-AGI stage); rather, it should support transparent collaboration with human users, offering explanations, asking clarifying questions, and adapting to user feedback.

Challenges

Keeping humans meaningfully engaged in generative AI’s reasoning process is difficult because these systems operate at speeds and scales that outpace human oversight, while their opaque decision-making (e.g., black-box hallucinations) frustrates transparent collaboration—the very foundation of trustworthy AI.

Conclusion

The paper by Lenat and Marcus presents both a critique and a roadmap. While generative AI dazzles with linguistic fluency, it falters on the very qualities that matter in high-stakes domains: truth, consistency, and accountability. By returning to the principles of symbolic AI, particularly as demonstrated in the Cyc project, the authors offer a compelling vision of what trustworthy AI might look like.

Rather than treating symbolic reasoning and neural networks as incompatible paradigms, their work suggests that the future lies in integration. Trustworthy AI will not be built on probability alone but on systems that can explain, justify, and improve their reasoning over time. In this emerging era, fluency will be necessary—but trust will be earned through logic.

References

Lenat, Doug, and Gary Marcus. “Getting From Generative AI to Trustworthy AI: What LLMs Might Learn From Cyc.” arXiv.org, July 31, 2023. https://arxiv.org/abs/2308.04445.

#AutomatedReasoning

#Domain-specific

#ExplainableAI(XAI)

#Hallucination

#ModelAccuracy

#ReasoningLLMs

#SymbolicAI

#TransparentAI

About the Writer

BN

Bisrat Negus

2.79169 MPXR

I'm an Electrical and Computer Engineer with a growing interest in data structures and algorithms. Lately, I’ve been focused on sharpening my problem-solving skills and building a solid base in computational thinking. I enjoy working with others—sharing ideas, tackling projects together, and learning through collaboration.

About the Editor

Em

Emrakeb

9.77001 MPXR

Call me Emrakeb, I am the 'AI Ethics' Team Lead at iCog Labs—where law meets tech. I dig deep into the ethical side of AI, questioning how it shapes society. Passionate about responsible innovation, I push for AI that’s fair, transparent, and built with people in mind.

A Tale of Two AIs: From Generative Flaws to Trustworthy Systems