While today’s large language models (LLMs) dazzle with fluent prose and clever mimicry, they remain unreliable when applied to domains requiring precise, consistent, and explainable reasoning. This is why Generative AI captivates but falters in reliability.
Introduction
It is true that LLMs have mastered most languages from the developed world (such as English and Spanish) while critically lagging in languages from less developed regions. Undoubtedly, they have revolutionized how machines generate human-like text. However, limitations in accuracy and reliability hinder their application in high-stakes domains even if the language of communication is still that of the developed world.
Despite their fluency, LLMs often fabricate facts (Generative AI's hallucination) and produce inconsistent results, largely because they are trained to mimic language patterns rather than to reason based on structured knowledge.
This article explores the flaws of current generative models, the principles for building dependable AI, and the transformative potential of knowledge-driven systems by drawing on a recently published research paper titled: 'Getting From Generative AI to Trustworthy AI: What LLMs Might Learn From Cyc'. In this paper, Doug Lenat and Gary Marcus propose a new approach: integrating explicit symbolic knowledge and logic into AI systems to build trustworthy, transparent, and consistent models. Drawing on insights from the Cyc project—a decades-long effort to encode human knowledge in symbolic form—they outline a roadmap for developing AI systems that don't just sound right but are right.
The Limits of Generative AI
Plausibility Over Truth
Current LLMs, such as GPT and Claude, operate by predicting the next most likely word based on vast amounts of text data. This method makes them remarkably fluent but fundamentally unreliable. To achieve fluency, they are optimized to fill gaps with fabricated facts and shaky reasoning. Moreover, their outputs vary—with slight or significant changes—depending on user prompts.
What is supposed to be their strength—their reliance on statistical patterns—has also become their Achilles' heel. As a result, they lack structured knowledge of 'Truth', making them prone to errors in complex or ambiguous scenarios.
They do not "understand" the truth content of their outputs. As Lenat and Marcus emphasize, LLMs are not grounded in a consistent model of the world. They cannot distinguish facts from plausible fictions, leading to erratic behaviors that vary by prompt, temperature setting, or even punctuation.
The Trustworthiness Gap
Trustworthy AI is the one that can master the truth of the content which requires qualities like consistency, transparency, and the ability to reason robustly. The authors of Getting from Generative AI to Trustworthy AI: What LLMs might learn from Cyc argue that for AI to be trustworthy, it must meet sixteen essential criteria—ranging from auditability and explainability to commonsense reasoning and ethical alignment. Most LLMs fall short of these attributes. Their inner workings are opaque; they lack persistent memory, and their reasoning processes are not easily interpretable or consistent. These gaps pose serious risks when deploying AI in critical areas like medicine, finance, or policy-making.
A Knowledge-Based Alternative
To overcome LLMs’ weaknesses, the authors advocate for systems that use structured knowledge bases—collections of facts and rules about the world. Unlike LLMs’ pattern-based learning, their approach aims to ensure outputs are grounded in verifiable information, enhancing accuracy and consistency.
Why Cyc Still Matters
For nearly forty years, the Cyc project has aimed to encode broad, commonsense knowledge in a formal, logical language. While its symbolic approach has often been overshadowed by data-driven deep learning, Lenat and Marcus argue that Cyc offers essential lessons. Unlike LLMs, Cyc supports traceable inference chains, uses structured representations, and can explicitly reason about cause and effect. These capabilities are vital for AI systems meant to be auditable and logically robust.
Logical Inference Over Pattern Matching
The heart of their proposal lies in combining large-scale symbolic reasoning with language capabilities. Rather than generating output based on surface-level pattern recognition, an AI system grounded in symbolic logic can apply deductive and inductive reasoning, assess conflicting evidence, and adapt to new contexts in a principled way. This allows for far greater consistency, especially in domains that require chains of logic or mathematical rigor.
Structured Knowledge Representation
A central tenet of the paper is that knowledge should not just be latent within a neural network but explicitly represented. Cyc, for instance, stores facts in logic-based triples (subject-predicate-object) and uses a rich ontology to model relationships between concepts. This structure enables the system to validate inferences, identify inconsistencies, and explain its conclusions—capabilities largely absent from today’s LLMs.
Building a Path to Trustworthy AI
What Makes AI Trustworthy?
According to the authors, a trustworthy AI must do more than generate coherent text. It should:
- Apply valid, reproducible reasoning.
- Distinguish between fact and conjecture.
- Incorporate long-term memory of facts and context.
- Adapt behavior based on moral and ethical considerations.
- Offer explanations for its outputs that can be traced back to explicit inputs.
These features align closely with the capabilities built into Cyc and are largely absent in neural LLMs.
Bridging the Gap: A Hybrid Model
Lenat and Marcus suggest that the path forward lies in a hybrid system—one that marries the language fluency of LLMs with the structured reasoning of symbolic AI. In this model, an LLM might generate candidate answers, but a symbolic reasoner would vet them against known facts and logical constraints. Alternatively, the symbolic engine might generate queries or infer missing premises that the LLM can then elaborate. This division of labor could offer the best of both worlds: flexibility and reliability.
Toward a New Evaluation Paradigm
The authors also critique the current evaluation standards for AI, which often rely on benchmark scores rather than deeper tests of reasoning and consistency. They call for new metrics that assess:
- Logical soundness
- Factual grounding
- Internal consistency
- Adherence to ethical norms
Such metrics would better reflect the goals of trustworthy AI and help identify systems suitable for critical deployments.

Challenges and Future Directions
Scaling Symbolic Systems
A frequent criticism of symbolic AI is its limited scalability. Building and maintaining a knowledge base like Cyc is resource-intensive. However, the authors argue that new tools for automated knowledge extraction, combined with collaborative editing, could make this process more scalable. They also propose that LLMs could help identify gaps in knowledge bases by generating counterfactuals or proposing novel inferences.
Integrating Commonsense and Context
LLMs frequently fail at commonsense reasoning because they lack an embedded model of the world. A symbolic system, on the other hand, can encode everyday knowledge explicitly: "if a person drops a glass, it usually breaks." Integrating this kind of knowledge enables systems to make better inferences and avoid absurd mistakes.
Human-in-the-Loop Reasoning
Another important proposal is to keep humans involved in the reasoning process. Trustworthy AI should not be fully autonomous (at least while it is still in its pre-AGI stage); rather, it should support transparent collaboration with human users, offering explanations, asking clarifying questions, and adapting to user feedback.
Challenges
Keeping humans meaningfully engaged in generative AI’s reasoning process is difficult because these systems operate at speeds and scales that outpace human oversight, while their opaque decision-making (e.g., black-box hallucinations) frustrates transparent collaboration—the very foundation of trustworthy AI.
Conclusion
The paper by Lenat and Marcus presents both a critique and a roadmap. While generative AI dazzles with linguistic fluency, it falters on the very qualities that matter in high-stakes domains: truth, consistency, and accountability. By returning to the principles of symbolic AI, particularly as demonstrated in the Cyc project, the authors offer a compelling vision of what trustworthy AI might look like.
Rather than treating symbolic reasoning and neural networks as incompatible paradigms, their work suggests that the future lies in integration. Trustworthy AI will not be built on probability alone but on systems that can explain, justify, and improve their reasoning over time. In this emerging era, fluency will be necessary—but trust will be earned through logic.
References
Lenat, Doug, and Gary Marcus. “Getting From Generative AI to Trustworthy AI: What LLMs Might Learn From Cyc.” arXiv.org, July 31, 2023. https://arxiv.org/abs/2308.04445.