Here’s why ChatGPT falls short. Ben Goertzel discusses the limitations of ChatGPT and what needs to happen to progress from there to an AGI.

Looking at the insanely popular Large Language Models (LLMs) like ChatGPT with an AGI researcher's eye, what I see are some potentially interesting components for future AGI systems.

Economically, I see the seed of a family of technologies that, over the next few years, is very likely to obsolete a majority of white-collar jobs, wreaking incredible disruption and ultimately moving us further toward a world in which people are liberated from the necessity to work for a living.

On the other hand, looking at them from an end user's view, what I see are tools that are already pretty useful for some things – but are marred by a few really major and frustrating flaws.

The first major flaw is a bizarre lack of reality discrimination – LLMs are so full of shit it’s almost unbelievable, given how intelligent they are in some ways.

The second is a mind-numbing boringness and banality – such powerful cliché machines have never previously even been imagined.

I do believe, though, both of these issues can be overcome with a moderate amount of R&D effort. Which is probably going to be put in by various parties during the next couple years.

To fully overcome these weaknesses will require a true breakthrough to Artificial General Intelligence. But I suspect they can be remedied to a significant degree even without AGI, as a parallel track of development.

Generative AI vs. AGI

As an AGI guy, the first thing I need to clarify when talking about LLMs is that they are certainly not AGIs. They achieve a fair degree of generality in their intelligence, but it’s not because they generalize beyond their training and programming – it’s because their training data is so huge it covers almost everything. A tremendous variety of queries can be answered via minor variations and combinations on things in the training data.

A generally intelligent artificial mind will have lots of uses for statistically recognizing patterns in huge datasets, and synthesizing new forms via merging these patterns together in a manner appropriate to some narrow context – the things that LLMs are good at. But I don't think these capabilities can form the core of an AGI mind. That needs much more abstract thinking, bound up with agency, will, self-reflection, modeling of self and other. If AGI systems have these core functions fulfilled (by the OpenCog Hyperon AGI architecture my colleagues and I are working on, or something else), I think LLMs could add a lot of wonderful peripheral functions.

Toward LLMs with Reality Discrimination

LLMs trained to produce text currently have serious problems with truth-telling – they basically can't distinguish truth from falsehood. I've gathered a few funny or instructive examples in a blog post on ChatGPT and AGI. There are many others easy to dig up online. I have a particular liking for the conversation where ChatGPT calls me the founder of OpenAI, the company that created it (in which I have no direct role).

A variety of approaches are being pursued to overcome this problem and make LLMs less full of shit, and I think a bunch of them will probably succeed to some degree. As an example, we at SingularityNET and TrueAGI are trying to address this with a neural-symbolic architecture: LLMs are hybridized with an OpenCog-based symbolic reasoning engine.

This symbolic reasoning engine can extract logical relationships from an LLM, and put the LLM’s knowledge in a more structured form. It then compares the relationships between concepts with each other to identify inconsistencies. Where possible, it also compares these relationships with existing background knowledge – common-sense truths about the world like ‘water is wet’ or ‘Ben Goertzel is not the founder of OpenAI’. Inconsistencies can then be used as negative evidence to fine-tune the model – the model can be explicitly trained to unlearn the lies it contains.

I suspect these hybrid symbolic-LLM structures can greatly increase the generality and truthfulness of things like ChatGPT, even without taking us all the way to AGI.

The Banality of LLMs

But LLMs also have other problems besides their rampant BS generation – like the fact that so much of what they produce is boring clichés. Eric Hoel has called this "the banality of ChatGPT".

As a so-bad-it's-good example of this banality, consider the following excerpt from a recent dialogue I had with the system:

BEN: How would you advise to manage and coordinate the project of creating a replica of the planet Jupiter with 1000x human intelligence?

ChatGPT: For managing and coordinating a project of creating a replica of Jupiter with 1000x human intelligence, I would suggest the following steps:

Define the project scope: Clearly define the goals, objectives, and deliverables of the project.
Assemble the team: Recruit individuals with relevant skills and experience to form a diverse and effective team.
Develop a project plan: Determine the tasks, resources, timelines, and budgets required to complete the project.
Set up project management systems: Establish tools and processes to track project progress, communicate with team members, and manage risks.
Monitor progress and adjust as needed: Continuously assess the project's status and make changes as needed to ensure it stays on track.
Ensure communication and collaboration: Encourage regular communication among team members and stakeholders to ensure everyone is on the same page.
Anticipate and manage risks: Identify and plan for potential risks and obstacles to minimize their impact on the project.
Celebrate success: Recognize and celebrate milestones and achievements along the way to keep team morale high.

Remember, creating a replica of Jupiter with 1000x human intelligence is a complex and challenging project, so it's important to be organized, communicate effectively, and continuously adjust as needed to ensure success.

Not exactly wrong. But, shall we say, not really grappling with the particulars of the question…

Sometimes boring clichés are exactly what you want in life. If you're writing a cover letter for a job application, or an essay for university English class... originality and quirkiness can sometimes be dangerous.

But one worries about a near-future in which boring clichés become even more common than they are now, because they're so quick and cheap to generate using AI models. A boring dystopia.

The Glory and Horror of Music LLMs

Google's recently announced MusicLM neural model provides an intriguing case study of the intersection between "fascinatingly impressive" and "horrifyingly boring."

You can give it a query like –

We can hear a choir, singing a Gregorian chant, and a drum machine, creating a rhythmic beat. The slow, stately sounds of strings provide a calming backdrop for the fast, complex sounds of futuristic electronic music.

– and it will generate music that fits the bill. Amazing stuff.

Except the effect is a bit like having a workaday lounge band improvise a musical passage for you. It's very rarely artistically thrilling.

Given how impressive the functionality is, you might say this is a pretty lame complaint.

However, if such technology was used to generate music for people to listen to, the result would be an even more cliché-ridden and repetitious music sphere than record execs have already inflicted on us! Dentist's office muzak++ forever!

The problem here is that averaging together everybody's art produces art that is itself average. For some commercial purposes – e.g. background music for ads or video games – average, passable, competent music may be fine.

As a lifelong composer and improviser, I've generally been more interested in creating sounds that diverge from the average and the expectation – even if they get a little jarring or discordant in the process.

Of course, current neural models can be jarring and discordant too – but they will tend to do it in a way quite similar to something from their training dataset, or combining surface-level features of a few things in their training datasets.

Music is the domain in which I've thought most about how to overcome the banality of LLM output – because as a musician, I would really love to have an AI musician to improvise with. We already have a robot singer in our Jam Galaxy Band, and some AI-composed riffs, but a real-time AI improviser jamming alongside me is what I dream of. I don't want boring lowest-common-denominator MusicLM-style in my band, not at all...

One approach that can be taken here is to formally introduce a theory of ‘interestingness’ – make a mathematical model of what constitutes interesting music, and then condition a MusicLM-type model to bias it to produce outputs meeting this interestingness criterion. This is not that far off from work I did in the 1990s using genetic algorithms to evolve music maximizing a ‘fitness function’ encoding a theory of musical interestingness. But LLMs allow the evolved music to incorporate patterns of various sorts from human music in a much more refined way than was possible back then.

LLMs vs. Hmmmmmm

Of course, this would still be a very different thing from how an AGI system would approach music.

AGI and music could intersect in a variety of ways, but one way or another, it would involve an AGI system creating and understanding music in the context of its experience of being an agent in the world, like when the AGI in the 2013 film Her says, “I'm trying to write a piece of music that's about what it feels like to be on the beach with you right now.”

Steven Mithen’s book The Singing Neanderthals presents an hypothesis about the origin of language and music. He posits that human communication began with a communication system he refers to as “Hmmmmm” because it had the following characteristics: it was Holistic, manipulative, multi-modal, musical and mimetic. Basically Hmmmmm combined sound and gesture and action and imitation – somewhat like the pre-verbal/semi-verbal communication one sees in one-year-old children, but with more adult-level cognitive sophistication underlying. His proposal is that Hmmmmm came first and then spawned both language and music, which evolved from Hmmmmm in their own different directions.

Cambridge Archeological Journal did a fascinating feature presenting various criticisms on the hypothesis along with Mithen’s responses.

An interesting and fairly difficult challenge would be to coax AI agents living in a virtual world – let's say Minecraft enhanced with plugins, or the in-process Sophiaverse virtual world – to invent language and music along the lines of the Hmmmmm theory. This could be an interesting and valuable thing for AGI researchers to do regardless of how fully accurate Mithen’s theory of evolution is.

We could stock the virtual world with a few easy-to-use musical instruments, let's say –

drums that make rhythms when hit
flutes that they breathe into (modulating volume and timbre with breath) while pushing buttons to make notes
Piano-type instruments that make notes when their keys are hit

One would then ultimately want these virtual-world proto-AGI agents – I like to think of them as "Neoterics" (new people) - to do things like:

Discover that dancing to music is pleasurable and creates a feeling of togetherness which fosters collective action and communication
Discover that drumming enthuses a group to carry out physical tasks together
Discover that listening to melodic music puts the mind in a state conducive to creativity

Given that the Neoterics' emotion models will be similar to ours, yet different in the particulars, it may be that the music they create to express their own emotions and influence each others’ emotions will be significantly different from human music. Perhaps one could then train music LLMs on music made by Neoterics and get a fascinating sort of hybrid – a truly new genre of music!

Whether or not this Neoterics experiment ever gets done in precisely this form, it does highlight the big difference between an LLM approach and an AGI approach – to music or anything else. LLMs are munging and merging data patterns, and with cleverness one can work around the most immediate issues that emerge from this approach, issues such as tendencies to hallucinate or converge on clichés. AGI, however, requires a totally different approach.

Narrow AI systems like LLMs may be useful for feeding patterns into the core cognition of AGI systems, or for helping them speak fluently in the lingo of a given domain. But at core, AGI systems will necessarily be very different from LLMs – they will be what Weaver has called Open-Ended Intelligences – complex self-organizing systems that engage richly with their environments, driven by complementary-and-contradictory drives to individuation and self-transcendence. AGI systems will achieve generalization via abstraction, and the need to balance individuation and self-transcendence, while working with limited resources, will drive them to intelligent abstract understanding. When they generate language or make music, they will build it on this abstract understanding, formed from their experience, and as a result will be imaginative and truth-telling naturally – organically – rather than with an additional trick.

LLMs and other deep neural nets are going to have a big impact on society, disrupting the business models of today's tech giants and potentially eliminating a significant percentage of human jobs. But their biggest contribution ultimately may be waking up the world to the potential of AI and thus attracting more cognitive, financial and cultural resources toward the development of AGI, which will have a far larger direct impact than even the coolest narrow AI systems.

Large Language Models – From Banality to Originality

Generative AI vs. AGI

Toward LLMs with Reality Discrimination

The Banality of LLMs

The Glory and Horror of Music LLMs

LLMs vs. Hmmmmmm

Related Articles

Comments on this article