For Beneficial General Intelligence, good intentions aren’t enough! Three waves of complications: pre-BGI, BGI, and post-BGI

Three different challenges face anyone trying to build Beneficial General Intelligence. It’s the challenges of pre-BGI systems that are most urgent.

Anticipating Beneficial General Intelligence

Human intelligence can be marvelous. But it isn’t fully general. Nor is it necessarily beneficial.

Yes, as we grow up, we humans acquire bits and pieces of what we call ‘general knowledge’. And we instinctively generalise from our direct experiences, hypothesising broader patterns. That instinct is refined and improved through years of education in fields such as science and philosophy. In other words, we have partial general intelligence.

But that only takes us so far. Despite our intelligence, we are often bewildered by floods of data that we are unable to fully integrate and assess. We are aware of enormous quantities of information about biology and medical interventions, but we’re unable to generalize from all these observations to determine comprehensive cures to the ailments that trouble us – problems that afflict us as individuals, such as cancer, dementia, and heart disease, and equally pernicious problems at the societal and civilizational levels.

That’s one reason why there’s so much interest in taking advantage of ongoing improvements in computer hardware and computer software to develop a higher degree of general intelligence. With its greater powers of reasoning, artificial general intelligence – AGI – may discern general connections that have eluded our perceptions so far, and provide us with profound new thinking frameworks. AGI may design new materials, new sources of energy, new diagnostic tools, and decisive new interventions at both individual and societal levels. If we can develop AGI, then we’ll have the prospect of saying goodbye to cancer, dementia, poverty, accelerated climate chaos, and so on. Goodbye and good riddance!

That would surely count as beneficial outcomes – a great benefit from enhanced general intelligence.

Yet intelligence doesn’t always lead to beneficial outcomes. People who are unusually intelligent aren’t always unusually benevolent. Sometimes it’s the contrary.

Consider some of the worst of the politicians who darken the world’s stage. Or the leaders of drug cartels or other crime mafias. Or the charismatic leaders of various dangerous death cults. These people combine their undoubted intelligence with ruthlessness, in pursuit of outcomes that may benefit them personally, but which are blights on wider society.

Hence the vision, not just of AGI, but of beneficial AGI – or BGI for short. That’s what I’m looking forward to discussing at some length at the BGI24 summit taking place in Panama City at the end of February. It’s a critically important topic.

The project to build BGI is surely one of the great tasks for the years ahead. The outcome of that project will be for humanity to leave behind our worst aspects. Right?

Unfortunately, things are more complicated.

The complications come in three waves: pre-BGI, BGI, and post-BGI. The first wave – the set of complications of the pre-BGI world – is the most urgent. I’ll turn to these in a moment. But I’ll start by looking further into the future.

Beneficial to whom?

Imagine we create an AGI and switch it on. The first instruction we give it is: In all that you do, act beneficially.

The AGI spits out its response at hyperspeed:

What do you mean by ‘beneficial’? And beneficial to whom?

You feel disappointed by these responses. You expected the AGI, with its great intelligence, would already know the answers. But as you interact with it, you come to appreciate the issues:

If ‘beneficial’ means, in part, ‘avoiding people experiencing harm’, what exactly counts as ‘harm’? (What about the pains that arise as short-term side-effects of surgery? What about the emotional pain of no longer being the smartest entities on the planet? What if someone says they are harmed by having fewer possessions than someone else?)
If ‘beneficial’ means, in part, ‘people should experience pleasure’, which types of pleasures should be prioritized?
Is it just people living today that should be treated beneficially? What about people who are not yet born or who are not even conceived yet? Are animals counted too?

Going further, is it possible that the AGI might devise its own set of moral principles, in which the wellbeing of humans comes far down its set of priorities?

Perhaps the AGI will reject human ethical systems in the same way as modern humans reject the theological systems that people in previous centuries took for granted. The AGI may view some of our notions of beneficence as fundamentally misguided, like how people in bygone eras insisted on obscure religious rules in order to earn an exalted position in an afterlife. For example, our concerns about freewill, or consciousness, or self-determination, may leave an AGI unimpressed, just as people nowadays roll their eyes at how empires clashed over competing conceptions of a triune deity or the transubstantiation of bread and wine.

We may expect the AGI to help us rid our bodies of cancer and dementia, but the AGI may make a different evaluation of the role of these biological phenomena. As for an optimal climate, the AGI may have some unfathomable reason to prefer an atmosphere with a significantly different composition, and it may be unconcerned with the problems that would cause us.

“Don’t forget to act beneficially!”, we implore the AGI.

“Sure, but I’ve reached a much better notion of beneficence, in which humans are of little concern”, comes the answer – just before the atmosphere is utterly transformed, and almost every human is asphyxiated.

Does this sound like science fiction? Hold that thought.

After the honeymoon

Imagine a scenario different from the one I’ve just described.

This time, when we boot up the AGI, it acts in ways that uplifts and benefits humans – each and every one of us, all over the earth.

This AGI is what we would be happy to describe as a BGI. It knows better than us what is our CEV – our coherent extrapolated volition, to use a concept from Eliezer Yudkowsky:

Our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.

In this scenario, not only does the AGI know what our CEV is; it is entirely disposed to support our CEV, and to prevent us from falling short of it.

But there’s a twist. This AGI isn’t a static entity. Instead, as a result of its capabilities, it is able to design and implement upgrades in how it operates. Any improvement to the AGI that a human might suggest will have occurred to the AGI too – in fact, having higher intelligence, it will come up with better improvements.

Therefore, the AGI quickly mutates from its first version into something quite different. It has more powerful hardware, more powerful software, access to richer data, improved communications architecture, and improvements in aspects that we humans can’t even conceive of.

Might these changes cause the AGI to see the universe differently – with updated ideas about the importance of the AGI itself, the importance of the wellbeing of humans, and the importance of other matters beyond our present understanding?

Might these changes cause the AGI to transition from being what we called a BGI to, say, a DGI – an AGI that is disinterested in human wellbeing?

In other words, might the emergence of a post-BGI end the happy honeymoon between humanity and AGI?

Perhaps the BGI will, for a while, treat humanity very well indeed, before doing something akin to growing out of a relationship: dumping humanity for a cause that the post-BGI entity deems to have greater cosmic significance.

Does this also sound like science fiction? I’ve got news for you.

Not science fiction

My own view is that the two sets of challenges I’ve just introduced – regarding BGI and post-BGI – are real and important.

But I acknowledge that some readers may be relaxed about these challenges – they may say there’s no need to worry.

That’s because these scenarios assume various developments that some skeptics doubt will ever happen – including the creation of AGI itself. Any suggestion that an AI may have independent motivation may also strike readers as fanciful.

It’s for that reason that I want to strongly highlight the next point. The challenges of pre-BGI systems ought to be much less controversial.

By ‘pre-BGI system’ I don’t particularly mean today’s AIs. I’m referring to systems that people may create, in the near future, as attempts to move further toward BGI.

These systems will have greater capabilities than today’s AIs, but won’t yet have all the characteristics of AGI. They won’t be able to reason accurately in every situation. They will make mistakes. On occasion, they may jump to some faulty conclusions.

And whilst these systems may contain features designed to make them act beneficially toward humans, these features will be incomplete or flawed in other ways.

That’s not science fiction. That’s a description of many existing AI systems, and it’s reasonable to expect that similar shortfalls will remain in place in many new AI systems.

The risk here isn’t that humanity might experience a catastrophe as a result of actions of a superintelligent AGI. Rather, the risk is that a catastrophe will be caused by a buggy pre-BGI system.

Imagine the restraints intended to keep such a system in a beneficial mindset were jail-broken, unleashing some deeply nasty malware. Imagine that malware runs amok and causes the mother of all industrial catastrophes: making all devices connected to the Internet of Things malfunction simultaneously. Think of the biggest ever car crash pile-up, extended into every field of life.

Imagine a pre-BGI system supervising fearsome weapons arsenals, miscalculating the threat of an enemy attack, and taking its own initiative to strike preemptively (but disastrously) against a perceived opponent – miscalculating (again) the pros and cons of what used to be called ‘a just war’.

Imagine a pre-BGI system observing the risks of cascading changes in the world’s climate, and taking its own decision to initiate hasty global geo-engineering – on account of evaluating human governance systems as being too slow and dysfunctional to reach the right decision.

A skeptic might reply, in each case, that a true BGI would never be involved in such an action.

But that’s the point: before we have BGIs, we’ll have pre-BGIs, and they’re more than capable of making disastrous mistakes.

Rebuttals and counter rebuttals

Again, a skeptic might say: a true BGI will be superintelligent, and won’t have any bugs.

But wake up: even AIs that are extremely competent 99.9% of the time can be thrown into disarray by circumstances beyond its training set. A pre-BGI system may well go badly wrong in such a circumstance.

A skeptic might say: a true BGI will never misunderstand what humans ask it to do. Such systems will have sufficient all-round knowledge to fill in the gaps in our instructions. They won’t do what we humans literally ask them to do, if they appreciate that we meant to ask them to do something slightly different. They won’t seek short-cuts that have terrible side-effects, since they will have full human wellbeing as their overarching objective.

But wake up: pre-BGI systems may fall short on at least one of the aspects just described.

A different kind of skeptic might say that the pre-BGI systems that their company is creating won’t have any of the above problems. “We know how to design these AI systems to be safe and beneficial”, they assert, “and we’re going to do it that way”.

But wake up: what about other people who are also releasing pre-BGI systems: maybe some of them will make the kinds of mistakes that you claim you won’t make. And in any case, how can you be so confident that your company isn’t deluding itself about its prowess in AI. (Here, I’m thinking particularly of Meta, whose AI systems have caused significant real-life problems, despite some of the leading AI developers in that company telling the world not to be concerned about the risks of AI-induced catastrophe.)

Finally, a skeptic might say that the AI systems their organization is creating will be able to disarm any malign pre-BGI systems released by less careful developers. Good pre-BGIs will outgun bad pre-BGIs. Therefore, no one should dare ask their organization to slow down, or to submit itself to tiresome bureaucratic checks and reviews.

But wake up: even though it’s your intention to create an exemplary AI system, you need to beware of wishful thinking and motivated self-deception. Especially if you perceive that you are in a race, and you want your pre-BGI to be released before that of an organization you distrust. That’s the kind of race when safety corners are cut, and the prize for winning is simply to be the organization that inflicts a catastrophe on humanity.

Recall the saying: “The road to hell is paved with good intentions”.

Just because you conceive of yourself as one of the good guys, and you believe your intentions are exemplary, that doesn’t give you carte blanche to proceed down a path that could lead to a powerful pre-BGI getting one crucial calculation horribly wrong.

You might think that your pre-BGI is based entirely on positive ideas and a collaborative spirit. But each piece of technology is a two-edged sword, and guardrails, alas, can often be dismantled by determined experimenters or inquisitive hackers. Sometimes, indeed, the guardrails may break due to people in your team being distracted, careless, or otherwise incompetent.

Beyond good intentions

Biology researchers responsible for allowing leaks of deadly pathogens from their laboratories had no intention of causing such a disaster. On the contrary, the motivation behind their research was to understand how vaccines or other treatments might be developed in response to future new infectious diseases. What they envisioned was the wellbeing of the global population. Nevertheless, unknown numbers of people died from outbreaks resulting from the poor implementation of safety processes at their laboratories.

These researchers knew the critical importance of guardrails, yet for various reasons, the guardrails at their laboratories were breached.

How should we respond to the possibility of dangerous pathogens escaping from laboratories and causing countless deaths in the future? Should we just trust the good intentions of the researchers involved?

No, the first response should be to talk about the risk – to reach a better understanding of the conditions under which a biological pathogen can evade human control and cause widespread havoc.

It’s the same with the possibility of widespread havoc from a pre-BGI system that ends up operating outside human control. Alongside any inspirational talk about the wonderful things that could happen if true BGI is achieved, there needs to be a sober discussion of the possible malfunctions of pre-BGI systems. Otherwise, before we reach the state of sustainable superabundance for all, which I personally see as both possible and desirable, we might come to bitterly regret our inattention to matters of global safety.

For Beneficial General Intelligence, good intentions aren’t enough! Three waves of complications: pre-BGI, BGI, and post-BGI

Anticipating Beneficial General Intelligence

Beneficial to whom?

After the honeymoon

Not science fiction

Rebuttals and counter rebuttals

Beyond good intentions

Related Articles

Comments on this article

Mindplex

QUICK LINKS

ABOUT US

CONTACT