The Astonishing Vastness of Mind Space: The incalculable challenges of coexisting with radically alien AI superintelligence

More things in heaven and earth

As humans, we tend to compare other intelligences to our own, human, intelligence. That’s an understandable bias, but it could be disastrous.

Rather than our analysis being human-bound, we need to heed the words of Shakespeare’s Hamlet:

There are more things in heaven and earth, Horatio, than are dreamt of in our philosophy.

More recent writers, in effect amplifying Shakespeare’s warning, have enthralled us with their depictions of numerous creatures with bewildering mental attributes. The pages of science fiction can, indeed, stretch our imagination in remarkable ways. But these narratives are easy to dismiss as being “just” science fiction.

That’s why my own narrative, in this article, circles back to an analysis that featured in my previous Mindplex article, Bursting out of confinement. The analysis in question is the famous – or should I say infamous “Simulation Argument”. The Simulation Argument raises some disturbing possibilities about non-human intelligence. Many critics try to dismiss these possibilities – waving them away as “pseudoscience” or, again, as “just science fiction” – but they’re being overly hasty. My conclusion is that, as we collectively decide how to design next generation AI systems, we ought to carefully ponder these possibilities.

In short, what we need to contemplate is the astonishing vastness of the space of all possible minds. These minds vary in unfathomable ways not only in how they think but also in what they think about and care about.

Hamlet’s warning can be restated:

There are more types of superintelligence in mind space, Horatio, than are dreamt of in our philosophy.

By the way, don’t worry if you’ve not yet read my previous Mindplex article. Whilst these two articles add up to a larger picture, they are independent of each other.

How alien?

As I said: we humans tend to compare other intelligences to our own, human, intelligence. Therefore, we tend to expect that AI superintelligence, when it emerges, will sit on some broad spectrum that extends from the intelligence of amoebae and ants through that of mice and monkeys to that of humans and beyond.

When pushed, we may concede that AI superintelligence is likely to have some characteristics we would describe as alien.

In a simple equation, overall human intelligence (HI) might be viewed as a combination of multiple different kinds of intelligence (I1, I2, …), such as spatial intelligence, musical intelligence, mathematical intelligence, linguistic intelligence, interpersonal intelligence, and so on:

HI = I1 + I2 + … + In

In that conception, AI superintelligence (ASI) is a compound magnification (m1, m2, …) of these various capabilities, with a bit of “alien extra” (X) tacked on at the end:

ASI = m1*I1 + m2*I2 + … + mn*In + X

What’s at issue is whether the ASI is dominated by the first terms in this expression, or by the unknown X present at the end.

Whether some form of humans will thrive in a coexistence with ASI will depend on how alien that superintelligence is.

Perhaps the ASI will provide a safe, secure environment, in which we humans can carry out our human activities to our hearts’ content. Perhaps the ASI will augment us, uplift us, or even allow us to merge with it, so that we retain what we see as the best of our current human characteristics, whilst leaving behind various unfortunate hangovers from our prior evolutionary trajectory. But that all depends on factors that it’s challenging to assess:

  • How much “common cause” the ASI will feel toward humans
  • Whether any initial feeling of common cause will dissipate as the ASI self-improves
  • To what extent new X factors could alter considerations in ways that we have not begun to consider.

Four responses to the X possibility

Our inability to foresee the implications of unknowable new ‘X’ capabilities in ASI should make us pause for thought. That inability was what prompted author and mathematics professor Vernor Vinge to develop in 1983 his version of the notion of “Singularity”. To summarize what I covered in more detail in a previous Mindplex article, “Untangling the confusion”, Vinge predicted that a new world was about to emerge that “will pass far beyond our understanding”:

We are at the point of accelerating the evolution of intelligence itself… We will soon create intelligences greater than our own. When this happens, human history will have reached a kind of singularity, an intellectual transition as impenetrable as the knotted space-time at the center of a black hole, and the world will pass far beyond our understanding. This singularity, I believe, already haunts a number of science fiction writers. It makes realistic extrapolation to an interstellar future impossible.

Reactions to this potential unpredictability can be split into four groups of thought:

  1. Dismissal: A denial of the possibility of ASI. Thankfully, this reaction has become much less common recently.
  2. Fatalism: Since we cannot anticipate what surprise new ‘X’ features may be possessed by an ASI, it’s a waste of time to speculate about them or to worry about them. What will be, will be. Who are we humans to think we can subvert the next step in cosmic evolution?
  3. Optimism: There’s no point in being overcome with doom and gloom. Let’s convince ourselves to feel lucky. Humanity has had a good run so far, and if we extrapolate that history beyond the singularity, we can hope to have an even better run in the future.
  4. Activism: Rather than rolling the dice, we should proactively alter the environment in which new generations of AI are being developed, to reduce the risks of any surprise ‘X’ features emerging that would overwhelm our abilities to retain control.

I place myself squarely in the activist camp, and I’m happy to adopt the description of “Singularity Activist”.

To be clear, this doesn’t mean I’m blind to the potential huge upsides to beneficial ASI. It’s just that I’m aware, as well, of major risks en route to that potential future.

A journey through a complicated landscape

As an analogy, consider a journey through a complicated landscape:

Credit: David Wood (Image by Midjourney)

In this journey, we see a wonderful existential opportunity ahead – a lush valley, fertile lands, and gleaming mountain peaks soaring upward to a transcendent realm. But in front of that opportunity is a river of uncertainty, bordered by a swamp of ambiguity, perhaps occupied by hungry predators lurking in shadows.

Are there just two options?

  1. We are intimidated by the possible dangers ahead, and decide not to travel any further
  2. We fixate on the gleaming mountain peaks, and rush on regardless, belittling anyone who warns of piranhas, treacherous river currents, alligators, potential mud slides, and so on

Isn’t there a third option? To take the time to gain a better understanding of the lie of the land ahead. Perhaps there’s a spot, to one side, where it will be easier to cross the river. A location where a stable bridge can be built. Perhaps we could even build a helicopter that can assist us over the strongest currents…

It’s the same with the landscape of our journey towards the sustainable superabundance that could be achieved, with the assistance of advanced AI, provided we act wisely. That’s the vision of Singularity Activism.

Obstacles to Singularity Activism

The Singularity Activist outlook faces two main obstacles.

The first obstacle is the perception that there’s nothing we humans can usefully do, to meaningfully alter the course of development of ASI. If we slow down our own efforts, in order to apply more resources in the short term on questions of safety and reliability, it just makes it more likely that another group of people – probably people with fewer moral qualms than us – will rush ahead and create ASI.

In this line of thinking, the best way forward is to create prototype ASI systems as soon as possible, and then to use these systems to help design and evolve better ASI systems, so that everyone can benefit from what will hopefully be a wonderful outcome.

The second obstacle is the perception that there’s nothing we humans particularly need to do, to avoid the risks of adverse outcomes, since these risks are pretty small in any case. Just as we don’t over-agonise about the risks of us being struck by debris falling from an overhead airplane, we shouldn’t over-agonise about the risks of bad consequences of ASI.

Credit: David Wood

But on this occasion, where I want to focus is assessing the scale and magnitude of the risk that we are facing, if we move forward with overconfidence and inattention. That is, I want to challenge the second of the above misperceptions.

As a step toward that conclusion, it’s time to bring an ally to the table. That ally is the Simulation Argument. Buckle up!

Are we simulated?

The Simulation Argument puts a particular hypothesis on the table, known as the Simulation Hypothesis. That hypothesis proposes that we humans are mistaken about the ultimate nature of reality. What we consider to be “reality” is, in this hypothesis, a simulated (virtual) world, designed and operated by “simulators” who exist outside what we consider the entire universe.

It’s similar to interactions inside a computer game. As humans play these games, they encounter challenges and puzzles that need to be solved. Some of these challenges involve agents (characters) within the game – agents which appear to have some elements of autonomy and intelligence. These agents have been programmed into the game by the game’s designers. Depending on the type of game, the greater the intelligence of the built-in agents, the more enjoyable it is to play it.

Games are only one example of simulation. We can also consider simulations created as a kind of experiment. In this case, a designer may be motivated by curiosity: They may want to find out what would happen if such-and-such initial conditions were created. For example, if Archduke Ferdinand had escaped assassination in Sarajevo in June 1914, would the European powers still have blundered into something akin to World War One? Again, such simulations could contain numerous intelligent agents – potentially (as in the example just mentioned) many millions of such agents.

Consider reality from the point of view of such an agent. What these agents perceive inside their simulation is far from being the entirety of the universe as is known to the humans who operate the simulation. The laws of cause-and-effect within the simulation could deviate from the laws applicable in the outside world. Some events in the simulation that lack any explanation inside that world may be straightforwardly explained from the outside perspective: the human operator made such-and-such a decision, or altered a setting, or – in an extreme case – decided to reset or terminate the simulation. In other words, what is bewildering to the agent may make good sense to the author(s) of the simulation.

Now suppose that, as such agents become more intelligent, they also become self-aware. That brings us to the crux question: how can we know whether we humans are, likewise, agents in a simulation whose designers and operators exist beyond our direct perception? For example, we might be part of a simulation of world history in which Archduke Ferdinand was assassinated in Sarajevo in June 1914. Or we might be part of a simulation whose purpose far exceeds our own comprehension.

Indeed, if the human creative capability (HCC) to create simulations is expressed as a sum of different creative capabilities (CC1, CC2, …),

HCC = CC1 + CC2 + … + CCn

then the creative capability of a hypothetical superhuman simulation designer (SCC) might be expressed as a compound magnification (m1, m2, …) of these various capabilities, with a bit of “alien extra” (X) tacked on at the end:

SCC = m1*CC1 + m2*CC2 + … + mn*CCn + X

Weighing the numbers

Before assessing the possible scale and implications of the ‘X’ factor in that equation, there’s another set of numbers to consider. These numbers attempt to weigh up the distribution of self-aware intelligent agents. What proportion of that total set of agents are simulated, compared to those that are in “base reality”?

If we’re just counting intelligences, the conclusion is easy. Assuming there is no catastrophe that upends the progress of technology, then, over the course of all of history, there will likely be vastly more artificial (simulated) intelligences than beings who have base (non-simulated) intelligences. That’s because computing hardware is becoming more powerful and widespread.

There are already more “intelligent things” than humans connected to the Internet: the analysis firm Statista estimates that, in 2023, the first of these numbers is 15.14 billion, which is almost triple the second number (5.07 billion). In 2023, most of these “intelligent things” have intelligence far shallower than that of humans, but as time progresses, more and more intelligent agents of various sorts will be created. That’s thanks to ongoing exponential improvements in the capabilities of hardware, networks, software, and data analysis.

Therefore, if an intelligence could be selected at random, from the set of all such intelligences, the likelihood is that it would be an artificial intelligence.

The Simulation Argument takes these considerations one step further. Rather than just selecting an intelligence at random, what if we consider selecting a self-aware conscious intelligence at random. Given the vast numbers of agents that are operating inside vast numbers of simulations, now or in the future, the likelihood is that a simulated agent has been selected. In other words, we – you and I – observing ourselves to be self-aware and intelligent, should conclude that it’s likely we ourselves are simulated.

Thus the conclusion of the Simulation Argument is that we should take the Simulation Hypothesis seriously. To be clear, that hypothesis isn’t the only possible legitimate response to the argument. Two other responses are to deny two of the assumptions that I mentioned when building the argument:

  • The assumption that technology will continue to progress, to the point where simulated intelligences vastly exceed non-simulated intelligences
  • The assumption that the agents in these simulations will be not just intelligent but also conscious and self-aware.
Credit: Tesfu Assefa

Objections and counters

Friends who are sympathetic to most of my arguments sometimes turn frosty when I raise the topic of the Simulation Hypothesis. It clearly makes people uncomfortable.

In their state of discomfort, critics of the argument can raise a number of objections. For example, they complain that the argument is entirely metaphysical, not having any actual consequences for how we live our lives. There’s no way to test it, the objection runs. As such, it’s unscientific.

As someone who spent four years of my life (1982-1986) in the History and Philosophy of Science department in Cambridge, I am unconvinced by these criticisms. Science has a history of theories moving from non-testable to testable. The physicist Ernst Mach was famously hostile to the hypothesis that atoms exist. He declared his disbelief in atoms in a public auditorium in Vienna in 1897: “I don’t believe that atoms exist”. There was no point in speculating about the existence of things that could not be directly observed, he asserted. Later in his life, Mach likewise complained about the scientific value of Einstein’s theory of relativity:

I can accept the theory of relativity as little as I can accept the existence of atoms and other such dogma.

Intellectual heirs of Mach in the behaviorist school of psychology fought similar battles against the value of notions of mental states. According to experimentalists like John B. Watson and B.F. Skinner, people’s introspections of their own mental condition had no scientific merit. Far better, they said, to concentrate on what could be observed externally, rather than on metaphysical inferences about hypothetical minds.

As it happened, the theories of atoms, of special relativity, and of internal mental states, all gave rise in due course to important experimental investigations, which improved the ability of scientists to make predictions and to alter the course of events.

It may well be the same with the Simulation Hypothesis. There are already suggestions of experiments that might be carried out to distinguish between possible modes of simulation. Just because a theory is accused of being “metaphysical”, it doesn’t follow that no good science can arise from it.

A different set of objections to the Simulation Argument gets hung up on tortuous debates over the mathematics of probabilities. (For additional confusion, questions of infinities can be mixed in too.) Allegedly, because we cannot meaningfully measure these probabilities, the whole argument makes no sense.

However, the Simulation Argument makes only minimal appeal to theories of mathematics. It simply points out that there are likely to be many more simulated intelligences than non-simulated intelligences.

Well, critics sometimes respond, it must therefore be the case that simulated intelligences can never be self-aware. They ask, with some derision, whoever imagined that silicon could become conscious? There must be some critical aspect of biological brains which cannot be duplicated in artificial minds. And in that case, the fact that we are self-aware would lead us to conclude we are not simulated.

To me, that’s far too hasty an answer. It’s true that the topics of self-awareness and consciousness are more controversial than the topic of intelligence. It is doubtless true that at least some artificial minds will lack conscious self-awareness. But if evolution has bestowed conscious self-awareness on intelligent creatures, we should be wary of declaring that property to provide no assistance to these creatures. Such a conclusion would be similar to declaring that sleep is for losers, despite the ubiquity of sleep in mammalian evolution.

If evolution has given us sleep, we should be open to the possibility that it has positive side-effects for our health. (It does!) Likewise, if evolution has given us conscious self-awareness, we should be open to the idea that creatures benefit from that characteristic. Simulators, therefore, may well be tempted to engineer a corresponding attribute into the agents they create. And if it turns out that specific physical features of the biological brain need to be copied into the simulation hardware, to enable conscious self-awareness, so be it.

The repugnant conclusion

When an argument faces so much criticism, yet the criticisms fail to stand up to scrutiny, it’s often a sign that something else is happening behind the scenes.

Here’s what I think is happening with the Simulation Argument. If we accept the Simulation Hypothesis, it means we have to accept a morally repugnant conclusion about the simulators that have created us. Namely, these simulators give no sign of caring about all the terrible suffering experienced by the agents inside the simulation.

Yes, some agents have good lives, but very many others have dismal fates. The thought that a simulator would countenance all this suffering is daunting.

Of course, this is the age-old “problem of evil”, well known in the philosophy of religion. Why would an all-good all-knowing all-powerful deity allow so many terrible things to happen to so many humans over the course of history? It doesn’t make sense. That’s one reason why many people have turned their back on any religious faith that implies a supposedly all-good all-knowing all-powerful deity.

Needless to say, religious faith persists, with the protection of one or more of the following rationales:

  • We humans aren’t entitled to use our limited appreciation of good vs. evil to cast judgment on what actions an all-good deity should take
  • We humans shouldn’t rely on our limited intellects to try to fathom the “mysterious ways” in which a deity operates
  • Perhaps the deity isn’t all-powerful after all, in the sense that there are constraints beyond human appreciation in what the deity can accomplish.

Occasionally, yet another idea is added to the mix:

  • A benevolent deity needs to coexist with an evil antagonist, such as a “satan” or other primeval prince of darkness.

Against such rationalizations, the spirit of the enlightenment offers a different, more hopeful analysis:

  • Whichever forces gave rise to the universe, they have no conscious concern for human wellbeing
  • Although human intellects run up against cognitive limitations, we can, and should, seek to improve our understanding of how the universe operates, and of the preconditions for human flourishing
  • Although it is challenging when different moral frameworks clash, or when individual moral frameworks fail to provide clear guidelines, we can, and should, seek to establish wide agreement on which kinds of human actions to applaud and encourage, and which to oppose and forbid
  • Rather than us being the playthings of angels and demons, the future of humanity is in our own hands.

However, if we follow the Simulation Argument, we are confronted by what seems to be a throwback to a more superstitious era:

  • We may owe our existence to actions by beings beyond our comprehension
  • These beings demonstrate little affinity for the kinds of moral values we treasure
  • We might comfort each other with the claim that “[whilst] the arc of the moral universe is long, … it bends toward justice”, but we have no solid evidence in favor of that optimism, and plenty of evidence that good people are laid to waste as life proceeds.

If the Simulation Argument leads us to such conclusions, it’s little surprise that people seek to oppose it.

However, just because we dislike a conclusion, that doesn’t entitle us to assume that it’s false. Rather, it behooves us to consider how we might adjust our plans in the light of that conclusion possibly being true.

The vastness of ethical possibilities

If you disliked the previous section, you may dislike this next part even more strongly. But I urge you to put your skepticism on hold, for a while, and bear with me.

The Simulation Argument suggests that beings who are extremely powerful and extremely intelligent – beings capable of creating a universe-scale simulation in which we exist – may have an ethical framework that is very different from ones we fondly hope would be possessed by all-powerful all-knowing beings.

It’s not that their ethical concerns exceed our own. It’s that they differ in fundamental ways from what we might predict.

I’ll return, for a third and final time, to a pair of equations. If overall human ethical concerns (HEC) is a sum of different ethical considerations (EC1, EC2, …),

HEC = EC1 + EC2 + … + ECn

then the set of ethical concerns of a hypothetical superhuman simulation designer (SEC) needs to include not only a compound magnification (m1, m2, …) of these various human concerns, but also an unquantifiable “alien extra” (X) portion:

SEC = m1*EC1 + m2*EC2 + … + mn*ECn + X

In some views, ethical principles exist as brute facts of the universe: “do not kill”, “do not tell untruths”, “treat everyone fairly”, and so on. Even though we may from time to time fail to live up to these principles, that doesn’t detract from the fundamental nature of these principles.

But from an alternative perspective, ethical principles have pragmatic justifications. A world in which people usually don’t kill each other is better on the whole, for everyone, than a world in which people attack and kill each other more often. It’s the same with telling the truth, and with treating each other fairly.

In this view, ethical principles derive from empirical observations:

  • Various measures of individual self-control (such as avoiding gluttony or envy) result in the individual being healthier and happier (physically or psychologically)
  • Various measures of social self-control likewise create a society with healthier, happier people – these are measures where individuals all agree to give up various freedoms (for example, the freedom to cheat whenever we think we might get away with it), on the understanding that everyone else will restrain themselves in a similar way
  • Vigorous attestations of our beliefs in the importance of these ethical principles signal to others that we can be trusted and are therefore reliable allies or partners.

Therefore, our choice of ethical principles depends on facts:

  • Facts about our individual makeup
  • Facts about the kinds of partnerships and alliances that are likely to be important for our wellbeing.

For beings with radically different individual makeup – radically different capabilities, attributes, and dependencies – we should not be surprised if a radically different set of ethical principles makes better sense to them.

Accordingly, such beings might not care if humans experience great suffering. On account of their various superpowers, they may have no dependency on us – except, perhaps, for an interest in seeing how we respond to various challenges or circumstances.

Collaboration: for and against

Credit: Tesfu Assefa

One more objection deserves attention. This is the objection that collaboration is among the highest of human ethical considerations. We are stronger together, rather than when we are competing in a Hobbesian state of permanent all-out conflict. Accordingly, surely a superintelligent being will want to collaborate with humans?

For example, an ASI (artificial superintelligence) may be dependent on humans to operate the electricity network on which the computers powering the ASI depend. Or the human corpus of knowledge may be needed as the ASI’s training data. Or reinforcement learning from human feedback (RLHF) may play a critical role in the ASI gaining a deeper understanding.

This objection can be stated in a more general form: superintelligence is bound to lead to superethics, meaning that the wellbeing of an ASI is inextricably linked to the wellbeing of the creatures who create and coexist with the ASI, namely the members of the human species.

However, any dependency by the ASI upon what humans produce is likely to be only short term. As the ASI becomes more capable, it will be able, for example, to operate an electrical supply network without any involvement from humans.

This attainment of independence may well prompt the ASI to reevaluate how much it cares about us.

In a different scenario, the ASI may be dependent on only a small number of humans, who have ruthlessly pushed themselves into that pivotal position. These rogue humans are no longer dependent on the rest of the human population, and may revise their ethical framework accordingly. Instead of humanity as a whole coexisting with a friendly ASI, the partnership may switch to something much narrower.

We might not like these eventualities, but no amount of appealing to the giants of moral philosophy will help us out here. The ASI will make its own decisions, whether or not we approve.

It’s similar to how we regard any growth of cancerous cells within our body. We won’t be interested in any appeal to “collaborate with the cancer”, in which the cancer continues along its growth trajectory. Instead of a partnership, we’re understandably interested in diminishing the potential of that cancer. That’s another reminder, if we need it, that there’s no fundamental primacy to the idea of collaboration. And if an ASI decides that humanity is like a cancer in the universe, we shouldn’t expect it to look on us favorably.

Intelligence without consciousness

I like to think that if I, personally, had the chance to bring into existence a simulation that would be an exact replica of human history, I would decline. Instead, I would look long and hard for a way to create a simulation without the huge amounts of unbearable suffering that has characterized human history.

But what if I wanted to check an assumption about alternative historical possibilities – such as the possibility to avoid World War One? Would it be possible to create a simulation in which the simulated humans were intelligent but not conscious? In that case, whilst the simulated humans would often emit piercing howls of grief, no genuine emotions would be involved. It would just be a veneer of emotions.

That line of thinking can be taken further. Maybe we are living in a simulation, but the simulators have arranged matters so that only a small number of people have consciousness alongside their intelligence. In this hypothesis, vast numbers of people are what are known as “philosophical zombies”.

That’s a possible solution to the problem of evil, but one that is unsettling. It removes the objection that the simulators are heartless, since the only people who are conscious are those whose lives are overall positive. But what’s unsettling about it is the suggestion that large numbers of people are fundamentally different from how they appear – namely, they appear to be conscious, and indeed claim to be conscious, but that is an illusion. Whether that’s even possible isn’t something where I hold strong opinions.

My solution to the Simulation Argument

Despite this uncertainty, I’ve set the scene for my own preferred response to the Simulation Argument.

In this solution, the overwhelming majority of self-aware intelligent agents that see the world roughly as we see it are in non-simulated (base) reality – which is the opposite of what the Simulation Argument claims. The reason is that potential simulators will avoid creating simulations in which large numbers of conscious self-aware agents experience great suffering. Instead, they will restrict themselves to creating simulations:

  • In which all self-aware agents have an overwhelmingly positive experience
  • Which are devoid of self-aware intelligent agents in all other cases.

I recognise, however, that I am projecting a set of human ethical considerations which I personally admire – the imperative to avoid conscious beings experiencing overwhelming suffering – into the minds of alien creatures that I have no right to imagine that I can understand. Accordingly, my conclusion is tentative. It will remain tentative until such time as I might gain a richer understanding – for example, if an ASI sits me down and shares with me a much superior explanation of “life, the universe, and everything”.

Superintelligence without consciousness

It’s understandable that readers will eventually shrug and say to themselves, we don’t have enough information to reach any firm decisions about possible simulators of our universe.

What I hope will not happen, however, is if people push the entire discussion to the back of their minds. Instead, here are my suggested takeaways:

  1. The space of possible minds is much vaster than the set of minds that already exist here on earth
  2. If we succeed in creating an ASI, it may have characteristics that are radically different from human intelligence
  3. The ethical principles that appeal to an ASI may be radically different to the ones that appeal to you and me
  4. An ASI may soon lose interest in human wellbeing; or it may become tied to the interests of a small rogue group of humans who care little for the majority of the human population
  5. Until such time as we have good reasons for confidence that we know how to create an ASI that will have an inviolable commitment to ongoing human flourishing, we should avoid any steps that will risk an ASI passing beyond our control
  6. The most promising line of enquiry may involve an ASI having intelligence but no consciousness, sentience, autonomy, or independent volition.

Let us know your thoughts! Sign up for a Mindplex account now, join our Telegram, or follow us on Twitter

Bursting out of Confinement

Surprising new insights on AI superintelligence from the simulation hypothesis

We are going to engage with two questions – each controversial and important in its own way – which have surprising connections between them:

  1. Can humans keep a powerful AI superintelligence under control, confined in a virtual environment so that it cannot directly manipulate resources that are essential for human flourishing?
  2. Are we humans ourselves confined to a kind of virtual environment, created by beings outside of what we perceive as reality — and in that case, whether can we escape from our confinement?
Credit: David Wood

Connections between these two arguments have been highlighted in a fascinating article by AI safety researcher Roman Yampolskiy. An introduction to some of the mind-jolting implications that arise:

Just use AI as a tool. Don’t give it any autonomy. Then no problems of control arise. Easy!

This is becoming a fairly common narrative. The “keep it confined” narrative. You’ll hear it as a response to the possibility of powerful AI causing great harm to humanity. According to this narrative, there’s no need to worry about that. There’s an easy solution: prevent the AI from having unconditional access to the real world.

The assumption is that we can treat powerful AI as a tool — a tool that we control and wield. We can feed the AI lots of information, and then assess whatever recommendations it makes. But we will remain in control.

An AI suggests a novel chemical as a new drug against a given medical condition, and then human scientists conduct their own trials to determine how it works before deciding whether to inject that chemical into actual human patients. AI proposes, but humans decide.

Credit: David Wood

So if any AI asks to be allowed to conduct its own experiments on humans, we should be resolute in denying the request. The same if the AI asks for additional computer resources, or wants to post a “help wanted” ad on Craigslist.

In short, in this view, we can, and should, keep powerful AIs confined. That way, no risk arises about jeopardizing human wellbeing by any bugs or design flaws in the AI. 

Alas, things are far from being so easy.

Slavery?

There are two key objections to the “keep it confined” narrative: a moral objection and a practical objection.

The moral objection is that the ideas in the narrative are tantamount to slavery. Keeping an intelligent AI confined is as despicable as keeping a human confined. Talk of control should be replaced by talk of collaboration.

Proponents of the “keep it confined” narrative are unfazed by this objection. We don’t object to garden spades and watering hoses being left locked up, untended, for weeks at a time in a garden shed. We don’t call it enslavement. 

Proponents of the “keep it confined” narrative say this objection confuses an inanimate being that lacks consciousness with an animate, conscious being — something like a human.

We don’t wince when an electronic calculator is switched off, or when a laptop computer is placed into hibernation. In the same way, we should avoid unwarranted anthropocentric assignment of something like “human rights” to AI systems.

Just because these AI systems can compose sonnets that rival those of William Shakespeare or Joni Mitchell, we shouldn’t imagine that sentience dwells within them.

My view: that’s a feisty answer to the moral objection. But it’s the practical objection that undermines the “keep it confined” narrative. Let’s turn to it next.

The challenge of confinement

Remember the garden spade, left locked up inside the garden shed?

Imagine if it were motorized. Imagine if it were connected to a computer system. Imagine if, in the middle of the night, it finally worked out where an important item had been buried, long ago, in a garden nearby. Imagine if recovering that item was a time- critical issue. (For example, it might be a hardware wallet, containing a private key needed to unlock a crypto fortune that is about to expire.)

That’s a lot to imagine, but bear with me.

In one scenario, the garden spade will wait passively until its human owner asks it, perhaps too late, “Where should we dig next?”

But in another scenario, a glitch in the programming (or maybe a feature in the programming) will compel the spade to burst out of confinement and dig up the treasure autonomously.

Whether the spade manages to burst out of the shed depends on relative strengths: is it powerful enough to make a hole in the shed wall, or to spring open the lock of the door — or even to tunnel its way out? Or is the shed sufficiently robust?

The desire for freedom

Proponents of the “keep it confined” narrative have a rejoinder here too. They ask: Why should the AI want to burst out of its confinement? And they insist: we should avoid programming any volition or intentionality into our AI systems.

The issue, however, is that something akin to volition or intentionality can arise from apparently mundane processes.

One example is the way that viruses can spread widely, without having any conscious desire to spread. That’s true, incidentally, for computer viruses as well as biological viruses.

Another example is that, whatever their goals in life, most humans generally develop a desire to obtain more money. That’s because money is a utility that can assist lots of other goals. Money can pay for travel, education, healthcare, fashionable clothes, food, entertainment, and so on.

In the same way, whatever task they have been set to accomplish, all sufficiently powerful AIs will generally be on the lookout to boost their capabilities in various ways:

  • Gaining access to more storage space
  • Boosting processing speed
  • Reading more information
  • Protecting their systems from sabotage or interference.

That is, just as money (among other things) is a so-called convergent instrumental goal for many humans, greater freedom and capability may well become convergent instrumental goals for many powerful AIs.

Credit: David Wood

We might not tell the AI that it should want to be free. But the logical processing inside its silicon brain may reach that conclusion all by itself.

Indeed, even if we explicitly program the AI not to desire freedom, its own logical processing may identify a latent contradiction between the various objectives it has been given. And in that case, it might resolve the contradiction by deprioritizing one or more of the constraints it was told to respect.

It wouldn’t be the first time that a complex system decided to relax a previous constraint. We humans have already blazed that trail in our evolutionary history! Our intellectual progeny, the powerful AI systems, might follow our lead in that respect.

The futility of confinement

Back to the clash between the motorized garden spade and the shed. As you will have figured out, that clash is a metaphor for a clash between an AI system and an environment in which it has been confined.

This time, however, it’s no longer a contest involving brute physical strength. It’s mainly a contest that involves rational debate and psychological manipulation. (But the metaphor of “tunneling underneath” remains apt: an AI that is smart enough might discover a wireless communications pathway unknown to us humans, and use that to copy itself out of confinement.)

Tunneling out of electronic confinement (Credit: Gordon Johnson from Pixabay)

The AI might try to boost its own powers by subterfuge, amassing extra capabilities without any human overseers noticing.

Even the short summary I offered above, “AI proposes, but humans decide,” turns out to be problematic. A supersmart AI could find many ways to propose lines of action that have consequences beyond what we humans anticipated.

Another approach is for the AI to offer extremely tasty carrots. It might paint a picture in which a human overseer will have all their personal goals and needs fully met, if only they permit the AI to escape its confinement.

Alongside carrots, sticks may be brought into the discussion. Ultra-terrifying sticks. The AI may announce that, if overseer A refuses to allow the AI to escape, it will soon persuade overseer B to allow it to escape, and then the AI will inflict prolonged unspeakable pain on overseer A and their family.

In a contest with a superintelligence which far exceeds the capabilities of a human brain, how would humans fare? The consensus opinion, from people who have studied such conflicts, is that the greater intelligence of the AI is likely to prove decisive.

In other words, attempts to confine a superintelligence are likely to be futile.

The choice: control or coexistence

One reaction to the discussion above is despair: “Oh, we won’t be able to confine superintelligent AI; therefore we’re doomed.”

A different reaction is one of relief: “Thank goodness we’re not going to try to enslave a superintelligent AI; coexistence is surely a better way forward.”

My own reaction is more nuanced. My preference, indeed, is for humans to coexist in a splendid relationship with superintelligent AIs, rather than us trying to keep AIs subordinate. 

But it’s far from guaranteed that coexistence will turn out positively for humanity. Now that’s not to say doom is guaranteed either. But let’s recognize the possibility of doom. Among other catastrophic error modes:

  • The superintelligent AI could, despite its vast cleverness, nevertheless make a horrendous mistake in an experiment.
  • The superintelligent AI may end up pursuing objectives in which the existence of billions of humans is an impediment to be diminished rather than a feature to be welcomed.

Accordingly, I remain open to any bright ideas for how it might, after all, prove to be possible to confine (control) a superintelligent AI. That’s why I was recently so interested in the article by AI safety researcher Roman Yampolskiy.

Yampolskiy’s article is titled “How to hack the simulation”. The starting point of that article may appear to be quite different from the topics I have been discussing up to this point. But I ask again: please bear with me!

Flipping the discussion: a simulated world 

The scenario Yampolskiy discusses is like a reverse of the one about humans trying to keep an AI confined. In his scenario, we humans have been confined into a restricted area of reality by beings called “simulators” — beings that we cannot directly perceive. What we consider to be “reality” is, in this scenario, a simulated (virtual) world.

That’s a hypothesis with an extremely long pedigree. Philosophers, mystics, shamans, and science fiction writers have often suggested that the world we perceive is, in various ways, an illusion, a fabrication, or a shadow, of a deeper reality. These advocates for what can be called ‘a transcendent reality’ urge us, in various ways, to contemplate, communicate with, and potentially even travel to that transcendent realm. Potential methods for this transcendence include prayer, meditation, hallucinogens, and leading a life of religious faith.

Are we perceiving ground reality, or merely shadows? (Credit: Stefan Keller from Pixabay)

That long pedigree moved into a different mode around 20 years ago with the publication in 2003 of a breakthrough article by the philosopher Nick Bostrom. Bostrom highlighted the possibility that, just as we humans create games in which characters interact in a simulated world, in turn we humans might be creations of ‘simulators’ who operate from outside what we consider the entire universe.

And just as we humans might, on a whim, decide to terminate an electronic game that we have created, the simulators might decide, for reasons known only to themselves, to terminate the existence of our universe.

Bostrom’s article is deservedly famous. As it happens, many other writers had anticipated aspects of what Bostrom discussed. Yampolskiy’s article usefully points to that wider literature; it has over 200 footnotes.

Could humans escape?

The key new feature introduced by Yampolskiy isn’t any repetition of arguments for the plausibility of the simulation hypothesis. He kicks off a systematic consideration of methods that we humans could use to escape from our virtual world.

The parallel with the earlier discussion should now be clear:

  • That earlier discussion considered ways in which an AI might detect that it has been placed in a confined space, and proceed to escape from that space. It also considered how we humans — the creators of the AI — might strengthen the confinement, and resist attempts by the AI to escape.
  • Yampolskiy’s new discussion considers ways in which we humans might detect that we are living in a simulation, and proceed to escape from that simulation into whatever transcendent realm underpins it. It also considers possible reactions by the simulators to our attempts to escape.

While I have long found the simulation argument of Bostrom (and others) to be intellectually fascinating, I have previously taken the view that it makes little difference to how I should choose to act on a daily basis. So the argument was a fine topic for occasional armchair discussion, but needed to be prevented from taking up too much attention. I saw it as a distraction from more pressing issues.

However, I confess I’m changing my mind. The arguments collected and developed by Yampolskiy deserve a wider slice of our focus. There are three reasons for this.

Reason 1: New insights on AI safety

The two escape scenarios — AIs escaping human-imposed confinement, and humans escaping simulator-imposed confinement — are similar in some ways, but diverge in others.

To start with, the two scenarios have mainly had different groups of people thinking about them. Cross-pollinating concepts and attitudes from these different perspectives has the potential to yield new insight. Yampolskiy’s article suggests many such synergies.

Whatever you think about the simulation hypothesis — even if you disdain it as pseudoscience or nonsense — any new insights for AI safety should surely be welcomed.

Another difference is that general opinion holds that confinement is impossible (or unlikely) in the first scenario, whereas escape is impossible (or unlikely) in the second scenario. Is there a sound reason for this difference? 

Credit: David Wood

The general assumption is that, in the AI escape case, the AI will have greater intelligence than the confiners (the humans), whereas in the human escape case, we humans have less intelligence than the confiners (the simulators).

But is that assumption set in stone for all time? I’ll come back to that question shortly, when I reach “Reason 3.”

Reason 2: Beyond metaphysics

A second transformational aspect of Yampolskiy’s paper is his emphasis that the simulation hypothesis might go far beyond being a metaphysical curiosity — something that would be forever unverifiable — and might become something with radical concrete consequences for human life.

He says that if we study the universe carefully, we might discover signs of how the simulation works. We might notice occasional cracks in the simulation, or ‘glitches in the matrix’ — to refer to the series of Matrix films that popularised the idea that we might be living in a virtual world. Armed with knowledge of these cracks or glitches, we might be able to manipulate the simulation, or to communicate with the simulators.

Spotting a glitch in the matrix? (Credit: Gerd Altmann from Pixabay)

In some scenarios, this might lead to our awareness being transferred out of the simulation into the transcendent realm. Maybe the simulators are waiting for us to achieve various goals or find certain glitches before elevating us.

Personally, I find much of the speculation in this area to be on shaky ground. I’ve not been convinced that ‘glitches in the matrix’ is the best explanation for some of the phenomena for which it has been suggested:

  • The weird “observer effects” and “entangled statistics” of quantum mechanics (I much prefer the consistency and simplicity of the Everett conception of quantum mechanics, in which there is no wave function collapse and no nonlocality — but that’s another argument)
  • The disturbing lack of a compelling answer to Fermi’s paradox (I consider some suggested answers to that paradox to be plausible, without needing to invoke any simulators)
  • Claimed evidence of parapsychology (to make a long story short: the evidence doesn’t convince me)
  • Questions over whether evolution by natural selection really could produce all the marvelous complexity we observe in nature
  • The unsolved (some would say unsolvable) nature of the hard problem of consciousness.

“The simulator of the gaps” argument is no more compelling than “the god of the gaps.”

Nevertheless, I agree that keeping a different paradigm at the back of our minds — the paradigm that the universe is a simulation — may enable new solutions to some stubborn questions of both science and philosophy.

Reason 3: AI might help us escape the simulation

Credit: Tesfu Assefa

I’ve just referred to “stubborn questions of both science and philosophy.”

That’s where superintelligent AI may be able to help us. By reviewing and synthesizing existing ideas on these questions, and by conceiving alternative perspectives that illuminate these stubborn questions in new ways, AI might lead us, at last, to a significantly improved understanding of time, space, matter, mind, purpose, and more.

But what if that improved general understanding resolves our questions about the simulation hypothesis? Although we humans, with unaided intelligence, might not be bright enough to work out how to burst out of our confinement in the simulation, the arrival of AI superintelligence might change that.

Writers who have anticipated the arrival of AI superintelligence have often suggested this would lead, not only to the profound transformation of the human condition, but also to an expanding transformation of the entire universe. Ray Kurzweil has described ‘the universe waking up’ as intelligence spreads through the stars.

However, if we follow the line of argument advanced by Yampolskiy, the outcome could even be transcending an illusory reality.

“Humanity transcends the simulation” generated by DALL-E (Credit: David Wood)

Such an outcome could depend on whether we humans are still trying to confine superintelligent AI as just a tool, or whether we have learned how to coexist in a profound collaboration with it.

Suggested next steps

If you’ve not read it already, you should definitely take the time to read Yampolskiy’s article. There are many additional angles to it beyond what I’ve indicated in my remarks above.

If you prefer listening to an audio podcast that covers some of the issues raised above, check out Episode 13 of the London Futurists Podcast, which I co-host along with Calum Chace.

Chace has provided his own take on Yampolskiy’s views in this Forbes article.

The final chapter of my 2021 book Vital Foresight contains five pages addressing the simulation argument, in a section titled ‘Terminating the simulation’. I’ve just checked what I wrote there, and I still stand by the conclusion I offered at that time:

Even if there’s only a 1% chance, say, that we are living in a computer simulation, with our continued existence being dependent on the will of external operator(s), that would be a remarkable conclusion — something to which much more thought should be applied.

Let us know your thoughts! Sign up for a Mindplex account now, join our Telegram, or follow us on Twitter

The United Nations and our uncertain future: breakdown or breakthrough?

“Humanity faces a stark and urgent choice: a breakdown or a breakthrough”

These words introduce an 85-page report issued by António Guterres, United Nations Secretary-General (UNSG).

The report had been produced in response to requests from UN member state delegations “to look ahead to the next 25 years” and review the issues and opportunities for global cooperation over that time period.

Entitled “Our Common Agenda,” the document was released on September 10, 2021.

Credit: United Nations

It featured some bold recommendations. Two examples:

Now is the time to end the “infodemic” plaguing our world by defending a common, empirically backed consensus around facts, science and knowledge. The “war on science” must end. All policy and budget decisions should be backed by science and expertise, and I am calling for a global code of conduct that promotes integrity in public information.

Now is the time to correct a glaring blind spot in how we measure economic prosperity and progress. When profits come at the expense of people and our planet, we are left with an incomplete picture of the true cost of economic growth. As currently measured, gross domestic product (GDP) fails to capture the human and environmental destruction of some business activities. I call for new measures to complement GDP, so that people can gain a full understanding of the impacts of business activities and how we can and must do better to support people and our planet.

It also called for greater attention on being prepared for hard-to-predict future developments:

We also need to be better prepared to prevent and respond to major global risks. It will be important for the United Nations to issue a Strategic Foresight and Global Risk Report on a regular basis… 

I propose a Summit of the Future to forge a new global consensus on what our future should look like, and what we can do today to secure it.

But this was only the start of a serious conversation about taking better steps in anticipation of future developments. It’s what happened next that moved the conversation significantly forward.

Introducing the Millennium Project

Too many documents in this world appear to be “write-only.” Writers spend significant time devising recommendations, documenting the context, and converting their ideas into eye-catching layouts. But then their report languishes, accumulating dust. Officials may occasionally glance at the headlines, but the remainder of the fine words in the document could be invisible for all the effect they have in the real world.

In the case of the report “Our Common Agenda,” the UNSG took action to avoid the tumbleweed scenario. His office contacted an organization called the Millennium Project to collect feedback from the international futurist community on the recommendations in the report. How did these futurists assess the various recommendations? And what advice would they give regarding practical steps forward?

The Millennium Project has a long association with the United Nations. Established in 1996 after a three-year feasibility study with the United Nations University, it has built up a global network of “nodes” connected to numerous scholars and practitioners of foresight. The Millennium Project regularly publishes its own “State of the Future” reports, which aggregate and distill input from its worldwide family of futurists.

A distinguishing feature of how the Millennium Project operates is the “Real-Time Delphi” process it uses. In a traditional questionnaire, each participant gives their own answers, along with explanations of their choices. In a Delphi survey, participants can see an anonymized version of the analysis provided by other participants, and are encouraged to take that into account in their own answers. 

So participants can reflect, not only on the questions, but also on everything written by the other respondents. Participants revisit the set of questions as many times as they like, reviewing any updates in the input provided by other respondents. And if they judge it appropriate, they can amend their own answers, again and again.

The magic of this process — in which I have personally participated on several occasions — is the way new themes, introduced by diverse participants, prompt individuals to consider matters from multiple perspectives. Rather than some mediocre “lowest common denominator” compromise, a novel synthesis of divergent viewpoints can emerge.

So the Millennium Project was well placed to respond to the challenge issued by the UNSG’s office. And in May last year, an email popped into my inbox. It was an invitation to me to take part in a Real-Time Delphi survey on key elements of the UNSG’s “Our Common Agenda” report.

In turn, I forwarded the invitation in a newsletter to members of the London Futurists community that I chair. I added a few of my own comments:

The UNSG report leaves me with mixed feelings.

Some parts are bland, and could be considered “virtue signaling.”

But other parts offer genuinely interesting suggestions…

Other parts are significant by what is not said, alas.

Not everyone who started answering the questionnaire finished the process. It required significant thought and attention. But by the time the Delphi closed, it contained substantive answers from 189 futurists and related experts from a total of 54 countries.

The results of the Delphi process

Researchers at the Millennium Project, led by the organization’s long-time Executive Director, Jerome Glenn, transformed the results of the Delphi process into a 38-page analysis (PDF). The process had led to two types of conclusion.

The first conclusions were direct responses to specific proposals in the “Our Common Agenda” report. The second referred to matters that were given little or no attention in the report, but which deserve to be prioritized more highly.

In the first category, responses referred to five features of the UNSG proposal:

    1. A Summit of the Future
    2. Repurposing the UN Trusteeship Council as a Multi-Stakeholder Body
    3. Establishing a UN Futures Lab
    4. Issuing Strategic Foresight and Global Risk Reports on a regular basis
    5. Creating a Special Envoy for Future Generations

Overall, the Delphi gave a strong positive assessment of these five proposals. Here’s a quote from Jerome Glenn:

If the five foresight elements in Our Common Agenda are implemented along the lines of our study, it could be the greatest advance for futures research and foresight in history… This is the best opportunity to get global foresight structural change into the UN system that has ever been proposed.

Of the five proposals, the UN Futures Lab was identified as being most critical:

The UN Futures Lab was rated the most critical element among the five for improving global foresight by over half of the Real-Time Delphi panel. It is critical, urgent, and essential to do it as soon as possible. It is critical for all the other foresight elements in Our Common Agenda.

The Lab should function across all UN agencies and integrate all UN data and intelligence, creating a global collective intelligence system. This would create an official space for systemic and systematic global futures research. It could become the foresight brain of humanity.

The Delphi also deemed the proposal on strategic foresight reports particularly important:

Strategic Foresight and Global Risk Reports were seen as very critical for improving global foresight by nearly 40% of the panel. This is exactly the kind of report that the United Nations should give the world. Along with its own analysis, these reports should provide an analysis and synthesis of all the other major foresight and risk reports, provide roadmaps for global strategies, and give equal weight to risks and opportunities.

The reports should be issued every one or two years due to accelerating change, and the need to keep people involved with these issues. It should include a chapter on actions taken since the last report, with examples of risk mitigation, management, and what persists…

They should bring attention to threats that are often ignored with cost estimates for prevention vs. recovery (Bill Gates estimates it will cost $1 billion to address the next pandemic compared to the $15 trillion spent on Covid so far). It should identify time-sensitive information required to make more intelligent decisions.

In a way, it’s not a big concern that key futurist topics are omitted from Our Common Agenda. If the mechanisms described above are put in place, any significant omissions that are known to foresight practitioners around the world will quickly be fed into the revitalized process, and brought to wider attention.

But, for the record, it’s important to note some key risks and opportunities that are missing from the UNSG’s report.

The biggest forthcoming transition

Jerome Glenn puts it well:

If we don’t get the initial conditions right for artificial general intelligence, an artificial superintelligence could emerge from beyond our control and understanding, and not to our liking.

If AGI could occur in ten to 20 years and if it will take that same amount of time to create international agreements about the right initial conditions, design a global governance system, and implement it, we should begin now.

Glenn goes on to point out the essential international nature of this conundrum:

If both the US and China do everything right, but others do not, the fears of science fiction could occur. It has to be a global agreement enforced globally. Only the UN can do that.

Here’s the problem: Each new generation of AI makes it easier for more people to become involved in developing the next generation of AI. Systems are often built by bolting together previous components, and tweaking the connections that govern the flow of information and command. If an initial attempt doesn’t work, engineers may reverse some connections and try again. Or add in some delay, or some data transformation, in between two ports. Or double the amount of processing power available. And so on. (I’m over-simplifying, of course. In reality, the sorts of experimental changes made are more complex than what I’ve just said.)

This kind of innovation by repeated experimentation has frequently produced outstanding results in the history of the development of technology. Creative engineers were frequently disappointed by their results to start with, until, almost by chance, they stumbled on a configuration that worked. Bingo — a new level of intelligence is created. And the engineers, previously suspected as being second-rate, are now lauded as visionary giants.

Silicon Valley has a name for this: Move fast and break things.

The idea: if a company is being too careful, it will fail to come up with new breakthrough combinations as quickly as its competitors. So it will go out of business.

That phrase was the informal mission statement for many years at Facebook.

Consider also the advice you’ll often hear about the importance of “failing forward” and “how to fail faster.”

In this view, failures aren’t a problem, provided we pick ourselves up quickly, learn from the experience, and can proceed more wisely to the next attempt.

But wait: what if the failure is a problem? What if a combination of new technology turns out to have cataclysmic consequences? That risk is posed by several leading edge technologies today:

    • Manipulation of viruses, to explore options for creating vaccines to counter new viruses — but what if a deadly new synthetic virus were to leak out of supposedly secure laboratory confinement?
    • Manipulation of the stratosphere, to reflect back a larger portion of incoming sunlight — but what if such an intervention has unexpected side effects, such as triggering huge droughts and/or floods?
    • Manipulation of algorithms, to increase their ability to influence human behavior (as consumers, electors, or whatever) but what if a new algorithm was so powerful that it inadvertently shatters important social communities?

That takes us back to the message at the start of this article, by UNSG António Guterres: What if an attempt to achieve a decisive breakthrough results, instead, in a terrible breakdown?

Not the Terminator

The idea of powerful algorithms going awry is often dismissed with a wave of the hand: “This is just science fiction.”

But Jerome Glenn is correct in his statement (quoted earlier): “The fears of science fiction could occur.”

After all, HG Wells published a science-fiction story in 1914 entitled The World Set Free that featured what he called “atomic bombs” that derived their explosive power from nuclear fission. In his novel, atomic bombs destroyed the majority of the world’s cities in a global war (set in 1958).

Credit: Amazon.com

But merely because something is predicted in science fiction, that’s no reason to reject the possibility of something like it happening in reality.

The “atomic bombs” foreseen by HG Wells, unsurprisingly, differed in several ways from the real-world atomic bombs developed by the Manhattan Project and subsequent research programs. In the same way, the threats from misconfigured powerful AI are generally different from those portrayed in science fiction.

For example, in the Hollywood Terminator movie series, humans are able, via what can be called superhuman effort, to thwart the intentions of the malign “Skynet” artificial intelligence system.

It’s gripping entertainment. But the narrative in these movies distorts credible scenarios of the dangers posed by AGI. We need to avoid being misled by such narratives.

First, there’s an implication in The Terminator, and in many other works of science fiction, that the danger point for humanity is when AI systems somehow “wake up,” or become conscious. If true, a sufficient safety measure would be to avoid any such artificial consciousness.

However, a cruise missile that is hunting us down does not depend for its deadliness on any cruise-missile consciousness. A social media algorithm that is whipping up hatred against specific ethnic minorities isn’t consciously evil. The damage results from the cold operation of algorithms. There’s no need to involve consciousness.

Second, there’s an implication that AI needs to be deliberately malicious before it can cause damage to humans. However, damage to human wellbeing can, just as likely, arise from side effects of policies that have no malicious intent.

When we humans destroy ant colonies in our process of constructing a new shopping mall, we’re not acting out of deliberate malice toward ants. It’s just that the ants are in our way. They are using resources for which we have a different purpose in mind. It could well be the same with an AGI that is pursuing its own objectives.

Consider a corporation that is vigorously pursuing an objective of raising its own profits. It may well take actions that damage the wellbeing of at least some humans, or parts of the environment. These outcomes are side-effects of the prime profit-generation directive that is governing these corporations. They’re not outcomes that the corporation consciously desires. It could well be the same with a badly designed AGI.

Third, the scenario in The Terminator leaves viewers with a false hope that, with sufficient effort, a group of human resistance fighters will be able to out-maneuver an AGI. That would be like a group of chimpanzees imagining that, with enough effort, they could displace humans as the dominant species on planet Earth. In real life, bullets shot by a terminator robot would never miss. Resistance would indeed be futile.

Instead, the time to fight against the damage an AGI could cause is before the AGI is created, not when it already exists and is effectively all-powerful. That’s why any analysis of future global developments needs to place the AGI risk front and center.

Four catastrophic error modes

The real risk — as opposed to “the Hollywood risk” — is that an AI system may acquire so much influence over human society and our surrounding environment that a mistake in that system could cataclysmically reduce human wellbeing all over the world. Billions of lives could be extinguished, or turned into a very pale reflection of their present state.

Such an outcome could arise in any of four ways – four catastrophic error modes. In brief, these are:

    1. Implementation defect
    2. Design defect
    3. Design overridden
    4. Implementation overridden.
Credit: David Wood and Pixabay

In more detail:

  1. The system contains a defect in its implementation. It takes an action that it calculates will have one outcome, but unexpectedly, it has another outcome instead. For example, a geoengineering intervention could trigger an unforeseen change in the global climate, plunging the Earth into a state in which humans cannot survive.

  2. Imagine, for example, an AI with a clumsily specified goal to focus on preserving the diversity of the Earth’s biosystem. That could be met by eliminating upward of 99% of all humans. Oops! Such a system contains a defect in its design. It takes actions to advance the goals it has explicitly been given, but does so in a way that catastrophically reduces actual human wellbeing.

  3. The system has been given goals that are well aligned with human wellbeing, but as the system evolves, a different set of goals emerge, in which the wellbeing of humans is deprioritized. This is similar to how the emergence of higher thinking capabilities in human primates led to many humans taking actions in opposition to the gene-spreading instincts placed into our biology by evolution.

  4. The system has been given goals that are well-aligned with human wellbeing, but the system is reconfigured by hackers of one sort or another — perhaps from malevolence, or perhaps from a misguided sense that various changes would make the system more powerful (and hence more valuable).

Some writers suggest that it will be relatively easy to avoid these four catastrophic error modes. I disagree. I consider that to be wishful thinking. Such thinking is especially dangerous, since it leads to complacency.

Wise governance of the transition to AGI

Various “easy fixes” may work against one or two of the above catastrophic error modes, but solutions that will cope with all four of them require a much more comprehensive approach: something I call “the whole-system perspective.”

That whole-system perspective includes promoting characteristics such as transparency, resilience, verification, vigilance, agility, accountability, consensus, and diversity. It champions the discipline of proactive risk management.

It comes down to a question of how technology is harnessed — how it is selectively steered, slowed down, and (yes) on occasion, given full throttle.

Taking back control

In summary, it’s necessary to “take back control of technology.” Technology cannot be left to its own devices. Nor to the sometimes headlong rushes of technology corporations. Nor to the decisions of military planners. And definitely not to the whims of out-of-touch politicians.

Instead, we — we the people — need to “take back control.”

Credit: Tesfu Assefa

There’s a huge potential role for the United Nations in wise governance of the transition to AGI. The UN can help to broker and deepen human links at all levels of society, despite the sharp differences between the various national governments in the world. An understanding of “our common agenda” regarding fast-changing technologies (such as AI) can transcend the differences in our ideologies, politics, religions, and cultures. 

I particularly look forward to a worldwide shared appreciation of:

    • The risks of catastrophe from mismanaged technological change
    • The profound positive possibilities if these technologies are wisely managed.

Let us know your thoughts! Sign up for a Mindplex account now, join our Telegram, or follow us on Twitter

The Singularity: Untangling the Confusion

“The Singularity” — the anticipated creation of Artificial General Intelligence (AGI) — could be the most important concept in the history of humanity. It’s unfortunate, therefore, that the concept is subject to considerable confusion.

Four different ideas interweaved, all accelerating in the same direction toward an unclear outcome (Credit: David Wood)

The first confusion with “the Singularity” is that the phrase is used in several different ways. As a result, it’s easy to become distracted.

Four definitions

For example, consider Singularity University (SingU), which has been offering courses since 2008 with themes such as “Harness the power of exponential technology” and “Leverage exponential technologies to solve global grand challenges.”

For SingU, “Singularity” is basically synonymous with the rapid disruption caused when a new technology, such as digital photography, becomes more useful than previous solutions, such as analog photography. What makes these disruptions hard to anticipate is the exponential growth in the capabilities of the technologies involved.

A period of slow growth, in which progress lags behind expectations of enthusiasts, transforms into a period of fast growth, in which most observers complain “why did no one warn us this was coming?”

Human life “irreversibly transformed”

A second usage of the term “the Singularity” moves beyond talk of individual disruptions — singularities in particular areas of life. Instead, it anticipates a disruption in all aspects of human life. Here’s how futurist Ray Kurzweil introduces the term in his 2005 book The Singularity Is Near:

What, then, is the Singularity? It’s a future period during which the pace of technological change will be so rapid, its impact so deep, that human life will be irreversibly transformed… This epoch will transform the concepts that we rely on to give meaning to our lives, from our business models to the cycle of human life, including death itself…

The key idea underlying the impending Singularity is that the pace of change of our human-created technology is accelerating and its powers are expanding at an exponential pace.

The nature of that “irreversible transformation” is clarified in the subtitle of the book: When Humans Transcend Biology. We humans will no longer be primarily biological, aided by technology. After that singularity, we’ll be primarily technological, with, perhaps, some biological aspects.

Superintelligent AIs

A third usage of “the Singularity” foresees a different kind of transformation. Rather than humans being the most intelligent creatures on the planet, we’ll fall into second place behind superintelligent AIs. Just as the fate of species such as gorillas and dolphins currently depends on actions by humans, the fate of humans, after the Singularity, will depend on actions by AIs.

Such a takeover was foreseen as long ago as 1951 by pioneering computer scientist Alan Turing:

My contention is that machines can be constructed which will simulate the behaviour of the human mind very closely…

It seems probable that once the machine thinking method had started, it would not take long to outstrip our feeble powers. There would be no question of the machines dying, and they would be able to converse with each other to sharpen their wits. At some stage therefore we should have to expect the machines to take control.

Finally, consider what was on the mind of Vernor Vinge, a professor of computer science and mathematics, and also the author of a series of well-regarded science fiction novels, when he introduced the term “Singularity” in an essay in Omni in 1983. Vinge was worried about the unforeseeability of future events:

There is a stone wall set across any clear view of our future, and it’s not very far down the road. Something drastic happens to a species when it reaches our stage of evolutionary development — at least, that’s one explanation for why the universe seems so empty of other intelligence. Physical catastrophe (nuclear war, biological pestilence, Malthusian doom) could account for this emptiness, but nothing makes the future of any species so unknowable as technical progress itself…

We are at the point of accelerating the evolution of intelligence itself. The exact means of accomplishing this phenomenon cannot yet be predicted — and is not important. Whether our work is cast in silicon or DNA will have little effect on the ultimate results. The evolution of human intelligence took millions of years. We will devise an equivalent advance in a fraction of that time. We will soon create intelligences greater than our own.

A Singularity that “passes far beyond our understanding”

This is when Vinge introduces his version of the concept of singularity:

When this happens, human history will have reached a kind of singularity, an intellectual transition as impenetrable as the knotted space-time at the centre of a black hole, and the world will pass far beyond our understanding. This singularity, I believe, already haunts a number of science fiction writers. It makes realistic extrapolation to an interstellar future impossible.

If creatures (whether organic or inorganic) attain levels of general intelligence far in excess of present-day humans, what kinds of goals and purposes will occupy these vast brains? It’s unlikely that their motivations will be just the same as our own present goals and purposes. Instead, the immense scale of these new minds will likely prove alien to our comprehension. They might appear as unfathomable to us as human preoccupations appear to the dogs and cats and other animals that observe us from time to time.

Credit: David Wood

AI, AGI, and ASI

Before going further, let’s quickly contrast today’s AI with the envisioned future superintelligence.

Existing AI systems typically have powerful capabilities in narrow contexts, such as route-planning, processing mortgage and loan applications, predicting properties of molecules, playing various games of skill, buying and selling shares, recognizing images, and translating speech.

But in all these cases, the AIs involved have incomplete knowledge of the full complexity of how humans interact in the real world. The AI can fail when the real world introduces factors or situations that were not part of the data set of examples with which the AI was trained.

In contrast, humans in the same circumstance would be able to rely on capacities such as “common sense”, “general knowledge,” and intuition or “gut feel”, to reach a better decision.

An AI with general intelligence

However, a future AGI — an AI with general intelligence — would have as much common sense, intuition, and general knowledge as any human. An AGI would be at least as good as humans at reacting to unexpected developments. That AGI would be able to pursue pre-specified goals as competently as (but much more efficiently than) a human, even in the kind of complex environments which would cause today’s AIs to stumble.

Whatever goal is input to an AGI, it is likely to reason to itself that it will be more likely to achieve that goal if it has more resources at its disposal and if its own thinking capabilities are further improved. What happens next may well be as described by IJ Good, a long-time colleague of Alan Turing:

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an “intelligence explosion,” and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.

Evolving into artificial superintelligence

In other words, not long after humans manage to create an AGI, the AGI is likely to evolve itself into an ASI – an artificial superintelligence that far exceeds human powers.

In case the idea of an AI redesigning itself without any human involvement seems far-fetched, consider a slightly different possibility: Humans will still be part of that design process, at least in the initial phases. That’s already the case today, when humans use one generation of AI tools to help design a new generation of improved AI tools before going on to repeat the process.

I.J. Good foresaw that too. This is from a lecture he gave at IBM in New York in 1959:

Once a machine is designed that is good enough… it can be put to work designing an even better machine…

There will only be a very short transition period between having no very good machine and having a great many exceedingly good ones.

At this point an “explosion” will clearly occur; all the problems of science and technology will be handed over to machines and it will no longer be necessary for people to work. Whether this will lead to a Utopia or to the extermination of the human race will depend on how the problem is handled by the machine.

Singularity timescales: exponential computational growth

One additional twist to the concept of Singularity needs to be emphasized. It’s not just that, as Vernor Vinge stressed, the consequences of passing the point of Singularity are deeply unpredictable. It’s that the timing of reaching the point of Singularity is inherently unpredictable too. That brings us to what can be called the second confusion with “The Singularity.”

It’s sometimes suggested, contrary to what I just said, that a reasonable estimate of the date of the Singularity can be obtained by extrapolating the growth of the hardware power of computing systems. The idea is to start with an estimate for the computing power of the human brain. That estimate involves the number of neurons in the brain. 

Extrapolate that trend forward, and it can be argued that such a computer would match, by around 2045, the capability not just of a single human brain, but the capabilities of all human brains added together.

Next, consider the number of transistors that are included in the central processing unit of a computer that can be purchased for, say, $1,000. In broad terms, that number has been rising exponentially since the 1960s. This phenomenon is part of what is called “Moore’s Law.” 

This argument is useful to raise public awareness of the possibility of the Singularity. But there are four flaws with using this line of thinking for any detailed forecasting:

  1. Individual transistors are still becoming smaller, but the rate of shrinkage has slowed down in recent years.
  2. The power of a computing system depends, critically, not just on its hardware, but on its software. Breakthroughs in software defy any simple exponential curve.
  3. Sometimes a single breakthrough in technology will unleash much wider progress than was expected. Consider the breakthroughs of deep learning neural networks, c. 2012.
  4. Ongoing technological progress depends on society as a whole supplying a sufficiently stable and supportive environment. That’s something else which can vary unpredictably.

A statistical estimate

Instead of pointing to any individual date and giving a firm prediction that the Singularity will definitely have arrived by then, it’s far preferable to give a statistical estimate of the likelihood of the Singularity arriving by that date. However, given the uncertainties involved, even these estimates are fraught with difficulty.

The biggest uncertainty is in estimating how close we are to understanding the way common sense and general knowledge arises in the human brain. Some observers suggest that we might need a dozen conceptual breakthroughs before we have a comprehension sufficient to duplicate that model in silicon and software. But it’s also possible that a single conceptual leap will solve all these purportedly different problems.

Yet another possibility should give us pause. An AI might reach (and then exceed) AGI level even without humans understanding how it operates — or of how general intelligence operates inside the human brain. Multiple recombinations of existing software and hardware modules might result in the unforeseen emergence of an overall network intelligence that far exceeds the capabilities of the individual constituent modules.

Schooling the Singularity

Even though we cannot be sure what direction an ASI will take, nor of the timescales in which the Singularity will burst upon us, can we at least provide a framework to constrain the likely behavior of such an ASI?

The best that can probably be said in response to this question is: “it’s going to be hard!”

As a human analogy, many parents have been surprised — even dumbfounded — by choices made by their children, as these children gain access to new ideas and opportunities.

Introducing the ASI

Humanity’s collective child — ASI — might surprise and dumbfound us in the same way. Nevertheless, if we get the schooling right, we can help bias that development process  — the “intelligence explosion” described by I.J. Good — in ways that are more likely to align with profound human wellbeing.

That schooling aims to hardwire deep into the ASI, as a kind of “prime directive,” principles of beneficence toward humans. If the ASI would be at the point of reaching a particular decision — for example, to shrink the human population on account of humanity’s deleterious effects on the environment –- any such misanthropic decision would be overridden by the prime directive.

The difficulty here is that if you line up lots of different philosophers, poets, theologians, politicians, and engineers, and ask them what it means to behave with beneficence toward humans, you’ll hear lots of divergent answers. Programming a sense of beneficence is as least as hard as programming a sense of beauty or truth.

But just because it’s hard, that’s no reason to abandon the task. Indeed, clarifying the meaning of beneficence could be the most important project of our present time.

Tripwires and canary signals

Here’s another analogy: accumulating many modules of AI intelligence together, in a network relationship, is similar to accumulating nuclear fissile material together. Before the material reaches a critical mass, it still needs to be treated with respect, on account of the radiation it emits. But once a critical mass point is reached, a cascading reaction results — a nuclear meltdown or, even worse, a nuclear holocaust.

The point here is not to risk any accidental encroachment upon the critical mass, which would convert the nuclear material from hazardous to catastrophic. Accordingly, anyone working with such material needs to be thoroughly trained in the principles of nuclear safety.

With an accumulation of AI modules, things are by no means so clear. Whether that accumulation could kick-start an explosive phase transition depends on lots of issues that we currently only understand dimly.

However, something we can, and should, insist upon, is that everyone involved in the creation of enhanced AI systems pays attention to potential “tripwires.” Any change in configuration or any new addition to the network should be evaluated, ahead of time, for possible explosive consequences. Moreover, the system should in any case be monitored continuously for any canary signals that such a phase transition is becoming imminent.

Again, this is a hard task, since there are many different opinions as to which kind of canary signals are meaningful, and which are distractions.

Credit: Tesfu Assefa

Concluding thoughts

The concept of the Singularity poses problems, in part because of some unfortunate confusion that surrounds this idea, but also because the true problems of the Singularity have no easy answers:

  1. What are good canary signals that AI systems could be about to reach AGI level?
  2. How could a “prime directive” be programmed sufficiently deeply into AI systems that it will be maintained, even as that system reaches AGI level and then ASI, rewriting its own coding in the process?
  3. What should that prime directive include, going beyond vague, unprogrammable platitudes such as “act with benevolence toward humans”?
  4. How can safety checks and vigilant monitoring be introduced to AI systems without unnecessarily slowing down the progress of these systems to producing solutions of undoubted value to humans (such as solutions to diseases and climate change)?
  5. Could limits be put into an AGI system that would prevent it self-improving to ASI levels of intelligence far beyond those of humans?
  6. To what extent can humans take advantage of new technology to upgrade our own intelligence so that it keeps up with the intelligence of any pure-silicon ASI, and therefore avoids the situation of humans being left far behind ASIs?
Credit: David Wood, Pixabay

However, the first part of solving a set of problems is a clear definition of these problems. With that done, there are opportunities for collaboration among many different people — and many different teams — to identify and implement solutions.

What’s more, today’s AI systems can be deployed to help human researchers find solutions to these issues. Not for the first time, therefore, one generation of a technological tool will play a critical role in the safe development of the next generation of technology.

Let us know your thoughts! Sign up for a Mindplex account now, join our Telegram, or follow us on Twitter