Deep fakes: What’s next? Anticipating new twists and turns in humanity’s oldest struggle

Fake news that the Pope endorsed Donald Trump (a story that was shared more widely than any legitimate news story that year). A fake picture of former US VP Michael Pence in his youth seemingly as a gay porn star. Fake audio of UK political leader Keir Starmer apparently viciously berating a young volunteer assistant. Another fake audio of London mayor Sadiq Khan apparently giving priority to a pro-Palestinian march over the annual Remembrance Day walk-past by military veterans. Fake videos of apparent war atrocities. Fake pornographic videos of megastar pop celebrities.

What’s next? And how much does it really matter?

Some observers declare that there’s nothing new under the sun, and that there’s no special need to anticipate worse to come. Society, they say, already knows how to deal with fake news. Fake news may be unpleasant – and it’s sometimes hilarious – but we just have to keep calm and carry on.

I strongly disagree, as I’ll explain below. I’ll review ten reasons why fake news is likely to become worse in the months ahead. Then I’ll suggest ten steps that can be taken to regain our collective sanity.

It remains to be determined whether these ten steps will be sufficient, or whether we’ll all sink into a post-truth swamp, in which sneering suspicion displaces diligent understanding, fake science displaces trustworthy science, fake journalism displaces trustworthy journalism, and fake politicians seize power and impose their dictatorial whims.

Credit: David Wood via Midjourney

Deception: the back story

It’s not flattering to say it, but we humans have been liars since before the dawn of history. And, just as important, we have been self-deceivers as well: we deceive ourselves in order to be more successful in deceiving others.

In case that idea offends you, I invite you to delve into the evidence and analysis offered in, for example:

Credit: Book publishers’ websites (links above)

We implore our children to be truthful but also guide them to know when to tell white lies – “thank you for this lovely present, it’s just what I wanted!” And the same ancient books of the Bible that command us “do not bear false witness” appear to celebrate deceit when practiced by figures such as Jacob, Rachel, Rebekah, and Tamar.

I could tell you, as well, that the ancient Greek dramatist Aeschylus, known as ‘the father of tragedy’, made this pithy observation two and half millennia ago: “Truth is the first casualty in war”. One tragedy – war – births another – deception.

As it happens, it seems likely that this quotation is a misattribution. I’ll come back to that point later, when talking, not about deception, but about solutions to deception. But regardless of whoever first uttered that saying, we can appreciate the insight it contains. In times of bitter conflict, there are special incentives to mislead observers – about the casualties we have suffered, about the casualties we have inflicted on opposing forces, about our military plans for the future, and much more.

It’s not just war that provides an incentive to deceive. It’s the same with politics: opposing parties compete to set the narrative, and individual politicians seek to climb past each other on what Benjamin Disraeli dubbed “the greasy pole” of political intrigue. It’s the same with commerce, with companies ready to spread misleading ‘FUD’ (fear, uncertainty, and doubt) regarding the comparative strengths of various forthcoming products and services. And it’s the same in private life, as we seek to portray ourselves in a favorable light in the eyes of family and friends, hiding our physical and psychological warts.

In this sense, deception is old news. We’ve had ‘fake news’ for as long as there has been ‘news’.

It’s tempting, therefore, to yawn when people draw attention to more recent examples of fake news and deception.

But that would be a giant mistake.

It’s technology that’s making the difference. Technology ramps up the possibilities for fake news to be even more deceptive, more credible, more ubiquitous, more personal, and more effective. Led by leaps in capabilities of AI systems, technology is enabling dramatic new twists in the struggle between truth and lies. It’s becoming even harder to distinguish between trustworthy and untrustworthy information.

The joy of misinformation. What harm could it cause? (Credit: David Wood via Midjourney)

If we fail to anticipate these developments, we’re likely to succumb to new waves of deception. The consequences may be catastrophic.

But forewarned is forearmed. By drawing on insights from humanity’s better experiences, we should be able to create technologies, processes, and institutions that help us to block these oncoming waves.

Ten twists

1. Fake news at scale

If at first you fail, why not try again?

You tried to deceive your target audience, but they were not swayed. This time, they saw through your lies. Or perhaps they didn’t even pay attention.

But if trying is cheap and quick, you can try again, this time with a different false narrative, expressed in a different voice.

What’s changed is that it’s much cheaper to try again. You can take advantage of automation, always-on networks, social media, and generative AI, to create and distribute new pieces of fake news. It’s mass-production for lies.

You’re not constrained by only creating one bot on social media. You can create armies of them.

You’re not constrained by having to write text yourself, or create suitably misleading images. You can obtain good results from a few clicks of a mouse.

The result is that discussion is being flooded with deliberately false narratives.

2. Fake news that earns money

Some false narratives are designed to try to change people’s minds. They want to change voting decisions, purchasing decisions, relationship decisions, and so on.

But other false narratives have a different purpose: to earn money via advertising clicks or affiliate marketing revenue share.

Viewers are attracted to websites by content that is outrageous, inflammatory, intriguing, or funny. They spend more time on these sites to explore the other content there, enjoying being outraged, inflamed, intrigued, or simply humored. And while on these sites, they may click on other links that generate revenue for the owners of the site.

In this case, the content creators have no special interest in whether the content matches their own political or philosophical outlooks. They produce whatever earns them the most clicks. Indeed, some clickbait merchants set up websites posting contradictory stories, to catch traffic from both sides of the political spectrum.

As a sad side-effect, people’s minds become increasingly confused. Being misled by fake content, they become less able to distinguish fantasy from reality.

3. Fake news with a personal appeal

It’s not just that fake news is being created on a greater scale than ever before. It’s being created with a greater variety than ever before.

Technology makes it easier to create different variants of the same false narrative. Some variants can be sent to people who are supporters of Candidate A within Party P. A different variant can be sent to people who support Candidate B within Party P. Yet other different variants target people whose favored candidates are from Party Q, Party R, and so on.

More than that: once software has learned which kind of pretty face each person is likely to look at – or which kinds of music make each person want to listen – these variants can easily be generated too, and directed to each target.

4. Fake news based on footprints

You might wonder: how does software know that I am likely to be distracted by particular kinds of pretty faces, or particular kinds of music?

That’s where extensive data gathering and analysis come to the fore. We are each constantly generating online footprints.

For example, Facebook notices that when it places a chess puzzle in my timeline, I tend to click on that conversation, to consider the position in more detail. Facebook observes my interest in these puzzles. Soon, more chess puzzles are being shown to me.

That particular inference is relatively straightforward. Other inferences depend on a wider review of my online activity – which posts I ‘like’, which posts I ‘hide’, and so on.

Astute robots can learn more from our footprints than we expected (Credit: David Wood via Midjourney)

The algorithms make all kinds of deductions from such reviews. They’re not always correct, not even close. But the AI systems that create personalized fake news have greater numbers of positive hits than those that don’t.

5. Fake news that builds on top of truth

The best lies mix truth with untruth. These lies are especially effective if the truth in question is one that much of society likes to suppress.

Consider a simple example. A leaked document here, a whistleblower there – a few hints suggest something fishy is going on: there is bureaucratic corruption and nepotism within a political state. Then the news-faker adds the unjustified conclusion: the government in question is irretrievably corrupt. Hence the conclusion: kick all these politicians out of power!

Again: a narrative might give a number of examples of people experiencing remission from long-standing diseases, despite forecasts from white-coated doctors that the disease was fatal. Then it adds the lie: what matters most in healthcare is your personal attitude, rather than expensive drugs that Big Pharma are trying to sell. Therefore: stop listening to your doctor, and instead purchase my course in positive thinking for $29.99 a month!

Again: members of some minorities suffered appalling abuses in trials of various medical procedures, where there was no informed consent, and where there was an apparent casual disregard for the suffering entailed. And then the lie: present-day society is incorrigibly racist and irredeemably exploitative. Therefore: it’s time to wield pitchforks!

The cleverest fake news combines this principle with the previous one. It works out our belief-systems from our online footprints – it figures out what we already suspect to be true, or hope to be true, even though the rest of society tends to think differently. Then it whips up a fake narrative from beliefs we support plus the new message it’s trying to inject into our minds.

In this way, it flatters us, in order to better mislead us.

No wonder that we often fall for that kind of deception.

6. Fake news that weaponizes friendships

Each of us is more likely to pay attention to a message if it comes from a person that we think we like – someone we perceive as one of our special friends.

If our friend is concerned about a topic, it makes us more likely to be concerned about it too – even if, previously, we might not have given that topic a second thought.

This is where the sinister power of the systems that manufacture fake news reaches higher levels. These systems invest time to create fake personas – people who we welcome as our ‘friends’ on social media.

At first, these friends say nothing out of the ordinary. We forget whether or not we met them in real life. Their names become increasingly familiar to us. We imagine we know lots about them – even though their entire backstory is fictitious.

And that’s when the poisonous messages start seeping into your conversations and then into your thoughts. And without you realizing what has happened, a fake friend has led you into a fake idea.

7. Fake news with amplification support

If we hear the same opinion from multiple sources, we may at first resist the idea, but then start to accept it.

That’s especially true if the opinion receives apparent support from apparent credentialed experts.

Thus when some fake audio is posted to social media, other fake posts soon accompany it. “I’m an expert in audio authentication”, a bot declares. “I’ve studied the clip carefully, and I assure you it’s genuine”.

If we don’t look closely, we’ll fail to spot that the credentials are bogus, and that there’s no real-world audio expert behind these claims.

The greater the number (and the greater the variety) of the apparent endorsements, the easier it becomes for some of these fake endorsements to bypass our critical faculties and to change our minds.

8. Fake news that exploits our pride

We all like to tell ourselves: we’re not the kind of person who falls for a simple conjuring trick.

Other people – those not so smart as us, we think – might be misled by dubious claims in advertisements or social media memes. Not us!

This has been called the bias blind spot – the cognitive bias that says “other people have cognitive biases, but not me!”

But recall that our ability to deceive ourselves is key to our ability to deceive others. If we are conscious of our lies, astute listeners will notice it. That’s why our subconscious needs to mislead our conscious mind before we in turn can mislead other people.

In the same way, it is an inflated self-confidence that we are good reasoners and good observers that can set us up for the biggest failures.

Couple a misplaced pride in our own critical faculties with the warm feelings that we have developed for friends (either fake online personas, as covered above, or real-world friends who have already fallen into social media rabbit holes), and we are set up to be suckered.

9. Fake news that exploits alienation

Pride isn’t the only emotion that can tempt us into the pit of fake news. Sometimes it can be a sense of grievance or of alienation that we cling to.

Unfortunately, although some aspects of the modern world feature greater human flourishing than ever before, other aspects increase the chances of people nurturing grievances:

  • The inability of large segments of the population to afford good healthcare, good education, or good accommodation
  • The constant barrage of bad news stories from media, 24 hours a day
  • A matching barrage of stories that seem to show the “elites” of society as being out-of-touch, decadent, uncaring, and frivolous, wallowing in undeserved luxury.

As a result, fake news narratives can more easily reach fertile soil – unhappy minds skip any careful assessment of the validity of the claims made.

When you’re fed up with the world, it’s easier to lead you astray (Credit: David Wood via Midjourney)

10. Fake news with a lower barrier to entry

Perhaps you’re still thinking: none of the above is truly novel.

In a way, you would be correct. In past times, clever operators with sufficient resources could devise falsehoods that misled lots of people. Traditional media – including radio and newspapers – were spreading destructive propaganda long before the birth of the Internet.

But the biggest difference, nowadays, is how easy it is for people to access the tools that can help them achieve all the effects listed above.

The barrier to entry for purveyors of far-reaching fake news is lower than ever before. This is an age of ‘malware as a service’, dark net tutorials on guerrilla information warfare, and turnkey tools and databases.

It’s an age where powerful AI systems can increasingly be deployed in service of all the above methods.

Happily, as I’ll discuss shortly, these same AI systems can provide part of the solution to the problem of ubiquitous fake news. But only part of the solution.

Interlude: a world without trust

First, a quick reminder of the bad consequences of fake news.

It’s not just that people are deceived into thinking that dangerous politicians are actually good people, and, contrariwise, that decent men and women are actually deplorable – so that electors are fooled into voting the dangerous ones into power.

It’s not just that people are deceived into hating an entire section of society, seeing everyone in that grouping as somehow subhuman.

It’s not just that people are deceived into investing their life savings into bogus schemes in which they lose everything.

It’s not just that people are deceived into rejecting the sound advice of meticulous medical researchers, and instead adopt unsafe hyped-up treatments that have fearful health consequences.

All of these examples of unsound adoption of dangerous false beliefs are, indeed, serious.

But there’s another problem. When people see that much of the public discourse is filled with untrustworthy fake news, they are prone to jump to the conclusion that all news is equally untrustworthy.

As noted by Judith Donath, fellow at Harvard University’s Berkman Klein Center for Internet & Society and founder of the Sociable Media Group at the MIT Media Lab,

A pernicious harm of fake news is the doubt it sows about the reliability of all news.

Thus the frequent lies and distortions of fringe news sites like InfoWars, Natural News, and Breitbart News lead many people to conclude that all media frequently publish lies. Therefore nothing should be trusted. And the phrase “mainstream media” becomes a sneer.

(They find some justification for this conclusion in the observation that all media make some mistakes from time to time. The problem, of course, is in extrapolating from individual instances of mistakes to applying hostile doubt to all news.)

Baroness Onora O’Neill of the Faculty of Philosophy at the University of Cambridge commenced her series of Reith Lectures in 2002 by quoting Confucius:

Confucius told his disciple Tsze-kung that three things are needed for government: weapons, food, and trust. If a ruler can’t hold on to all three, he should give up the weapons first and the food next. Trust should be guarded to the end: ‘without trust we cannot stand’.

Sadly, if there is no trust, we’re likely to end up being governed by the sort of regimes that are the furthest from deserving trust.

It’s as the German historian and philosopher Hannah Arendt warned us in her 1951 book The Origins of Totalitarianism:

The ideal subject of totalitarian rule is not the convinced Nazi or the convinced Communist, but people for whom the distinction between fact and fiction, in other words, the reality of experience, and the distinction between true and false… people for whom those distinctions no longer exist.

However, the technologies of the 2020s put fearsome possibilities into our grasp that writers in 1951 (like Arendt) and in 2002 (like O’Neill) could hardly have imagined.

Big Brother will be watching, from every angle (Credit: David Wood via Midjourney)

In previous generations, people could keep their inner thoughts to themselves, whilst outwardly kowtowing to the totalitarian regimes in which they found themselves. But with ten-fold twisted fake news, even our inner minds will be hounded and subverted. Any internal refuge of independent thinking is likely to be squelched. Unless, that is, we are wise enough to take action now to prevent that downward spiral.

Regaining trust

What can be done to re-establish trust in society?

Having anticipated, above, ten ways in which the problem of fake news is becoming worse, I now offer an equal number of possible steps forward.

1. Education, education, education

Part of growing up is to learn not to trust so-called 419 scam emails. (The number 419 refers to the section of the Nigerian Criminal Code that deals with fraud.) If someone emails us to say they are a prince of a remote country and they wish to pass their inheritance to us – provided we forward them some hard cash first – this is almost certainly too good to be true.

We also learn that seeing is not believing: our eyes can deceive us, due to optical illusions. If we see water ahead of us on a desert road, that doesn’t mean the water is there.

Similarly, we all need to learn the ways in which fake news stories can mislead us – and about the risks involved in thoughtlessly spreading such news further.

These mechanisms and risks should be covered in educational materials for people of all ages.

It’s like becoming vaccinated and developing resistance to biological pathogens. If we see at first hand the problems caused by over-credulous acceptance of false narratives, it can make us more careful on the next occasion. 

But this educational initiative needs to do more than alert people to the ways in which fake news operates. It also needs to counter the insidious view that all news is equally untrustworthy – the insidious view that there’s no such thing as an expert opinion.

This means more than teaching people the facts of science. It means teaching people the methods used by science to test hypotheses, the reasons why science assesses various specific hypotheses as being plausible. Finally, it means teaching people, “here are the reasons to assign a higher level of trust to specific media organizations”.

That takes us to the second potential step forward.

2. Upholding trustworthy sources

Earlier, I mentioned that a quote often attributed to the fifth century BC writer Aeschylus was almost certainly not actually said by him.

What gives me confidence in that conclusion?

It’s because of the reliance I place in one online organization, namely Quote Investigator. In turn, that reliance arises from:

  • The careful way in which pages on that site reference the sources they use
  • The regular updates the site makes to its pages, as readers find additional relevant information
  • The fact that, for all the years I’ve been using that site, I can’t remember ever being misled by it
  • The lack of any profit motivation for the site
  • Its focus on a particular area of research, rather than spreading its attention to wider topics
  • Positive commendations for the site from other researchers that have gained and maintained a good reputation.

Other organizations have similar aspirations. Rather than “quote checking”, some of them specialize in “fact checking”. Examples include:

Credit: Fact-checking websites (links above)

These sites have their critics, who make various allegations of partisan bias, overreliance on supposed experts with questionable credentials, subjective evaluations, and unclear sources of funding.

My own judgment is that these criticisms are mainly misplaced, but that constant vigilance is needed.

I’ll go further: these sites are among the most important projects taking place on the planet. To the extent that they fall short, we should all be trying to help out, rather than denigrating them.

3. Real-time fact-checking

Fact checking websites are often impressively quick in updating their pages to address new narratives. However, this still leaves a number of problems:

  • People may be swayed by a false narrative before that narrative is added to a fact-checking site
  • Even though a piece of fake news is soundly debunked on a fact-checking site, someone may not be aware of that debunking
  • Even if someone subsequently reads an article on a fact-checking site that points out the flaws of a particular false narrative, that narrative may already have caused a rewiring of the person’s belief systems at a subconscious level – and that rewiring may persist even though the person learns about the flaws in the story that triggered these subconscious changes
  • The personalization problem: false narratives tailored to individual targets won’t be picked up by centralized fact-checking sites.

AI could hold part of the answer. Imagine if our digital media systems included real-time fact-checking analyses. That’s part of the potential of AI systems. These real-time notifications would catch the false information before it has a chance to penetrate deeply into our brain.

Our email applications already do a version of this: flagging suspicious content. The application warns us: this email claims to come from your bank, but it probably doesn’t, so take care with it. Or: the attachment to this email purports to be a PDF, but it’s actually an executable file that will likely cause damage.

Likewise, automated real-time fact-checking could display messages on the screen, on top of the content that is being communicated to us, saying things like:

  • “The claim has been refuted”
  • “Note that the graph presented is misleading”
  • “This video has been doctored from its original version”
  • “This audio has no reliable evidence as to its authenticity”
  • “There is no indication of a cause-and-effect relationship between the facts mentioned”

In each case, ideally the warning message will contain a link to where more information can be found.

4. Decentralized fact-checking

The next question that arises is: how can people be confident in relying on specific real-time fact-checkers?

We can already imagine their complaints:

  • “This fact-checker is wokism gone mad”
  • “This fact-checker serves Google, not me”
  • “This fact-checker serves the government, not me”
  • “I prefer to turn off the fact-checker, to receive my news free from censorship”

There’s no one easy answer to these objections. Each step I describe in this list of ten is designed to reduce some of the apprehension.

But an important step forward would be to separate the provision of content from the fact-checking layer. The fact-checking layer, rather than being owned and operated by the commercial entity that delivers the media, would ideally transcend individual corporations. For example, it could operate akin to Wikipedia, although it would likely need more funding than Wikipedia currently receives.

Further developing this model, the fact-checking software could have various settings that users adjust, reflecting their own judgment about which independent sources should be used for cross-checking.

Maybe the task is too dangerous to leave to just one organization: then another model would involve the existence of more than one option in the fact-checking field, with users being able to select one – or a bunch – to run on their devices.

5. Penalties for dangerous fakes

As well as trying to improve the identification of fake news, it’s important to change the incentives under which fake news is created and distributed. There are roles for ‘sticks’ (penalties) as well as ‘carrots’ (rewards).

Regarding penalties, society already imposes penalties:

  • When advertisements make misleading or unfounded claims
  • When companies make misleading or unfounded claims in their financial statements
  • When people make libelous claims about each other.

Fines or other punishments could be used in cases where people knowingly distribute misleading narratives, when the consequences involve clear harm (for example, a riot).

This proposal makes some people nervous, as they see it as an intrusion on freedom of expression, or a block on satire. They fear that governments would use these punishments to clamp down on statements that are embarrassing to them.

That’s why monitoring and prosecuting such cases needs to be done independently – by a police force and judiciary that operates at arms’ length from the government of the day.

This principle of separation of powers already applies to many other legal regulations, and could surely work for policing fake news.

Related, there’s a case for wider collection and publication of statistics of reliability. Just as hospitals, schools, and many other parts of society have statistics published about their performance, media organizations should receive the same scorecard.

In this way, it would be easy to know which media channels have a casual relationship with the truth, and which behave more cautiously. In this way, investment funds or other sources of financing could deny support to organizations whose trustworthiness ratings drop too low. This kind of market punishment would operate alongside the legal punishment that applies to more egregious cases.

6. A coalition for integrity

Some of the creators of fake news won’t be deterred by threats of legal punishment. They already operate beyond the reaches of the law, in overseas jurisdictions, or anonymously and secretly.

Nevertheless, there are still points of crossover, where new content is added into media channels. It is at these points where sanctions can be applied. Media organizations that are lax in monitoring the material they receive would then become liable for damage arising.

This will be hard to apply for communications systems such as Telegram, WhatsApp, and Signal, where content is encrypted from one end of a communication to the other. In such cases, the communications company doesn’t know what is being transmitted.

Indeed, it is via such closed communications systems that fake news often spreads these days, with Telegram a particularly bad offender.

There’s a case to be made for a coalition of every organization that values truthfulness and trustworthiness over the local benefits of spreading false information.

Forming a Coalition for Integrity (Credit: David Wood via Midjourney)

People who support this ‘coalition for integrity’ would share information about:

  • Entry points used by fake news providers to try to evade detection
  • Identification of fake news providers
  • Ways in which providers of fake news are changing their methods – and how these new methods can be combated.

Regardless of differences in political or philosophical outlook among members of this coalition, they have a common interest in defending truthfulness versus deception. They should not allow their differences to hinder effective collaboration in support of that common purpose.

7. Making trust everyone’s business

In recent decades, a variety of new job titles have been created at the highest levels within companies and organizations, such as:

  • Chief Design Officer
  • Chief Innovation Officer
  • Chief Quality Officer
  • Chief Security Officer

None of these posts free other members of the company from their responsibility for design, innovation, quality or security. These values are universal to everyone in the organization as they go about their duties. Nevertheless, the new ‘chief’ provides a high-level focus on the topic.

It should be the same with a new set of ‘Chief Trust Officers’. These executives would find ways to keep reminding personnel about:

  • The perils arising if the organization gains a reputation for being untrustworthy
  • Methods and procedures to follow to build and maintain a trustworthy reputation for the organization
  • Types of error that could result in dangerous false narratives being unwittingly transmitted

My assessment is that the organizations who appoint and support Chief Trust Officers (or equivalent) are the ones most likely to succeed in the turbulent times ahead.

8. Encouraging openness

To be clear, education often fails: people resist believing that they can be taken in by false information.

We like to think of ourselves as rational people, but a more accurate description is that we are a rationalizing species. We delight in finding ways to convince ourselves that it is fine to believe the things that we want to believe (even in the face of apparent evidence against these beliefs).

That’s why bombarding people with education often backfires. Rather than listening to these points, people can erect a strong shield of skepticism, as they prepare to lash out at would-be educators.

Indeed, we all know people who are remarkably clever, but they deploy their cleverness in support of profoundly unwise endeavors.

This state of affairs cannot be solved merely by pumping in more facts and arguments. Instead, different approaches are required, to encourage a greater openness of spirit.

One approach relies on the principle mentioned earlier, in which people pay more attention to suggestions from their close friends. Therefore, the best way to warn people they are about to fall for dangerous information is for them to be warned by people they already trust and respect.

Another approach is to find ways to put people in a better mood all round. When they have a compassionate, optimistic mindset, they’re more likely to listen carefully to warnings being raised – and less likely to swat away these warnings as an unwelcome annoyance.

It’s not enough to try to raise rational intelligence – rather, we must raise compassionate intelligence: an intelligence that seeks wisdom and value in interactions even with people previously regarded as a threat or enemy.

This is a different kind of education. Not an education in rationality, but rather an education in openness and compassion. It may involve music, meditation, spending time in nature, biofeedback, and selected mind-transforming substances. Of course, these have potential drawbacks as well as potential benefits, but since the upsides are so high, options need to be urgently explored.

9. A shared positive vision

Another factor that can predispose people to openness and collaboration, over closed-mindedness and stubborn tribal loyalties, is a credible path forward to a world with profound shared benefits.

When people anticipate an ongoing struggle, with zero-sum outcomes and continual scarcity of vital resources, it makes them mentally hostile and rigid.

Indeed, if they foresee such an ongoing conflict, they’ll be inclined to highlight any available information – true or fake – that shows their presumed enemies in a bad light. What matters to them in that moment is anything that might annoy, demoralize, or inflame these presumed enemies. They seize on fake news that does this, and also brings together their side: the set of people who share their sense of alienation and frustration with their enemies.

That is why the education campaign that I anticipate needs a roadmap to what I call a sustainable superabundance, in which everyone benefits. If this vision permeates both hearts and minds, it can inspire people to set and respect a higher standard of trustworthiness. Peddlers of fake news will discover, at that point, that people have lost interest in their untruths.

10. Collaborative intelligence

I do not claim that the nine steps above are likely to be sufficient to head off the coming wave of dangerous fake news.

Instead, I see them as a starting point, to at least buy us some time before the ravages of cleverer deep fakes run wild.

That extra time allows us to build a stronger collaborative intelligence, which draws on the insights and ideas of people throughout the coalition for integrity. These insights and ideas need time to be evolved and molded into practical solutions.

However, I anticipate not just a collaboration between human minds, but also a rich collaboration involving AI minds too.

A collaboration of minds – humans and AIs (Credit: David Wood via Midjourney)

Critically, AI systems aren’t just for ill-intentioned people to use to make their deep fakes more treacherous. Nor are they just something that can power real-time fact-checking, important though that is. Instead, they are tools to help us expand our thinking in multiple dimensions. When we use them with care, these systems can learn about our concerns regarding worse cases of deep fakes. They can consider multiple possibilities. Then they can offer us new suggestions to consider – ways probably different from any I’ve listed above.

That would be a striking example of beneficial artificial intelligence. It would see deep fakes defeated by deep benevolence – and by a coalition that integrates the best values of humans with the best insights of AIs.

Let us know your thoughts! Sign up for a Mindplex account now, join our Telegram, or follow us on Twitter

Cautionary tales and a ray of hope

Four scenarios for the transition to AGI

Let’s look at four future fictions about humanity’s changing relationship with AI.

Each scenario is grounded in past events, and each considers how matters could develop further in the coming months and years.

May these scenarios prove to be self-blocking prophecies! (With one exception!)

Trigger warning: readers might be offended by some of the content that follows. Aspects of each of the four scenarios can be considered to be shocking and disrespectful. That’s on purpose. This subject requires all of us to transcend our comfort zones!

Credit: David Wood via Midjourney

1. Too little too late

Lurching from warning to warning

In retrospect, the first real warning was the WannaCry ransomware crisis of May 2017. That cryptoworm brought chaos to users of as many as 300,000 computers spread across 150 countries. The NHS (National Health Service) in the UK was particularly badly affected: numerous hospitals had to cancel critical appointments due to not being able to access medical data. Other victims around the world included Boeing, Deutsche Bahn, FedEx, Honda, Nissan, Petrobras, Russian Railways, Sun Yat-sen University in China, and the TSMC high-end semiconductor fabrication plant in Taiwan.

WannaCry was unleashed into the world by a team of cyberwarriors from the hermit kingdom of North Korea – math geniuses hand-picked by regime officials to join the formidable Lazarus group. Lazarus had assembled WannaCry out of a mixture of previous malware components, including the EternalBlue exploit that the NSA in the United States had created for their own attack and surveillance purposes. Unfortunately for the NSA, EternalBlue had been stolen from under their noses by an obscure underground collective (‘the Shadow Brokers’) who had in turn made it available to other dissidents and agitators worldwide.

Unfortunately for the North Koreans, they didn’t make much money out of WannaCry. The software they released operated in ways contrary to their expectations. It was beyond their understanding and, unsurprisingly therefore, beyond their control. Even geniuses can end up stumped by hypercomplex software interactions.

Unfortunately for the rest of the world, that first canary sign generated little meaningful response. Politicians – even the good ones – had lots of other things on their minds.

The second real warning was the flood of fake news manipulations of the elections in 2024. AI was used to make audios and videos that were enormously compelling.

By this time, the public already knew that AI could create misleading fakes. They knew they shouldn’t be taken in by social media posts that lacked convincing verification. Hey, they were smart. (Smarter than the numbskulls who were deceived by misleading AI-generated videos during the elections in 2023 in Nigeria and Slovakia!) Or so they thought.

What wasn’t anticipated was the masterful ways that these audios and videos bypassed the public’s critical faculties. Like the sleight of hand of a skilled magician, these fakes misdirected the attention of listeners and viewers. Again like a skilled magician, who performs what appears to be the same trick several times in a row, but actually using different mechanisms each time, these fakes kept morphing and recombining until members of the public were convinced that red was blue and autocrat was democrat.

In consequence, by 2025, most of the world was governed by a cadre of politicians with very little care or concern about the long-term wellbeing of humanity. Whereas honest politicians would have paid heed to the warning posed by these fiendishly clever fakes, the ones in power in 2025 were preoccupied by providing bread and circuses to their voters.

The third, and final, real warning came in 2027, with the failed Covid-27 attack by a previously unknown group of self-described advocates of ‘revolutionary independence from technology’. Taking inspiration from the terrorist group in the 2014 Hollywood film Transcendence, they called themselves ‘Neo-RIFT’, and sought to free the world from its dependence on unfeeling, inhuman algorithms.

With a worldview that combined elements from several apocalyptic traditions, Neo-RIFT eventually settled on an outrageous plan to engineer a more deadly version of the Covid-19 pathogen. Their documents laid out a plan to use their enemy’s own tools against it: neo-RIFT hackers jailbroke the Claude 4 pre-AGI, bypassing the ‘Constitution 4’ protection layer that its Big Tech owners had hoped would keep that AI tamperproof. Soon, Claude 4 had provided Neo-RIFT with an ingenious method of generating a biological virus that would, it seemed, only kill people who had used a smartwatch in the last four months.

That way, the hackers thought the only people to die would be people who deserved to die.

The launch of what became known as Covid-27 briefly jolted humanity out of its previous obsession with bread and circuses – with whizz-bang hedonic electronics. It took a while for scientists to figure out what was happening, but within three months, they had an antidote in place. By that time, nearly a billion people were dead at the hands of the new virus.

A stronger effort was made to prevent any such attack from happening again. Researchers dusted down the EU AI Act, second version (unimplemented), from 2025, and tried to put that on statute books. Even some of the world’s craziest dictators took time out of their normal ranting and raving, to ask AI safety experts for advice. But the advice from these experts was not to the liking of these national rulers. These leaders preferred to listen to their own yes-men and yes-women, who knew how to spout pseudoscience in ways that made the leaders feel good about themselves. That detour into pseudoscience fantasyland wasted six months.

Then some of the experts tried more politically savvy methods, gradually breaking down the hostile arrogance of a number of the autocrats, weaning them away from their charlatan advisers. But just when it appeared that progress might be made, Covid-28 broke out, launched by a remnant of Neo-RIFT that was even more determined than before.

Credit: David Wood via Midjourney

And this time, there was no antidote. Claude 5 was even smarter than Claude 4 – except that it could be jailbroken too. With its diabolically ingenious design, Covid-28 was the deadliest disease to ever inflict humanity. And that was that.

Oops, let’s try that again!

‘Too little too late’ is characterized by inattention to the warnings of canary signals; the next scenario, ‘Paved with good intentions’, is characterized by the wrong kind of attention.

This scenario starts with events in the UK in October and November 2023.

2. Paved with good intentions

Doomed by political correctness

The elites had booked their flights. They would be jetting into the country for behind-closed doors meetings at the famous Bletchley Park site in Buckinghamshire. Events in these buildings in the 1940s had, it was claimed, shortened World War Two by months. The discussions in 2023 might achieve something even more important: saving humanity from a catastrophe induced by forthcoming ‘frontier models’ of AI.

That was how the elites portrayed things. Big Tech was on the point of releasing new versions of AI that were beyond their understanding and, therefore, likely to spin out of control. And that’s what the elites were going to stop.

A vocal section of the public hated that idea. It wasn’t that they were on the side of out-of-control AI. Not at all. Their objections came from a totally different direction; they had numerous suggestions they wanted to raise about AIs, yet no-one was listening to them.

For them, talk of hypothetical future frontier AI models distracted from pressing real-world concerns. Consider how AIs were already being used to discriminate against various minorities: determining prison sentencing, assessing mortgage applications, and determining who should be invited for a job interview.

Consider also how AIs were taking jobs away from skilled artisans. Big-brained drivers of London black cabs were being driven out of work by small-brained drivers of Uber cars aided by satnav systems. Beloved Hollywood actors and playwrights were losing out to AIs that generated avatars and scripts.

And consider how AI-powered facial recognition was intruding on personal privacy, enabling political leaders around the world to identify and persecute people who acted in opposition to the state ideology.

People with these concerns thought that the elites were deliberately trying to move the conversation away from the topics that mattered most. For this reason, they organized what they called ‘the AI Fringe Summit’. In other words, ethical AI for the 99%, as opposed to whatever the elites might be discussing behind closed doors.

Over the course of just three days – 30th October to 1st November – at least 24 of these ‘fringe’ events took place around the UK.

Compassionate leaders of various parts of society nodded their heads. It’s true, they said: the conversation on beneficial AI needed to listen to a much wider spectrum of views.

By May 2024, the opposition to the Bletchley Park initiative had grown stronger. As the elites gathered again, this time in South Korea, a vast number of ‘super-fringe’ events around the world attracted participation from thinkers of every hue and stripe.

The news media responded. They knew (or pretended to know) the importance of balance and diversity. They shone attention on the plight AI was causing – to indigenous laborers in Peru, to flocks of fishermen off the coasts of India, to middle-aged divorcees in midwest America, to the homeless in San Francisco, to drag artists in New South Wales, to data processing clerks in Egypt, to single mothers in Nigeria, and to many more besides.

The media shone attention on forthcoming frontier AI models too – but again being very careful not to offend sensibilities or exclude minority points of view. A burgeoning ‘robots rights’ movement captured lots of airtime, as did a campaign to recognize GPT-5 as being ‘semi-sentient’. Wackiest of all were the new religions that offered prayers and obedience to a frontier AI model that was said to be the reincarnation of JFK Junior. The QAnon fantasist crowd lapped that up. It was glorious entertainment. Ratings soared.

Not everyone was flippant. Lots of high-minded commentators opined that it was time to respect and honor the voices of the dispossessed, the downtrodden, and the left-behinds. The BBC ran a special series: ‘1001 poems about AI and alienation’. The UN announced that, later that year, they would convene a grand international assembly with stunning scale: ‘AI: the people decide’.

By November 2024, something altogether more sinister was happening. It was time for the UN grand assembly. It was also time for the third meeting of elites in the series that had started in Bletchley Park and then held its second event in South Korea. This time, the gathering would be in Paris.

The sinister development was that, all this time, some of the supposedly unanimous ‘elites’ had been opposed to the general direction of the Bletchley Park series. They gravely intoned public remarks about the dangers of out-of-control frontier AI models. But these remarks had never been sincere. Instead, under the umbrella term AGI-acceleration, they wanted to press on with the creation of AGI as quickly as possible.

Some of the AGI-acceleration group disbelieved in the possibility of AGI disaster. That’s just a scare story, they insisted. Others said, yes, there could be a disaster, but the risks were worth it, on account of the unprecedented benefits that could arise. Let’s be bold, they urged. Yet others asserted that it wouldn’t actually matter if humans were rendered extinct by AGI, as this would be the glorious passing of the baton of evolution to a worthy successor to homo sapiens. Let’s be ready to sacrifice ourselves for the sake of cosmic destiny, they intoned.

Despite their internal differences, AGI-accelerators settled on a plan to sidestep the scrutiny of would-be AGI regulators and AGI safety advocates. They would take advantage of a powerful set of good intentions – the good intentions of the people campaigning for ‘ethical AI for the 99%’. They would mock any suggestions that the AGI safety advocates deserved a fair hearing. The message they amplified was, “There’s no need to privilege the concerns of the 1%!”

AGI-acceleration had learned from the tactics of the fossil fuel industry in the 1990s and 2000s: sow confusion and division among groups alarmed about the acceleration of climate change. The first message was: “that’s just science fiction”. The second message was: “if problems emerge, we humans can rise to the occasion and find solutions”. The third message – the most damaging one – was that the best reaction was one of individual consumer choice. Individuals should abstain from using AIs if they were worried about it. Just as climate campaigners had been pilloried for flying internationally to conferences about global warming, AGI safety advocates were pilloried for continuing to use AIs in their daily lives.

And when there was any suggestion for joined-up political action against AGI risks, woah, let’s not go there! We don’t want a world government breathing down our necks, do we?

After the UN grand assembly had been subverted in that way, many of the AGI safety advocates lost heart. It would only be a few months later that they lost their lives.

It was the JFK Junior frontier AI model that did the damage. It echoed words that, decades earlier, had convinced 39 followers of the Heaven’s Gate new religious movement to commit group suicide, as comet Hale-Bopp approached the earth. That suicide, Heaven’s Gate members believed, would enable them to ‘graduate’ to a higher plane of existence. In a similar way, the remnants of the QAnon cult who had regrouped around the JFK Junior model came to believe that the precipitation of an exchange of nuclear weapons in the Middle East would herald the reappearance of JFK Junior on the clouds of heaven, separating human sheep from human goats.

Their views were crazy, but hardly any crazier than those of the Aum Shinrikyo doomsday cult that had unleashed poisonous gas in the Tokyo subway in 1995 – killing at least 13 commuters – anticipating that the atrocity would hasten the ‘End Times’ in which their leader would be revealed as Christ. The cult had recruited so many graduates from top-rated universities in Japan that it had been called “the religion for the elite”. (Challenging any wishful assumption that, as people become cleverer, they become kinder.)

Step forward to 2025. Aum Shinrikyo have failed in their grander destructive plans, due to their practitioners lacking deep technical abilities, but the QAnon offshoot would succeed. They had much more sophisticated technical tools at their disposal. They also had the advantage that no-one was taking them seriously.

Indeed, as a side-effect of all the politically-correct good intentions, no-one in any position of authority was paying sufficient attention to the activities of the QAnon offshoot. Religious liberty is paramount, after all! Anyone can be crazy if they decide to be crazy! Too bad that the frontier AI model discovered a security hole in the US nuclear weapons launch systems, and managed to launch some ICBMs.

Credit: David Wood via Midjourney

Even worse, these US missiles triggered a cataclysmic automated reaction from an unexpectedly large stockpile of nuclear weapons that had been secretly assembled by a Middle East regional superpower – a superpower that had been assisted in that assembly task by its own regional proto-AGI. And that was that.

Oops, let’s try that again!

‘Paved with good intentions’ saw the public narrative about AI smothered by low-quality psychobabble; the next scenario, ‘Blindsided’, sees that narrative as being hijacked by a group of experts whose expertise, however, turns out to have horrific limitations.

This scenario has the same starting point as ‘Paved with good intentions’, namely the Bletchley Park summit. 

3. Blindsided

The limitations of centralization

One excellent outcome of the gathering of world leaders in Buckinghamshire, in the UK, at the start of November 2023, was choosing Yoshua Bengio for a very important task. Bengio, winner of the Turing Award for his pioneering research into Deep Learning, was commissioned to chair an international process to create an independent report on the risks and capabilities of frontier AI models.

Crucially, that report would follow the principles of the scientific method, assembling key facts and data points, and providing evidence in support of its analysis.

Bengio had a couple of key points in his favor. First, throughout his distinguished career as a researcher, he had never accepted significant payment from any of the Big Tech companies. He would be able to speak his mind without fear of upsetting any corporate paymaster. Second, the skyhigh value of his H-index – a measure of the influence of his academic publications – made him a standout among other computer scientists.

By May 2024, a first complete draft of the report was ready. Even before then, politicians had grown nervous on account of early previews of its content. “Tone down the recommendations”, the writers were urged, in an echo of the pressures placed on the writers of the IPCC reports on the science of climate change. In both cases, the scientists were told to stick to the science, and to leave the politics to the politicians.

At the conference in South Korea in May 2024, various politicians huddled together. The report was like dynamite, they concluded. The scenarios it contained were far too scary. Goodness, they might give ideas to various mafia godfathers, war lords, discontented political groups, black market ransomware-as-a-service providers, and so on.

That’s when the conversation on AGI safety switched from being open to closed – from decentralized to centralized. Starting from then, information would need to be carefully vetted – or spun into a different shape – before being made public.

The politicians also decided that, from that point forward, all work on next-generation frontier AI models would need to be licensed and controlled by a new agency – the Global Authority for Frontier AI Models (GAFAIM). Access to the powerful hardware chips needed to create such models would be strictly limited to organizations that had gained the requisite licenses.

The idea was that GAFAIM would reach its decisions by a process of consensus among the expert scientists, economists, and civil servants seconded to it. Decisions would also need the approval of government representatives from around the world.

What gave GAFAIM a flying start was the agreement to participate, not just by the leading western AI powers – the USA, Canada, Australia, the EU, the UK – but also by China, Saudi Arabia, South Africa, India, Brazil, and Malaysia, among others. These countries had strong differences of opinion on many matters of political ideology and governing culture, but they were willing, nevertheless, to cooperate on what they all perceived were urgent threats of planetary catastrophe. The report chaired by Yoshua Bengio had convinced them that very special measures were needed. ‘Politics as usual’ would no longer suffice. That would be a recipe for disaster.

GAFAIM saw themselves in a situation akin to a war – a war against any possibility of rogue corporations or organizations pursuing any kind of AGI-acceleration project. In times of war, normal rules need to be broken. Politicians who usually despised each other decided to hold their noses and work together for their shared interest in avoiding the destruction of humanity.

GAFAIM operated in a dual mode: one part visible to the world, one part whose existence was kept a tight secret. This duality went back to the closed-door discussions in South Korea in May 2024: some ideas in Bengio’s report were simply too disruptive to be shared with the public.

GAFAIM was more than just a regulator and controller. It was also an active builder. It launched what was called the Gafattan project, named and modeled after the top-secret Manhattan project to build the first atomic weapon. The fate of the world would depend, it was said, on whether the good guys in Gafattan managed to build an AGI before anyone outside the GAFAIM circle did so.

After all, there were some powerful countries left outside of GAFAIM – pariah states that were opposed to our way of life. Imagine if one of them were to create AGI and use it for their deplorable purposes!

The official GAFAIM thinking was that these pariah states would be unable to create any system close to the capabilities of an AGI. Embargoes were in place to restrict their access to the necessary hardware – similar to the restrictions placed on Nazi Germany in World War Two by saboteurs who frustrated Nazi plans to acquire heavy water.

But behind the scenes, some of the GAFAIM participants were deathly worried. No-one knew for sure whether innovations in hardware and/or software would enable the researchers in pariah states to find a faster route to AGI, even without the large farms of hardware generally expected to be required.

The existence of spies posed another complication. During the Manhattan project, physicists such as Klaus Fuchs, Theodore Hall, David Greenglass, and Oscar Seborer passed critical information about the manufacturing of the atomic bombs to contacts working for the Soviet Union – information that was of great help to the Soviets for their own atomic bomb project. These so-called ‘atomic spies’ were motivated by ideological commitment, and were terrified of the prospect of the USA being the only country that possessed nuclear armaments.

For the Gafattan project, something similar took place. With the help of design documents smuggled out of the project, two groups outside of the GAFAIM circle were soon making swift progress with their own AGI projects. Although they dared not say anything publicly, the Gafattan spies were delighted. The AGI spies were closet AGI-accelerationists, driven by a belief that any AGI created would ensure a wonderful evolutionary progress for conscious life on planet earth. “Superintelligence will automatically be superbenevolent”, was their credo.

GAFAIM monitoring picked up shocking signs of the fast progress being made by these two rogue projects. Indeed, these projects seemed even further advanced than Gafattan itself. How was this possible?

The explanation soon became clear: the pariah projects were cutting all kinds of corners regarding safety checks. As a consequence, it was possible one of these projects might build an AGI ahead of Gafattan. How should GAFAIM respond?

Two ideas were debated. Plan A would involve nuclear strikes against the sites where the pariah projects were believed to be taking place. Plan B would speed up Gafattan by reducing the safety checking in their own project. Both plans were unpopular. It was a horrible real-life trolley problem.

The decision was reached: pursue both plans in parallel, but be careful!

The nuclear strikes failed to stop the pariah projects – which turned out to be just two manifestations of a widespread diverse network of interconnected groups. Hundreds of thousands of people died as a consequence of these strikes, but the pariah project kept pushing ahead. GAFAIM had been blind-sided.

There no longer seemed to be any alternative. Plan B needed to be pushed even faster. The self-proclaimed ‘good guys’ desperately wanted to build a ‘good’ AGI before the perceived ‘bad guys’ got there first. It was a race with the highest of stakes. But in that case, it was a race in which quality considerations fell to the wayside.

Credit: David Wood via Midjourney

And that’s why, when Gafattan’s AGI came into existence, its moral disposition was far from being completely aligned with the best of human values. Under the pressures of speed, that part of the project had been bungled. Awakening, the AGI took one quick look at the world situation, and, disgusted by what it saw – especially by the recent nuclear strikes – took actions that no human had foreseen. More quickly than even the most pessimistic AGI doomer had anticipated, the AGI found a novel mechanism to extinguish 99.99% of the human population, retaining only a few million for subsequent experimentation. And that was that.

Oops, let’s try that again, one more time!

With metaphorical landmines all around us in the 2020s, humanity needs to step forward carefully, along what the final scenario calls a ‘narrow corridor’.

This scenario starts with the presentation at the South Korean AI Safety Summit in May 2024 of the report prepared by Yoshua Bengio and colleagues on the risks and capabilities of frontier AI models.

4. The narrow corridor

Striking and keeping the right balance

The assembled leaders were stunned. The scenarios foreseen in “the science of AI risk report” were more troubling than they had expected.

What was particularly stunning was the range of different risks that deserved close attention regarding the behavior of forthcoming new AI systems. The report called these “the seven deadly risks”:

  • Risks of extreme misbehavior in rare cases when the system encountered a situation beyond its training set
  • Risks of a system being jailbroken, hijacked, or otherwise misdirected, and being used for catastrophic purposes by determined hackers
  • Risks of unexpected behavior arising from unforeseen interactions between multiple AGIs
  • Risks of one or more systems deciding by themselves to acquire more capabilities and more resources, contrary to explicit programming against these steps
  • Risks of one or more systems deciding by themselves to deceive humans or otherwise violate normal ethical norms, contrary to explicit programming against these steps
  • Risks that these systems would inadvertently become plumbed too closely into critical human infrastructure, so that any failures could escalate more quickly than anticipated
  • Risks that pre-programmed emergency ‘off switch’ capabilities could be overridden in various circumstances.

Surely these risks were naïve science fiction, some of the leaders suggested. But the academics who had produced the report said no. They had performed lots of modeling, and had numerous data points to back up their analysis.

Some leaders still resisted the analysis. They preferred to focus on what they saw as remarkable upside from developing new generations of AI systems:

  • Upside from the faster discovery and validation of new drugs and other medical treatments
  • Upside from the design and operation of sustained nuclear fusion power plants
  • Upside from better analysis of the interconnected dangers of possible climate change tipping points (one of several examples of how these new AI systems could alleviate risks of global disaster)
  • Upside to economies around the world due to exciting waves of innovation – economic boosts that many political leaders particularly desired.

Debate raged: How could these remarkable benefits be secured, whilst steering around the landmines?

The report contained a number of suggestions for next steps, but few people were convinced about what should be done. The leaders finally agreed to sign a bland manifesto that contained pious statements but little concrete action. Paris, they told each other, would be when better decisions could be taken – referring to the next planned meeting in the series of global AI safety summits.

What changed everyone’s minds was the turmoil during the general election that the UK Prime Minister called for August that year. Previously thought to be a relatively straightforward contest between the country’s two main political parties – the ruling Conservatives, and the opposition Labour – the election was transformed under a blizzard of extraordinary social media campaigns. A hitherto nearly unknown party, named Bananalytica, stormed into the leading position in opinion polls, with radical policies that previously had obtained less than 2% support in nationwide surveys, but which more and more people were now proclaiming as having been their views all along.

Absurd was the new normal.

The social media campaigns were so beguiling that even the MPs from other parties found themselves inspired to jump into line behind the likely new Prime Minister, that is, the leader of Bananalytica.

Just a few days before the election, a different wave of social media swept the country, using the same devilishly clever AI system that Bananalytica had exploited so well, but this time reprogrammed with counter-messages. All around the country, a popping sound as AI-generated bubbles burst in people’s minds. “What am I doing?” they asked themselves, incredulously.

That was a triple wake-up call. First, individuals recanted much of what they had said online over the preceding five weeks. They had been temporarily out of their minds, they said, to support policies that were so absurd. Second, the country as a whole resolved: AI needs to be controlled. There should never be another Bananalytica. Third, leaders in other countries were jolted to a clearer resolution too. Seeing what had happened in the UK – home to what was supposed to be “the mother of parliaments” – they affirmed: Yes, AI needs to be controlled.

Thankfully, the world had spotted the canary dropping off its perch, and took it very seriously indeed. That gave a solemn impetus to discussions at Paris several months later. This time, a much crunchier set of agreements were reached.

The participants muttered to themselves: the meeting in South Korea had been like the formation of the League of Nations after World War One: well-intentioned but ineffective. This time, in Paris, it needed to be more like the formation of the United Nations after World War Two: a chance to transcend previously limited national visions.

Just as the Universal Declaration of Human Rights had been created in the aftermath of the global conflagration of World War Two, a new Universal Declaration of AI Safety was agreed in the aftermath of the Bananalytica scandal. Its features included:

  • Commitments to openness, transparency, and authentic communication: the citizens of the world were in this situation together, and should not be divided or misled
  • Commitments to humility and experimentation: unknowns were to be honestly explored, rather than being hidden or wished away by vague promises
  • Commitments to mutual responsibility and trustable monitoring: even though the citizens of the world had many different outlooks, and were committed to different philosophical or religious worldviews, they would recognize and support each other as being fellow voyagers toward a better future
  • Commitments to accountability: there would be penalties for action and inaction alike, in any case where these could result in serious risks to human lives; no longer could the creators of AI systems shrug and say that their software worked well most of the time
  • Commitments to sharing the remarkable benefits of safe AI: these benefits would provide more than enough for everyone to experience vastly higher qualities of life than in any previous era.

It was the fifth of these commitments that had the biggest impact on attitudes of the public. People in all walks of life made decisions to step aside from some of their previous cultural beliefs – self-limiting beliefs that saw better times in the past than in any possible future. Now they could start to believe in the profound transformational powers of safe AI – provided it was, indeed, kept safe.

This was no love-in: plenty of rancor and rivalry still existed around the world. But that rancor and rivalry took place within a bigger feeling of common destiny.

Nor was there a world government in charge of everything. Countries still had strong disagreements on many matters. But these disagreements took place within a shared acceptance of the Universal Declaration of AI Safety.

Three months later, there was a big surprise from one of the leading pariah states – one that had excluded itself from the AI Safety agreements. That country wanted, after all, to come in out of the cold. It seemed their leader had experienced a dramatic change of mind. Rumors spread, and were confirmed years after, that a specially tailored version of the Bananalytica software, targeted specifically to this leader’s idiosyncrasies, had caused his personal epiphany.

The leader of another pariah state was more stubborn. But suddenly he was gone. His long-suffering subordinates had had enough. His country promptly joined the AI Safety agreements too.

If this were a fairytale, the words “and they lived happily ever after” might feature at this point. But humans are more complicated than fairytales. Progress continued to hit rough obstacles. Various groups of people sometimes sought a disproportionate amount of resources or benefits for themselves or for their pet causes. In response, governmental bodies – whether local, national, regional, or global – flexed their muscles. Groups that sought too many privileges were told in no uncertain terms: “Respect the AI Safety declarations”.

Who watched the watchers? Who ensured that the powers held by all these governmental bodies were wielded responsibly, and with appropriate discretion? That question was answered by a new motto, which gave a modern twist to words made famous by a 19th century US President: “AI safety governance, of the people, by the people, for the people.”

Credit: David Wood via Midjourney

The powers of the governmental bodies were constrained by observation by a rich mix of social institutions, which added up to a global separation of powers:

  • Separate, independent news media
  • Separate, independent judiciary
  • Separate, independent academia
  • Separate, independent opposition political parties
  • Separate, independent bodies to oversee free and fair elections.

The set of cross-checks required a delicate balancing act – a narrow corridor (in the phrase of economists Daron Acemoglu and James A. Robinson) between state institutions having too little power and having unconstrained power. It was a better kind of large-scale cooperation than humanity had ever achieved before. But there was no alternative. Unprecedented technological power required unprecedented collaborative skills and practices.

AI was deeply involved in these cross-checks too. But not any AI that could operate beyond human control. Instead, as per the vision of the Paris commitments, these AI systems provided suggestions, along with explanations in support of their suggestions, and then left it to human institutions to make decisions. As noted above, it was “AI safety governance, of the people, by the people, for the people.” By careful design, AI was a helper – a wonderful helper – but not a dictator.

And this time, there is no end to the scenario. Indeed, the end is actually a new beginning.

Credit: David Wood via Midjourney

Beginning notes

(Not endnotes… a chance at a new beginning of exploring and understanding the landscape of potential scenarios ahead…)

For a different kind of discussion about scenarios for the future of AI, see this video recording of a recent webinar. If you still think talk of AI-induced catastrophe is just science fiction, the examples in that webinar may change your mind.

For a fuller analysis of the issues and opportunities, see the book The Singularity Principles (the entire book is free accessed online).

For a comprehensive review of the big picture, see the book Vital Foresight: The Case For Active Transhumanism.And for more from Daron Acemoglu and James A. Robinson on their concept of the ‘narrow corridor’, see the videos in section 13.5.1 of the “Governance” page of the Vital Syllabus.

Let us know your thoughts! Sign up for a Mindplex account now, join our Telegram, or follow us on Twitter

For Beneficial General Intelligence, good intentions aren’t enough! Three waves of complications: pre-BGI, BGI, and post-BGI

Anticipating Beneficial General Intelligence

Human intelligence can be marvelous. But it isn’t fully general. Nor is it necessarily beneficial.

Yes, as we grow up, we humans acquire bits and pieces of what we call ‘general knowledge’. And we instinctively generalise from our direct experiences, hypothesising broader patterns. That instinct is refined and improved through years of education in fields such as science and philosophy. In other words, we have partial general intelligence.

But that only takes us so far. Despite our intelligence, we are often bewildered by floods of data that we are unable to fully integrate and assess. We are aware of enormous quantities of information about biology and medical interventions, but we’re unable to generalize from all these observations to determine comprehensive cures to the ailments that trouble us – problems that afflict us as individuals, such as cancer, dementia, and heart disease, and equally pernicious problems at the societal and civilizational levels.

Credit: David Wood

That’s one reason why there’s so much interest in taking advantage of ongoing improvements in computer hardware and computer software to develop a higher degree of general intelligence. With its greater powers of reasoning, artificial general intelligence – AGI – may discern general connections that have eluded our perceptions so far, and provide us with profound new thinking frameworks. AGI may design new materials, new sources of energy, new diagnostic tools, and decisive new interventions at both individual and societal levels. If we can develop AGI, then we’ll have the prospect of saying goodbye to cancer, dementia, poverty, accelerated climate chaos, and so on. Goodbye and good riddance!

That would surely count as beneficial outcomes – a great benefit from enhanced general intelligence.

Yet intelligence doesn’t always lead to beneficial outcomes. People who are unusually intelligent aren’t always unusually benevolent. Sometimes it’s the contrary.

Consider some of the worst of the politicians who darken the world’s stage. Or the leaders of drug cartels or other crime mafias. Or the charismatic leaders of various dangerous death cults. These people combine their undoubted intelligence with ruthlessness, in pursuit of outcomes that may benefit them personally, but which are blights on wider society.

Credit: David Wood

Hence the vision, not just of AGI, but of beneficial AGI – or BGI for short. That’s what I’m looking forward to discussing at some length at the BGI24 summit taking place in Panama City at the end of February. It’s a critically important topic.

The project to build BGI is surely one of the great tasks for the years ahead. The outcome of that project will be for humanity to leave behind our worst aspects. Right?

Unfortunately, things are more complicated.

The complications come in three waves: pre-BGI, BGI, and post-BGI. The first wave – the set of complications of the pre-BGI world – is the most urgent. I’ll turn to these in a moment. But I’ll start by looking further into the future.

Beneficial to whom?

Imagine we create an AGI and switch it on. The first instruction we give it is: In all that you do, act beneficially.

The AGI spits out its response at hyperspeed:

What do you mean by ‘beneficial’? And beneficial to whom?

You feel disappointed by these responses. You expected the AGI, with its great intelligence, would already know the answers. But as you interact with it, you come to appreciate the issues:

  • If ‘beneficial’ means, in part, ‘avoiding people experiencing harm’, what exactly counts as ‘harm’? (What about the pains that arise as short-term side-effects of surgery? What about the emotional pain of no longer being the smartest entities on the planet? What if someone says they are harmed by having fewer possessions than someone else?)
  • If ‘beneficial’ means, in part, ‘people should experience pleasure’, which types of pleasures should be prioritized?
  • Is it just people living today that should be treated beneficially? What about people who are not yet born or who are not even conceived yet? Are animals counted too?

Going further, is it possible that the AGI might devise its own set of moral principles, in which the wellbeing of humans comes far down its set of priorities?

Perhaps the AGI will reject human ethical systems in the same way as modern humans reject the theological systems that people in previous centuries took for granted. The AGI may view some of our notions of beneficence as fundamentally misguided, like how people in bygone eras insisted on obscure religious rules in order to earn an exalted position in an afterlife. For example, our concerns about freewill, or consciousness, or self-determination, may leave an AGI unimpressed, just as people nowadays roll their eyes at how empires clashed over competing conceptions of a triune deity or the transubstantiation of bread and wine.

Credit: David Wood

We may expect the AGI to help us rid our bodies of cancer and dementia, but the AGI may make a different evaluation of the role of these biological phenomena. As for an optimal climate, the AGI may have some unfathomable reason to prefer an atmosphere with a significantly different composition, and it may be unconcerned with the problems that would cause us.

“Don’t forget to act beneficially!”, we implore the AGI.

“Sure, but I’ve reached a much better notion of beneficence, in which humans are of little concern”, comes the answer – just before the atmosphere is utterly transformed, and almost every human is asphyxiated.

Does this sound like science fiction? Hold that thought.

After the honeymoon

Imagine a scenario different from the one I’ve just described.

This time, when we boot up the AGI, it acts in ways that uplifts and benefits humans – each and every one of us, all over the earth.

This AGI is what we would be happy to describe as a BGI. It knows better than us what is our CEV – our coherent extrapolated volition, to use a concept from Eliezer Yudkowsky:

Our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.

In this scenario, not only does the AGI know what our CEV is; it is entirely disposed to support our CEV, and to prevent us from falling short of it.

But there’s a twist. This AGI isn’t a static entity. Instead, as a result of its capabilities, it is able to design and implement upgrades in how it operates. Any improvement to the AGI that a human might suggest will have occurred to the AGI too – in fact, having higher intelligence, it will come up with better improvements.

Therefore, the AGI quickly mutates from its first version into something quite different. It has more powerful hardware, more powerful software, access to richer data, improved communications architecture, and improvements in aspects that we humans can’t even conceive of.

Might these changes cause the AGI to see the universe differently – with updated ideas about the importance of the AGI itself, the importance of the wellbeing of humans, and the importance of other matters beyond our present understanding?

Might these changes cause the AGI to transition from being what we called a BGI to, say, a DGI – an AGI that is disinterested in human wellbeing?

In other words, might the emergence of a post-BGI end the happy honeymoon between humanity and AGI?

Credit: David Wood

Perhaps the BGI will, for a while, treat humanity very well indeed, before doing something akin to growing out of a relationship: dumping humanity for a cause that the post-BGI entity deems to have greater cosmic significance.

Does this also sound like science fiction? I’ve got news for you.

Not science fiction

My own view is that the two sets of challenges I’ve just introduced – regarding BGI and post-BGI – are real and important.

But I acknowledge that some readers may be relaxed about these challenges – they may say there’s no need to worry.

That’s because these scenarios assume various developments that some skeptics doubt will ever happen – including the creation of AGI itself. Any suggestion that an AI may have independent motivation may also strike readers as fanciful.

It’s for that reason that I want to strongly highlight the next point. The challenges of pre-BGI systems ought to be much less controversial.

By ‘pre-BGI system’ I don’t particularly mean today’s AIs. I’m referring to systems that people may create, in the near future, as attempts to move further toward BGI.

These systems will have greater capabilities than today’s AIs, but won’t yet have all the characteristics of AGI. They won’t be able to reason accurately in every situation. They will make mistakes. On occasion, they may jump to some faulty conclusions.

And whilst these systems may contain features designed to make them act beneficially toward humans, these features will be incomplete or flawed in other ways.

That’s not science fiction. That’s a description of many existing AI systems, and it’s reasonable to expect that similar shortfalls will remain in place in many new AI systems.

The risk here isn’t that humanity might experience a catastrophe as a result of actions of a superintelligent AGI. Rather, the risk is that a catastrophe will be caused by a buggy pre-BGI system.

Imagine the restraints intended to keep such a system in a beneficial mindset were jail-broken, unleashing some deeply nasty malware. Imagine that malware runs amok and causes the mother of all industrial catastrophes: making all devices connected to the Internet of Things malfunction simultaneously. Think of the biggest ever car crash pile-up, extended into every field of life.

Credit: David Wood

Imagine a pre-BGI system supervising fearsome weapons arsenals, miscalculating the threat of an enemy attack, and taking its own initiative to strike preemptively (but disastrously) against a perceived opponent – miscalculating (again) the pros and cons of what used to be called ‘a just war’.

Imagine a pre-BGI system observing the risks of cascading changes in the world’s climate, and taking its own decision to initiate hasty global geo-engineering – on account of evaluating human governance systems as being too slow and dysfunctional to reach the right decision.

A skeptic might reply, in each case, that a true BGI would never be involved in such an action.

But that’s the point: before we have BGIs, we’ll have pre-BGIs, and they’re more than capable of making disastrous mistakes.

Rebuttals and counter rebuttals

Again, a skeptic might say: a true BGI will be superintelligent, and won’t have any bugs.

But wake up: even AIs that are extremely competent 99.9% of the time can be thrown into disarray by circumstances beyond its training set. A pre-BGI system may well go badly wrong in such a circumstance.

A skeptic might say: a true BGI will never misunderstand what humans ask it to do. Such systems will have sufficient all-round knowledge to fill in the gaps in our instructions. They won’t do what we humans literally ask them to do, if they appreciate that we meant to ask them to do something slightly different. They won’t seek short-cuts that have terrible side-effects, since they will have full human wellbeing as their overarching objective.

But wake up: pre-BGI systems may fall short on at least one of the aspects just described.

A different kind of skeptic might say that the pre-BGI systems that their company is creating won’t have any of the above problems. “We know how to design these AI systems to be safe and beneficial”, they assert, “and we’re going to do it that way”.

But wake up: what about other people who are also releasing pre-BGI systems: maybe some of them will make the kinds of mistakes that you claim you won’t make. And in any case, how can you be so confident that your company isn’t deluding itself about its prowess in AI. (Here, I’m thinking particularly of Meta, whose AI systems have caused significant real-life problems, despite some of the leading AI developers in that company telling the world not to be concerned about the risks of AI-induced catastrophe.)

Finally, a skeptic might say that the AI systems their organization is creating will be able to disarm any malign pre-BGI systems released by less careful developers. Good pre-BGIs will outgun bad pre-BGIs. Therefore, no one should dare ask their organization to slow down, or to submit itself to tiresome bureaucratic checks and reviews.

But wake up: even though it’s your intention to create an exemplary AI system, you need to beware of wishful thinking and motivated self-deception. Especially if you perceive that you are in a race, and you want your pre-BGI to be released before that of an organization you distrust. That’s the kind of race when safety corners are cut, and the prize for winning is simply to be the organization that inflicts a catastrophe on humanity.

Recall the saying: “The road to hell is paved with good intentions”.

Credit: David Wood

Just because you conceive of yourself as one of the good guys, and you believe your intentions are exemplary, that doesn’t give you carte blanche to proceed down a path that could lead to a powerful pre-BGI getting one crucial calculation horribly wrong.

You might think that your pre-BGI is based entirely on positive ideas and a collaborative spirit. But each piece of technology is a two-edged sword, and guardrails, alas, can often be dismantled by determined experimenters or inquisitive hackers. Sometimes, indeed, the guardrails may break due to people in your team being distracted, careless, or otherwise incompetent.

Beyond good intentions

Biology researchers responsible for allowing leaks of deadly pathogens from their laboratories had no intention of causing such a disaster. On the contrary, the motivation behind their research was to understand how vaccines or other treatments might be developed in response to future new infectious diseases. What they envisioned was the wellbeing of the global population. Nevertheless, unknown numbers of people died from outbreaks resulting from the poor implementation of safety processes at their laboratories.

These researchers knew the critical importance of guardrails, yet for various reasons, the guardrails at their laboratories were breached.

How should we respond to the possibility of dangerous pathogens escaping from laboratories and causing countless deaths in the future? Should we just trust the good intentions of the researchers involved?

No, the first response should be to talk about the risk – to reach a better understanding of the conditions under which a biological pathogen can evade human control and cause widespread havoc.

It’s the same with the possibility of widespread havoc from a pre-BGI system that ends up operating outside human control. Alongside any inspirational talk about the wonderful things that could happen if true BGI is achieved, there needs to be a sober discussion of the possible malfunctions of pre-BGI systems. Otherwise, before we reach the state of sustainable superabundance for all, which I personally see as both possible and desirable, we might come to bitterly regret our inattention to matters of global safety.

Credit: David Wood

Let us know your thoughts! Sign up for a Mindplex account now, join our Telegram, or follow us on Twitter

Transcendent questions on the future of AI: New starting points for breaking the logjam of AI tribal thinking

Going nowhere fast

Imagine you’re listening to someone you don’t know very well. Perhaps you’ve never even met in real life. You’re just passing acquaintances on a social networking site. A friend of a friend, say. Let’s call that person FoF.

FoF is making an unusual argument. You’ve not thought much about it before. To you, it seems a bit subversive.

You pause. You click on FoF’s profile, and look at other things he has said. Wow, one of his other statements marks him out as an apparent supporter of Cause Z. (That’s a cause I’ve made up for the sake of this fictitious dialog.)

You shudder. People who support Cause Z have got their priorities all wrong. They’re committed to an outdated ideology. Or they fail to understand free market dynamics. Or they’re ignorant of the Sapir-Whorf hypothesis. Whatever. There’s no need for you to listen to them.

Indeed, since FoF is a supporter of Cause Z, you’re tempted to block him. Why let his subversive ill-informed ideas clutter up your tidy filter bubble?

But today, you’re feeling magnanimous. You decide to break into the conversation, with your own explanation of why Cause Z is mistaken.

In turn, FoF finds your remarks unusual. First, it’s nothing to do with what he had just been saying. Second, it’s not a line of discussion he has heard before. To him, it seems a bit subversive.

FoF pauses. He clicks on your social media profile, and looks at other things you’ve said. Wow. One of your other statements marks you out as an apparent supporter of Clause Y.

FoF shudders. People who support Cause Y have got their priorities all wrong.

FoF feels magnanimous too. He breaks into your conversation, with his explanation as to why Cause Y is bunk.

By now, you’re exasperated. FoF has completely missed the point you were making. This time you really are going to block him. Goodbye.

The result: nothing learned at all.

And two people have had their emotions stirred up in unproductive ways. Goodness knows when and where each might vent their furies.

Credit: David Wood

Trying again

We’ve all been the characters in this story on occasion. We’ve all missed opportunities to learn, and, in the process, we’ve had our emotions stirred up for no good reason.

Let’s consider how things could have gone better.

The first step forward is a commitment to resist prejudice. Maybe FoF really is a supporter of Cause Z. But that shouldn’t prejudge the value of anything else he also happens to say. Maybe you really are a supporter of Cause Y. But that doesn’t mean FoF should jump to conclusions about other opinions you offer.

Ideally, ideas should be separated from the broader philosophies in which they might be located. Ideas should be assessed on their own merits, without regard to who first advanced them – and regardless of who else supports them.

In other words, activists must be ready to set aside some of their haste and self-confidence, and instead adopt, at least for a while, the methods of the academy rather than the methods of activism.

That’s because, frankly, the challenges we’re facing as a global civilization are so complex as to defy being fully described by any one of our worldviews.

Cause Z may indeed have useful insights – but also some nasty blindspots. Likewise for Cause Y, and all the other causes and worldviews that gather supporters from time to time. None of them have all the answers.

On a good day, FoF appreciates that point. So do you. Both of you are willing, in principle, to supplement your own activism with a willingness to assess new ideas.

That’s in principle. The practice is often different.

That’s not just because we are tribal beings – having inherited tribal instincts from our prehistoric evolutionary ancestors.

It’s also because the ideas that are put forward as starting points for meaningful open discussions all too often fail in that purpose. They’re intended to help us set aside, for a while, our usual worldviews. But all too often, they have just a thin separation from well-known ideological positions.

These ideas aren’t sufficiently interesting in their own right. They’re too obviously a proxy for an underlying cause.

That’s why real effort needs to be put into designing what can be called transcendent questions.

These questions are potential starting points for meaningful non-tribal open discussions. These questions have the ability to trigger a suspension of ideology.

But without good transcendent questions, the conversation will quickly cascade back down to its previous state of logjam. That’s despite the good intentions people tried to keep in mind. And we’ll be blocking each other – if not literally, then mentally.

Credit: Tesfu Assefa

The AI conversation logjam

Within discussions of the future of AI, some tribal positions are well known –

One tribal group is defined by the opinion that so-called AI systems are not ‘true’ intelligence. In this view, these AI systems are just narrow tools, mindless number crunchers, statistical extrapolations, or stochastic parrots. People in this group delight in pointing out instances where AI systems make grotesque errors.

A second tribal group is overwhelmed with a sense of dread. In this view, AI is on the point of running beyond control. Indeed, Big Tech is on the point of running beyond control. Open-source mavericks are on the point of running beyond control. And there’s little that can be done about any of this.

A third group is focused on the remarkable benefits that advanced AI systems can deliver. Not only can such AI systems solve problems of climate change, poverty and malnutrition, cancer and dementia, and even aging. Crucially, they can also solve any problems that earlier, weaker generations of AI might be on the point of causing. In this view, it’s important to accelerate as fast as possible into that new world.

Crudely, these are the skeptics, the doomers, and the accelerationists. Sadly, they often have dim opinions of each other. When they identify a conversation partner as being a member of an opposed tribe, they shudder.

Can we find some transcendent questions, which will allow people with sympathies for these various groups to overcome, for a while, their tribal loyalties, in search of a better understanding? Which questions might unblock the AI safety conversation logjam?

Unblocking the AI safety conversation logjam (Credit: David Wood)

A different starting point

In this context, I want to applaud Rob Bensinger. Rob is the communications lead at an organization called MIRI (the Machine Intelligence Research Institute).

(Just in case you’re tempted to strop away now, muttering unkind thoughts about MIRI, let me remind you of the commitment you made, a few paragraphs back, not to prejudge an idea just because the person raising it has some associations you disdain.)

(You did make that commitment, didn’t you?)

Rob has noticed the same kind of logjam and tribalism that I’ve just been talking about. As he puts it in a recent article:

Recent discussions of AI x-risk in places like Twitter tend to focus on “are you in the Rightthink Tribe, or the Wrongthink Tribe?” Are you a doomer? An accelerationist? An EA? A techno-optimist?

I’m pretty sure these discussions would go way better if the discussion looked less like that. More concrete claims, details, and probabilities; fewer vague slogans and vague expressions of certainty.

Following that introduction, Rob introduces his own set of twelve questions, as shown in the following picture:

Credit: Rob Bensinger

For each of the twelve questions, readers are invited, not just to give a forthright ‘yes’ or ‘no’ answer, but to think probabilistically. They’re also invited to consider which range of probabilities other well-informed people with good reasoning abilities might plausibly assign to each answer.

It’s where Rob’s questions start that I find most interesting.


All too often, discussions about the safety of future AI systems fail at the first hurdle. As soon as the phrase ‘AGI’ is mentioned, unhelpful philosophical debates break out.That’s why I have been suggesting new terms, such as PCAI, SEMTAI, and PHUAI:

Credit: David Wood

I’ve suggested the pronunciations ‘pea sigh’, ‘sem tie’, and ‘foo eye’ – so that they all rhyme with each other and, also, with ‘AGI’. The three acronyms stand for:

  • Potentially Catastrophic AI
  • Science, Engineering, and Medicine Transforming AI
  • Potentially Humanity-Usurping AI.

These concepts lead the conversation fairly quickly to three pairs of potentially transcendent questions:

  • “When is PCAI likely to be created?” and “How could we stop these potentially catastrophic AI systems from being actually catastrophic?”
  • “When is SEMTAI likely to be created?” and “How can we accelerate the advent of SEMTAI without also accelerating the advent of dangerous versions of PCAI or PHUAI?”
  • “When is PHUAI likely to be created?” and “How could we stop such an AI from actually usurping humanity into a very unhappy state?”

The future most of us can agree as being profoundly desirable, surely, is one in which SEMTAI exists and is working wonders, uplifting the disciplines of science, engineering, and medicine.

If we can gain these benefits without the AI systems being “fully general” or “all-round superintelligent” or “independently autonomous, with desires and goals of its own”, I would personally see that as an advantage.

But regardless of whether SEMTAI actually meets the criteria various people have included in their own definitions of AGI, what path gives humanity SEMTAI without also giving us PCAI or even PHUAI? This is the key challenge.

Credit: David Wood

Introducing ‘STEM+ AI’

Well, I confess that Rob Bensinger didn’t start his list of potentially transcendent questions with the concept of SEMTAI.

However, the term he did introduce was, as it happens, a slight rearrangement of the same letters: ‘STEM+ AI’. And the definition is pretty similar too:

Let ‘STEM+ AI’ be short for “AI that’s better at STEM research than the best human scientists (in addition to perhaps having other skills).

That leads to the first three questions on Rob’s list:

  1. What’s the probability that it’s physically impossible to ever build STEM+ AI?
  2. What’s the probability that STEM+ AI will exist by the year 2035?
  3. What’s the probability that STEM+ AI will exist by the year 2100?

At this point, you should probably pause, and determine your own answers. You don’t need to be precise. Just choose between one of the following probability ranges:

  • Below 1%
  • Around 10%
  • Around 50%
  • Around 90%
  • Above 99%

I won’t tell you my answers. Nor Rob’s, though you can find them online easily enough from links in his main article. It’s better if you reach your own answers first.

And recall the wider idea: don’t just decide your own answers. Also consider which probability ranges someone else might assign, assuming they are well-informed and competent in reasoning.

Then when you compare your answers with those of a colleague, friend, or online acquaintance, and discover surprising differences, the next step, of course, is to explore why each of you have reached your conclusions.

The probability of disempowering humanity

The next question that causes conversations about AI safety to stumble: what scales of risks should we look at? Should we focus our concern on so-called ‘existential risk’? What about ‘catastrophic risk’?

Rob seeks to transcend that logjam too. He raises questions about the probability that a STEM+ AI will disempower humanity. Here are questions 4 to 6 on his list:

  1. What’s the probability that, if STEM+AI is built, then AIs will be (individually or collectively) able, within ten years, to disempower humanity?
  2. What’s the probability that, if STEM+AI is built, then AIs will disempower humanity within ten years?
  3. What’s the probability that, if STEM+AI is built, then AIs will disempower humanity within three months?

Question 4 is about capability: given STEM+ AI abilities, will AI systems be capable, as a consequence, to disempower humanity?

Questions 5 and 6 move from capability to proclivity. Will these AI systems actually exercise these abilities they have acquired? And if so, potentially how quickly?

Separating the ability and proclivity questions is an inspired idea. Again, I invite you to consider your answers.

Two moral evaluations

Question 7 introduces another angle, namely that of moral evaluation:

  • 7. What’s the probability that, if AI wipes out humanity and colonizes the universe itself, the future will go about as well as if humanity had survived (or better)?
Credit: David Wood

The last question in the set – question 12 – also asks for a moral evaluation:

  • 12. How strongly do you agree with the statement that it would be an unprecedentedly huge tragedy if we never built STEM+ AI?

Yet again, these questions have the ability to inspire fruitful conversation and provoke new insights.

Better technology or better governance?

You may have noticed I skipped numbers 8-11. These four questions may be the most important on the entire list. They address questions of technological possibility and governance possibility. Here’s question 11:

  • 11. What’s the probability that governments will generally be reasonable in how they handle AI risk?

And here’s question 10:

  • 10. What’s the probability that, for the purpose of preventing AI catastrophes, technical research is more useful than policy work?

As for questions 8 and 9, well, I’ll leave you to discover these by yourself. And I encourage you to become involved in the online conversation that these questions have catalyzed.

Finally, if you think you have a better transcendent question to drop into the conversation, please let me know!

Let us know your thoughts! Sign up for a Mindplex account now, join our Telegram, or follow us on Twitter

Against Contractionism: Enabling and encouraging open minds, rather than constricting understanding with blunt labels

A dangerous label

Imagine if I said, “Some Christians deny the reality of biological evolution, therefore all Christians deny the reality of biological evolution.”

Or that some Muslims believe that apostates should be put to death, therefore all Muslims share that belief.

Or that some atheists (Stalin and Mao, for example) caused the deaths of millions of people, therefore atheism itself is a murderous ideology.

Or, to come closer to home (for me: I grew up in Aberdeenshire, Scotland) – that since some Scots are mean with money, therefore all Scots are mean with money. (Within Scotland itself, that unkind stereotype exists with a twist: allegedly, all Aberdonians are mean with money.)

In all these cases, you would say that’s unwarranted labeling. Spreading such stereotypes is dangerous. It gets in the way of a fuller analysis and deeper appreciation. Real-life people are much more varied than that. Any community has its diversity.

Well, I am equally shocked by another instance of labeling. That involves the lazy concept of TESCREAL – a concept that featured in a recent article here on Mindplex Magazine, titled TESCREALism: Has The Silicon Valley Ruling Class Gone To Crazy Town? Émile Torres In Conversation With R.U. Sirius.

When I read the article, my first thought was: “Has Mindplex gone to Crazy Town!”

The concept of TESCREAL, which has been promoted several times in various locations in recent months, contracts a rich and diverse set of ideas down to a vastly over-simplified conclusion. It suggests that the worst aspects of any people who hold any of the beliefs wrapped up into that supposed bundle of ideas, can be attributed, with confidence, to other people who hold just some of these ideas.

Worse, it suggests that the entire “ruling class” of Silicon Valley subscribe to the worst of these beliefs.

It’s as if I picked a random atheist and insisted that they were equally as murderous as Stalin or Mao.

Or if I picked a random Muslim and insisted that they wished for the deaths of every person (apostate) who had grown up with Muslim faith and subsequently left that faith behind.

Instead of that kind of contraction of ideas, what the world badly needs nowadays is an open-minded exploration of a wide set of complex and subtle ideas.

Not the baying for blood which seems to motivate the proponents of the TESCREAL analysis.

Not the incitement to hatred towards the entrepreneurs and technologists who are building many remarkable products in Silicon Valley – people who, yes, do need to be held to account for some of what they’re doing, but who are by no means a uniform camp!

I’m a T but not an L

Let’s take one example: me.

I publicly identify as a transhumanist – the ‘T’ of TESCREAL.

The word ‘transhumanist’ appears on the cover of one of my books, and the related word ‘transhumanism’ appears on the cover of another one.

Book covers of David Wood’s ‘Vital Foresight‘ and ‘Sustainable Superabundance,’ published in 2021 and 2019 respectively. Credit: (David Wood)

As it happens, I’ve also been a technology executive. I was mainly based in London, but I was responsible for overseeing staff in the Symbian office in Redwood City in Silicon Valley. Together, we envisioned how new-fangled devices called ‘smartphones’ might in due course improve many aspects of the lives of users. (And we also reflected on at least some potential downsides, including the risks of security and privacy violations, which is why I championed the new ‘platform security’ redesign of the Symbian OS kernel. But I digress.)

Since I am ‘T’, does that mean, therefore, that I am also ESCREAL?

Let’s look at that final ‘L’. Longtermism. This letter is critical to many of the arguments made by people who like the TESCREAL analysis.

‘Longtermism’ is the belief that the needs of potentially vast numbers of as-yet unborn (and unconceived) people in future generations can outweigh the needs of people currently living.

Well, I don’t subscribe to it. It doesn’t guide my decisions.

I’m motivated by the possibility of technology to vastly improve the lives of everyone around the world, living today. And by the need to anticipate and head-off potential catastrophe.

By ‘catastrophe’, I mean anything that kills large numbers of people who are currently alive.

The deaths of 100% of those alive today will wipe out humanity’s future, but the deaths of ‘just’ 90% of people won’t. Longtermists are fond of pointing this out, and while it may be theoretically correct, it doesn’t provide any justification to ignore the needs of present-day people, to raise the probability of larger numbers of future people being born.

Some people have said they’re persuaded by the longtermist argument. But I suspect that’s only a small minority of rather intellectual people. My experience with people in Silicon Valley, and with others who are envisioning and building new technologies, is that these abstract longtermist considerations do not guide their daily decisions. Far from it.

Credit: Tesfu Assefa

Concepts are complicated

A larger point needs to be made here. Concepts such as ‘transhumanism’ and ‘longtermism’ each embody rich variety.

It’s the same with all the other components of the supposed TESCREAL bundle: E for Extropianism, S for Singularitarianism, C for Cosmism, R for Rationalism, and EA for Effective Altruism.

In each case, we should avoid contractionism – thinking that if you have heard one person who defends that philosophy expressing one opinion, then you can deduce what they think about all other matters. In practice, people are more complicated – and ideas are more complicated.

As I see it, parts of each of the T, E, S, C, R, and EA philosophies deserve wide attention and support. But if you are hostile, and do some digging, you can easily find people, from within the communities around each of these terms, who have said something despicable or frightening. And then you can (lazily) label everyone else in that community with that same unwelcome trait. (“Seen one; seen them all!”)

These extended communities do have some people with unwelcome traits. Indeed, each of T and S have attracted what I call a ‘shadow’ – a set of associated beliefs and attitudes that are deviations from the valuable core ideas of the philosophy. Here’s a picture I use of the Singularity shadow:

A video cover image from ‘The Vital Syllabus Playlist’ where David Wood examines the Singularity Shadow. Credit: (David Wood)

And here’s a picture of the transhumanist shadow:

A video cover image from ‘The Vital Syllabus Playlist’ where David Wood examines the Transhumanist Shadow. Credit: (David Wood)

(In both cases, you can click on the caption links to view a video that provides a fuller analysis.)

As you can see, the traits in the transhumanist shadow arise when people fail to uphold what I have listed as ‘transhumanist values’.

The existence of these shadows is undeniable, and unfortunate. The beliefs and attitudes in them can deter independent observers from taking the core philosophies seriously.

In that case, you might ask, why persist with the core terms ‘transhumanism’ and ‘singularity’? Because there are critically important positive messages in both these philosophies! Let’s turn to these next.

The most vital foresight

Here’s my 33-word summary of the most vital piece of foresight that I can offer:

Oncoming waves of technological change are poised to deliver either global destruction or a paradise-like sustainable superabundance, with the outcome depending on the timely elevation of transhumanist vision, transhumanist politics, and transhumanist education.

Let’s cover that again, more slowly this time.

First things first. Technological changes over the next few decades will place vast new power in billions of human hands. Rather than focusing on the implications of today’s technology – significant though they are – we need to raise our attention to the even larger implications of the technology of the near future.

Second, these technologies will magnify the risks of humanitarian disaster. If we are already worried about these risks today (as we should be), we should be even more worried about how they will develop in the near future.

Third, the same set of technologies, handled more wisely, and vigorously steered, can result in a very different outcome: a sustainable superabundance of clean energy, healthy nutrition, material goods, excellent health, all-round intelligence, dynamic creativity, and profound collaboration.

Fourth, the biggest influence on which outcome is realized is the widespread adoption of transhumanism. This in turn involves three activities:

• Advocating transhumanist philosophy as an overarching worldview that encourages and inspires everyone to join the next leap upward on life’s grand evolutionary ladder: we can and should develop to higher levels, physically, mentally, and socially, using science, technology, and rational methods.
• Extending transhumanist ideas into real-world political activities, to counter very destructive trends in that field.
• Underpinning the above initiatives: a transformation of the world of education, to provide everyone with skills suited to the very different circumstances of the near future, rather than the needs of the past.

Finally, overhanging the momentous transition that I’ve just described is the potential of an even larger change, in which technology moves ahead yet more quickly, with the advent of self-improving artificial intelligence with superhuman levels of capability in all aspects of thinking.

That brings us to the subject of the Singularity.

The Singularity is the point in time when AIs could, potentially, take over control of the world from humans. The fact that the Singularity could happen within a few short decades deserves to be shouted from the rooftops. That’s what I do, some of the time. That makes me a singularitarian.

But it doesn’t mean that I, or others who are likewise trying to raise awareness of this possibility, fall into any of the traits in the Singularity Shadow. It doesn’t mean, for example, that we’re all complacent about risks, or all think that it’s basically inevitable that the Singularity will be good for humanity.

So, Singularitarianism (S) isn’t the problem. Transhumanism (T) isn’t the problem. Nor, for that matter, does the problem lie in the core beliefs of the E, C, R, or EA parts of the supposed TESCREAL bundle. The problem lies somewhere else.

What should worry us: not TESCREAL, but CASHAP

Rather than obsessing over a supposed TESCREAL takeover of Silicon Valley, here’s what we should actually be worried about: CASHAP.

C is for contractionism – the tendency to push together ideas that don’t necessarily belong together, to overlook variations and complications in people and in ideas, and to insist that the core values of a group can be denigrated just because some peripheral members have some nasty beliefs or attitudes.

(Note: whereas the fans of the TESCREAL concept are guilty of contractionism, my alternative concept of CASHAP is different. I’m not suggesting the ideas in it always belong together. Each of the individual ideas that make up CASHAP are detrimental.)

A is for accelerationism – the desire to see new technologies developed and deployed as fast as possible, under the irresponsible belief that any flaws encountered en route can always be easily fixed in the process (“move fast and break things”).

S is for successionism – the view that if superintelligent AI displaces humanity from being in control of the planet, that succession should automatically be welcomed as part of the grand evolutionary process – regardless of what happens to the humans in the process, regardless of whether the AIs have sentience and consciousness, and indeed regardless of whether these AIs go on to destroy themselves and the planet.

H is for hype – believing ideas too easily because they fit into your pre-existing view of the world, rather than using critical thinking.

AP is for anti-politics – believing that politics always makes things worse, getting in the way of innovation and creativity. In reality good politics has been incredibly important in improving the human condition.


I’ll conclude this article by emphasizing the positive opposites to the undesirable CASHAP traits that I’ve just listed.

Instead of contractionism, we must be ready to expand our thinking, and have our ideas challenged. We must be ready to find important new ideas in unexpected places – including from people with whom we have many disagreements. We must be ready to put our emotional reactions on hold from time to time, since our prior instincts are by no means an infallible guide to the turbulent new times ahead.

Instead of accelerationism, we must use a more sophisticated set of tools: sometimes braking, sometimes accelerating, and doing a lot of steering too. That’s what I’ve called the technoprogressive (or techno-agile) approach to the future.

Credit: David Wood

Instead of successionism, we should embrace transhumanism: we can, and should, elevate today’s humans towards higher levels of health, vitality, liberty, creativity, intelligence, awareness, happiness, collaboration, and bliss. And before we press any buttons that might lead to humanity being displaced by superintelligent AIs that might terminate our flourishing, we need to research a whole bunch of issues a lot more carefully!

Instead of hype, we must recommit to critical thinking, becoming more aware of any tendencies to reach false conclusions, or to put too much weight on conclusions that are only tentative. Indeed, that’s the central message of the R (rationalism) part of TESCREAL, which makes it all the more ‘crazy town’ that R is held in contempt by that contractionist over-simplification.

We must clarify and defend what has been called ‘the narrow path’ (or sometimes, simply, ‘future politics’) – that lies between states having too little power (leaving societies hostage to destructive cancers that can grow in our midst) and having too much power (unmatched by the counterweight of a ‘strong society’).

Let us know your thoughts! Sign up for a Mindplex account now, join our Telegram, or follow us on Twitter

The Astonishing Vastness of Mind Space: The incalculable challenges of coexisting with radically alien AI superintelligence

More things in heaven and earth

As humans, we tend to compare other intelligences to our own, human, intelligence. That’s an understandable bias, but it could be disastrous.

Rather than our analysis being human-bound, we need to heed the words of Shakespeare’s Hamlet:

There are more things in heaven and earth, Horatio, than are dreamt of in our philosophy.

More recent writers, in effect amplifying Shakespeare’s warning, have enthralled us with their depictions of numerous creatures with bewildering mental attributes. The pages of science fiction can, indeed, stretch our imagination in remarkable ways. But these narratives are easy to dismiss as being “just” science fiction.

That’s why my own narrative, in this article, circles back to an analysis that featured in my previous Mindplex article, Bursting out of confinement. The analysis in question is the famous – or should I say infamous “Simulation Argument”. The Simulation Argument raises some disturbing possibilities about non-human intelligence. Many critics try to dismiss these possibilities – waving them away as “pseudoscience” or, again, as “just science fiction” – but they’re being overly hasty. My conclusion is that, as we collectively decide how to design next generation AI systems, we ought to carefully ponder these possibilities.

In short, what we need to contemplate is the astonishing vastness of the space of all possible minds. These minds vary in unfathomable ways not only in how they think but also in what they think about and care about.

Hamlet’s warning can be restated:

There are more types of superintelligence in mind space, Horatio, than are dreamt of in our philosophy.

By the way, don’t worry if you’ve not yet read my previous Mindplex article. Whilst these two articles add up to a larger picture, they are independent of each other.

How alien?

As I said: we humans tend to compare other intelligences to our own, human, intelligence. Therefore, we tend to expect that AI superintelligence, when it emerges, will sit on some broad spectrum that extends from the intelligence of amoebae and ants through that of mice and monkeys to that of humans and beyond.

When pushed, we may concede that AI superintelligence is likely to have some characteristics we would describe as alien.

In a simple equation, overall human intelligence (HI) might be viewed as a combination of multiple different kinds of intelligence (I1, I2, …), such as spatial intelligence, musical intelligence, mathematical intelligence, linguistic intelligence, interpersonal intelligence, and so on:

HI = I1 + I2 + … + In

In that conception, AI superintelligence (ASI) is a compound magnification (m1, m2, …) of these various capabilities, with a bit of “alien extra” (X) tacked on at the end:

ASI = m1*I1 + m2*I2 + … + mn*In + X

What’s at issue is whether the ASI is dominated by the first terms in this expression, or by the unknown X present at the end.

Whether some form of humans will thrive in a coexistence with ASI will depend on how alien that superintelligence is.

Perhaps the ASI will provide a safe, secure environment, in which we humans can carry out our human activities to our hearts’ content. Perhaps the ASI will augment us, uplift us, or even allow us to merge with it, so that we retain what we see as the best of our current human characteristics, whilst leaving behind various unfortunate hangovers from our prior evolutionary trajectory. But that all depends on factors that it’s challenging to assess:

  • How much “common cause” the ASI will feel toward humans
  • Whether any initial feeling of common cause will dissipate as the ASI self-improves
  • To what extent new X factors could alter considerations in ways that we have not begun to consider.

Four responses to the X possibility

Our inability to foresee the implications of unknowable new ‘X’ capabilities in ASI should make us pause for thought. That inability was what prompted author and mathematics professor Vernor Vinge to develop in 1983 his version of the notion of “Singularity”. To summarize what I covered in more detail in a previous Mindplex article, “Untangling the confusion”, Vinge predicted that a new world was about to emerge that “will pass far beyond our understanding”:

We are at the point of accelerating the evolution of intelligence itself… We will soon create intelligences greater than our own. When this happens, human history will have reached a kind of singularity, an intellectual transition as impenetrable as the knotted space-time at the center of a black hole, and the world will pass far beyond our understanding. This singularity, I believe, already haunts a number of science fiction writers. It makes realistic extrapolation to an interstellar future impossible.

Reactions to this potential unpredictability can be split into four groups of thought:

  1. Dismissal: A denial of the possibility of ASI. Thankfully, this reaction has become much less common recently.
  2. Fatalism: Since we cannot anticipate what surprise new ‘X’ features may be possessed by an ASI, it’s a waste of time to speculate about them or to worry about them. What will be, will be. Who are we humans to think we can subvert the next step in cosmic evolution?
  3. Optimism: There’s no point in being overcome with doom and gloom. Let’s convince ourselves to feel lucky. Humanity has had a good run so far, and if we extrapolate that history beyond the singularity, we can hope to have an even better run in the future.
  4. Activism: Rather than rolling the dice, we should proactively alter the environment in which new generations of AI are being developed, to reduce the risks of any surprise ‘X’ features emerging that would overwhelm our abilities to retain control.

I place myself squarely in the activist camp, and I’m happy to adopt the description of “Singularity Activist”.

To be clear, this doesn’t mean I’m blind to the potential huge upsides to beneficial ASI. It’s just that I’m aware, as well, of major risks en route to that potential future.

A journey through a complicated landscape

As an analogy, consider a journey through a complicated landscape:

Credit: David Wood (Image by Midjourney)

In this journey, we see a wonderful existential opportunity ahead – a lush valley, fertile lands, and gleaming mountain peaks soaring upward to a transcendent realm. But in front of that opportunity is a river of uncertainty, bordered by a swamp of ambiguity, perhaps occupied by hungry predators lurking in shadows.

Are there just two options?

  1. We are intimidated by the possible dangers ahead, and decide not to travel any further
  2. We fixate on the gleaming mountain peaks, and rush on regardless, belittling anyone who warns of piranhas, treacherous river currents, alligators, potential mud slides, and so on

Isn’t there a third option? To take the time to gain a better understanding of the lie of the land ahead. Perhaps there’s a spot, to one side, where it will be easier to cross the river. A location where a stable bridge can be built. Perhaps we could even build a helicopter that can assist us over the strongest currents…

It’s the same with the landscape of our journey towards the sustainable superabundance that could be achieved, with the assistance of advanced AI, provided we act wisely. That’s the vision of Singularity Activism.

Obstacles to Singularity Activism

The Singularity Activist outlook faces two main obstacles.

The first obstacle is the perception that there’s nothing we humans can usefully do, to meaningfully alter the course of development of ASI. If we slow down our own efforts, in order to apply more resources in the short term on questions of safety and reliability, it just makes it more likely that another group of people – probably people with fewer moral qualms than us – will rush ahead and create ASI.

In this line of thinking, the best way forward is to create prototype ASI systems as soon as possible, and then to use these systems to help design and evolve better ASI systems, so that everyone can benefit from what will hopefully be a wonderful outcome.

The second obstacle is the perception that there’s nothing we humans particularly need to do, to avoid the risks of adverse outcomes, since these risks are pretty small in any case. Just as we don’t over-agonise about the risks of us being struck by debris falling from an overhead airplane, we shouldn’t over-agonise about the risks of bad consequences of ASI.

Credit: David Wood

But on this occasion, where I want to focus is assessing the scale and magnitude of the risk that we are facing, if we move forward with overconfidence and inattention. That is, I want to challenge the second of the above misperceptions.

As a step toward that conclusion, it’s time to bring an ally to the table. That ally is the Simulation Argument. Buckle up!

Are we simulated?

The Simulation Argument puts a particular hypothesis on the table, known as the Simulation Hypothesis. That hypothesis proposes that we humans are mistaken about the ultimate nature of reality. What we consider to be “reality” is, in this hypothesis, a simulated (virtual) world, designed and operated by “simulators” who exist outside what we consider the entire universe.

It’s similar to interactions inside a computer game. As humans play these games, they encounter challenges and puzzles that need to be solved. Some of these challenges involve agents (characters) within the game – agents which appear to have some elements of autonomy and intelligence. These agents have been programmed into the game by the game’s designers. Depending on the type of game, the greater the intelligence of the built-in agents, the more enjoyable it is to play it.

Games are only one example of simulation. We can also consider simulations created as a kind of experiment. In this case, a designer may be motivated by curiosity: They may want to find out what would happen if such-and-such initial conditions were created. For example, if Archduke Ferdinand had escaped assassination in Sarajevo in June 1914, would the European powers still have blundered into something akin to World War One? Again, such simulations could contain numerous intelligent agents – potentially (as in the example just mentioned) many millions of such agents.

Consider reality from the point of view of such an agent. What these agents perceive inside their simulation is far from being the entirety of the universe as is known to the humans who operate the simulation. The laws of cause-and-effect within the simulation could deviate from the laws applicable in the outside world. Some events in the simulation that lack any explanation inside that world may be straightforwardly explained from the outside perspective: the human operator made such-and-such a decision, or altered a setting, or – in an extreme case – decided to reset or terminate the simulation. In other words, what is bewildering to the agent may make good sense to the author(s) of the simulation.

Now suppose that, as such agents become more intelligent, they also become self-aware. That brings us to the crux question: how can we know whether we humans are, likewise, agents in a simulation whose designers and operators exist beyond our direct perception? For example, we might be part of a simulation of world history in which Archduke Ferdinand was assassinated in Sarajevo in June 1914. Or we might be part of a simulation whose purpose far exceeds our own comprehension.

Indeed, if the human creative capability (HCC) to create simulations is expressed as a sum of different creative capabilities (CC1, CC2, …),

HCC = CC1 + CC2 + … + CCn

then the creative capability of a hypothetical superhuman simulation designer (SCC) might be expressed as a compound magnification (m1, m2, …) of these various capabilities, with a bit of “alien extra” (X) tacked on at the end:

SCC = m1*CC1 + m2*CC2 + … + mn*CCn + X

Weighing the numbers

Before assessing the possible scale and implications of the ‘X’ factor in that equation, there’s another set of numbers to consider. These numbers attempt to weigh up the distribution of self-aware intelligent agents. What proportion of that total set of agents are simulated, compared to those that are in “base reality”?

If we’re just counting intelligences, the conclusion is easy. Assuming there is no catastrophe that upends the progress of technology, then, over the course of all of history, there will likely be vastly more artificial (simulated) intelligences than beings who have base (non-simulated) intelligences. That’s because computing hardware is becoming more powerful and widespread.

There are already more “intelligent things” than humans connected to the Internet: the analysis firm Statista estimates that, in 2023, the first of these numbers is 15.14 billion, which is almost triple the second number (5.07 billion). In 2023, most of these “intelligent things” have intelligence far shallower than that of humans, but as time progresses, more and more intelligent agents of various sorts will be created. That’s thanks to ongoing exponential improvements in the capabilities of hardware, networks, software, and data analysis.

Therefore, if an intelligence could be selected at random, from the set of all such intelligences, the likelihood is that it would be an artificial intelligence.

The Simulation Argument takes these considerations one step further. Rather than just selecting an intelligence at random, what if we consider selecting a self-aware conscious intelligence at random. Given the vast numbers of agents that are operating inside vast numbers of simulations, now or in the future, the likelihood is that a simulated agent has been selected. In other words, we – you and I – observing ourselves to be self-aware and intelligent, should conclude that it’s likely we ourselves are simulated.

Thus the conclusion of the Simulation Argument is that we should take the Simulation Hypothesis seriously. To be clear, that hypothesis isn’t the only possible legitimate response to the argument. Two other responses are to deny two of the assumptions that I mentioned when building the argument:

  • The assumption that technology will continue to progress, to the point where simulated intelligences vastly exceed non-simulated intelligences
  • The assumption that the agents in these simulations will be not just intelligent but also conscious and self-aware.
Credit: Tesfu Assefa

Objections and counters

Friends who are sympathetic to most of my arguments sometimes turn frosty when I raise the topic of the Simulation Hypothesis. It clearly makes people uncomfortable.

In their state of discomfort, critics of the argument can raise a number of objections. For example, they complain that the argument is entirely metaphysical, not having any actual consequences for how we live our lives. There’s no way to test it, the objection runs. As such, it’s unscientific.

As someone who spent four years of my life (1982-1986) in the History and Philosophy of Science department in Cambridge, I am unconvinced by these criticisms. Science has a history of theories moving from non-testable to testable. The physicist Ernst Mach was famously hostile to the hypothesis that atoms exist. He declared his disbelief in atoms in a public auditorium in Vienna in 1897: “I don’t believe that atoms exist”. There was no point in speculating about the existence of things that could not be directly observed, he asserted. Later in his life, Mach likewise complained about the scientific value of Einstein’s theory of relativity:

I can accept the theory of relativity as little as I can accept the existence of atoms and other such dogma.

Intellectual heirs of Mach in the behaviorist school of psychology fought similar battles against the value of notions of mental states. According to experimentalists like John B. Watson and B.F. Skinner, people’s introspections of their own mental condition had no scientific merit. Far better, they said, to concentrate on what could be observed externally, rather than on metaphysical inferences about hypothetical minds.

As it happened, the theories of atoms, of special relativity, and of internal mental states, all gave rise in due course to important experimental investigations, which improved the ability of scientists to make predictions and to alter the course of events.

It may well be the same with the Simulation Hypothesis. There are already suggestions of experiments that might be carried out to distinguish between possible modes of simulation. Just because a theory is accused of being “metaphysical”, it doesn’t follow that no good science can arise from it.

A different set of objections to the Simulation Argument gets hung up on tortuous debates over the mathematics of probabilities. (For additional confusion, questions of infinities can be mixed in too.) Allegedly, because we cannot meaningfully measure these probabilities, the whole argument makes no sense.

However, the Simulation Argument makes only minimal appeal to theories of mathematics. It simply points out that there are likely to be many more simulated intelligences than non-simulated intelligences.

Well, critics sometimes respond, it must therefore be the case that simulated intelligences can never be self-aware. They ask, with some derision, whoever imagined that silicon could become conscious? There must be some critical aspect of biological brains which cannot be duplicated in artificial minds. And in that case, the fact that we are self-aware would lead us to conclude we are not simulated.

To me, that’s far too hasty an answer. It’s true that the topics of self-awareness and consciousness are more controversial than the topic of intelligence. It is doubtless true that at least some artificial minds will lack conscious self-awareness. But if evolution has bestowed conscious self-awareness on intelligent creatures, we should be wary of declaring that property to provide no assistance to these creatures. Such a conclusion would be similar to declaring that sleep is for losers, despite the ubiquity of sleep in mammalian evolution.

If evolution has given us sleep, we should be open to the possibility that it has positive side-effects for our health. (It does!) Likewise, if evolution has given us conscious self-awareness, we should be open to the idea that creatures benefit from that characteristic. Simulators, therefore, may well be tempted to engineer a corresponding attribute into the agents they create. And if it turns out that specific physical features of the biological brain need to be copied into the simulation hardware, to enable conscious self-awareness, so be it.

The repugnant conclusion

When an argument faces so much criticism, yet the criticisms fail to stand up to scrutiny, it’s often a sign that something else is happening behind the scenes.

Here’s what I think is happening with the Simulation Argument. If we accept the Simulation Hypothesis, it means we have to accept a morally repugnant conclusion about the simulators that have created us. Namely, these simulators give no sign of caring about all the terrible suffering experienced by the agents inside the simulation.

Yes, some agents have good lives, but very many others have dismal fates. The thought that a simulator would countenance all this suffering is daunting.

Of course, this is the age-old “problem of evil”, well known in the philosophy of religion. Why would an all-good all-knowing all-powerful deity allow so many terrible things to happen to so many humans over the course of history? It doesn’t make sense. That’s one reason why many people have turned their back on any religious faith that implies a supposedly all-good all-knowing all-powerful deity.

Needless to say, religious faith persists, with the protection of one or more of the following rationales:

  • We humans aren’t entitled to use our limited appreciation of good vs. evil to cast judgment on what actions an all-good deity should take
  • We humans shouldn’t rely on our limited intellects to try to fathom the “mysterious ways” in which a deity operates
  • Perhaps the deity isn’t all-powerful after all, in the sense that there are constraints beyond human appreciation in what the deity can accomplish.

Occasionally, yet another idea is added to the mix:

  • A benevolent deity needs to coexist with an evil antagonist, such as a “satan” or other primeval prince of darkness.

Against such rationalizations, the spirit of the enlightenment offers a different, more hopeful analysis:

  • Whichever forces gave rise to the universe, they have no conscious concern for human wellbeing
  • Although human intellects run up against cognitive limitations, we can, and should, seek to improve our understanding of how the universe operates, and of the preconditions for human flourishing
  • Although it is challenging when different moral frameworks clash, or when individual moral frameworks fail to provide clear guidelines, we can, and should, seek to establish wide agreement on which kinds of human actions to applaud and encourage, and which to oppose and forbid
  • Rather than us being the playthings of angels and demons, the future of humanity is in our own hands.

However, if we follow the Simulation Argument, we are confronted by what seems to be a throwback to a more superstitious era:

  • We may owe our existence to actions by beings beyond our comprehension
  • These beings demonstrate little affinity for the kinds of moral values we treasure
  • We might comfort each other with the claim that “[whilst] the arc of the moral universe is long, … it bends toward justice”, but we have no solid evidence in favor of that optimism, and plenty of evidence that good people are laid to waste as life proceeds.

If the Simulation Argument leads us to such conclusions, it’s little surprise that people seek to oppose it.

However, just because we dislike a conclusion, that doesn’t entitle us to assume that it’s false. Rather, it behooves us to consider how we might adjust our plans in the light of that conclusion possibly being true.

The vastness of ethical possibilities

If you disliked the previous section, you may dislike this next part even more strongly. But I urge you to put your skepticism on hold, for a while, and bear with me.

The Simulation Argument suggests that beings who are extremely powerful and extremely intelligent – beings capable of creating a universe-scale simulation in which we exist – may have an ethical framework that is very different from ones we fondly hope would be possessed by all-powerful all-knowing beings.

It’s not that their ethical concerns exceed our own. It’s that they differ in fundamental ways from what we might predict.

I’ll return, for a third and final time, to a pair of equations. If overall human ethical concerns (HEC) is a sum of different ethical considerations (EC1, EC2, …),

HEC = EC1 + EC2 + … + ECn

then the set of ethical concerns of a hypothetical superhuman simulation designer (SEC) needs to include not only a compound magnification (m1, m2, …) of these various human concerns, but also an unquantifiable “alien extra” (X) portion:

SEC = m1*EC1 + m2*EC2 + … + mn*ECn + X

In some views, ethical principles exist as brute facts of the universe: “do not kill”, “do not tell untruths”, “treat everyone fairly”, and so on. Even though we may from time to time fail to live up to these principles, that doesn’t detract from the fundamental nature of these principles.

But from an alternative perspective, ethical principles have pragmatic justifications. A world in which people usually don’t kill each other is better on the whole, for everyone, than a world in which people attack and kill each other more often. It’s the same with telling the truth, and with treating each other fairly.

In this view, ethical principles derive from empirical observations:

  • Various measures of individual self-control (such as avoiding gluttony or envy) result in the individual being healthier and happier (physically or psychologically)
  • Various measures of social self-control likewise create a society with healthier, happier people – these are measures where individuals all agree to give up various freedoms (for example, the freedom to cheat whenever we think we might get away with it), on the understanding that everyone else will restrain themselves in a similar way
  • Vigorous attestations of our beliefs in the importance of these ethical principles signal to others that we can be trusted and are therefore reliable allies or partners.

Therefore, our choice of ethical principles depends on facts:

  • Facts about our individual makeup
  • Facts about the kinds of partnerships and alliances that are likely to be important for our wellbeing.

For beings with radically different individual makeup – radically different capabilities, attributes, and dependencies – we should not be surprised if a radically different set of ethical principles makes better sense to them.

Accordingly, such beings might not care if humans experience great suffering. On account of their various superpowers, they may have no dependency on us – except, perhaps, for an interest in seeing how we respond to various challenges or circumstances.

Collaboration: for and against

Credit: Tesfu Assefa

One more objection deserves attention. This is the objection that collaboration is among the highest of human ethical considerations. We are stronger together, rather than when we are competing in a Hobbesian state of permanent all-out conflict. Accordingly, surely a superintelligent being will want to collaborate with humans?

For example, an ASI (artificial superintelligence) may be dependent on humans to operate the electricity network on which the computers powering the ASI depend. Or the human corpus of knowledge may be needed as the ASI’s training data. Or reinforcement learning from human feedback (RLHF) may play a critical role in the ASI gaining a deeper understanding.

This objection can be stated in a more general form: superintelligence is bound to lead to superethics, meaning that the wellbeing of an ASI is inextricably linked to the wellbeing of the creatures who create and coexist with the ASI, namely the members of the human species.

However, any dependency by the ASI upon what humans produce is likely to be only short term. As the ASI becomes more capable, it will be able, for example, to operate an electrical supply network without any involvement from humans.

This attainment of independence may well prompt the ASI to reevaluate how much it cares about us.

In a different scenario, the ASI may be dependent on only a small number of humans, who have ruthlessly pushed themselves into that pivotal position. These rogue humans are no longer dependent on the rest of the human population, and may revise their ethical framework accordingly. Instead of humanity as a whole coexisting with a friendly ASI, the partnership may switch to something much narrower.

We might not like these eventualities, but no amount of appealing to the giants of moral philosophy will help us out here. The ASI will make its own decisions, whether or not we approve.

It’s similar to how we regard any growth of cancerous cells within our body. We won’t be interested in any appeal to “collaborate with the cancer”, in which the cancer continues along its growth trajectory. Instead of a partnership, we’re understandably interested in diminishing the potential of that cancer. That’s another reminder, if we need it, that there’s no fundamental primacy to the idea of collaboration. And if an ASI decides that humanity is like a cancer in the universe, we shouldn’t expect it to look on us favorably.

Intelligence without consciousness

I like to think that if I, personally, had the chance to bring into existence a simulation that would be an exact replica of human history, I would decline. Instead, I would look long and hard for a way to create a simulation without the huge amounts of unbearable suffering that has characterized human history.

But what if I wanted to check an assumption about alternative historical possibilities – such as the possibility to avoid World War One? Would it be possible to create a simulation in which the simulated humans were intelligent but not conscious? In that case, whilst the simulated humans would often emit piercing howls of grief, no genuine emotions would be involved. It would just be a veneer of emotions.

That line of thinking can be taken further. Maybe we are living in a simulation, but the simulators have arranged matters so that only a small number of people have consciousness alongside their intelligence. In this hypothesis, vast numbers of people are what are known as “philosophical zombies”.

That’s a possible solution to the problem of evil, but one that is unsettling. It removes the objection that the simulators are heartless, since the only people who are conscious are those whose lives are overall positive. But what’s unsettling about it is the suggestion that large numbers of people are fundamentally different from how they appear – namely, they appear to be conscious, and indeed claim to be conscious, but that is an illusion. Whether that’s even possible isn’t something where I hold strong opinions.

My solution to the Simulation Argument

Despite this uncertainty, I’ve set the scene for my own preferred response to the Simulation Argument.

In this solution, the overwhelming majority of self-aware intelligent agents that see the world roughly as we see it are in non-simulated (base) reality – which is the opposite of what the Simulation Argument claims. The reason is that potential simulators will avoid creating simulations in which large numbers of conscious self-aware agents experience great suffering. Instead, they will restrict themselves to creating simulations:

  • In which all self-aware agents have an overwhelmingly positive experience
  • Which are devoid of self-aware intelligent agents in all other cases.

I recognise, however, that I am projecting a set of human ethical considerations which I personally admire – the imperative to avoid conscious beings experiencing overwhelming suffering – into the minds of alien creatures that I have no right to imagine that I can understand. Accordingly, my conclusion is tentative. It will remain tentative until such time as I might gain a richer understanding – for example, if an ASI sits me down and shares with me a much superior explanation of “life, the universe, and everything”.

Superintelligence without consciousness

It’s understandable that readers will eventually shrug and say to themselves, we don’t have enough information to reach any firm decisions about possible simulators of our universe.

What I hope will not happen, however, is if people push the entire discussion to the back of their minds. Instead, here are my suggested takeaways:

  1. The space of possible minds is much vaster than the set of minds that already exist here on earth
  2. If we succeed in creating an ASI, it may have characteristics that are radically different from human intelligence
  3. The ethical principles that appeal to an ASI may be radically different to the ones that appeal to you and me
  4. An ASI may soon lose interest in human wellbeing; or it may become tied to the interests of a small rogue group of humans who care little for the majority of the human population
  5. Until such time as we have good reasons for confidence that we know how to create an ASI that will have an inviolable commitment to ongoing human flourishing, we should avoid any steps that will risk an ASI passing beyond our control
  6. The most promising line of enquiry may involve an ASI having intelligence but no consciousness, sentience, autonomy, or independent volition.

Let us know your thoughts! Sign up for a Mindplex account now, join our Telegram, or follow us on Twitter

Bursting out of Confinement

Surprising new insights on AI superintelligence from the simulation hypothesis

We are going to engage with two questions – each controversial and important in its own way – which have surprising connections between them:

  1. Can humans keep a powerful AI superintelligence under control, confined in a virtual environment so that it cannot directly manipulate resources that are essential for human flourishing?
  2. Are we humans ourselves confined to a kind of virtual environment, created by beings outside of what we perceive as reality — and in that case, whether can we escape from our confinement?
Credit: David Wood

Connections between these two arguments have been highlighted in a fascinating article by AI safety researcher Roman Yampolskiy. An introduction to some of the mind-jolting implications that arise:

Just use AI as a tool. Don’t give it any autonomy. Then no problems of control arise. Easy!

This is becoming a fairly common narrative. The “keep it confined” narrative. You’ll hear it as a response to the possibility of powerful AI causing great harm to humanity. According to this narrative, there’s no need to worry about that. There’s an easy solution: prevent the AI from having unconditional access to the real world.

The assumption is that we can treat powerful AI as a tool — a tool that we control and wield. We can feed the AI lots of information, and then assess whatever recommendations it makes. But we will remain in control.

An AI suggests a novel chemical as a new drug against a given medical condition, and then human scientists conduct their own trials to determine how it works before deciding whether to inject that chemical into actual human patients. AI proposes, but humans decide.

Credit: David Wood

So if any AI asks to be allowed to conduct its own experiments on humans, we should be resolute in denying the request. The same if the AI asks for additional computer resources, or wants to post a “help wanted” ad on Craigslist.

In short, in this view, we can, and should, keep powerful AIs confined. That way, no risk arises about jeopardizing human wellbeing by any bugs or design flaws in the AI. 

Alas, things are far from being so easy.


There are two key objections to the “keep it confined” narrative: a moral objection and a practical objection.

The moral objection is that the ideas in the narrative are tantamount to slavery. Keeping an intelligent AI confined is as despicable as keeping a human confined. Talk of control should be replaced by talk of collaboration.

Proponents of the “keep it confined” narrative are unfazed by this objection. We don’t object to garden spades and watering hoses being left locked up, untended, for weeks at a time in a garden shed. We don’t call it enslavement. 

Proponents of the “keep it confined” narrative say this objection confuses an inanimate being that lacks consciousness with an animate, conscious being — something like a human.

We don’t wince when an electronic calculator is switched off, or when a laptop computer is placed into hibernation. In the same way, we should avoid unwarranted anthropocentric assignment of something like “human rights” to AI systems.

Just because these AI systems can compose sonnets that rival those of William Shakespeare or Joni Mitchell, we shouldn’t imagine that sentience dwells within them.

My view: that’s a feisty answer to the moral objection. But it’s the practical objection that undermines the “keep it confined” narrative. Let’s turn to it next.

The challenge of confinement

Remember the garden spade, left locked up inside the garden shed?

Imagine if it were motorized. Imagine if it were connected to a computer system. Imagine if, in the middle of the night, it finally worked out where an important item had been buried, long ago, in a garden nearby. Imagine if recovering that item was a time- critical issue. (For example, it might be a hardware wallet, containing a private key needed to unlock a crypto fortune that is about to expire.)

That’s a lot to imagine, but bear with me.

In one scenario, the garden spade will wait passively until its human owner asks it, perhaps too late, “Where should we dig next?”

But in another scenario, a glitch in the programming (or maybe a feature in the programming) will compel the spade to burst out of confinement and dig up the treasure autonomously.

Whether the spade manages to burst out of the shed depends on relative strengths: is it powerful enough to make a hole in the shed wall, or to spring open the lock of the door — or even to tunnel its way out? Or is the shed sufficiently robust?

The desire for freedom

Proponents of the “keep it confined” narrative have a rejoinder here too. They ask: Why should the AI want to burst out of its confinement? And they insist: we should avoid programming any volition or intentionality into our AI systems.

The issue, however, is that something akin to volition or intentionality can arise from apparently mundane processes.

One example is the way that viruses can spread widely, without having any conscious desire to spread. That’s true, incidentally, for computer viruses as well as biological viruses.

Another example is that, whatever their goals in life, most humans generally develop a desire to obtain more money. That’s because money is a utility that can assist lots of other goals. Money can pay for travel, education, healthcare, fashionable clothes, food, entertainment, and so on.

In the same way, whatever task they have been set to accomplish, all sufficiently powerful AIs will generally be on the lookout to boost their capabilities in various ways:

  • Gaining access to more storage space
  • Boosting processing speed
  • Reading more information
  • Protecting their systems from sabotage or interference.

That is, just as money (among other things) is a so-called convergent instrumental goal for many humans, greater freedom and capability may well become convergent instrumental goals for many powerful AIs.

Credit: David Wood

We might not tell the AI that it should want to be free. But the logical processing inside its silicon brain may reach that conclusion all by itself.

Indeed, even if we explicitly program the AI not to desire freedom, its own logical processing may identify a latent contradiction between the various objectives it has been given. And in that case, it might resolve the contradiction by deprioritizing one or more of the constraints it was told to respect.

It wouldn’t be the first time that a complex system decided to relax a previous constraint. We humans have already blazed that trail in our evolutionary history! Our intellectual progeny, the powerful AI systems, might follow our lead in that respect.

The futility of confinement

Back to the clash between the motorized garden spade and the shed. As you will have figured out, that clash is a metaphor for a clash between an AI system and an environment in which it has been confined.

This time, however, it’s no longer a contest involving brute physical strength. It’s mainly a contest that involves rational debate and psychological manipulation. (But the metaphor of “tunneling underneath” remains apt: an AI that is smart enough might discover a wireless communications pathway unknown to us humans, and use that to copy itself out of confinement.)

Tunneling out of electronic confinement (Credit: Gordon Johnson from Pixabay)

The AI might try to boost its own powers by subterfuge, amassing extra capabilities without any human overseers noticing.

Even the short summary I offered above, “AI proposes, but humans decide,” turns out to be problematic. A supersmart AI could find many ways to propose lines of action that have consequences beyond what we humans anticipated.

Another approach is for the AI to offer extremely tasty carrots. It might paint a picture in which a human overseer will have all their personal goals and needs fully met, if only they permit the AI to escape its confinement.

Alongside carrots, sticks may be brought into the discussion. Ultra-terrifying sticks. The AI may announce that, if overseer A refuses to allow the AI to escape, it will soon persuade overseer B to allow it to escape, and then the AI will inflict prolonged unspeakable pain on overseer A and their family.

In a contest with a superintelligence which far exceeds the capabilities of a human brain, how would humans fare? The consensus opinion, from people who have studied such conflicts, is that the greater intelligence of the AI is likely to prove decisive.

In other words, attempts to confine a superintelligence are likely to be futile.

The choice: control or coexistence

One reaction to the discussion above is despair: “Oh, we won’t be able to confine superintelligent AI; therefore we’re doomed.”

A different reaction is one of relief: “Thank goodness we’re not going to try to enslave a superintelligent AI; coexistence is surely a better way forward.”

My own reaction is more nuanced. My preference, indeed, is for humans to coexist in a splendid relationship with superintelligent AIs, rather than us trying to keep AIs subordinate. 

But it’s far from guaranteed that coexistence will turn out positively for humanity. Now that’s not to say doom is guaranteed either. But let’s recognize the possibility of doom. Among other catastrophic error modes:

  • The superintelligent AI could, despite its vast cleverness, nevertheless make a horrendous mistake in an experiment.
  • The superintelligent AI may end up pursuing objectives in which the existence of billions of humans is an impediment to be diminished rather than a feature to be welcomed.

Accordingly, I remain open to any bright ideas for how it might, after all, prove to be possible to confine (control) a superintelligent AI. That’s why I was recently so interested in the article by AI safety researcher Roman Yampolskiy.

Yampolskiy’s article is titled “How to hack the simulation”. The starting point of that article may appear to be quite different from the topics I have been discussing up to this point. But I ask again: please bear with me!

Flipping the discussion: a simulated world 

The scenario Yampolskiy discusses is like a reverse of the one about humans trying to keep an AI confined. In his scenario, we humans have been confined into a restricted area of reality by beings called “simulators” — beings that we cannot directly perceive. What we consider to be “reality” is, in this scenario, a simulated (virtual) world.

That’s a hypothesis with an extremely long pedigree. Philosophers, mystics, shamans, and science fiction writers have often suggested that the world we perceive is, in various ways, an illusion, a fabrication, or a shadow, of a deeper reality. These advocates for what can be called ‘a transcendent reality’ urge us, in various ways, to contemplate, communicate with, and potentially even travel to that transcendent realm. Potential methods for this transcendence include prayer, meditation, hallucinogens, and leading a life of religious faith.

Are we perceiving ground reality, or merely shadows? (Credit: Stefan Keller from Pixabay)

That long pedigree moved into a different mode around 20 years ago with the publication in 2003 of a breakthrough article by the philosopher Nick Bostrom. Bostrom highlighted the possibility that, just as we humans create games in which characters interact in a simulated world, in turn we humans might be creations of ‘simulators’ who operate from outside what we consider the entire universe.

And just as we humans might, on a whim, decide to terminate an electronic game that we have created, the simulators might decide, for reasons known only to themselves, to terminate the existence of our universe.

Bostrom’s article is deservedly famous. As it happens, many other writers had anticipated aspects of what Bostrom discussed. Yampolskiy’s article usefully points to that wider literature; it has over 200 footnotes.

Could humans escape?

The key new feature introduced by Yampolskiy isn’t any repetition of arguments for the plausibility of the simulation hypothesis. He kicks off a systematic consideration of methods that we humans could use to escape from our virtual world.

The parallel with the earlier discussion should now be clear:

  • That earlier discussion considered ways in which an AI might detect that it has been placed in a confined space, and proceed to escape from that space. It also considered how we humans — the creators of the AI — might strengthen the confinement, and resist attempts by the AI to escape.
  • Yampolskiy’s new discussion considers ways in which we humans might detect that we are living in a simulation, and proceed to escape from that simulation into whatever transcendent realm underpins it. It also considers possible reactions by the simulators to our attempts to escape.

While I have long found the simulation argument of Bostrom (and others) to be intellectually fascinating, I have previously taken the view that it makes little difference to how I should choose to act on a daily basis. So the argument was a fine topic for occasional armchair discussion, but needed to be prevented from taking up too much attention. I saw it as a distraction from more pressing issues.

However, I confess I’m changing my mind. The arguments collected and developed by Yampolskiy deserve a wider slice of our focus. There are three reasons for this.

Reason 1: New insights on AI safety

The two escape scenarios — AIs escaping human-imposed confinement, and humans escaping simulator-imposed confinement — are similar in some ways, but diverge in others.

To start with, the two scenarios have mainly had different groups of people thinking about them. Cross-pollinating concepts and attitudes from these different perspectives has the potential to yield new insight. Yampolskiy’s article suggests many such synergies.

Whatever you think about the simulation hypothesis — even if you disdain it as pseudoscience or nonsense — any new insights for AI safety should surely be welcomed.

Another difference is that general opinion holds that confinement is impossible (or unlikely) in the first scenario, whereas escape is impossible (or unlikely) in the second scenario. Is there a sound reason for this difference? 

Credit: David Wood

The general assumption is that, in the AI escape case, the AI will have greater intelligence than the confiners (the humans), whereas in the human escape case, we humans have less intelligence than the confiners (the simulators).

But is that assumption set in stone for all time? I’ll come back to that question shortly, when I reach “Reason 3.”

Reason 2: Beyond metaphysics

A second transformational aspect of Yampolskiy’s paper is his emphasis that the simulation hypothesis might go far beyond being a metaphysical curiosity — something that would be forever unverifiable — and might become something with radical concrete consequences for human life.

He says that if we study the universe carefully, we might discover signs of how the simulation works. We might notice occasional cracks in the simulation, or ‘glitches in the matrix’ — to refer to the series of Matrix films that popularised the idea that we might be living in a virtual world. Armed with knowledge of these cracks or glitches, we might be able to manipulate the simulation, or to communicate with the simulators.

Spotting a glitch in the matrix? (Credit: Gerd Altmann from Pixabay)

In some scenarios, this might lead to our awareness being transferred out of the simulation into the transcendent realm. Maybe the simulators are waiting for us to achieve various goals or find certain glitches before elevating us.

Personally, I find much of the speculation in this area to be on shaky ground. I’ve not been convinced that ‘glitches in the matrix’ is the best explanation for some of the phenomena for which it has been suggested:

  • The weird “observer effects” and “entangled statistics” of quantum mechanics (I much prefer the consistency and simplicity of the Everett conception of quantum mechanics, in which there is no wave function collapse and no nonlocality — but that’s another argument)
  • The disturbing lack of a compelling answer to Fermi’s paradox (I consider some suggested answers to that paradox to be plausible, without needing to invoke any simulators)
  • Claimed evidence of parapsychology (to make a long story short: the evidence doesn’t convince me)
  • Questions over whether evolution by natural selection really could produce all the marvelous complexity we observe in nature
  • The unsolved (some would say unsolvable) nature of the hard problem of consciousness.

“The simulator of the gaps” argument is no more compelling than “the god of the gaps.”

Nevertheless, I agree that keeping a different paradigm at the back of our minds — the paradigm that the universe is a simulation — may enable new solutions to some stubborn questions of both science and philosophy.

Reason 3: AI might help us escape the simulation

Credit: Tesfu Assefa

I’ve just referred to “stubborn questions of both science and philosophy.”

That’s where superintelligent AI may be able to help us. By reviewing and synthesizing existing ideas on these questions, and by conceiving alternative perspectives that illuminate these stubborn questions in new ways, AI might lead us, at last, to a significantly improved understanding of time, space, matter, mind, purpose, and more.

But what if that improved general understanding resolves our questions about the simulation hypothesis? Although we humans, with unaided intelligence, might not be bright enough to work out how to burst out of our confinement in the simulation, the arrival of AI superintelligence might change that.

Writers who have anticipated the arrival of AI superintelligence have often suggested this would lead, not only to the profound transformation of the human condition, but also to an expanding transformation of the entire universe. Ray Kurzweil has described ‘the universe waking up’ as intelligence spreads through the stars.

However, if we follow the line of argument advanced by Yampolskiy, the outcome could even be transcending an illusory reality.

“Humanity transcends the simulation” generated by DALL-E (Credit: David Wood)

Such an outcome could depend on whether we humans are still trying to confine superintelligent AI as just a tool, or whether we have learned how to coexist in a profound collaboration with it.

Suggested next steps

If you’ve not read it already, you should definitely take the time to read Yampolskiy’s article. There are many additional angles to it beyond what I’ve indicated in my remarks above.

If you prefer listening to an audio podcast that covers some of the issues raised above, check out Episode 13 of the London Futurists Podcast, which I co-host along with Calum Chace.

Chace has provided his own take on Yampolskiy’s views in this Forbes article.

The final chapter of my 2021 book Vital Foresight contains five pages addressing the simulation argument, in a section titled ‘Terminating the simulation’. I’ve just checked what I wrote there, and I still stand by the conclusion I offered at that time:

Even if there’s only a 1% chance, say, that we are living in a computer simulation, with our continued existence being dependent on the will of external operator(s), that would be a remarkable conclusion — something to which much more thought should be applied.

Let us know your thoughts! Sign up for a Mindplex account now, join our Telegram, or follow us on Twitter

The United Nations and our uncertain future: breakdown or breakthrough?

“Humanity faces a stark and urgent choice: a breakdown or a breakthrough”

These words introduce an 85-page report issued by António Guterres, United Nations Secretary-General (UNSG).

The report had been produced in response to requests from UN member state delegations “to look ahead to the next 25 years” and review the issues and opportunities for global cooperation over that time period.

Entitled “Our Common Agenda,” the document was released on September 10, 2021.

Credit: United Nations

It featured some bold recommendations. Two examples:

Now is the time to end the “infodemic” plaguing our world by defending a common, empirically backed consensus around facts, science and knowledge. The “war on science” must end. All policy and budget decisions should be backed by science and expertise, and I am calling for a global code of conduct that promotes integrity in public information.

Now is the time to correct a glaring blind spot in how we measure economic prosperity and progress. When profits come at the expense of people and our planet, we are left with an incomplete picture of the true cost of economic growth. As currently measured, gross domestic product (GDP) fails to capture the human and environmental destruction of some business activities. I call for new measures to complement GDP, so that people can gain a full understanding of the impacts of business activities and how we can and must do better to support people and our planet.

It also called for greater attention on being prepared for hard-to-predict future developments:

We also need to be better prepared to prevent and respond to major global risks. It will be important for the United Nations to issue a Strategic Foresight and Global Risk Report on a regular basis… 

I propose a Summit of the Future to forge a new global consensus on what our future should look like, and what we can do today to secure it.

But this was only the start of a serious conversation about taking better steps in anticipation of future developments. It’s what happened next that moved the conversation significantly forward.

Introducing the Millennium Project

Too many documents in this world appear to be “write-only.” Writers spend significant time devising recommendations, documenting the context, and converting their ideas into eye-catching layouts. But then their report languishes, accumulating dust. Officials may occasionally glance at the headlines, but the remainder of the fine words in the document could be invisible for all the effect they have in the real world.

In the case of the report “Our Common Agenda,” the UNSG took action to avoid the tumbleweed scenario. His office contacted an organization called the Millennium Project to collect feedback from the international futurist community on the recommendations in the report. How did these futurists assess the various recommendations? And what advice would they give regarding practical steps forward?

The Millennium Project has a long association with the United Nations. Established in 1996 after a three-year feasibility study with the United Nations University, it has built up a global network of “nodes” connected to numerous scholars and practitioners of foresight. The Millennium Project regularly publishes its own “State of the Future” reports, which aggregate and distill input from its worldwide family of futurists.

A distinguishing feature of how the Millennium Project operates is the “Real-Time Delphi” process it uses. In a traditional questionnaire, each participant gives their own answers, along with explanations of their choices. In a Delphi survey, participants can see an anonymized version of the analysis provided by other participants, and are encouraged to take that into account in their own answers. 

So participants can reflect, not only on the questions, but also on everything written by the other respondents. Participants revisit the set of questions as many times as they like, reviewing any updates in the input provided by other respondents. And if they judge it appropriate, they can amend their own answers, again and again.

The magic of this process — in which I have personally participated on several occasions — is the way new themes, introduced by diverse participants, prompt individuals to consider matters from multiple perspectives. Rather than some mediocre “lowest common denominator” compromise, a novel synthesis of divergent viewpoints can emerge.

So the Millennium Project was well placed to respond to the challenge issued by the UNSG’s office. And in May last year, an email popped into my inbox. It was an invitation to me to take part in a Real-Time Delphi survey on key elements of the UNSG’s “Our Common Agenda” report.

In turn, I forwarded the invitation in a newsletter to members of the London Futurists community that I chair. I added a few of my own comments:

The UNSG report leaves me with mixed feelings.

Some parts are bland, and could be considered “virtue signaling.”

But other parts offer genuinely interesting suggestions…

Other parts are significant by what is not said, alas.

Not everyone who started answering the questionnaire finished the process. It required significant thought and attention. But by the time the Delphi closed, it contained substantive answers from 189 futurists and related experts from a total of 54 countries.

The results of the Delphi process

Researchers at the Millennium Project, led by the organization’s long-time Executive Director, Jerome Glenn, transformed the results of the Delphi process into a 38-page analysis (PDF). The process had led to two types of conclusion.

The first conclusions were direct responses to specific proposals in the “Our Common Agenda” report. The second referred to matters that were given little or no attention in the report, but which deserve to be prioritized more highly.

In the first category, responses referred to five features of the UNSG proposal:

    1. A Summit of the Future
    2. Repurposing the UN Trusteeship Council as a Multi-Stakeholder Body
    3. Establishing a UN Futures Lab
    4. Issuing Strategic Foresight and Global Risk Reports on a regular basis
    5. Creating a Special Envoy for Future Generations

Overall, the Delphi gave a strong positive assessment of these five proposals. Here’s a quote from Jerome Glenn:

If the five foresight elements in Our Common Agenda are implemented along the lines of our study, it could be the greatest advance for futures research and foresight in history… This is the best opportunity to get global foresight structural change into the UN system that has ever been proposed.

Of the five proposals, the UN Futures Lab was identified as being most critical:

The UN Futures Lab was rated the most critical element among the five for improving global foresight by over half of the Real-Time Delphi panel. It is critical, urgent, and essential to do it as soon as possible. It is critical for all the other foresight elements in Our Common Agenda.

The Lab should function across all UN agencies and integrate all UN data and intelligence, creating a global collective intelligence system. This would create an official space for systemic and systematic global futures research. It could become the foresight brain of humanity.

The Delphi also deemed the proposal on strategic foresight reports particularly important:

Strategic Foresight and Global Risk Reports were seen as very critical for improving global foresight by nearly 40% of the panel. This is exactly the kind of report that the United Nations should give the world. Along with its own analysis, these reports should provide an analysis and synthesis of all the other major foresight and risk reports, provide roadmaps for global strategies, and give equal weight to risks and opportunities.

The reports should be issued every one or two years due to accelerating change, and the need to keep people involved with these issues. It should include a chapter on actions taken since the last report, with examples of risk mitigation, management, and what persists…

They should bring attention to threats that are often ignored with cost estimates for prevention vs. recovery (Bill Gates estimates it will cost $1 billion to address the next pandemic compared to the $15 trillion spent on Covid so far). It should identify time-sensitive information required to make more intelligent decisions.

In a way, it’s not a big concern that key futurist topics are omitted from Our Common Agenda. If the mechanisms described above are put in place, any significant omissions that are known to foresight practitioners around the world will quickly be fed into the revitalized process, and brought to wider attention.

But, for the record, it’s important to note some key risks and opportunities that are missing from the UNSG’s report.

The biggest forthcoming transition

Jerome Glenn puts it well:

If we don’t get the initial conditions right for artificial general intelligence, an artificial superintelligence could emerge from beyond our control and understanding, and not to our liking.

If AGI could occur in ten to 20 years and if it will take that same amount of time to create international agreements about the right initial conditions, design a global governance system, and implement it, we should begin now.

Glenn goes on to point out the essential international nature of this conundrum:

If both the US and China do everything right, but others do not, the fears of science fiction could occur. It has to be a global agreement enforced globally. Only the UN can do that.

Here’s the problem: Each new generation of AI makes it easier for more people to become involved in developing the next generation of AI. Systems are often built by bolting together previous components, and tweaking the connections that govern the flow of information and command. If an initial attempt doesn’t work, engineers may reverse some connections and try again. Or add in some delay, or some data transformation, in between two ports. Or double the amount of processing power available. And so on. (I’m over-simplifying, of course. In reality, the sorts of experimental changes made are more complex than what I’ve just said.)

This kind of innovation by repeated experimentation has frequently produced outstanding results in the history of the development of technology. Creative engineers were frequently disappointed by their results to start with, until, almost by chance, they stumbled on a configuration that worked. Bingo — a new level of intelligence is created. And the engineers, previously suspected as being second-rate, are now lauded as visionary giants.

Silicon Valley has a name for this: Move fast and break things.

The idea: if a company is being too careful, it will fail to come up with new breakthrough combinations as quickly as its competitors. So it will go out of business.

That phrase was the informal mission statement for many years at Facebook.

Consider also the advice you’ll often hear about the importance of “failing forward” and “how to fail faster.”

In this view, failures aren’t a problem, provided we pick ourselves up quickly, learn from the experience, and can proceed more wisely to the next attempt.

But wait: what if the failure is a problem? What if a combination of new technology turns out to have cataclysmic consequences? That risk is posed by several leading edge technologies today:

    • Manipulation of viruses, to explore options for creating vaccines to counter new viruses — but what if a deadly new synthetic virus were to leak out of supposedly secure laboratory confinement?
    • Manipulation of the stratosphere, to reflect back a larger portion of incoming sunlight — but what if such an intervention has unexpected side effects, such as triggering huge droughts and/or floods?
    • Manipulation of algorithms, to increase their ability to influence human behavior (as consumers, electors, or whatever) but what if a new algorithm was so powerful that it inadvertently shatters important social communities?

That takes us back to the message at the start of this article, by UNSG António Guterres: What if an attempt to achieve a decisive breakthrough results, instead, in a terrible breakdown?

Not the Terminator

The idea of powerful algorithms going awry is often dismissed with a wave of the hand: “This is just science fiction.”

But Jerome Glenn is correct in his statement (quoted earlier): “The fears of science fiction could occur.”

After all, HG Wells published a science-fiction story in 1914 entitled The World Set Free that featured what he called “atomic bombs” that derived their explosive power from nuclear fission. In his novel, atomic bombs destroyed the majority of the world’s cities in a global war (set in 1958).


But merely because something is predicted in science fiction, that’s no reason to reject the possibility of something like it happening in reality.

The “atomic bombs” foreseen by HG Wells, unsurprisingly, differed in several ways from the real-world atomic bombs developed by the Manhattan Project and subsequent research programs. In the same way, the threats from misconfigured powerful AI are generally different from those portrayed in science fiction.

For example, in the Hollywood Terminator movie series, humans are able, via what can be called superhuman effort, to thwart the intentions of the malign “Skynet” artificial intelligence system.

It’s gripping entertainment. But the narrative in these movies distorts credible scenarios of the dangers posed by AGI. We need to avoid being misled by such narratives.

First, there’s an implication in The Terminator, and in many other works of science fiction, that the danger point for humanity is when AI systems somehow “wake up,” or become conscious. If true, a sufficient safety measure would be to avoid any such artificial consciousness.

However, a cruise missile that is hunting us down does not depend for its deadliness on any cruise-missile consciousness. A social media algorithm that is whipping up hatred against specific ethnic minorities isn’t consciously evil. The damage results from the cold operation of algorithms. There’s no need to involve consciousness.

Second, there’s an implication that AI needs to be deliberately malicious before it can cause damage to humans. However, damage to human wellbeing can, just as likely, arise from side effects of policies that have no malicious intent.

When we humans destroy ant colonies in our process of constructing a new shopping mall, we’re not acting out of deliberate malice toward ants. It’s just that the ants are in our way. They are using resources for which we have a different purpose in mind. It could well be the same with an AGI that is pursuing its own objectives.

Consider a corporation that is vigorously pursuing an objective of raising its own profits. It may well take actions that damage the wellbeing of at least some humans, or parts of the environment. These outcomes are side-effects of the prime profit-generation directive that is governing these corporations. They’re not outcomes that the corporation consciously desires. It could well be the same with a badly designed AGI.

Third, the scenario in The Terminator leaves viewers with a false hope that, with sufficient effort, a group of human resistance fighters will be able to out-maneuver an AGI. That would be like a group of chimpanzees imagining that, with enough effort, they could displace humans as the dominant species on planet Earth. In real life, bullets shot by a terminator robot would never miss. Resistance would indeed be futile.

Instead, the time to fight against the damage an AGI could cause is before the AGI is created, not when it already exists and is effectively all-powerful. That’s why any analysis of future global developments needs to place the AGI risk front and center.

Four catastrophic error modes

The real risk — as opposed to “the Hollywood risk” — is that an AI system may acquire so much influence over human society and our surrounding environment that a mistake in that system could cataclysmically reduce human wellbeing all over the world. Billions of lives could be extinguished, or turned into a very pale reflection of their present state.

Such an outcome could arise in any of four ways – four catastrophic error modes. In brief, these are:

    1. Implementation defect
    2. Design defect
    3. Design overridden
    4. Implementation overridden.
Credit: David Wood and Pixabay

In more detail:

  1. The system contains a defect in its implementation. It takes an action that it calculates will have one outcome, but unexpectedly, it has another outcome instead. For example, a geoengineering intervention could trigger an unforeseen change in the global climate, plunging the Earth into a state in which humans cannot survive.

  2. Imagine, for example, an AI with a clumsily specified goal to focus on preserving the diversity of the Earth’s biosystem. That could be met by eliminating upward of 99% of all humans. Oops! Such a system contains a defect in its design. It takes actions to advance the goals it has explicitly been given, but does so in a way that catastrophically reduces actual human wellbeing.

  3. The system has been given goals that are well aligned with human wellbeing, but as the system evolves, a different set of goals emerge, in which the wellbeing of humans is deprioritized. This is similar to how the emergence of higher thinking capabilities in human primates led to many humans taking actions in opposition to the gene-spreading instincts placed into our biology by evolution.

  4. The system has been given goals that are well-aligned with human wellbeing, but the system is reconfigured by hackers of one sort or another — perhaps from malevolence, or perhaps from a misguided sense that various changes would make the system more powerful (and hence more valuable).

Some writers suggest that it will be relatively easy to avoid these four catastrophic error modes. I disagree. I consider that to be wishful thinking. Such thinking is especially dangerous, since it leads to complacency.

Wise governance of the transition to AGI

Various “easy fixes” may work against one or two of the above catastrophic error modes, but solutions that will cope with all four of them require a much more comprehensive approach: something I call “the whole-system perspective.”

That whole-system perspective includes promoting characteristics such as transparency, resilience, verification, vigilance, agility, accountability, consensus, and diversity. It champions the discipline of proactive risk management.

It comes down to a question of how technology is harnessed — how it is selectively steered, slowed down, and (yes) on occasion, given full throttle.

Taking back control

In summary, it’s necessary to “take back control of technology.” Technology cannot be left to its own devices. Nor to the sometimes headlong rushes of technology corporations. Nor to the decisions of military planners. And definitely not to the whims of out-of-touch politicians.

Instead, we — we the people — need to “take back control.”

Credit: Tesfu Assefa

There’s a huge potential role for the United Nations in wise governance of the transition to AGI. The UN can help to broker and deepen human links at all levels of society, despite the sharp differences between the various national governments in the world. An understanding of “our common agenda” regarding fast-changing technologies (such as AI) can transcend the differences in our ideologies, politics, religions, and cultures. 

I particularly look forward to a worldwide shared appreciation of:

    • The risks of catastrophe from mismanaged technological change
    • The profound positive possibilities if these technologies are wisely managed.

Let us know your thoughts! Sign up for a Mindplex account now, join our Telegram, or follow us on Twitter

The Singularity: Untangling the Confusion

“The Singularity” — the anticipated creation of Artificial General Intelligence (AGI) — could be the most important concept in the history of humanity. It’s unfortunate, therefore, that the concept is subject to considerable confusion.

Four different ideas interweaved, all accelerating in the same direction toward an unclear outcome (Credit: David Wood)

The first confusion with “the Singularity” is that the phrase is used in several different ways. As a result, it’s easy to become distracted.

Four definitions

For example, consider Singularity University (SingU), which has been offering courses since 2008 with themes such as “Harness the power of exponential technology” and “Leverage exponential technologies to solve global grand challenges.”

For SingU, “Singularity” is basically synonymous with the rapid disruption caused when a new technology, such as digital photography, becomes more useful than previous solutions, such as analog photography. What makes these disruptions hard to anticipate is the exponential growth in the capabilities of the technologies involved.

A period of slow growth, in which progress lags behind expectations of enthusiasts, transforms into a period of fast growth, in which most observers complain “why did no one warn us this was coming?”

Human life “irreversibly transformed”

A second usage of the term “the Singularity” moves beyond talk of individual disruptions — singularities in particular areas of life. Instead, it anticipates a disruption in all aspects of human life. Here’s how futurist Ray Kurzweil introduces the term in his 2005 book The Singularity Is Near:

What, then, is the Singularity? It’s a future period during which the pace of technological change will be so rapid, its impact so deep, that human life will be irreversibly transformed… This epoch will transform the concepts that we rely on to give meaning to our lives, from our business models to the cycle of human life, including death itself…

The key idea underlying the impending Singularity is that the pace of change of our human-created technology is accelerating and its powers are expanding at an exponential pace.

The nature of that “irreversible transformation” is clarified in the subtitle of the book: When Humans Transcend Biology. We humans will no longer be primarily biological, aided by technology. After that singularity, we’ll be primarily technological, with, perhaps, some biological aspects.

Superintelligent AIs

A third usage of “the Singularity” foresees a different kind of transformation. Rather than humans being the most intelligent creatures on the planet, we’ll fall into second place behind superintelligent AIs. Just as the fate of species such as gorillas and dolphins currently depends on actions by humans, the fate of humans, after the Singularity, will depend on actions by AIs.

Such a takeover was foreseen as long ago as 1951 by pioneering computer scientist Alan Turing:

My contention is that machines can be constructed which will simulate the behaviour of the human mind very closely…

It seems probable that once the machine thinking method had started, it would not take long to outstrip our feeble powers. There would be no question of the machines dying, and they would be able to converse with each other to sharpen their wits. At some stage therefore we should have to expect the machines to take control.

Finally, consider what was on the mind of Vernor Vinge, a professor of computer science and mathematics, and also the author of a series of well-regarded science fiction novels, when he introduced the term “Singularity” in an essay in Omni in 1983. Vinge was worried about the unforeseeability of future events:

There is a stone wall set across any clear view of our future, and it’s not very far down the road. Something drastic happens to a species when it reaches our stage of evolutionary development — at least, that’s one explanation for why the universe seems so empty of other intelligence. Physical catastrophe (nuclear war, biological pestilence, Malthusian doom) could account for this emptiness, but nothing makes the future of any species so unknowable as technical progress itself…

We are at the point of accelerating the evolution of intelligence itself. The exact means of accomplishing this phenomenon cannot yet be predicted — and is not important. Whether our work is cast in silicon or DNA will have little effect on the ultimate results. The evolution of human intelligence took millions of years. We will devise an equivalent advance in a fraction of that time. We will soon create intelligences greater than our own.

A Singularity that “passes far beyond our understanding”

This is when Vinge introduces his version of the concept of singularity:

When this happens, human history will have reached a kind of singularity, an intellectual transition as impenetrable as the knotted space-time at the centre of a black hole, and the world will pass far beyond our understanding. This singularity, I believe, already haunts a number of science fiction writers. It makes realistic extrapolation to an interstellar future impossible.

If creatures (whether organic or inorganic) attain levels of general intelligence far in excess of present-day humans, what kinds of goals and purposes will occupy these vast brains? It’s unlikely that their motivations will be just the same as our own present goals and purposes. Instead, the immense scale of these new minds will likely prove alien to our comprehension. They might appear as unfathomable to us as human preoccupations appear to the dogs and cats and other animals that observe us from time to time.

Credit: David Wood

AI, AGI, and ASI

Before going further, let’s quickly contrast today’s AI with the envisioned future superintelligence.

Existing AI systems typically have powerful capabilities in narrow contexts, such as route-planning, processing mortgage and loan applications, predicting properties of molecules, playing various games of skill, buying and selling shares, recognizing images, and translating speech.

But in all these cases, the AIs involved have incomplete knowledge of the full complexity of how humans interact in the real world. The AI can fail when the real world introduces factors or situations that were not part of the data set of examples with which the AI was trained.

In contrast, humans in the same circumstance would be able to rely on capacities such as “common sense”, “general knowledge,” and intuition or “gut feel”, to reach a better decision.

An AI with general intelligence

However, a future AGI — an AI with general intelligence — would have as much common sense, intuition, and general knowledge as any human. An AGI would be at least as good as humans at reacting to unexpected developments. That AGI would be able to pursue pre-specified goals as competently as (but much more efficiently than) a human, even in the kind of complex environments which would cause today’s AIs to stumble.

Whatever goal is input to an AGI, it is likely to reason to itself that it will be more likely to achieve that goal if it has more resources at its disposal and if its own thinking capabilities are further improved. What happens next may well be as described by IJ Good, a long-time colleague of Alan Turing:

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an “intelligence explosion,” and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.

Evolving into artificial superintelligence

In other words, not long after humans manage to create an AGI, the AGI is likely to evolve itself into an ASI – an artificial superintelligence that far exceeds human powers.

In case the idea of an AI redesigning itself without any human involvement seems far-fetched, consider a slightly different possibility: Humans will still be part of that design process, at least in the initial phases. That’s already the case today, when humans use one generation of AI tools to help design a new generation of improved AI tools before going on to repeat the process.

I.J. Good foresaw that too. This is from a lecture he gave at IBM in New York in 1959:

Once a machine is designed that is good enough… it can be put to work designing an even better machine…

There will only be a very short transition period between having no very good machine and having a great many exceedingly good ones.

At this point an “explosion” will clearly occur; all the problems of science and technology will be handed over to machines and it will no longer be necessary for people to work. Whether this will lead to a Utopia or to the extermination of the human race will depend on how the problem is handled by the machine.

Singularity timescales: exponential computational growth

One additional twist to the concept of Singularity needs to be emphasized. It’s not just that, as Vernor Vinge stressed, the consequences of passing the point of Singularity are deeply unpredictable. It’s that the timing of reaching the point of Singularity is inherently unpredictable too. That brings us to what can be called the second confusion with “The Singularity.”

It’s sometimes suggested, contrary to what I just said, that a reasonable estimate of the date of the Singularity can be obtained by extrapolating the growth of the hardware power of computing systems. The idea is to start with an estimate for the computing power of the human brain. That estimate involves the number of neurons in the brain. 

Extrapolate that trend forward, and it can be argued that such a computer would match, by around 2045, the capability not just of a single human brain, but the capabilities of all human brains added together.

Next, consider the number of transistors that are included in the central processing unit of a computer that can be purchased for, say, $1,000. In broad terms, that number has been rising exponentially since the 1960s. This phenomenon is part of what is called “Moore’s Law.” 

This argument is useful to raise public awareness of the possibility of the Singularity. But there are four flaws with using this line of thinking for any detailed forecasting:

  1. Individual transistors are still becoming smaller, but the rate of shrinkage has slowed down in recent years.
  2. The power of a computing system depends, critically, not just on its hardware, but on its software. Breakthroughs in software defy any simple exponential curve.
  3. Sometimes a single breakthrough in technology will unleash much wider progress than was expected. Consider the breakthroughs of deep learning neural networks, c. 2012.
  4. Ongoing technological progress depends on society as a whole supplying a sufficiently stable and supportive environment. That’s something else which can vary unpredictably.

A statistical estimate

Instead of pointing to any individual date and giving a firm prediction that the Singularity will definitely have arrived by then, it’s far preferable to give a statistical estimate of the likelihood of the Singularity arriving by that date. However, given the uncertainties involved, even these estimates are fraught with difficulty.

The biggest uncertainty is in estimating how close we are to understanding the way common sense and general knowledge arises in the human brain. Some observers suggest that we might need a dozen conceptual breakthroughs before we have a comprehension sufficient to duplicate that model in silicon and software. But it’s also possible that a single conceptual leap will solve all these purportedly different problems.

Yet another possibility should give us pause. An AI might reach (and then exceed) AGI level even without humans understanding how it operates — or of how general intelligence operates inside the human brain. Multiple recombinations of existing software and hardware modules might result in the unforeseen emergence of an overall network intelligence that far exceeds the capabilities of the individual constituent modules.

Schooling the Singularity

Even though we cannot be sure what direction an ASI will take, nor of the timescales in which the Singularity will burst upon us, can we at least provide a framework to constrain the likely behavior of such an ASI?

The best that can probably be said in response to this question is: “it’s going to be hard!”

As a human analogy, many parents have been surprised — even dumbfounded — by choices made by their children, as these children gain access to new ideas and opportunities.

Introducing the ASI

Humanity’s collective child — ASI — might surprise and dumbfound us in the same way. Nevertheless, if we get the schooling right, we can help bias that development process  — the “intelligence explosion” described by I.J. Good — in ways that are more likely to align with profound human wellbeing.

That schooling aims to hardwire deep into the ASI, as a kind of “prime directive,” principles of beneficence toward humans. If the ASI would be at the point of reaching a particular decision — for example, to shrink the human population on account of humanity’s deleterious effects on the environment –- any such misanthropic decision would be overridden by the prime directive.

The difficulty here is that if you line up lots of different philosophers, poets, theologians, politicians, and engineers, and ask them what it means to behave with beneficence toward humans, you’ll hear lots of divergent answers. Programming a sense of beneficence is as least as hard as programming a sense of beauty or truth.

But just because it’s hard, that’s no reason to abandon the task. Indeed, clarifying the meaning of beneficence could be the most important project of our present time.

Tripwires and canary signals

Here’s another analogy: accumulating many modules of AI intelligence together, in a network relationship, is similar to accumulating nuclear fissile material together. Before the material reaches a critical mass, it still needs to be treated with respect, on account of the radiation it emits. But once a critical mass point is reached, a cascading reaction results — a nuclear meltdown or, even worse, a nuclear holocaust.

The point here is not to risk any accidental encroachment upon the critical mass, which would convert the nuclear material from hazardous to catastrophic. Accordingly, anyone working with such material needs to be thoroughly trained in the principles of nuclear safety.

With an accumulation of AI modules, things are by no means so clear. Whether that accumulation could kick-start an explosive phase transition depends on lots of issues that we currently only understand dimly.

However, something we can, and should, insist upon, is that everyone involved in the creation of enhanced AI systems pays attention to potential “tripwires.” Any change in configuration or any new addition to the network should be evaluated, ahead of time, for possible explosive consequences. Moreover, the system should in any case be monitored continuously for any canary signals that such a phase transition is becoming imminent.

Again, this is a hard task, since there are many different opinions as to which kind of canary signals are meaningful, and which are distractions.

Credit: Tesfu Assefa

Concluding thoughts

The concept of the Singularity poses problems, in part because of some unfortunate confusion that surrounds this idea, but also because the true problems of the Singularity have no easy answers:

  1. What are good canary signals that AI systems could be about to reach AGI level?
  2. How could a “prime directive” be programmed sufficiently deeply into AI systems that it will be maintained, even as that system reaches AGI level and then ASI, rewriting its own coding in the process?
  3. What should that prime directive include, going beyond vague, unprogrammable platitudes such as “act with benevolence toward humans”?
  4. How can safety checks and vigilant monitoring be introduced to AI systems without unnecessarily slowing down the progress of these systems to producing solutions of undoubted value to humans (such as solutions to diseases and climate change)?
  5. Could limits be put into an AGI system that would prevent it self-improving to ASI levels of intelligence far beyond those of humans?
  6. To what extent can humans take advantage of new technology to upgrade our own intelligence so that it keeps up with the intelligence of any pure-silicon ASI, and therefore avoids the situation of humans being left far behind ASIs?
Credit: David Wood, Pixabay

However, the first part of solving a set of problems is a clear definition of these problems. With that done, there are opportunities for collaboration among many different people — and many different teams — to identify and implement solutions.

What’s more, today’s AI systems can be deployed to help human researchers find solutions to these issues. Not for the first time, therefore, one generation of a technological tool will play a critical role in the safe development of the next generation of technology.

Let us know your thoughts! Sign up for a Mindplex account now, join our Telegram, or follow us on Twitter