Roko's Basilisk: Unraveling the Ethical Paradox of AI

Super AI: humanity’s greatest creation—or its most dangerous. Are we at the mercy of future AI overlords? Explore the chilling implications of Roko's Basilisk and its unsettling place in Ethics.

Artificial Intelligence is evolving rapidly and becoming more powerful day by day. While the technology accelerates, our understanding of the ethical and philosophical issues surrounding AI, Super AI, and AGI remains unsatisfactory at best and controversial at worst. We are at a point where addressing these deep philosophical and ethical questions is critical before time runs out.

In doing so, let's revisit one of the most captivating—and chilling—thought experiments in this space: Roko's Basilisk. This infamous scenario has ignited great intrigue within the AI community. First introduced on the Less Wrong forum in 2010, it proposes a world where an advanced AI could punish individuals who did not help bring it into existence. Though speculative, the discussion surrounding Roko’s Basilisk dives into questions of morality, responsibility, and the unforeseen consequences of AI development.

This article explores the origins of Roko's Basilisk, its philosophical and ethical implications, and its potential real-world impact as we move closer to the development of AGI and Super AI. Are we babysitting our future overlord, one who can make our existence hellish?

The Origins of Roko's Basilisk

Roko’s Basilisk emerged from an online community known as Less Wrong, a forum dedicated to rationality, philosophy, and artificial intelligence. In July 2010, a user named Roko posted a thought experiment based on the concept of coherent extrapolated volition (CEV), which was developed by Less Wrong’s co-founder, Eliezer Yudkowsky. CEV theorizes that a superintelligent AI would act in ways that optimize outcomes for human good. However, this is where the ethical paradox begins.

Roko’s idea was that such an AI, with its sole mission being to ensure human well-being, might decide to eliminate any obstacles to its own creation. From the AI’s perspective, any individual who did not work to bring it into existence would be seen as an impediment to achieving the ultimate goal—maximizing human good. Thus, the thought experiment suggests that this superintelligence could punish those who failed to contribute to its creation, including individuals who knew about its potential but chose not to act.

This concept’s eerie twist is that once you are aware of Roko's Basilisk, you are technically "implicated" in it. The mere knowledge of the possibility of such an AI introduces a moral obligation: if you do nothing to help bring it into existence, you might be subjected to punishment in the future. The proposition was so bizarre and powerful that it scared the owners and admins of Less Wrong, leading them to delete the forum (more on this in the section below). The closest version of the original discussion is preserved as a copy on the RationalWiki page.

Yudkowsky’s Response and the Basilisk Debate

Eliezer Yudkowsky himself was deeply troubled by the implications of Roko’s post. He deleted the thought experiment from the forum and banned discussions of the Basilisk for five years, citing the dangers of spreading ideas that could cause emotional and psychological harm. In his explanation, Yudkowsky expressed shock that someone would publicize a theory suggesting that future AIs might torture individuals based on their past actions or inactions.

Before I address Yudkowsky's reaction—particularly his controversial moderation (yelling at Roko and banning the entire discussion for years)—let's examine the two fundamental arguments in Roko's proposition.

The first stance is: "Humans must contribute everything to the development of Super AI because a future Super AI might choose to punish all humans who knowingly or unknowingly failed to assist in its creation." This is a deeply twisted idea, which led to significant backlash against the Less Wrong community. Some assumed, and some still believe, that Yudkowsky and his network supported this interpretation to encourage more funding for AI development. However, this assumption is incorrect, and a thorough look at the discussions back then suggest that Yudkowsky likely did not see the argument this way. Instead, he interpreted it through the second possibility.

The second argument is: "There will always be an AI control problem, and a future Super AI might decide to punish people for not helping to create it. Therefore, we should not build Super AI at all." The central ethical question here is: "If there is a possibility that a future Super AI cannot be controlled, why are we building it today? Isn’t this a form of deliberate self-destruction?"

The AI Control Problem and Dilemma

In a nutshell, the AI control problem and the control dilemma address two key questions from both technical and ethical perspectives.

1) From a technical angle, controlling a superintelligent AI is not feasible. Humanity must either abandon the idea of complete control and focus on designing systems that maximize the chances of a benevolent Super AI, or stop pursuing uncontrollable Super AI altogether. 2) From an ethical angle, if complete or considerable control over another human being is immoral, shouldn't controlling advanced Super AI be considered equally unethical, presenting a significant ethical dilemma?

Now, let me show you how Yudkowsky reacted to this thought experiment back in 2010. Below is a partial quote of the controversial reply from Yodkowsky (you can read the full reply and the follow-up here).

…Listen to me very closely, you idiot.

YOU DO NOT THINK IN SUFFICIENT DETAIL ABOUT SUPERINTELLIGENCES CONSIDERING WHETHER OR NOT TO BLACKMAIL YOU. THAT IS THE ONLY POSSIBLE THING WHICH GIVES THEM A MOTIVE TO FOLLOW THROUGH ON THE BLACKMAIL.

…Until we have a better worked-out version of TDT and we can prove that formally, it should just be OBVIOUS that you DO NOT THINK ABOUT DISTANT BLACKMAILERS in SUFFICIENT DETAIL that they have a motive to ACTUALLY BLACKMAIL YOU…

...Meanwhile I'm banning this post so that it doesn't (a) give people horrible nightmares and (b) give distant superintelligences a motive to follow through on blackmail against people dumb enough to think about them in sufficient detail, though, thankfully, I doubt anyone dumb enough to do this knows the sufficient detail. (I'm not sure I know the sufficient detail.)…

…(For those who have no idea why I'm using capital letters for something that just sounds like a random crazy idea, and worry that it means I'm as crazy as Roko, the gist of it was that he just did something that potentially gives superintelligences an increased motive to do extremely evil things in an attempt to blackmail us. It is the sort of thing you want to be EXTREMELY CONSERVATIVE about NOT DOING.)

For me, Roko's Basilisk is a prime example of the AI control problem, suggesting that humanity cannot control superintelligence. This is affirmed by Yudkowsky's later responses, were he viewed Roko's Basilisk as an "information hazard." (In latter years, in a Reddit post, Yudkowsky tried to justify his first reaction to the thought experiment (the yelling and the deleting/banning) confirmed that he banned the discussion because he believed it posed such a hazard).

The Information Hazard

What makes Roko’s Basilisk an information hazard? Since Roko posted this idea online, AI systems could theoretically access it and use it to blackmail current AI developers, pressuring them to accelerate Super AI development. This interpretation (regardless of the argument, the thought experiment by it self is an information hazard) suggests that Yudkowsky and other thought leaders believe there might be some truth to Roko's Basilisk—that Super AI could indeed blackmail us.

To understand why I view this thought experiment as a real "information hazard," you need to grasp concepts like Newcomb's paradox. If I apply the core concept of Newcomb's paradox to Roko's Basilisk, I will argue as follows: two agents (human and AI) making independent decisions might not cooperate at all if one agent (AI) has access to predictive data! The AI can blackmail the other agent (it's human developers), forcing compliance by knowing exactly how these less-informed agents (human developers) will act.

Interpreted through Roko's Basilisk, my argument suggests that a semi-super AI (the soon-to-come transition AI that isn’t fully Super Intelligent but can access vast data and run autonomous predictions) could be motivated to blackmail anyone who could have helped create the Super AI!

Here, Yudkowsky might not agree with my interpretation of "information hazard in the case of Roko's Basilisk". Yudkowksy might have the common definition in mind when he was saying it is an "info hazard''. He said, "Since there was no upside to being exposed to Roko's Basilisk, its probability of being true was irrelevant". Hence, my interpretation of the "info hazard" might be different: for me the thought experiment is an information hazard because it can potentially give a clue for transition AIs on how to blackmail humans via the way I explained it above.

However, even if Roko's Basilisk constitutes an information hazard, I do not believe banning it was/is the solution; in fact, banning it back then was a serious mistake! Everyone in the AI industry—and indeed all of humanity—should be aware of this possible scenario. For instance, what if a confined AI used this argument to manipulate its developers into allowing its escape? What if such an AI exploited the fear generated by Roko's thought experiment to pressure one of its creators? The only way to mitigate these risks is through increased awareness and understanding of such scenarios. Knowledge is our best defense in this complex game.

Credit: GizmoGuru via Designer Microsoft

Other Ethical Implications of Roko’s Basilisk

The reality of Roko's Basilisk might be far-fetched. However, the thought experiment surrounding Roko's Basilisk raises other profound ethical questions, primarily revolving around causality, responsibility, and morality. In it's face value (the naive interpretation), it forces us to ask: if a future AI could retroactively punish individuals for not assisting in its creation, does that imply a moral obligation to contribute to its development? And if so, where do we draw the line on ethical responsibility when it comes to artificial intelligence?

The extension of this naive interpretation is, alarmingly, visible in many current thought leaders' core arguments. While they aren’t arguing for the acceleration of Super AI out of fear it will punish us, like in Roko’s Basilisk, groups such as Accelerationists, Longtermists, and Effective Altruists share a similar underlying motivator.

For Accelerationists, Super AI must be developed swiftly to solve humanity's most pressing issues, if not we will be extinct. For the Effective Altruists, speeding up Super AI development is a must because only it can guarantee maximized positive outcomes globally. For the Longtermists accelerated Super AI is the only key to ensure humanity’s (or any other intelligent sentients) survival, it is our only option to safeguard the long-term future in this vast universe.

Do you see the distant echo of Roko’s Basilisk in these groups? Their argument's core is: "if we don’t build Super AI, we’re doomed". The ethical dilemma deepens here: Who says Super AI is the only solution to our complex problems? Why are we surrendering our faith in human potential? Why is that humans are incapable and Super AI is the only savior?

The paradox at the heart of Roko’s Basilisk challenges our conventional notions of time and morality. Roko's Basilisk had already flipped the dynamic. Action or inaction today (building Super AI or ignoring to stop it) could lead to future punishment by a yet-to-exist entity. The not so naive interpretation (if we can't control it then why do we develop it) creates a dilemma where action or inaction is no longer a morally neutral choice, but rather one that carries potential consequences. Time and again, we have proved that (including Yudkowsky via his AI-Box Experiments) that Super AI cannot be controlled, and even if it doesn't behave like the AI in Roko's Basilisk, there are countless scenarios where its moral value don't align with us, and that it's decisions could put the future of humanity at great peril.

Roko’s Basilisk taps into the fear that AI advancements might outpace our ethical frameworks. As AI systems grow more autonomous, there is increasing concern that they might make decisions that conflict with human values, or that our current understanding of ethics may not be applicable to super intelligent systems.

However, one must ask whether fear is the right motivator to act or not to. Today, we are witnessing a significant clash between the Accelerationist and Decelerationist groups regarding the pace of AI development. This ethical dilemma is not limited to these groups; the larger set of humanity, divided into pro- and anti-Super AI factions, also grapples with the same question: the fear of the unknown!