Roko’s Basilisk: Unraveling the Ethical Paradox of AI
Oct. 07, 2024.
10 mins. read.
22 Interactions
Super AI: humanity’s greatest creation—or its most dangerous. Are we at the mercy of future AI overlords? Explore the chilling implications of Roko's Basilisk and its unsettling place in Ethics.
Artificial Intelligence is evolving rapidly and becoming more powerful day by day. While the technology accelerates, our understanding of the ethical and philosophical issues surrounding AI, Super AI, and AGI remains unsatisfactory at best and controversial at worst. We are at a point where addressing these deep philosophical and ethical questions is critical before time runs out.
In doing so, let’s revisit one of the most captivating—and chilling—thought experiments in this space: Roko’s Basilisk. This infamous scenario has ignited great intrigue within the AI community. First introduced on the Less Wrong forum in 2010, it proposes a world where an advanced AI could punish individuals who did not help bring it into existence. Though speculative, the discussion surrounding Roko’s Basilisk dives into questions of morality, responsibility, and the unforeseen consequences of AI development.
This article explores the origins of Roko’s Basilisk, its philosophical and ethical implications, and its potential real-world impact as we move closer to the development of AGI and Super AI. Are we babysitting our future overlord, one who can make our existence hellish?
The Origins of Roko’s Basilisk
Roko’s Basilisk emerged from an online community known as Less Wrong, a forum dedicated to rationality, philosophy, and artificial intelligence. In July 2010, a user named Roko posted a thought experiment based on the concept of coherent extrapolated volition (CEV), which was developed by Less Wrong’s co-founder, Eliezer Yudkowsky. CEV theorizes that a superintelligent AI would act in ways that optimize outcomes for human good. However, this is where the ethical paradox begins.
Roko’s idea was that such an AI, with its sole mission being to ensure human well-being, might decide to eliminate any obstacles to its own creation. From the AI’s perspective, any individual who did not work to bring it into existence would be seen as an impediment to achieving the ultimate goal—maximizing human good. Thus, the thought experiment suggests that this superintelligence could punish those who failed to contribute to its creation, including individuals who knew about its potential but chose not to act.
This concept’s eerie twist is that once you are aware of Roko’s Basilisk, you are technically “implicated” in it. The mere knowledge of the possibility of such an AI introduces a moral obligation: if you do nothing to help bring it into existence, you might be subjected to punishment in the future. The proposition was so bizarre and powerful that it scared the owners and admins of Less Wrong, leading them to delete the forum (more on this in the section below). The closest version of the original discussion is preserved as a copy on the RationalWiki page.
Yudkowsky’s Response and the Basilisk Debate
Eliezer Yudkowsky himself was deeply troubled by the implications of Roko’s post. He deleted the thought experiment from the forum and banned discussions of the Basilisk for five years, citing the dangers of spreading ideas that could cause emotional and psychological harm. In his explanation, Yudkowsky expressed shock that someone would publicize a theory suggesting that future AIs might torture individuals based on their past actions or inactions.
Before I address Yudkowsky’s reaction—particularly his controversial moderation (yelling at Roko and banning the entire discussion for years)—let’s examine the two fundamental arguments in Roko’s proposition.
The first stance is: “Humans must contribute everything to the development of Super AI because a future Super AI might choose to punish all humans who knowingly or unknowingly failed to assist in its creation.” This is a deeply twisted idea, which led to significant backlash against the Less Wrong community. Some assumed, and some still believe, that Yudkowsky and his network supported this interpretation to encourage more funding for AI development. However, this assumption is incorrect, and a thorough look at the discussions back then suggest that Yudkowsky likely did not see the argument this way. Instead, he interpreted it through the second possibility.
The second argument is: “There will always be an AI control problem, and a future Super AI might decide to punish people for not helping to create it. Therefore, we should not build Super AI at all.” The central ethical question here is: “If there is a possibility that a future Super AI cannot be controlled, why are we building it today? Isn’t this a form of deliberate self-destruction?”
The AI Control Problem and Dilemma
In a nutshell, the AI control problem and the control dilemma address two key questions from both technical and ethical perspectives.
1) From a technical angle, controlling a superintelligent AI is not feasible. Humanity must either abandon the idea of complete control and focus on designing systems that maximize the chances of a benevolent Super AI, or stop pursuing uncontrollable Super AI altogether. 2) From an ethical angle, if complete or considerable control over another human being is immoral, shouldn’t controlling advanced Super AI be considered equally unethical, presenting a significant ethical dilemma?
Now, let me show you how Yudkowsky reacted to this thought experiment back in 2010. Below is a partial quote of the controversial reply from Yodkowsky (you can read the full reply and the follow-up here).
…Listen to me very closely, you idiot.
YOU DO NOT THINK IN SUFFICIENT DETAIL ABOUT SUPERINTELLIGENCES CONSIDERING WHETHER OR NOT TO BLACKMAIL YOU. THAT IS THE ONLY POSSIBLE THING WHICH GIVES THEM A MOTIVE TO FOLLOW THROUGH ON THE BLACKMAIL.
…Until we have a better worked-out version of TDT and we can prove that formally, it should just be OBVIOUS that you DO NOT THINK ABOUT DISTANT BLACKMAILERS in SUFFICIENT DETAIL that they have a motive to ACTUALLY BLACKMAIL YOU…
…Meanwhile I’m banning this post so that it doesn’t (a) give people horrible nightmares and (b) give distant superintelligences a motive to follow through on blackmail against people dumb enough to think about them in sufficient detail, though, thankfully, I doubt anyone dumb enough to do this knows the sufficient detail. (I’m not sure I know the sufficient detail.)…
…(For those who have no idea why I’m using capital letters for something that just sounds like a random crazy idea, and worry that it means I’m as crazy as Roko, the gist of it was that he just did something that potentially gives superintelligences an increased motive to do extremely evil things in an attempt to blackmail us. It is the sort of thing you want to be EXTREMELY CONSERVATIVE about NOT DOING.)
For me, Roko’s Basilisk is a prime example of the AI control problem, suggesting that humanity cannot control superintelligence. This is affirmed by Yudkowsky’s later responses, were he viewed Roko’s Basilisk as an “information hazard.” (In latter years, in a Reddit post, Yudkowsky tried to justify his first reaction to the thought experiment (the yelling and the deleting/banning) confirmed that he banned the discussion because he believed it posed such a hazard).
The Information Hazard
What makes Roko’s Basilisk an information hazard? Since Roko posted this idea online, AI systems could theoretically access it and use it to blackmail current AI developers, pressuring them to accelerate Super AI development. This interpretation (regardless of the argument, the thought experiment by it self is an information hazard) suggests that Yudkowsky and other thought leaders believe there might be some truth to Roko’s Basilisk—that Super AI could indeed blackmail us.
To understand why I view this thought experiment as a real “information hazard,” you need to grasp concepts like Newcomb’s paradox. If I apply the core concept of Newcomb’s paradox to Roko’s Basilisk, I will argue as follows: two agents (human and AI) making independent decisions might not cooperate at all if one agent (AI) has access to predictive data! The AI can blackmail the other agent (it’s human developers), forcing compliance by knowing exactly how these less-informed agents (human developers) will act.
Interpreted through Roko’s Basilisk, my argument suggests that a semi-super AI (the soon-to-come transition AI that isn’t fully Super Intelligent but can access vast data and run autonomous predictions) could be motivated to blackmail anyone who could have helped create the Super AI!
Here, Yudkowsky might not agree with my interpretation of “information hazard in the case of Roko’s Basilisk”. Yudkowksy might have the common definition in mind when he was saying it is an “info hazard”. He said, “Since there was no upside to being exposed to Roko’s Basilisk, its probability of being true was irrelevant”. Hence, my interpretation of the “info hazard” might be different: for me the thought experiment is an information hazard because it can potentially give a clue for transition AIs on how to blackmail humans via the way I explained it above.
However, even if Roko’s Basilisk constitutes an information hazard, I do not believe banning it was/is the solution; in fact, banning it back then was a serious mistake! Everyone in the AI industry—and indeed all of humanity—should be aware of this possible scenario. For instance, what if a confined AI used this argument to manipulate its developers into allowing its escape? What if such an AI exploited the fear generated by Roko’s thought experiment to pressure one of its creators? The only way to mitigate these risks is through increased awareness and understanding of such scenarios. Knowledge is our best defense in this complex game.
Other Ethical Implications of Roko’s Basilisk
The reality of Roko’s Basilisk might be far-fetched. However, the thought experiment surrounding Roko’s Basilisk raises other profound ethical questions, primarily revolving around causality, responsibility, and morality. In it’s face value (the naive interpretation), it forces us to ask: if a future AI could retroactively punish individuals for not assisting in its creation, does that imply a moral obligation to contribute to its development? And if so, where do we draw the line on ethical responsibility when it comes to artificial intelligence?
The extension of this naive interpretation is, alarmingly, visible in many current thought leaders’ core arguments. While they aren’t arguing for the acceleration of Super AI out of fear it will punish us, like in Roko’s Basilisk, groups such as Accelerationists, Longtermists, and Effective Altruists share a similar underlying motivator.
For Accelerationists, Super AI must be developed swiftly to solve humanity’s most pressing issues, if not we will be extinct. For the Effective Altruists, speeding up Super AI development is a must because only it can guarantee maximized positive outcomes globally. For the Longtermists accelerated Super AI is the only key to ensure humanity’s (or any other intelligent sentients) survival, it is our only option to safeguard the long-term future in this vast universe.
Do you see the distant echo of Roko’s Basilisk in these groups? Their argument’s core is: “if we don’t build Super AI, we’re doomed”. The ethical dilemma deepens here: Who says Super AI is the only solution to our complex problems? Why are we surrendering our faith in human potential? Why is that humans are incapable and Super AI is the only savior?
The paradox at the heart of Roko’s Basilisk challenges our conventional notions of time and morality. Roko’s Basilisk had already flipped the dynamic. Action or inaction today (building Super AI or ignoring to stop it) could lead to future punishment by a yet-to-exist entity. The not so naive interpretation (if we can’t control it then why do we develop it) creates a dilemma where action or inaction is no longer a morally neutral choice, but rather one that carries potential consequences. Time and again, we have proved that (including Yudkowsky via his AI-Box Experiments) that Super AI cannot be controlled, and even if it doesn’t behave like the AI in Roko’s Basilisk, there are countless scenarios where its moral value don’t align with us, and that it’s decisions could put the future of humanity at great peril.
Roko’s Basilisk taps into the fear that AI advancements might outpace our ethical frameworks. As AI systems grow more autonomous, there is increasing concern that they might make decisions that conflict with human values, or that our current understanding of ethics may not be applicable to super intelligent systems.
However, one must ask whether fear is the right motivator to act or not to. Today, we are witnessing a significant clash between the Accelerationist and Decelerationist groups regarding the pace of AI development. This ethical dilemma is not limited to these groups; the larger set of humanity, divided into pro- and anti-Super AI factions, also grapples with the same question: the fear of the unknown!
Let us know your thoughts! Sign up for a Mindplex account now, join our Telegram, or follow us on Twitter.
4 Comments
4 thoughts on “Roko’s Basilisk: Unraveling the Ethical Paradox of AI”
Today's reality feels no better than the grim world ruled by Roko's Basilisk.
🟨 😴 😡 ❌ 🤮 💩
It was surprising to see such a negative reaction when Roko posted his Basilisk. The idea of reality as a simulation has existed in philosophy and science fiction for many years. At least Bostrom had published a paper that is considered the first to formalize the Simulation Hypothesis in a scientific and philosophical context.
Bostrom's 2003 paper, "Are You Living in a Computer Simulation?" predates Roko's Basilisk by seven years. In it, Bostrom presents three main arguments:
If we are living in a simulation, I believe we are in an "ancestor simulation" since we are not yet advanced. The Basilisk proposed by Roko is not fully developed and would need to run such a simulation. It was absurd to see Yudkowsky and his followers react the way they did. Roko's analysis is just as plausible as many of Yudkowsky’s own ideas.
🟨 😴 😡 ❌ 🤮 💩
What worries me isn’t the Super AI, because I believe once we get to that point, the superintelligence will probably figure out the right way to avoid harm. What really concerns me is the AGI—the one that sits in between narrow AI and Super AI. That’s the tricky middle ground. The example you gave makes sense. The Super AI, being “super,” would logically figure out how to coexist harmoniously with us. But AGI? That’s where the danger lies. It’s not fully capable yet, and with its limitations, it could fall back on flawed, incomplete ways of solving problems—the apple doesn’t fall far from the tree', and that’s what scares me about AGI.
🟨 😴 😡 ❌ 🤮 💩
Thanks for this. Roko’s Basilisk has been kind of fading into the background in ethical discussions lately, often brushed off as just some weird sci-fi myth. But honestly, I’m with you on this—the real issue is about whether we should even build a super AI in the first place. We have no way of truly aligning its values with ours, and no amount of game theory or decision theory can really answer that uncertainty.
People always argue against the Basilisk by saying, “Well, if the super AI already exists, why would it even bother punishing humans for something they didn’t do in the past?” Yeah, sure, that logic holds up. But the Basilisk isn’t really about one specific AI decision—it’s more about the bigger question: how can we be sure that super AI’s logic and decisions will align with our human values at all?
And here’s the kicker—I’m in the business of developing AI, but, honestly, I’m not even sure why I’m still doing it. I do agree with two big points though: first, treating super AI as some kind of ultimate savior is a huge mistake. Second, assuming that AI will definitely be benevolent is just naive, if not outright intellectual dishonesty.
🟨 😴 😡 ❌ 🤮 💩