Quit Emailing Yourself

Building AIs that do human-like philosophy - Joe Carlsmith

7 min read | Saved February 14, 2026 | Copied!

ai-alignment 🤖 philosophy 🤖 ethics 🤖 generalization 🤖 human-like 🤖

Do you care about this?

The article explores the role of human-like philosophy in AI alignment, arguing that AIs must be capable and disposed to engage in philosophical reasoning to generalize effectively in new contexts. It emphasizes the challenges of teaching AIs this form of reasoning, particularly in ethics, due to its complex and contingent nature.

If you do, here's more

The essay focuses on the role of philosophy in AI alignment, particularly the need for AIs to engage in what the author terms "human-like philosophy." The author argues that AI alignment isn't merely a scientific issue but also a philosophical one, emphasizing that philosophical inquiry is essential for addressing the complexities of ethical decision-making in AI systems. The piece critiques the notion that AIs should be designed as perfect dictators, advocating instead for AIs that can reflect on philosophical concepts and apply them in various contexts.

The author distinguishes between two key aspects of this philosophical engagement: Capability and Disposition. Capability refers to an AI's ability to perform philosophy that aligns with human values, while Disposition concerns the inclination of AIs to engage in this kind of philosophical reasoning. The essay suggests that while advanced AIs might eventually develop the necessary capability, the bigger challenge lies in fostering the right disposition. This involves eliciting high-quality conceptual research from AIs, which is akin to solving an elicitation problem.

The connection between philosophy and out-of-distribution generalization is a central theme. Philosophical analysis helps clarify how concepts apply to unfamiliar scenarios, which is vital as AIs will encounter situations that diverge from their training data. The author draws parallels between philosophical practices and the challenges faced in AI alignment, such as ensuring AIs adopt human-like concepts when making decisions. Key factors include the need for AIs to navigate ethical dilemmas in unfamiliar contexts and the absence of objectively correct solutions for many of these situations. The essay ultimately aims to elevate the discourse around the philosophical dimensions of AI alignment.

Questions about this article

No questions yet.