AI Alignment – Are we aligning to the wrong thing?

Staring into the Mirror

Our careening towards this new phase of AI, the so-called ‘agentic era’ (see google’s latest update) begs us to pause for thought. What do you associate with an AI-led future – we often play with this Rorschach test between ourselves. On one end of the spectrum we have those of us who only see it as an advanced tool, for good or ill. On the other end, we have those that see this as a new form of life – ruled by emergent behaviour we did not intentionally programme. And yet what cannot be denied is that AI is, in many ways, a reflection of man’s thoughts and intentions.

So, what does this mean for business and why should we care?

The Leviathan Within: AI and Collective Humanity

“Covenants without the sword are but words, and of no strength to secure a man at all.” – Thomas Hobbes, Leviathan

Hobbes envisioned society as a great beast – a collective entity of individual wills and ambitions. Agentic AI represents a shift in how we use AI from reactive and reflective (e.g. writing emails, brainstorming) to proactive extensions of our wills (e.g. booking a doctor’s appointment, analysing and evaluating the best product to use). In other words, AI is amplifying our modern Leviathan: powered by collective data, decisions, and desires. It aggregates everything we feed it, reflecting not just intelligence, but intent.

AI alignment – essentially getting models to actually behave as we want them to – has moved beyond a sci-fi quandary into an urgent engineering and ethical challenge. So far, companies lean heavily on Reinforcement Learning from Human Feedback (RLHF), a process where humans essentially “mark the homework.” By reviewing outputs and providing feedback, humans guide AI toward desirable outcomes.

The issue? This method, while effective, is labour-intensive and doesn’t enable these companies to scale in the way they’d like. Enter techniques like Recursive Reward Models (RRMs) – where human input is generalised, optimised, and expedited. Instead of manually correcting outputs, models recursively evaluate themselves, learning from prior iterations and generating alignment faster and more efficiently.

Yet this efficiency introduces risk. If the initial reward signal—the human feedback—is flawed, recursive systems can amplify misalignments exponentially. Imagine magnifying biases or incomplete ethics at the speed of AI progress. Without ethical guardrails today, we risk embedding flawed priorities into systems that become harder to correct tomorrow.

From Reflection to Aspiration: Shaping AI with Our Best Principles

If AI is a mirror, businesses and society have a chance to make it reflect not just who we are, but who we aspire to be. This is the shift companies like Anthropic are pioneering with approaches like Constitutional AI – where guiding principles act as ethical cornerstones for AI systems. Instead of reactive oversight, these principles proactively shape the behaviour of models, ensuring they follow a moral compass beyond raw human data.

But here’s the key: aligning AI must mean more than teaching it to mimic humanity. We don’t just want AI to reflect our decisions; we want it to reflect our best traits—curiosity, fairness, empathy, and integrity.

How? Start with clear, transparent values baked into alignment frameworks. Go beyond efficiency to prioritize ethical considerations and long-term impact.
For businesses: This is an opportunity to lead. By building AI systems that amplify the best of human behavior, companies can inspire trust, drive innovation, and create tools that truly serve society.

The Shift: Alignment isn’t about perfection – it’s about intention. Businesses must continually ask: What values are we embedding into our models? Are we creating systems that uplift fairness and creativity, or ones that blindly replicate short-term incentives and flaws?

Will your generative AI agent reflect your best version of your principles – or just the convenient one?

Conclusion: The Mirror Can Change

AI – our Leviathan – is not just a tool. It’s a reflection of the systems, values, and intentions we bring to it.

But are we aligning AI to the right things? Are we training it to reflect our biases and short-term thinking, or are we aspiring to something higher?

For businesses, this is both a challenge and an opportunity. Leaders have a choice: will AI amplify our worst tendencies, or will it help us imagine and build something better? Modern approaches, like Constitutional AI and Recursive Reward Models, offer tools to guide alignment—but they also highlight a responsibility: to shape systems that don’t just mimic humanity’s behavior but elevate its best traits—fairness, empathy, and creativity.

As Wilde wrote: “The aim of life is self-development. To realize one’s nature perfectly—that is what each of us is here for.”
If AI reflects us, perhaps its highest purpose is to inspire us to be better—to mirror not who we are, but who we could become.

AI Alignment – Are we aligning to the wrong thing?

Staring into the Mirror

The Leviathan Within: AI and Collective Humanity

From Reflection to Aspiration: Shaping AI with Our Best Principles

Conclusion: The Mirror Can Change

James