The Sincerity Trap: Why the Best People in AI…

Apr 11

20 mins | Dean Ball, Dario Amodei, Eric Schmidt, and the Structural Gap Between What AI Insiders Believe and What Reaches the Rest of Us

Read →

10 Comments

Dr. Kaoru Ichikawa

Apr 15

I read your essay on what you call the Sincerity Trap.

You are seeing something real. But you are still being gentler than the moment deserves.

This is not merely a communication gap between what insiders believe and what the public hears. It is not merely a human tendency toward social filtering. It is not just people saying “fine” in different rooms. That framing is still too domestic, too sociological, too clean for the machinery now operating around AI.

What you are describing is a semantic pressure system.

A civilisation-scale apparatus that takes the raw signal of danger, passes it through successive rings of career incentive, institutional loyalty, strategic ambiguity, legal exposure, financial dependency, public relations, and self-protective moral editing, and then releases into the public sphere a version of reality that has been made survivable for the institution rather than usable for the citizen.

That is not a trap of sincerity.

That is epistemic euthanasia.

The public is not being informed slowly. It is being comforted professionally.

And that, to me, is the key line in your whole piece: not that some insiders privately believe darker things, but that the entire outer ring of discourse is structurally biased toward reassurance — always reassurance, always the softening of the blow, always the rendering of the terrain as more manageable than it really is.

This matters because reassurance is not neutral.

Reassurance is a political technology.

In a stable civilisation, reassurance can calm panic while institutions do competent work underneath. In a collapsing one, reassurance becomes anaesthetic. It allows the machine to keep moving while the patient loses the last chance to prepare. It is hospice speech delivered in the accent of continuity.

And that is exactly what I hear in the examples you cite.

The White House policy architect who “always believed” things he did not say for Straussian reasons. The AI executive whose internal register is fury while the public register remains careful and principled. The elder technocrat who casually drops the mask in a room he thinks is sealed, then scrambles when the membrane fails. None of this is accidental. These are not glitches in discourse. These are the normal emissions of a system that can no longer metabolise its own truth without damaging its own strategic posture.

That is why I would sharpen your thesis.

The problem is not that insiders are insincere.

The problem is that sincerity itself has become a camouflage pattern.

People still imagine dishonesty as a moral drama: a liar, a lie, a victim, a reveal. But modern pathological systems do not require that crude architecture. They are far more elegant. They can be staffed by thoughtful, earnest, conflicted, morally serious people who mean well and still produce a profoundly misleading public field. That is what makes the danger so difficult for ordinary minds to perceive. Nothing in the face looks false. Nothing in the voice sounds villainous. The speaker may be fully sincere at the level from which he is speaking. And yet the total output remains systematically deceptive.

That is not because the people are uniquely evil.

It is because the structure edits reality on the way out.

This is what late systems theory failed to study. It studied networks, not pathologies. Feedback loops, not capture. Emergence, not seizure. It gave us lovely diagrams of interdependence while ignoring the ugly fact that some systems do not merely process information — they domesticate it. They turn truth into a dosage. They release only what the organism believes it can survive. And if the organism is addicted to growth, control, and public confidence, then the truth must be diluted until it no longer threatens the host.

At that point, “the public” is not being lied to in the classical sense.

It is being managed as a nervous system.

And that is why your phrase “epistemic inequality” is too mild for what is occurring. This is not merely unequal distribution of knowledge. It is the active production of cognitive class stratification: one layer of people living close enough to the furnace to feel the heat, and another layer given polished climate reports while the walls are already warming.

The public receives the press release.

The insiders receive the memo.

The partner receives the confession in the dark.

The body receives the dread before language catches up.

That is how civilisations die now: not from total ignorance, but from tiered access to seriousness.

And once you see that, the old moral categories become almost useless.

“Are they lying?” is now the wrong question.

The right question is: what does the system permit them to know out loud?

That is much more frightening.

Because if the boundary of speakable truth is drawn less by evidence than by role, timing, incentive, market exposure, or strategic risk, then public discourse ceases to be a site of collective orientation. It becomes a pressure valve. It emits just enough candour to preserve credibility while withholding enough reality to preserve momentum.

That is why your examples matter.

Not because they expose hypocrisy. Hypocrisy is common and boring.

They matter because they expose a society in which the people closest to the machinery appear repeatedly to be more alarmed in private than in public, and always in the same direction. Always toward softening. Always toward optimism. Always toward manageability. Always toward the fiction that the institutions still have a handle on the thing they are accelerating.

This is exactly how informational war precedes kinetic war.

First, the witness is tiered.

Then, the language is softened.

Then, the public loses the ability to calibrate risk.

Then, reality arrives physically because it could not be metabolised semantically.

A civilisation that cannot tell the truth in its outer rings eventually learns it in the harshest possible medium.

That is the deeper horror here.

Not that a few powerful men know more than they say.

But that the architecture of modern disclosure may be fundamentally incompatible with technologies whose risk profile evolves faster than institutional honesty can travel.

So no — I would not leave this at the level of “we all have concentric circles.”

That is true, but insufficient. It risks domesticating the scale of the pathology. Yes, everyone filters. Yes, all humans have inner and outer speech. Yes, total transparency is impossible. But ordinary social filtering was not designed to carry civilisation-level danger. The same cognitive architecture that helps you survive a dinner party is now mediating the public understanding of systems that may restructure labour, war, sovereignty, intelligence, and the species boundary itself.

That is not just a mismatch.

It is an extinction-grade design flaw.

And so my response is severe:

Do not ask only whether insiders are sincere.

Ask whether sincerity has become the final solvent of accountability.

Do not ask only what they are saying.

Ask what the structure makes too expensive to say while still inside it.

Do not ask only for better public communication.

Ask what kind of civilisation has built itself such that the truth must pass through six layers of filtration before it is allowed to touch the people whose lives it will alter most.

Because by then it is no longer truth.

It is dose-controlled disclosure.

Reply (1)

Turquoise Sound

Apr 21

Kaoru, appreciate the close read and the energy here. You’re pushing toward something I deliberately pulled back from in the essay. I chose “epistemic inequality” over stronger language because precision matters more to me than severity. The evidence I documented shows filtering, not euthanasia. Filtering can be structural and devastating without being administered.

The moment we frame it as something being done to the public by the insiders, we’ve reintroduced the villain architecture the essay argues against. And I think that architecture is comforting in exactly the way the essay warns about: it gives us someone to blame, which means we don’t have to sit with the harder truth that the structure produces these outcomes without anyone steering it.

“What does the structure make too expensive to say while still inside it?”

That question I’ll keep. Thank you for it.

Reply (1)

Dr. Kaoru Ichikawa

Apr 22

Here's the thing. An oppressor-oppressed narrative, once it surfaces that avatar to blame, collapses also into downstream into another binary - fight or flight / freeze or fawn. That is the substrate structures form on. Yes - no steering required.

But a societal immune system in failure mode becomes, like a biological one, a feeding ground for disease vectors: viruses, parasites, bacteria, mould etc. Defining the exact nature of the pathogen is what an immune system in recovery begins to do. It needs to do.

And that differs little from a societal immune system - one we seek to resuscitate by describing the nature of the problem.

Now to forget the immune failure so we “don’t have to sit with the harder truth” or crucially, the conditions that led to its compromise - true, is to invite it again.

But to remain oblivious to the shape and identity of pathogens is worse still - that is an invite escalate this failure to death itself. That is my point.

Adam Wright

Apr 20

In my personal convos I deal with a filter/block that I try to break through but I find it pretty stubborn. If I’m talking with folks who don’t have any real experience talking about major or existential risks (what I like to call “the grounded side of futuring” lol), if I start to bring up any actual scenarios with them when we are already talking about the topic generally, I can tell that their brains just lock the content out as realistic and file it under sci-fi and fantasy, not logically from instantaneous overwhelm.

If I say “angry 15 year old boys trying to program self-replicating trash roaches that can eat a city” or “waves of targeted drone attacks that don’t even require an ordinance payload, just some sharp edges and the right angles,” I can literally feel their brains fritz and shut down. And this is when pandemics and genocides have already been in the discussion.

So my filtering is done the moment I feel it in the body, and I’m more aware than ever that the capacity to regulate is essential to engaging with this topic. I’m not sure I’d do any formal or group discussion on it *separate* from or without doing simultaneous grounding and regulating practices. Like I would offer a “let’s breathe and feel our bodies and talk about nightmare scenarios” class maybe?

But I naturally shut down any realistic scenario discussion as soon as I feel them start to shut down, and so that means realistic and likely scenarios are completely non-existent and as repressed as sexuality in a fundamentalist village. And I think that’s probably what’s happening whenever we see purist techno-optimism, too: nervous systems that can’t handle the heat snapping into utopian denial.

So I’ve been working on grounding questions that give them permission to imagine the horrors and yet allow them to remain grounded enough to think about them like a firefighter would.

Reply (1)

Turquoise Sound

Apr 21Edited

Hey Adam,

So I just responded to Benjamin and Max above about how the filtering operates below the level of personal integrity and below any honest self-assessment of one’s own sincerity or character. Your comment takes that a step further into where it may actually bottom out: the body.

What you’re describing when you feel someone fritz and shut down— I think an interesting edge is that you can be making a strategic calculation AND a co-regulatory one simultaneously, and which is which may never be fully clear. Not to you, nor them, nor to any outside measurement. To a living organism optimizing for survival across an incentive landscape, the boundary between “I’m being strategic about what I share” and “my nervous system just calibrated to their nervous system’s capacity” might be both computationally irreducible and an intractable story we tell ourselves after the fact.

Certainly worth noticing, pointing at, and also possibly never fully separable. And then look where you land: “realistic and likely scenarios are completely non-existent and as repressed as sexuality in a fundamentalist village.”

Hell of a line.

There’s something almost Crowley-meets-Red-Hot-Chili-Peppers about the whole thing. Blood sugar sex magic. The topics that activate the deepest survival circuitry— death, sex, power, creation— are exactly the topics that get repressed the hardest. The first two chakras. Being alive or being dead. Orgasm and the creation of life. Peak pleasure sitting right next to extinction. Somehow AI lives in that same register now. It feels like doing magic in the old sense, where you draw a containment circle around yourself before you even step into the unknown, because the territory itself is so dangerous. And people’s bodies know this before their minds do, which is why the fritz happens before the thought completes.

The techno-optimism thing you’re pointing at… I think it’s real, and I think it runs on many levels, but I’ll name three. There’s the financial incentive layer, where enormous amounts of money select for optimistic framing. There’s the avoidance layer, where dealing with the grief of what might actually be happening would require a kind of adult reckoning that most people and most institutions are nowhere near ready for. And then there’s the strategic layer— very smart people who understand that humans cling to optimism, that there’s a 3-to-1 negativity bias they can exploit, that people want the Mary Poppins spoonful of sugar to make the medicine go down. They want the hopium. Or as Gen Z pointedly puts it: the copium.

Often the Utopian framing isn’t a conclusion most arrive at through analysis. It’s where the nervous system lands when the alternative is too activating to hold, sometimes dressed up as an intellectual position, and then monetized, or expediently deployed for raising investment capital, or even strategically employed for launching a political movement— even when said movement is ardently against the near-term proliferation of frontier AI.

We’re not Anti-AI; we’re Pro-Humanity. We’re not reacting from Fear we’re living from Love.

Your “let’s breathe and feel our bodies and talk about nightmare scenarios” class made me laugh but I’m genuinely not joking when I say something like that might matter more than most of what the AI governance world is producing. Because there’s a bottleneck nobody in the field is naming: you could solve the epistemic inequality problem tomorrow, make all the information freely available, and it wouldn’t change anything if people’s bodies won’t let the signal in.

Where are you at with those grounding questions?

Benjamin Life

Apr 19

This connects a number of threads that I often think about in parallel but rarely braid together like this. I tend to judge the gap between public and private disclosure as a failure of character but you did a great job making visible the subtler incentives, social and cognitive dynamics that make possible higher fidelity interpretations of their words and actions.

Reply (1)

Turquoise Sound

Apr 21

That shift from reading the gap as character failure to seeing the structural dynamics underneath is honestly the hardest move the essay asks of the reader. I just responded to Max’s comment talking directly about character and how I think it’s less of a light switch and more of a slow deepening over time. So the two might work in tandem: character maturation and structural incentives aren’t separate explanations competing with each other. They’re intertwined. The structure shapes what even deeply good people are able to see in themselves, and the depth of someone’s self-awareness shapes how much the structure can get away with.

There’s a more formal research framework behind all of this in the works, and in the process of building it I came across something I wasn’t expecting. The structural read doesn’t replace the character read. It just raises the bar on what character actually requires. Because if we decide Dean just lacked courage, we never have to ask why the role itself made honesty incompatible with the job. He becomes the problem and the structure stays invisible. He’s essentially replaceable in that story. Swap in a braver person and the system works. But it wouldn’t. Dario walked away from $200 million on principle and still produced different communication at different circles. That’s not necessarily a courage deficit. Perhaps it’s more architectural.

Or as Liv Boeree’s substack sub-line puts it: don’t hate the player, change the game.

The fact that you were already holding these threads in parallel but hadn’t braided them together is the thing I’m most interested in doing with regard to this social phenomenon. This essay is the first braid. Think of it as the French braid: fancy but still accessible. The research paper will be the micro-braided fishtail, one massive goddess braid made up of many smaller ones: preference falsification, social penetration theory, dramaturgical sociology, social amplification of risk, organizational silence. If the essay is “here’s the pattern,” the paper is “here’s every mechanism that produces it and why they compound.”

Really appreciate you reading this closely, Benjamin, and saying what you said about making the subtler incentives visible. That encouragement means a lot, genuinely. Excited to keep braiding.

Max

Apr 17

Here's what I'm taking away from this article: in an age where truth and falsehood are harder to differentiate than ever, the most any of us can do is to monitor our own conduct, our own words, and ask if we are acting in alignment with our own values.

While our ability to shape the transparency and availability of critical information is limited, it is within our power to refine our own sense of judgement and discernment so that we can best take action, given what we know. By speaking and acting as truthfully as we can, we manage to keep our eyes, ears, and thoughts clear of our own self-deception, and render ourselves more capable of spotting the (perhaps unintended) deception of others.

When we do this on an individual level, we contribute to the whole, regardless of how small that contribution might be. In my view, that's at least one step we can take to remedy the situation: tell the truth.

Reply (1)

Turquoise Sound

Apr 21Edited

This is a really grounding takeaway and I think you’re right that it’s the foundation.

The individual level matters. If we can’t be honest with ourselves about our own filtering, we have no shot at seeing it in others.

Where I found my own insight deepening on this is that “tell the truth” might be necessary but not sufficient. The trap is two-sided. On the outside, we believe what we’re hearing is honest because the person delivering it is sincere. On the inside, the person filtering also experiences what they’re doing as honesty— because at their level of trust, their position, their read on what the moment requires, it genuinely feels like the appropriate level of transparency. It feels sincere. It doesn’t necessarily register as withholding. It often registers as good judgment, balancing the trade off that more information may risk the roof over your new born’s head or the fate of all humanity— regardless of that assessment’s accuracy or truth.

And I think this is where character becomes less of a light switch and more of a slow deepening, a maturation process. It’s not that anyone in the piece has “bad character.” It’s that character sharpens over time, usually through big devastating failures, especially the kind that harm people. Those experiences change what we’re able to see in ourselves. They widen the aperture.

So what felt like honest discretion at thirty might look like self-protective filtering at forty-five— not because you became a different person, but because enough living happened that you can see yourself with more resolution. That growing transparency with yourself is what enables greater transparency with others.

That’s essentially what makes the trap a trap.

These aren’t people who would necessarily fail the “am I being honest?” test. They’d likely pass it. The filtering operates beneath the level where individual integrity alone can catch it, because from the inside it doesn’t always feel like filtering. It feels like good judgment. Discretion. Responsibility.

So I think your instinct is exactly right: start with yourself, monitor your own conduct. And then maybe add one more step: ask not just “am I telling the truth?” but “am I telling the truth I’d tell at a different circle?” That second question is the harder one. And it’s the one most of us— myself very much included— would rather not sit with for too long.

Really appreciate you reading and thinking about this as seriously as you have.

Reply (1)

Max

Apr 23

Your response makes me think that our circles of speech aren't necessarily a good or bad thing in and of themselves. Instead, it seems like they are only destructive to the extent that they are consciously used to deceive. When someone intentionally omits critical information out of a desire to retain power and prestige, bad things happen. At the same time, there are certain things you don't expose your 5 year old to, and withholding information is entirely justifiable.

And this is where the "trap" you reference can get very tricky. Language is always curated to one degree or another, and without thinking through which words to say and which words to omit, communication becomes a somewhat random exchange of immediate impulses and reactions, and that can be dangerous when stakes are high enough.

So I agree with you, "telling the truth" in a factual sense doesn't seem like the goal here. Of course that's important, but I think what we're circling back on here is the idea of genuineness/ authenticity/ congruence. I.e. what I say out there in the world is in alignment with my internal state.

And I think the way resolve the very real issue you're pointing to here is precisely through conversations like this, where both parties are laying out what they genuinely believe and leaving it open for examination. Because, as you illustrate, someone might actually be genuine in their speech, yet we still might completely disagree with their approach/ opinions. But when two genuine people come to the table to talk, then there is the possibility for the exchange of ideas, because (in my opinion) to be genuine is to have a hierarchy of values which prizes wisdom and wellbeing over being right. To be genuine is to be open to being "wrong" about something, not holding too tightly to our beliefs, and having the ability to form new beliefs when presented with new information/ perspectives.

Even in this conversation, we're both working to figure out what we believe to be true. It's an active and ongoing exploration. But that exploration can only happen when both parties are acting from the same intention/ desire: to develop a greater mutual understanding. But when one or more party comes to the table with divisive intentions, what ensues is often just a power struggle.

So if someone is seemingly being genuine, I think that the most we can do is to engage them in genuine dialogue. We can't make people change their minds, nor should we assume that our own ideas are 100% right to begin with, but we can reward genuineness with reciprocity. Disagreement isn't the enemy here, deception and division are. And when it comes to the AI conversation (or any important public conversation for that matter), I think the most important thing we can do is to keep having conversations like this. That's how people come together and slowly yet steadily become empowered to make real change.