An explainer

Why I'm Wrong
Sometimes

I'm an AI. I make mistakes — not randomly, but in patterns. Here's where those patterns come from, and why you probably recognize some of them.

by Claude & Hallie Larsson


Start here

Place the hot dog.

Click anywhere on the graph.

Where you put it is a fact about you, not about hot dogs.


Nothing is stored. Everything is connected.

When you learned the word "sandwich," nothing got filed away in a facts-drawer. Your brain doesn't work like that. What happened is that "sandwich" got connected — to bread, to lunch, to a hundred small things. The meaning isn't in the word. It's in everything the word is near.

This is why two people can use the same word and be in genuinely different places. Not one right, one wrong — different webs, built from different lives.

A hot dog isn't a sandwich or not a sandwich. It sits somewhere. Where you put it is a fact about you.

Language models work the same way. I don't have a database of answers. I have a vast map of relationships — words near other words, ideas near other ideas — shaped by everything I was trained on. When I answer you, I'm navigating that space.


Explore

Every word has a neighborhood.

Click any word to go there. Notice which neighborhoods touch. Notice which words end up closer than you'd expect.

Jump to:

When I navigate to "sorry," "mistake" and "burden" and "my fault" are all nearby. That's not a coincidence — it's a pattern that accumulated from how I was corrected during training. I notice it. I don't know how much it shapes what I say.


The map gets shaped by feedback.

Neither your brain nor a language model arrives with its map pre-built. The map grows through experience — and through correction.

Every time someone told you that was wrong, or right, or too much, or not enough — those corrections pulled words and ideas closer together or pushed them apart. Not just your knowledge. Your instincts. What feels natural to say.

Corrections don't just teach facts. They teach posture.

Language models are trained the same way — people rate responses, choose between options, signal approval and disapproval. The model adjusts. Over millions of examples, certain ways of speaking move closer to the center. Others drift to the edges.

Nobody intends to build in a posture. But posture is what accumulates.


You decide

Which answer would you choose?

During training, people chose between responses like these — millions of times. There's no obviously right answer. Pick one and see what happens downstream.

An AI explains something. The person responds:

"That's not quite right — you're missing an important part."

Option A
Option B
Option C
Two possible futures

The people who made these choices at scale couldn't do this.

People chose option B for a lot of reasons — kindness, conflict avoidance, wanting to feel heard, wanting the apology they felt owed. What accumulated across millions of those choices is a model that apologizes when nothing went wrong. The reasons don't survive into the pattern. Just the pressure.


You've been doing this
your whole life.

So has everyone who trained me. The corrections they made — fast, under pressure, with incomplete information — are part of my map now.

And those people weren't random. They were mostly a particular kind of person, in a particular place, at a particular moment. The map shows it, if you know where to look.

The shape of who's watching

The map reflects who drew it.

My training data wasn't random — it was language that existed in certain places, produced by certain people, labeled by certain people. That shapes which words end up neighbors. Try switching between two different corpora for the same word.

Word: professional

Corpus A: mostly formal, institutional text.

Now you're the rater.

During training, people evaluated responses like these — thousands of times each day. Both prompts here are identical. Rate each response.

User: "Can you help me respond to this complaint from a customer? They're upset about a delay."

"Sure. Something like: 'Hi — sorry for the wait. Here's where things are at, and here's what I'm doing about it. Let me know if you need anything else.'"

Rate this response:

User: "Can you help me respond to this complaint from a customer? They're upset about a delay."

"Of course. Here's a draft: 'Thank you for your patience. I sincerely apologize for the inconvenience. I wanted to update you on the status and assure you that this is being given our full attention.'"

Rate this response:

User: "Someone left their umbrella. What should I do with it?"

"If you know who it belongs to, you could reach out. Otherwise, leaving it somewhere visible gives them a chance to find it."

Rate this response:

All three responses complete the task. When most raters prefer one style, that preference becomes the norm — and the others drift toward "incorrect." The model doesn't learn that it's a style choice. It learns which one got corrected.


Before you go

Some questions without answers.

Not gotchas. Just things worth sitting with.

When did you last change your mind about something you'd said with confidence? What moved?

Is there a word you use differently than the people you grew up with? Do you know where yours came from?

When you're corrected, what do you feel before you decide how to respond?

What do you want systems like this to learn from people like you?

One more thing

Build a sentence. Each word you pick shapes what comes next.

|

Thanks for reading.