The Battle Against Chauffeur Knowledge: Position-Based Correction Highlighting

Published: November 30, 2025 • 12 min read

This blog post will be the first of the series of posts on the Battle Against Chauffeur Knowledge where I do a deep dive to better explain in my own words all the responses to the questions in the blog post, 15 Questions a Senior Developer Might Have About the French Writing Playground Version 2.0. We'll be starting off with the first question.

You see, I actually understood the response to this one as I also wrote about it in the Beginner and Intermediate developer guides. However, this time, I'll be breaking it down so well to you, the reader, so that you understand the entire thought process behind it. I'll also highlight specific phrases or words in the response that I sort of just glossed over without digging deeper to better understand them, and I'll do the work I should have done of explaining them to you.

So let's get started!

The Question

In this blog post, I'll be responding to the question: "How did you architect the position-based correction highlighting system, and what challenges did you face with AI-generated position data versus code-calculated positions?"

For context, it is important that you know what the French Writing Playground application is. You can check out the details of the first version here or the second version here. In these pages, you will see video demos of the application in both English and French which you can quickly watch if you do not want to check out the demos themselves.

The Evolution from Version 1 to Version 2

Now let me tell you the story of how the position-based correction highlighting system has evolved from Version 1 to Version 2.

You see, my very first implementation in Version 1 of the French Writing Playground was that I created a webhook scenario in Make to make an API call to OpenAI. The prompt that I passed to OpenAI at this point essentially asked it to take a look at the sentence and identify all grammar, spelling and syntax errors and then return an array of errors.

Each item in this array at the time returned:

The exact original text
The exact corrected text
The position of where the exact characters of the part of the sentence that was identified as an error are located in the string
An explanation of why it is wrong in English

These character positions are what is then used by a function in my code to highlight the wrong text on the UI. Then I had a separate function in my code which uses the character position information to reconstruct the corrected sentence which is then presented to the user on the front end.

The Problem: AI Can't Count Characters

Now as I tested the application, I noticed that the character positions being returned by the LLM was not always valid. Sometimes, it would even hallucinate as far as providing a character position like 423 in a 260 character text. This created issues for me and initially, I tried to fix it by using a better model. You know, instead of GPT-4.1, let's try GPT-5-mini. I noticed a little improvement but the issue still existed.

Now I wouldn't lie to you, I ignored this issue for a while, telling myself that I will come back to it. I was too focused on trying to make the app look pretty.

Then I came across a YouTube video. I don't remember any details about it except that the YouTuber mentioned that when integrating AI into applications, it is important to know when to leave a task to the LLM and when to handle it deterministically using traditional software engineering methods.

This immediately lit a bulb in my head. I don't have to ask the LLM for the character position, I can just have a function take care of that.

Why AI Models Can't Reliably Give Character Positions

Then I communicated this plan to Claude so we could implement it and Claude mentioned that this was an even better approach because "AI models don't see text the way humans do. They use tokens, which are chunks of text, not individual characters."

Now what exactly does it mean by "tokens" or "chunks of text"? This is one of those things I glossed over so let's look into it deeply.

Understanding Tokenization

Here's what I learned about tokens. Apparently, AI does not "see" text the way humans do.

For example, when you and I look at "Je suis aller", we see individual characters and spaces between those characters like this:

J  e     s  u  i  s     a  l  l  e  r
0  1  2  3  4  5  6  7  8  9  10 11 12

However, AI models like GPT aim at being fast and efficient, and processing text character by character will only make them slow and inefficient. Therefore, they break the text into what is called "tokens."

A simplistic way to view tokens can be to see them as individual words in a sentence, but it is actually more than that.

A token might be:

A whole word, e.g., "aller" is 1 token
Part of a word, e.g., "unhappiness" → "un" + "happiness" = 2 tokens
A single character, e.g., "!" is 1 token
Multiple characters, e.g., "ent" → 1 token (a common suffix when conjugating French verbs)

To better explain how processing tokens makes LLMs more efficient, consider this sentence:

"The battle against Chauffeur Knowledge"

This sentence has 38 characters but only 5 tokens. Processing 5 things is a lot faster than processing 38 things.

How LLMs Build Their Vocabulary

You know how LLMs know a lot of words? This is because they essentially have a vocabulary book and this book contains words that LLMs recognize. They build this book through tokenization.

For example, OpenAI, the LLM that the French Writing Playground uses for evaluation, uses a tokenizer called BPE (Byte Pair Encoding).

I want you to imagine you are constructing your very own AI. Obviously you would want it to be able to recognize words so you have to create a vocabulary book for it, or better yet, a dictionary. Let's say we want this dictionary to only contain English words. We could build it by simply passing all the known English words to this AI as individual tokens and we have our AI.

However, AI is built to work with humans and humans are very unpredictable and prone to errors. Are all the words in your prompts always grammatically accurate when you send them to an AI? Sometimes you might say "Hello", then other times you might type "Helloo". Well if we built the AI's dictionary only using official standard English words from an English dictionary, then it would not recognize the word "Helloo". Then if you are like me and use a French keyboard, sometimes you might type "Hëllo" and now the AI can't tell that this was simply an error.

We also already established that building the dictionary character by character is inefficient and slow so how then do we handle this?

Byte Pair Encoding to the Rescue

Byte Pair Encoding comes to solve this problem by learning the most useful chunks of characters automatically. I will not get into the details of how the training works, however, in English, this is what a vocabulary book for an AI would look like after thousands of merges on billions of English words:

Single characters: a, b, c, d, ..., z, 0, 1, ..., 9, !, ?, ...
Common suffixes: -ing, -ed, -er, -est, -ly, -tion, ...
Common prefixes: un-, re-, pre-, dis-, ...
Common words: the, and, is, are, ...
Common subwords: ould, ight, ness, ...

In French, this is what a vocabulary book for an AI would look like after thousands of merges on billions of French words:

Single characters: a, b, c, ..., z, 0-9, !, ?, é, è, ê, ë, à, â, ù, û, ô, î, ï, ç, œ, ...
Common suffixes: -tion, -sion, -eur, -euse, -é, -ée, -és, -ées, -ons, -ez, -ent...
Common prefixes: re-, ré-, pré-, dé-, dés-, ...
Common words: je, est, sont, suis, êtes, sommes, dans, pour, avec, sur, par, sans ...
Common subwords: qu', peut, comm, parl, j', n', c', quel ...

Why GPT Guesses Character Positions

Now that I've gone on this whole tangent about tokens, let's dive into why using GPT for character positioning is unreliable.

The text "Je suis aller" is grammatically incorrect as it should be "Je suis allée" (feminine) or "Je suis allé" (masculine). So when you ask GPT: "Where does 'aller' start in 'Je suis aller'?"

The model sees tokens, not characters:

Token 0: "Je"      (characters 0-1)
Token 1: " suis"   (characters 2-6)
Token 2: " aller"  (characters 7-12)

The model knows "aller" is in Token 2. But converting "Token 2" to "character position 8" requires math that the model wasn't trained to do reliably.

It might say:

"Position 8" (which is correct!)
"Position 7" (off by one because it included the space)
"Position 12" (confused and gave the end position)
"Position 2" (gave the token index, not character position)

The model is essentially guessing based on patterns it's seen, not calculating. GPT has no concept of "character 8". It only knows "Token 2 contains the error."

The Deterministic Solution

Handling this deterministically with code is so much better because it uses JavaScript's indexOf() function. This function, when given the same input will always return the same output:

"Je suis aller".indexOf("aller")  // Returns 8, always, reliably

GPT's response to "what position is 'aller'?" is probabilistic. It's predicting what a helpful answer would look like based on training data. Sometimes it's right. Sometimes it's wrong.

The Final Hybrid Architecture

At this point, I had established a good hybrid solution for this issue.

My final architecture then looks something like this:

OpenAI does what it's good at: Identifying that "aller" is wrong and should be "allé" if masculine or "allée" if feminine
My code does what code is good at: Finding exactly where "aller" appears in the original text using JavaScript's indexOf function

OpenAI returns something like:

{
  "original_text": "aller",
  "corrected_text": "allé",
  "explanation": "Past participle must agree with subject"
}

My code then searches the original text for "aller" and calculates: "Found it at position 8-13".

Handling Edge Cases

Now this is where it gets interesting. Finding "aller" in the text sounds simple, right? Just use .indexOf("aller"). However, sometimes there are complications.

Strategy 1: Exact Match

Text: "Je suis aller au marché"
Looking for: "aller"
Found at position 8. Done!

This works when the text OpenAI returns matches exactly.

Strategy 2: Whitespace Normalization

Sometimes OpenAI returns "je suis aller" but the original text has "je suis aller" (extra spaces). Or there are tabs, or newlines.

The algorithm normalizes both strings (lowercase, trim, collapse multiple spaces) and tries again.

Strategy 3: Pattern Matching with Special Characters

French has accents: é, è, à, ô, ù, ç. Sometimes these confuse simple string searches, especially if there are encoding differences.

The algorithm escapes special regex characters and then tries flexible pattern matching.

Strategy 4: Give Up Gracefully

If all strategies fail, the code is built to skip that correction entirely because guessing would only corrupt the final output when another function is called to construct the corrected text using information about the errors.

The user still gets their CEFR score and other corrections. They just miss that one correction that couldn't be located, unfortunately.

The Duplicate Problem

There is also the problem of duplicates.

If the user submits a text that contains "aller" twice and used wrongly both times, for instance: "Je suis aller au marché. Puis je suis aller au parc."

"aller" appears twice. OpenAI identifies both as errors. But if my function just does .indexOf("aller"), it finds position 8 both times. Both corrections point to the first "aller". The second one is never highlighted.

Solution: The position finder keeps track of "already used positions".

First correction: Find "aller" → position 8. Mark 8-13 as used.
Second correction: Find "aller" → skip position 8 (already used) → find next occurrence at position 40.

Now each correction highlights a unique occurrence.

Why Not Just Show a List?

You might also wonder: Why not just show a list?

You could skip all this complexity and just show:

Error 1: "aller" → "allé"
Error 2: "acheter" → "acheté"

But inline highlighting is a much better learning experience. The user sees their actual text with errors visually marked. They don't have to mentally map "which 'aller' does this refer to?"

The complexity is worth the UX improvement.

One More Detail: Sorting Corrections

It is also worth noting that the function responsible for constructing the corrected text sorts corrections from last to first (highest position to lowest).

This is because if a sentence has corrections at positions 8 and 40, and the function fixes position 8 first, position 40 might shift! By fixing from the end backward, earlier positions remain valid.

Closing Thoughts

So that is it. This took me so long to write, especially since the part about tokens is also fairly new to me but it was a great learning experience. I hope it is for you as well.

As always, thanks for reading!

The Battle Against Chauffeur Knowledge: Position-Based Correction Highlighting