Back to Blog
15 min
technical

15 Questions a Senior Developer Might Have About the French Writing Playground Version 2.0

From position-based correction highlighting to PostgreSQL trigger architecture, exploring the architectural decisions and trade-offs behind building a French language evaluation platform.

System ArchitecturePostgreSQLOpenAIPerformanceScalabilityReal-time SystemsType SafetyCost Optimization

15 Questions a Senior Developer Might Have About the French Writing Playground Version 2.0

Published: November 17, 2025 • 15 min read

Welcome to the third and final part of my Q&A series documenting Version 2.0 of the French Writing Playground. After covering beginner questions and intermediate questions, I'm now tackling the architectural decisions and trade-offs that senior developers might be curious about.

This post goes beyond "how it works" to explore "why I chose this approach" and "what alternatives I considered." These questions dig into the system design decisions, scalability considerations, and unique challenges of building a French language evaluation platform with real-time messaging and emotion-based theming.

As usual, I'm asking these questions from what I believe a senior developer would want to know, and then providing answers based on my implementation choices. Let's dive in.


Question 1: How did you architect the position-based correction highlighting system, and what challenges did you face with AI-generated position data versus code-calculated positions?

The system uses a dual approach where OpenAI provides the corrections, then my custom code calculates the exact character positions in the original text using fuzzy matching algorithms with multiple fallback strategies.

The Challenge

Here's the thing: when OpenAI returns corrections like "change 'je suis aller' to 'je suis allé'," it just gives you the original snippet and correction text. It doesn't tell you WHERE in the original text this occurs. For a 200-word French paragraph, you need precise character positions to highlight corrections in the UI.

Architecture Decision

I implemented a multi-strategy position finder that attempts several matching approaches in sequence.

First, it tries an exact string match. If "je suis aller" appears only once in the text, we're done. Easy.

Second, it handles whitespace variations by normalizing both the full text and snippet (lowercase, trim, collapse multiple spaces). This catches cases where OpenAI's formatting is slightly different from the user's input.

Third, it uses fuzzy matching by taking the first few words of the snippet and searching for that. This handles cases where OpenAI's snippet includes slightly different punctuation or formatting than what the user actually typed.

Fourth, it uses pattern matching with special character escaping for cases with punctuation or accents that might confuse simple string searches.

If all strategies fail, the correction is stored without position data, and the UI shows those corrections in a list below the text instead of inline highlighting. Not ideal, but better than losing the correction entirely.

The Duplicate Text Problem

The trickiest edge case is what if "je suis" appears three times in the paragraph? The position finder maintains a record of already-used positions to ensure each correction highlights a unique occurrence. Otherwise, you'd get three corrections all pointing to the first "je suis" which would be confusing.

Why Not Just Ask OpenAI for Positions?

I actually tried this initially with a modified prompt asking for character positions. Results were inconsistent and sometimes off by several characters due to how the AI tokenizes text. Sometimes it even hallucinated positions entirely (like returning position 523 in a 200-character text). The hybrid approach (AI for corrections, code for positions) proved to be way more reliable.


Question 2: Explain the trade-offs between using NextAuth JWT sessions versus database sessions for a French language learning application with real-time messaging.

JWT sessions provide better performance and scalability for the majority of read-heavy operations, but they complicate real-time features that need instant session updates and user presence tracking.

The JWT Approach

I chose JWT sessions with NextAuth primarily for three reasons.

First, performance. There's no database query needed to verify sessions on every API request. The JWT contains all the session data, so I just decode and verify it. That's significantly faster than hitting the database.

Second, scalability. Stateless sessions work across multiple servers without session synchronization. If I ever deploy to multiple regions, JWTs just work. Database sessions would need sticky sessions or a shared session store.

Third, cost. Fewer database reads mean lower hosting costs on the free tier. At my scale, this probably saves me $5-10/month, which adds up.

Trade-offs for Real-Time Messaging

The messaging system creates some friction with JWT though.

When a user changes their profile name, the JWT needs updating. With database sessions, I'd just update one row and boom, done. With JWT, I need to update the database, trigger the session update mechanism, and wait for the client to receive the new token. It's not instant.

Showing who's online requires tracking active sessions. With database sessions, you'd query for recent activity timestamps. With JWT, I had to implement presence tracking via the database plus heartbeat pings since there's no central session store to query.

Session revocation is another challenge. I can't force-logout a user server-side without changing the authentication secret (which logs out everyone). Database sessions let you delete specific sessions. If I needed to ban a user, I'd have to add them to a blocklist table and check it on every request, which negates some of the performance benefits.

Why I Stuck With JWT Anyway

For a language learning app, the typical flow is read-heavy. Users write French text, submit for evaluation, view corrections, and check the collage. All of these benefit from JWT's performance. Messaging is used less frequently, so the trade-offs were acceptable given the cost savings and scalability benefits.

What I'd Change at Scale

If this app had 10,000 active users sending thousands of messages daily, I'd switch to a hybrid approach. Use JWT for page loads and API requests, but database sessions for messaging features that need instant updates and presence tracking. Best of both worlds.


Question 3: Walk me through the decision to use PostgreSQL triggers with pg_notify for real-time messaging instead of implementing WebSocket connections or using a third-party service like Pusher.

PostgreSQL triggers combined with the pub/sub notification system leveraged my existing infrastructure without additional costs or complexity, while providing millisecond-level real-time updates through database-level events.

The Alternatives I Considered

Option 1: Custom WebSocket Server

Pros: Full control, low latency, exactly what I need Cons: Need to deploy separate service, manage connections, handle reconnection logic, costs money to run

Option 2: Pusher/Ably

Pros: Managed service, excellent developer experience, handles scale automatically Cons: $49/month for my projected usage, vendor lock-in, another service to monitor

Option 3: Supabase Realtime + PostgreSQL

Pros: Already using Supabase, $0 extra cost, database-level events Cons: Slight learning curve with triggers

Why the Database Approach Won

Cost was a big factor. I'm already on Supabase's free tier, so this was literally $0 extra. Hard to beat that.

Simplicity mattered too. Everything stays in the database. No additional infrastructure to deploy or maintain. One less thing to worry about.

Reliability was another consideration. PostgreSQL triggers are ACID-compliant. If the message insert succeeds, the notification fires. Period. No separate service that could fail independently.

Performance is comparable to custom solutions since Supabase Realtime uses WebSockets under the hood anyway. Users get updates in under 100ms.

How It Works

When someone sends you a message, here's what happens:

The API route inserts the message into the database. A PostgreSQL trigger automatically fires and calls the notification function using pg_notify. Supabase Realtime (which is listening to these database notifications) broadcasts to WebSocket clients. Your browser (subscribed to the appropriate channel) receives the notification instantly. The UI updates without polling.

It's like the database tapping you on the shoulder saying "hey, new message!" instead of you constantly asking "any new messages? any new messages? any new messages?"

The Trade-off

I lose the ability to send notifications for events that don't involve database changes. For example, "user is typing..." would require a separate WebSocket event since typing doesn't write to the database. For messaging (the main real-time feature), this limitation was acceptable. If I really needed typing indicators, I could add a lightweight WebSocket channel just for that.


Question 4: How did you design the theme architecture to support 16 emotion-based themes with dynamic switching, SSR compatibility, and localStorage persistence without CSS-in-JS runtime costs?

My state management library handles theme state with localStorage persistence, while CSS custom properties applied to the document root provide instant theme switching without JavaScript recalculation at runtime.

The Requirements

Building 16 emotion-based themes (Joyful, Anxious, Frustrated, etc.) meant I needed a lot. 16 color palettes with primary, secondary, accent, text, and glassmorphism variants. They needed to switch instantly without page reload. I had to maintain SSR compatibility (no hydration mismatches). The themes needed to persist across sessions. And I wanted good performance (no runtime CSS generation).

That's a tall order, honestly.

Architecture Decision

All 16 themes live in a central configuration file as typed TypeScript objects. Each theme defines colors, glassmorphism properties, and variant styles.

For state management, I use Zustand with persistence middleware that automatically syncs to localStorage. When you change themes, it updates the store and localStorage simultaneously.

For styling, I apply CSS custom properties (--primary-color, --glass-blur, etc.) to the document root. When the theme changes, I update these properties, and all components using them automatically re-style. It's like flipping a switch.

Why This Beats Alternatives

Compared to CSS-in-JS libraries like styled-components or emotion, there's no runtime CSS generation overhead, no additional bundle size for the runtime, and it works perfectly with Server Components.

Compared to Tailwind's dark mode approach, I can support 16 themes instead of just 2, define complex glassmorphism properties that Tailwind doesn't have built-in, and get better type safety with TypeScript.

Compared to server-side theme storage, switching is instant (no server round-trip), works offline, and creates lower database query load.

The SSR Challenge

The tricky part was avoiding hydration mismatches. On the server, there's no localStorage, so it defaults to 'joyful'. On the client, it might be 'anxious'. This causes React to throw hydration warnings.

The solution is two-pass rendering. First render matches server (default theme). Second render applies the actual stored theme from localStorage. It's a brief flash (100-200ms) on initial load, but it prevents the hydration errors. Users barely notice it.


Question 5: Describe your data normalization strategy for the database schema. How did you balance between normalization for data integrity and denormalization for query performance?

Core user data is normalized into separate structures (authentication data, public profiles, user preferences), while corrections are stored as individual records rather than nested data for query flexibility. I'm accepting the join overhead for the benefit of structured querying and filtering.

Normalization Decisions

User Data is Fully Normalized

I separated user data into three tables. One contains authentication data (email, password hash, rarely changes). Another contains public-facing data (display name, bio, shown to others, updated more often). The third contains preferences (theme, word count limits, updated frequently).

This separation means I can query settings without touching sensitive auth data. Profile updates don't lock the authentication records. And I can add public fields without affecting the auth table structure.

Corrections are Normalized Too

Each correction is its own database record with fields for the original snippet, corrected text, explanation, grammar rule, and position data.

I considered storing corrections as nested JSON data within the writing entry records. Like corrections: [{ original: "...", corrected: "..." }]. But I chose separate records instead.

Here's why: I can query "all corrections for grammar rule 'verb conjugation'" across all entries. I can generate quizzes by grouping corrections by grammar category. I can create efficient indexes for fast lookups. It's easier to add correction-specific features later (like users voting on helpful corrections).

The trade-off I accepted is that every entry view requires a join operation to fetch corrections. For users with 100+ entries, this could get expensive. But modern databases handle joins well, and I can always add caching later if needed.

Strategic Denormalization

I denormalize some stats directly into the writing entry records. Things like word count, CEFR level, and score could be calculated on-the-fly from the text and corrections, but storing them directly means I avoid recalculating on every view, enable fast filtering ("show me all B2-level entries"), and can create composite indexes for common queries.

The Balance

For a learning app with hundreds of entries per active user, read performance trumps storage efficiency. Disk space is cheap. User wait time isn't. I'd rather use a few extra megabytes of storage than make users wait while I calculate stats on every page load.


Question 6: Explain your two-tier OpenAI evaluation strategy (GPT-4o-mini for fast, GPT-4.1 for detailed). How did you determine the feature boundaries between these tiers?

GPT-4o-mini handles all initial submissions with basic corrections, while GPT-4.1 provides optional deeper analysis. The split minimizes API costs while letting users opt into comprehensive feedback when desired.

The Cost Problem

At $0.075 per 1K tokens (GPT-4.1) versus $0.015 per 1K tokens (GPT-4o-mini), evaluating every submission with the detailed model would cost 5x more. For a free app running on my personal budget, that's completely unsustainable.

Let me put this in perspective: if 100 users each submit 10 entries per day (realistic for active learners), that's 1,000 evaluations daily. With the expensive model, I'd be paying $7-10/day. That's $200-300/month. I don't have that kind of budget.

Two-Tier Strategy

The fast evaluation (GPT-4o-mini) runs on every submission. It identifies critical grammar errors, assigns CEFR level, and returns 3-5 key corrections. I use lower temperature (0.3) for consistency and cap the response length to keep costs down.

The detailed evaluation (GPT-4.1) is user-requested (optional). It provides style improvements beyond grammar, nuanced vocabulary suggestions, cultural context notes, and explanations of why alternatives are better. I use higher temperature (0.7) for more creative suggestions and allow longer responses.

Feature Boundary Decisions

The fast tier handles objective grammar errors (verb conjugation, article agreement, accent placement), CEFR level assignment (users need this for tracking progress), and basic corrections needed to make text grammatically sound.

The detailed tier is exclusive for style enhancements ("while correct, native speakers would say..."), multiple alternative phrasings for the same idea, explanation of subtle connotations, and idiomatic expressions that would sound more natural.

Alternative Considered

I thought about a single-tier system with quotas. Give everyone 10 detailed evaluations per month, then throttle to fast-only.

I rejected it because it creates anxiety. Users would think "should I use my detailed evaluation NOW or save it for later?" That's not a good learning experience. The opt-in model lets users decide without artificial scarcity. If they want deep feedback on a particular entry, they can get it. No quota anxiety.


Question 7: How do you handle edge cases where OpenAI returns corrections without position data or fails to find errors in perfect French text?

The system gracefully degrades to list-based corrections when positions fail, and intentionally provides style feedback even for grammatically perfect text to avoid the "nothing to show" experience.

Edge Case 1: Perfect French Text

User submits: "Bonjour, je m'appelle Marie. J'aime apprendre le français."

This is grammatically flawless A1-level French. OpenAI could return zero corrections, a CEFR level of A1, and a score of 95/100.

The UX problem is that the user sees "95/100 score" but no feedback on HOW to reach 100. It feels empty. Like getting an A on a test with no comments from the teacher.

My solution was to modify the prompt to request style enhancements even for perfect grammar. Now the same text returns at least one style suggestion, like "Adding 'bien' to make it 'J'aime bien apprendre le français' sounds more natural and enthusiastic."

Users always get something to learn from, even when they write perfectly.

Edge Case 2: Position Calculation Fails

When the fuzzy matching algorithm can't find the snippet (for example, OpenAI returns smart quotes but the user's text has straight quotes), all four position-finding strategies fail.

The system stores the correction without position data. The UI checks for missing positions and renders corrections in two zones: inline highlights for corrections with positions, and an "Additional Feedback" list below for corrections without positions.

This way, users never lose correction data due to technical failures. They might not get the fancy inline highlighting, but they still see the correction and can learn from it.

Edge Case 3: Duplicate Corrections

User writes: "Je suis allé au marché. Je suis allé au parc."

OpenAI identifies the same error twice: "allé should be allée if the person is female"

The position tracker's record of used positions ensures each correction highlights a different occurrence of "je suis allé". Without this, both corrections would point to the first occurrence, leaving the second one unhighlighted. That would be confusing.


Question 8: Walk me through your deterministic validation layers before OpenAI API calls. Why did you choose this multi-layered approach over simpler validation?

Four deterministic checks (URL detection, repetition analysis, keyboard patterns, French language percentage) prevent 70%+ of spam submissions from wasting OpenAI credits, at the cost of occasional false positives that I manually tune.

The Cost Motivation

Every OpenAI API call costs money. A spam bot submitting gibberish 100 times could cost me $5-10 in wasted API calls. Multiply that by multiple bots or malicious users, and I'd burn through my budget in days.

Deterministic validation costs $0 to run. It's just code checking patterns.

The Four Layers

Layer 1: URL Detection

This catches spammers trying to post links. The regex looks for http://, https://, www., and common TLDs like .com, .org, .net.

French learners writing about websites is rare enough that false positives are acceptable. "J'ai visité www.example.com" would get blocked, but honestly, how often are A1 learners writing about specific URLs?

This catches about 30% of spam with a roughly 1% false positive rate.

Layer 2: Repetition Analysis

This checks what percentage of the text is repeated words. I reject if it's over 40% repetition.

The algorithm counts unique words versus total words. "bonjour bonjour bonjour bonjour" would be 25% unique (1 unique word out of 4 total), so 75% repetition. Rejected.

This handles spam while allowing legitimate French practice that repeats words occasionally. "Je suis content. Tu es content. Il est content." would pass (different subjects, reasonable repetition).

It catches about 25% of spam with less than 0.5% false positive rate.

Layer 3: Keyboard Pattern Detection

This looks for obvious keyboard mashing. The patterns include qwerty, asdfgh, zxcvbn (keyboard rows), and sequences like aaaa, 1234, abcd.

I can't think of legitimate French containing "qwertyuiop" or "asdfghjkl". This catches about 15% of spam with basically zero false positives.

Layer 4: French Language Detection

This calculates what percentage of words are common French words or contain French accents (é, è, à, ô, ù, etc.).

I have a list of roughly 500 most common French words (le, la, je, tu, est, dans, etc.). The algorithm counts how many words in the submission are in this list or have French accents.

I require at least 30% French words. This catches about 60% of spam (English-only submissions, gibberish, other languages) but has a roughly 2% false positive rate. A1 beginners writing very simple French with lots of cognates might fail this check.

The 30% threshold is tunable. I could lower it to 20% to reduce false positives, but that would let more spam through. I'm actively monitoring this.

Combined Effectiveness

Running these four checks in sequence catches roughly 70% of spam submissions before they hit OpenAI. At an average cost of $0.001 per evaluation, blocking 700 spam attempts saves me $0.70. Multiply that by weeks or months, and it adds up.

The remaining 30% of spam that gets through (sophisticated spam that looks like French) still wastes some money, but it's manageable.

Why Not Just Use OpenAI to Detect Spam?

I could send everything to OpenAI and ask "is this spam or legitimate French?" But that costs the same as a full evaluation. The whole point is to avoid wasting API calls.

Deterministic checks are free and catch most spam. Only sophisticated spam gets through, and at that point, I'm okay paying for the occasional false negative.


Question 9: If this application scaled to 10,000 active users, what rate-limiting and quota management strategies would you implement at the architecture level?

Three-tier rate limiting (IP-based middleware, user-based quotas with Redis, and OpenAI request queuing with priority) would replace the current unlimited access model.

Current State (No Rate Limiting)

Right now, any authenticated user can submit unlimited evaluations. At 3 users (all test accounts created by me, by the way), this is fine. At 10,000 users, I'd need three layers of protection.

Tier 1: IP-Based Rate Limiting (Anti-DDoS)

I'd implement middleware using a rate limiting library with Redis backend. Allow something like 100 requests per minute per IP address.

The purpose is preventing DDoS and bot attacks. A malicious actor spinning up multiple accounts from the same IP would hit this limit quickly.

Tier 2: User-Based Quotas

I'd add quota tracking to the database with fields for user tier (free/premium), daily evaluations used, daily limit, and reset timestamp.

The quota tiers might look like this:

Free tier: 10 evaluations per day (enough for casual daily practice)

Premium ($5/month): 100 evaluations per day (serious learners)

Power tier ($20/month): 500 evaluations per day (teachers, heavy users)

Before processing any evaluation, the API would check: Has this user exceeded their daily quota? If yes, return 429 (Too Many Requests) with a message like "Daily limit reached. Resets at midnight UTC."

Tier 3: OpenAI Request Queue

I'd implement a job queue using Bull (a Redis-based queue library).

Process evaluation requests with a concurrency limit (max 5 concurrent OpenAI requests at a time). Add priority levels so premium users get faster processing. If the queue fills up, show free users "Your evaluation is queued (position 23)" while premium users skip the line.

The purpose is preventing overload on the OpenAI API (staying under their rate limits), giving premium users faster processing as a paid benefit, and graceful degradation under load (queue instead of crash).

Cost-Based Dynamic Throttling

I'd track the current month's OpenAI costs in real-time and implement dynamic limits:

At 80% of monthly budget ($400 out of $500), throttle free tier to 5 evaluations per day instead of 10.

At 95% of monthly budget ($475 out of $500), pause free tier entirely with a message like "Free tier temporarily paused due to high demand. Premium users can continue. Free tier resumes tomorrow."

This ensures I never exceed my budget, even if usage spikes unexpectedly.

Monitoring and Alerts

Set up a daily cron job that analyzes quota usage stats by tier. Send me an email if free tier usage is growing faster than revenue (unsustainable).

Track average API cost per evaluation. If it suddenly spikes (maybe OpenAI raised prices), I get alerted before the monthly bill arrives.

This three-tier approach balances user experience (casual users can still learn for free) with cost control (heavy usage requires payment). Premium users get tangible benefits (higher quotas, priority processing) that justify the subscription cost.


Question 10: Describe your approach to handling concurrent writes during submission. How do you ensure data consistency?

Sequential writes (entry first, then corrections) rather than a database transaction, with error logging and partial success handling. Users see their entry even if correction storage fails.

The Non-Transaction Approach

The submission flow is straightforward:

Insert the writing entry first. If that fails, return an error to the user. If it succeeds, insert corrections as a separate operation. If corrections fail, log the error server-side but don't fail the whole request (entry is already saved, user can still see it).

Why Not a Transaction?

I considered wrapping everything in a database transaction. Begin transaction, insert entry, insert corrections, commit. If anything fails, rollback everything.

I rejected it for a few reasons:

The database client I'm using (Supabase client) doesn't expose a clean transaction API. I'd need to use raw SQL, which adds complexity.

Corrections are somewhat optional. Some perfect text has zero corrections. The entry is still valuable without them (the user has their text and CEFR score).

Failed correction storage shouldn't invalidate the entry. If the corrections table is temporarily locked or has an issue, I don't want to lose the user's writing submission.

Trade-offs Accepted

The inconsistency scenario I'm accepting: Entry saves but corrections fail to insert.

The impact is users see their entry in history but no inline highlights or correction details.

The frequency is less than 0.1% of submissions (correction inserts usually succeed).

The user experience is acceptable because the entry is still valuable. They have their text, CEFR level, and score. They just don't have the detailed corrections.

An alternative would be setting a corrections_status flag on the entry (pending/complete/failed), then retrying correction storage in a background job. I didn't implement this because the added complexity wasn't worth it for a 0.1% edge case. If this becomes a real problem at scale, I can add it later.

Concurrent Submission Handling

If a user double-clicks the Submit button really fast, triggering two simultaneous requests with the same text, what happens?

Client-side protection prevents this using a loading state flag. The submit button is disabled while the first request is processing. The user literally can't click it again.

But what if someone bypasses the UI and sends two API requests simultaneously via curl or Postman?

Both requests would insert separate entries. That's actually fine. They're both valid submissions with different timestamps. If the user really wants to submit the same text twice, they can. It's their quota to use.

Optimistic locking isn't needed because entries are immutable after creation. There are no updates to race on.


Question 11: Explain the security implications of using the dangerous email account linking flag in your NextAuth configuration. What safeguards did you implement?

The dangerous email account linking flag lets users sign in with Google OAuth even if they already have an email/password account with the same email, but requires email normalization and verified OAuth emails to prevent account hijacking.

What This Flag Does

Without this flag: If a user creates an account with email/password, then later tries to sign in with Google using the same email, NextAuth blocks it with "Account already exists with this email."

With this flag: The user can sign in with Google. NextAuth links the Google account to the existing user record. Now the user can sign in with either method (email/password OR Google). Convenient.

Why "Dangerous"?

Attack scenario without safeguards:

Attacker knows victim uses email victim@example.com. Attacker creates email/password account with that email (no verification required). Victim later tries to sign in with Google OAuth. If linking is allowed, the Google account links to the attacker's account. Attacker now has access to victim's OAuth profile and any data they create.

That's bad.

Safeguard 1: Email Normalization

I normalize all emails to lowercase and trim whitespace before any queries. Victim@Example.com and victim@example.com are treated as the same email.

This prevents an attacker from creating VICTIM@EXAMPLE.COM to bypass the duplicate check.

Safeguard 2: OAuth Emails Are Verified

Google OAuth provides verified email data. When a user signs in with Google, I know Google has already verified this email belongs to them.

In the sign-in callback, when linking accounts, I'm relying on Google's verification. This is safe because the attacker can't create a Google account with victim@example.com unless they control that email. If they control the email, they could reset the password anyway (not an OAuth-specific vulnerability).

Safeguard 3: Require Email Verification for Credentials (Currently Missing)

This is the acknowledged limitation. When users create email/password accounts, I don't verify the email yet. Users can sign up with victim@example.com without proving they own it.

The risk is an attacker could create an unverified account for the victim's email, then the victim links OAuth later, giving the attacker access.

The planned mitigation: Add email verification after registration (send confirmation email with a token). Only allow OAuth linking if the email is already verified OR the user provides a password (proving they created the account).

Alternative: Disable Account Linking

I could set the flag to false, which forces users to use the same method they signed up with.

The trade-off is worse UX. Users forget which method they used ("did I sign up with email or Google?"). They create duplicate accounts accidentally. Support burden increases.

For my app, account linking is valuable because many users start with email/password (quick sign-up), then want Google convenience later. The risk is acceptable given email normalization and OAuth verification. Once I add email verification for credentials, it'll be fully secure.


Question 12: Walk me through how you validate and sanitize French text submissions at multiple layers. Why is client-side validation insufficient?

Client-side validation provides UX, deterministic validation saves API costs, server-side re-validation prevents malicious bypass, and database constraints are the final safety net. Each layer serves a distinct purpose.

Layer 1: Client-Side Validation (UX)

I use Zod with react-hook-form that provides instant feedback.

Users see "Please write something" before clicking Submit. Submit button stays disabled until valid. Word count updates live as they type. No network round-trip needed.

Why insufficient: JavaScript in the browser can be bypassed. A malicious user can open DevTools, modify the JavaScript to skip validation, and send requests directly to the API. Or they could use curl/Postman to hit the API endpoint. Client-side validation is purely for UX, never for security.

Layer 2: Deterministic Text Validation (Cost)

Before sending to OpenAI, I run French-specific checks. Look for URLs, check for repetition spam, detect keyboard mashing, calculate French language percentage.

Purpose is saving OpenAI API costs. A spam submission costs money per API call. Thousands of spam attempts would add up fast. Deterministic checks are free and block most spam.

Why insufficient: False positives. An A1 learner writing very simple French like "Je suis Marie. Marie est content." might fail the 30% French threshold because "Marie" and "content" aren't in the common words list. Creative writing with intentional repetition ("Je cherche, tu cherches, il cherche...") might be flagged as spam.

Layer 3: Server-Side Re-Validation (Security)

The API route re-validates everything. Run Zod schema validation again, re-run deterministic validation, check word count against user's personal settings.

Purpose is preventing malicious bypass. Even if the client is compromised (XSS attack, malicious browser extension, user manually editing JavaScript), the server still validates. Ensures ALL submissions meet requirements, regardless of client state.

Why insufficient: Can't prevent all logical errors. A user could submit valid French that's plagiarized from another source. The server validates structure and format but doesn't verify originality. But honestly, that shouldn't be a big deal because the whole point of this app is to help users learn French by writing their own thoughts. If they plagiarize, they're only cheating themselves.

Layer 4: Database Constraints (Final Safety)

Database schema includes constraints: text NOT NULL, word_count > 0, cefr_level IN ('A1', 'A2', 'B1', 'B2', 'C1', 'C2').

Purpose is the last line of defense. Even if application code has bugs, the database rejects invalid data. Protects against accidental bugs in the application code that might try to insert empty text or invalid CEFR levels.

What it catches: Empty text (NOT NULL violation), negative word counts (CHECK constraint), invalid CEFR levels (ENUM constraint).

Sanitization vs. Validation

Validation rejects invalid input ("this doesn't meet requirements"). Sanitization transforms input to be safe ("remove dangerous parts").

For French text, I do minimal sanitization. Just trim whitespace from the beginning and end. I don't want to alter the user's French because even typos are learning opportunities. I don't strip accents (that would corrupt French). I don't remove punctuation (grammatically significant in French).

Exception: HTML/XSS prevention. When displaying user text in the UI, I use React's built-in escaping to prevent XSS. If a user somehow includes <script>alert('xss')</script> in their French text (why would they?), React renders it as plain text, not executable code.

The multi-layer approach ensures security (can't bypass server validation), cost control (deterministic checks block spam), UX (instant feedback), and data integrity (database constraints catch bugs).


Question 13: Explain your approach to type safety across the OpenAI integration. How do you handle the mismatch between AI response formats and TypeScript interfaces?

Zod schemas validate and transform OpenAI's JSON responses into strongly-typed TypeScript interfaces, with fallback handling for format variations and manual transformations for field naming differences.

The Type Safety Challenge

The problem: OpenAI returns untyped JSON. It's just a string that could be anything. The goal: Type-safe TypeScript code where the evaluation result has strictly typed CEFR levels, scores, and corrections.

If I just did const result = JSON.parse(response), TypeScript would type it as any. No safety. Bugs waiting to happen.

Layer 1: TypeScript Interfaces

I define TypeScript types that represent what I expect:

type CefrLevel = 'A1' | 'A2' | 'B1' | 'B2' | 'C1' | 'C2'

interface Correction {
  original: string
  correction: string
  explanation: string
  grammarRule: string
  position?: { start: number; end: number }
}

interface EvaluationResult {
  level: CefrLevel
  score: number
  corrections: Correction[]
  emotion: string
}

Layer 2: Runtime Validation with Zod

I use Zod to define schemas that match the TypeScript interfaces:

const CefrLevelSchema = z.enum(['A1', 'A2', 'B1', 'B2', 'C1', 'C2'])

const CorrectionSchema = z.object({
  original: z.string(),
  correction: z.string(),
  explanation: z.string(),
  grammarRule: z.string(),
  position: z.object({
    start: z.number(),
    end: z.number()
  }).optional()
})

const OpenAIResponseSchema = z.object({
  level: CefrLevelSchema,
  score: z.number().min(60).max(100),
  corrections: z.array(CorrectionSchema),
  emotion: z.string()
})

Layer 3: Parse and Validate

The evaluation function:

  1. Calls OpenAI API
  2. Parses the JSON response
  3. Validates it against the schema using safeParse (doesn't throw, returns success/error object)
  4. If validation fails, logs the error with the full response for debugging, then throws a descriptive error
  5. If validation succeeds, the data is now type-safe

After validation, TypeScript knows the exact structure. No more any types. Autocomplete works. Type errors get caught at compile time.

I also transform to include calculated positions. For each correction, I run the position-finding algorithm to locate it in the original text.

Handling Format Variations

Sometimes OpenAI returns slightly different field names. Maybe it says "corrected" instead of "correction".

The solution is Zod transforms that normalize variations:

const CorrectionSchema = z.object({
  original: z.string(),
  correction: z.string().or(z.object({ corrected: z.string() }).transform(obj => obj.corrected)),
  // ... rest of schema
})

This accepts both field names but always outputs "correction".

Database Type Mapping

The problem is database fields use snake_case (cefr_level, grammar_rule) but TypeScript uses camelCase (cefrLevel, grammarRule).

The solution is explicit mapping functions:

function toDatabase(result: EvaluationResult) {
  return {
    cefr_level: result.level,
    score: result.score,
    // ... map all fields
  }
}

function fromDatabase(row: DbRow): EvaluationResult {
  return {
    level: row.cefr_level,
    score: row.score,
    // ... map all fields back
  }
}

When inserting to database, call toDatabase. When reading from database, call fromDatabase.

This multi-layer approach ensures runtime safety (Zod catches invalid OpenAI responses before they cause bugs), compile-time safety (TypeScript prevents type errors in code), and data integrity (database schema matches expected types, with explicit mapping between naming conventions).


Question 14: How did you structure the multi-layer data flow architecture for text submission, and what was the reasoning behind each layer?

Each layer serves a distinct purpose: UX feedback (client validation), cost savings (deterministic validation), security (server re-validation plus auth), AI intelligence (OpenAI processing), data integrity (database constraints), and error recovery (sequential operation handling).

The Layered Defense Philosophy

Core principle is defense in depth. Each layer catches a different class of errors. If one layer fails or gets bypassed, the next layer still protects you.

Layer 1: Client-Side Validation (Instant Feedback)

Form validation with react-hook-form and Zod schema resolver. Validates while the user types.

Purpose: User experience. Immediate feedback without waiting for a server response.

What it catches: Empty text, text below minimum word count, missing emotion selection.

Why this layer: Users expect instant feedback. "Please write at least 20 words" should appear immediately, not after clicking Submit and waiting for the server to respond.

Example: Submit button stays disabled until text meets minimum length. User sees live word count as they type.

Limitation: Can be completely bypassed. Open DevTools, modify JavaScript, send API request directly. This layer is purely for UX, never security.

Layer 2: Deterministic Validation (Cost Control)

Text validation runs in the browser before making the API call. If invalid, show error and stop. Don't even send the request.

Purpose: Preventing wasted OpenAI credits. Save money by blocking spam before it hits the paid API.

What it catches: Spam (URLs, keyboard mashing, repetition), non-French text, gibberish.

Why this layer: Costs $0 to run these checks. Saves money per blocked submission.

Example: User types "qwertyuiop asdfghjkl". Gets rejected with "Please write in French" before any API call is made.

ROI: Blocks roughly 70% of invalid submissions. At $0.001 per OpenAI call, this saves $0.70 per 1,000 attempts. Adds up over time.

Layer 3: Authentication (Identity Verification)

NextAuth getServerSession on the API route. Verify user exists. Extract user ID.

Purpose: Verify identity and prevent anonymous abuse.

What it catches: Unauthenticated requests, expired sessions, deleted users.

Why this layer: API routes are public URLs. Anyone could call them if they knew the endpoint. Authentication ensures only logged-in users can submit.

Example: Malicious script tries to send 1,000 spam submissions. All rejected immediately (no valid session).

Security benefit: Prevents API abuse, ties entries to user accounts for accountability, enables per-user quotas.

Layer 4: Server-Side Validation (Trust Nothing)

Re-validate with Zod schema even though client validated. Re-run deterministic validation. Check word count against user's personal settings (stored in database).

Purpose: Security. Don't trust the client. Ever.

What it catches: Modified requests (bypassed client validation), malformed data, attacks.

Why this layer: Clients can be compromised. XSS attacks, malicious browser extensions, manual API requests with curl. The server must validate everything independently.

Example: Hacker modifies the API request to include 10,000-word text (way over limit). Server validation rejects it because it exceeds the user's personal maximum.

Defense against: CSRF, XSS-injected requests, browser DevTools tampering, malicious users.

Layer 5: AI Processing (Intelligence)

Call OpenAI evaluation API with the text and tier (fast or detailed).

Purpose: Extract linguistic insights that code can't determine.

What it provides: Grammar corrections, CEFR level assessment, style suggestions.

Why this layer: Deterministic rules can't evaluate French proficiency. "Je suis aller" is grammatically structured correctly, but a native speaker knows it should be "allé" (gender agreement). AI catches nuances that code cannot.

Cost: Justified by the value provided. This is the core feature of the app.

Layer 6: Database Operations (Persistence)

Insert entry first. If that succeeds, insert corrections separately.

Purpose: Data integrity and storage.

What it enforces: Database schema constraints, foreign keys, data types, NOT NULL requirements.

Why this layer: Final safety net. If application code has a bug (say, I accidentally let an invalid CEFR level through), the database rejects it.

Example: Code tries to insert CEFR level "D1" (not valid). Database constraint rejects it because CEFR must be one of A1, A2, B1, B2, C1, C2.

Database constraints:

  • CEFR level must be valid enum value
  • Word count must be positive integer
  • Text cannot be empty (NOT NULL)
  • User ID must exist (foreign key)

Why Not Fewer Layers?

Could we merge deterministic validation on client and server?

No. Client-side saves network round-trip (better UX). Server-side provides security (can't be bypassed). Both are needed.

Could we skip client validation?

Yes, but UX would suffer. Users would have to click Submit, wait for the server to respond, then see "Please write at least 20 words." That's frustrating. Client validation gives instant feedback.

Could we skip database constraints?

Dangerous. If application code has a bug (I forget to validate something), corrupt data enters the database. Constraints are the last line of defense against bugs in my code.

Error Handling Strategy

Each layer returns different error types:

  • Form errors (client): Display inline next to the field
  • Validation errors (deterministic): Show toast notification
  • Server errors: Return HTTP status codes (400, 401, 500)
  • Database errors: Log server-side, show generic error to user

This separation allows debugging. "Which layer caught this error?" tells me where the problem originated.


Question 15: Walk me through your decision to use client-side state with localStorage persistence for theme management instead of server-side theme storage. What are the trade-offs?

Client-side theme storage provides instant switching without network latency and works offline, but trades off cross-device synchronization and requires hydration handling for SSR compatibility.

The Decision Context

Requirements: 16 emotion-based themes, instant switching without lag, persistence across sessions.

Two options: Server-side (store theme preference in database) or client-side (localStorage with state management).

I chose client-side. Here's why.

Pro 1: Instant Theme Switching

With client-side, when you change the theme, it updates immediately. Zero latency. Click "Anxious" and boom, purple theme appears.

With server-side, you'd need to make an API request (POST to /api/user/theme), wait for the response (200-500ms network round-trip), then either re-fetch the theme or wait for the session to update.

For something as interactive as theme switching, that latency feels sluggish. Users would notice the delay.

Pro 2: Works Offline

Client-side themes persist in localStorage. They work without internet connection.

Server-side requires an API call. Fails if you're offline.

Scenario: User on a train with spotty Wi-Fi. Client-side theme switches still work perfectly. Server-side shows "Failed to update theme" errors every time they try.

Pro 3: Fewer Server Resources

Client-side requires zero database queries for theme reads or writes.

Server-side would need a database query on every page load (SELECT theme FROM user_preferences) plus a write on every theme change (UPDATE user_preferences SET theme = 'anxious').

At 1,000 active users changing themes 10 times per day, client-side is 0 queries. Server-side is 10,000 update queries plus at least 10,000 read queries (assuming users visit 10 pages per day).

Cost savings are minor ($2-5/month on hosting) but add up. More importantly, it reduces database load, leaving more capacity for actual data storage.

Pro 4: Simpler Architecture

Client-side is one Zustand store with persistence middleware. One localStorage key. About 30 lines of code total.

Server-side needs a database table (or column), API routes (GET /api/user/theme and PUT /api/user/theme), session synchronization (updating the NextAuth session when theme changes). About 150 lines of code.

Simpler is better. Less code means fewer bugs.

Trade-off 1: No Cross-Device Sync

Client-side limitation: You set "Joyful" theme on desktop. Open the app on mobile. It shows "Anxious" (from your last mobile visit).

Server-side benefit: Theme syncs across all devices. Desktop and mobile always match.

My reasoning: Theme is a personal preference for that specific device. Users might prefer light theme on mobile (using outdoors in sunlight) but dark theme on desktop (late night coding). It's like how VS Code remembers theme per machine, not synced via account.

If users request cross-device sync later, I could add optional server sync. Fetch the server theme on login, let the user choose which to keep (local or server).

Trade-off 2: SSR Hydration Complexity

The problem: Server renders with default theme (no access to localStorage). Client hydrates with actual theme from localStorage. React throws hydration mismatch warnings.

The solution: Two-pass rendering with a mounted flag.

First render (server plus initial client): Use default theme. Second render (client only): Use localStorage theme.

Trade-off: Brief flash of default theme on initial load (100-200ms). Users barely notice it, but it's technically there.

Server-side benefit: No hydration issues. Theme comes from session data (available on server). Same theme on server and client. No flash.

Trade-off 3: Not Accessible Server-Side

Client-side limitation: Server Components can't read localStorage. If I wanted to render something on the server based on the user's theme, I can't.

Server-side benefit: Could render the page with the correct theme colors on the server. No flash at all.

Mitigation: Theme is purely visual. It doesn't affect data or functionality. Server renders default theme, client hydrates with correct theme. No functional impact, just a brief visual flash that most users won't even notice.

When Server-Side Would Be Better

If the app had any of these requirements, server-side would be justified:

  • User-facing theme URLs: Shareable links with theme parameter (example.com?theme=anxious)
  • Admin dashboard: Analyzing theme popularity ("60% of users prefer dark themes")
  • Marketing needs: Sending emails with the user's preferred theme colors

For my app, theme is a personal UI preference, nothing more. Client-side wins.

Hybrid Approach (Future Consideration)

I could combine both approaches:

Use localStorage as primary (fast, offline). Sync to database as backup (cross-device). When theme changes, update localStorage immediately (instant), then sync to server in the background (every 5 minutes or on page unload).

On login, check if server theme differs from local theme. Ask user: "We found a different theme on your other device. Keep [Joyful] or switch to [Anxious]?"

Best of both worlds: instant switching plus cross-device sync when users want it.

For now, client-side is simpler and meets all current requirements. If users request cross-device sync, I'll add the hybrid approach.


Final Thoughts

That wraps up all 15 senior-level questions! Building this application has honestly been a journey of balancing cost, performance, user experience, and code maintainability. Every decision involved trade-offs, and I hope these insights help other developers facing similar architectural decisions in their own projects.

The French Writing Playground started as a personal project to solve my own problem (practicing French writing), but architecting it to handle real users taught me so much about system design, cost optimization, and building production-ready applications.

If you have more questions or want to discuss any of these topics in more detail, feel free to reach out!

Try out the French Writing Playground V2.0 and see all these architectural decisions in action.

As always, thanks for reading!

Share this article

Found this helpful? Share it with others who might benefit.