Back to Blog
8 min
technical

777-1: Seven Projects, Seven Subagents, Seven Case Studies, One Goal

129 code reviews, 7 recurring issues, 7 custom subagents. Here's my ambitious plan to build an algorithm that predicts prompt failures before they happen.

777-1AIPrompt EngineeringSubagentsCase StudiesBuilding in PublicOutlierPortfolio Development

777-1: Seven Projects, Seven Subagents, Seven Case Studies, One Goal

Published: November 23, 2025 • 8 min read

I've been cooking these past few days. Not just yellow potatoes, but ideas.

You see, I have a confession to make.

The Project I Put on Pause

In this blog post, I mentioned that I was going to work on something I nicknamed "Project of Projects." This project is the AI Prompt Engineering Toolkit, and I even highlighted a plan for it. This project is supposed to be the project that demonstrates the intuition I have built over my lifetime of working extensively with AI.

However, since the 23rd of October, this project has been sitting at 30% completion.

And the reason is that for a long time, I haven't been exactly clear on how exactly I want this application to work or come together. So far, I have made good use of the application, even while at 30% completion, when I performed this case study to see how well Claude Code could redesign the application, changing its design trend from Neobrutalism to Glassmorphism. However, that is not where I want the application to end.

30 Days of Clarity

Over the past 30 days (wow, it's actually quite shocking to me that I'm writing this exactly 30 days since I put that project on pause), I've had this app sitting in the back burner of my brain. Given the work I have done, I think I finally have some clarity on how I am going to use this application to demonstrate the intuition I have built while working with AI.

Here is my plan.

The Foundation: 129 Code Reviews

When I worked at Outlier reviewing applications built by other attempters with the goal of generating golden standard applications, I stored every review that I sent. The reason is that, as I described in the review workflow, the same task could come to me multiple times to be reviewed, and keeping track of the reviews I sent previously for a specific task allowed me to confirm that the attempter fixed all the issues highlighted in my reviews.

Here are 3 adapted examples that capture the types of feedback I typically gave:

Review Example 1

The visual design here is really solid so nice work on that! Unfortunately, I need to flag several functionality gaps. A key requirement is that all interactive elements need to actually work, not just look clickable. Right now, the notification bell doesn't do anything, and the filter dropdowns are purely decorative. This type of application really needs user accounts, so please add sign-in and registration flows. I also noticed the sidebar collapses when you resize the window, which is great, but the collapsed version doesn't show any icons. It's just a blank strip. The "Add to Favorites" heart icon changes color on click but doesn't persist anywhere. Where do favorites go? Consider adding a favorites section or at minimum, keep them saved in local storage. For any features you can't fully implement, add a notification letting users know it's coming soon rather than leaving buttons that do nothing.

Review Example 2

This is a decent starting point but there are a number of things that need attention. The app clearly requires users to have their own accounts, so please add login and signup functionality. In the profile settings, I can change my username but the avatar still displays my old initials, which makes the update feel incomplete. The footer has some layout issues on wider screens where there's noticeable empty space on both sides that looks unintentional. You want the content to fill the footer area without leaving gaps. Between around 800px and 1200px, the navigation bar doesn't adapt well and the menu items start to crowd each other. The footer actually handles this range better by stacking its columns, but then one column ends up much wider than the others which looks unbalanced. When I click "Start Session" and then pause it, I would expect the button text to change to "Resume" and the original Start button to become disabled. Right now nothing changes visually which makes it confusing. The "View All" link in the recommendations section doesn't navigate anywhere. If the destination page isn't built yet, consider showing a toast message that says the feature is coming soon. The search bar accepts input but pressing enter or clicking the search icon does nothing with it. Please address these issues and keep an eye out for any similar ones as you work through the fixes.

Review Example 3

Good start to this task! I found a small code issue though. There's a template literal where you're trying to interpolate JSX directly, which won't render correctly. Check the section around the timer display; the markup needs to be restructured outside the string. On the UX side, I'm a bit confused by the "Save" and "Submit" buttons at the bottom of the form. What's the difference? Does Save keep it as a draft? Does Submit publish it? The icons don't clarify this either. If Save creates a draft, maybe label it "Save Draft" and add a visual indicator showing items are in draft state. The current setup made me hesitate before clicking because I wasn't sure what would happen. These are quick fixes so good luck!

The Data

Now here is the thing. Sitting in my computer right now are exactly 129 reviews similar to the above.

Now you may be wondering, but your website shows 65+ applications built. Well, that is a rough estimate of the number of applications I built, either from scratch or when I completed review tasks that required modifying the work of a previous attempter. The + in 65+ was to signify that there were more applications that were simply reviewed.

Check out this blog post for more details on what that workflow looked like.

I have always wondered what to do with all these reviews sitting on my computer. Well, after learning about subagents, a light bulb clicked in my head.

The Extraction: 7 Recurring Issues, 7 Custom Subagents

So two days ago, I had Claude read every single review I had on my computer and come up with a comprehensive document detailing the top 7 issues I pointed out consistently across reviews.

Then using those issues, I manually built 7 subagents in Claude. Now when I say manually, I mean that I did not use the /agents functionality in Claude Code to define these agents but rather created the markdown files with each agent's specifications myself.

The Plan: 7 Projects, 7 Subagents, 7 Case Studies

Now here is my plan.

I am going to build 7 projects, inspired by a range of projects that I worked on at Outlier. I'll use the general-purpose subagent to build these projects and then apply each subagent I have created to the work done by the general-purpose subagent.

I will document extensively the entire process and even create multiple GitHub branches to track changes made by each subagent. I will share my documentation for each project in a case study.

So you see now why this is titled "7 projects, 7 subagents, 7 case studies, One Goal."

But what is that One Goal?

The Goal: An Algorithm for Predicting Prompt Failures

The Goal is to define an algorithm with a scoring mechanism for writing good prompts. This algorithm will be used to create a key part of the Prompt Engineering Toolkit: the playground.

If you look at the version of the Prompt Engineering Toolkit used in the meta-prompting case study, the algorithm in the playground was defined by Claude itself. Now, I want to create mine.

You see, I want people to be able to provide and/or construct their prompts in my Prompt Engineering Toolkit's playground, and I want to be able to predict what issues might happen if the user goes ahead to execute that prompt.

These predictions will be backed by these 7 projects and their case studies documentation. The goal is to show that I've built intuition for working with LLMs.

The Prompt Engineering Toolkit will look at your prompt and predict multiple areas where the model might fail in execution, reference one or more case studies that show this, and then provide testing suggestions to confirm that they don't fail as well as prompt improvement strategies.

The Project That'll Never End

Now I have to acknowledge that I recognize there are real limitations of the work I am about to do.

For starters, I am only using 7 projects. That's never going to be enough to define an algorithm that'll be so comprehensive.

Also, all the applications will be built with Next.js and TypeScript, so how would this be useful to other developers working with a different tech stack?

These and many more questions are probably running through your head as they are running through mine, and that is why this project, the Prompt Engineering Toolkit, now gets a second nickname: "The Project That'll Never End".

I will continue to iterate and add more and more projects to the methodologies I implement, but to avoid making this blog post any longer, I'll save the details about that for the future.

For now, just know that every blog post I write about this project will have "777-1" in its title and as a tag so that you can easily search for them.

The Emotions

I'm scared and excited for the outcomes of this task. Ever felt those 2 emotions at the same time before?

Anyways, as always, thanks for reading!

Share this article

Found this helpful? Share it with others who might benefit.

Enjoyed this post?

Get notified when I publish new blog posts, case studies, and project updates. No spam, just quality content about AI-assisted development and building in public.

No spam. Unsubscribe anytime. I publish 1-2 posts per day.