Building an LLM-based developer
A look at the system that now designs, codes, and refines
Here’s what happens when you actually try to build software with LLMs instead of just talking about it.
Last year, I introduced “prompt-to-PR” at Deriv—essentially vibe coding, that fast-and-loose approach everyone pretends works. When we decided to build a new trading platform, we naturally started there. Three weeks later, we had a working POC with authentication, market data, and pricing.
Then our CEO asked the killer question: “What exactly did the LLM develop?”
That’s the trap of vibe coding. You end up with a massive codebase and no clue what’s inside. Our traditional documentation was worthless against the generated code. That moment of reckoning led me to Spec-to-PR. The idea was to define the project module by module with full specifications, and let the model implement them.
The challenges were real. The model needed to read specs like a human, judge changes, and—crucially—avoid regenerating code from scratch every time. Sonnet 3.7 turned out to be the answer.1
We demoed the next POC to our CEO. He wisely asked: “Where’s the product specification?”
None of our documents—architecture, service specs, module specs—was even remotely an easily readable one-pager. He called us a few names, then sat down and wrote a one-pager entitled Trading Platform Product Description and told us to convert this into a full trading platform—back-end, front-end, everything—only using LLMs.
Interesting moment. I looked for an open window. Our building is a modern office with plenty of glass but no actual windows.2 The only option was to find a way to get a short “product idea” and output a fully developed solution.
The challenge: No human intervention
We started mid-May with a strict rule: no human intervention. No vibe coding. Every step generated by LLMs must be documented. We attempted to follow early software development methodologies—a Waterfall approach starting from a product idea, through domain modelling and API definitions, down to code.
Our goal was to convert a short product description into a fully fledged trading platform. That was a mistake. Aiming for 700k+ lines of code from 100 lines of instruction was impossible; the scope was simply too large. We spent months testing, but could only implement narrow services before the process broke down.
By June, we realised we needed two major breakthroughs to make this work.
Defining achievement levels
A trading platform is too large. We needed intermediate goals. So I defined levels of achievement for our LLM-based developer.
L1: A simple project. Single service, no authentication, no user management; any solution that needs full compliance with the specifications and gives you a meaningful, correct result.
L2: An LLM-based developer that can develop three to five services, coordinate between them—designing different services with the LLM and asking it to decide about the calls between them all—and design them “in one mind, one eye”. It’s a complex task. We expected L2 to handle user management, login, sign-up, some kind of admin, and be good enough that security doesn’t find too many mistakes. Just imagine a good mid-level developer.
L3: They should be able to design even more complex platforms with seven or eight services, real-time communication like WebSocket, gRPC, or others, and services that are more or less scalable, acceptably good.
L4: Our final platform: a trading platform that suddenly explodes in complexity, including risk management and many external limitations that the platform needs to handle; a full-fledged solution.
2. Merging vibe-coding with waterfall
The second breakthrough was a hybrid process. We needed the agility of vibe-coding (quick fixes, fast iteration) combined with the reliability of Waterfall (traceability, systematic progression).
We redesigned the process to include:
An orchestrator to route requests.
Complexity assessment to adapt to project size.
Structured phases with dedicated workers and verification steps.
A modification router that intelligently classifies changes—bugs go straight to code, while features cascade through the specs.
We validated this with an L1 project (a PDF-to-Markdown converter), which worked perfectly on the first try. We then expanded to L2 (a secret sharing solution with encryption), where security found almost no issues. We are currently working on L3.
Our LLM-driven software development process
We have now open-sourced this process at https://github.com/deriv-com/specai-process. Let me walk you through how it all works.
The orchestrator (start.md)
Everything begins with start.md, our LLM-based orchestrator that manages the lifecycle. It handles three scenarios:
New projects: Follows the waterfall phases step by step
Interrupted work: Checks the current state and resumes execution
Completed projects: Accepts modifications and routes them to the correct phase
It never executes directly; it always delegates to specialised workers.
It also performs a complexity assessment, scoring the project (0-10) to select the right templates—from minimal scripts to full microservices architectures.
The pipeline structure
To make the process manageable, every phase (Requirements, Architecture, API, etc.) follows a consistent file structure:
Phase structure:
├── prompt.md # Main orchestration logic
├── guideline.md # Standards and best practices
├── verify.md # Verifies requirements are followed (no more, no less)
└── {worker}.md # Phase-specific implementation files
Each phase operates in three modes: New (fresh creation), Update (gap analysis only), and Review (modifying existing decisions).
Requirement traceability
Our process generates unique IDs for every artefact, creating a traceable path from requirements to implementation:
ID format examples:
REQ-01-001 → Requirement: User Authentication
ENT-02-003 → Entity: UserAccount
US-03-005 → User Story: Admin manages users
SVC-04-002 → Service: Authentication Service
API-05-007 → API Endpoint: POST /auth/login
TC-06-009 → Test Case: Login validation
Each ID flows through the system:
REQ-01-001 (User Authentication)
↓
ENT-02-003 (UserAccount entity in Domain)
↓
US-03-005 (User login story)
↓
SVC-04-002 (Auth service in Architecture)
↓
API-05-007 (Login endpoint in API spec)
↓
Implementation with all references preserved
This enables:
Impact analysis: Change REQ-01-001? Know exactly what’s affected.
Coverage verification: Every requirement has an implementation.
Debugging: Trace issues back to requirements.
Documentation: Self-documenting system.
Preference management: The memory system
The system needs to remember user decisions without repeatedly asking the same questions. preferences.md files record user decisions, directives, and clarifications throughout the development process.
The system uses a two-tier hierarchy:
Workspace preferences (
workspace/preferences.md) - Default settings that persist across all projectsPhase preferences (
workspace/output/{phase}/preferences.md) - Project-specific decisions for each phase
Both work the same way—they store the current state of decisions. The only difference is scope: workspace preferences apply globally across all projects, whilst phase preferences are specific to the current project and phase.
workspace/
├── preferences.md # Global defaults (persistent)
└── output/
├── requirements/
│ └── preferences.md # Requirements phase decisions
├── domain/
│ └── preferences.md # Domain modelling decisions
├── api/
│ └── preferences.md # API design decisions
└── ... (one for each phase)
Example preference entry:
### Technology Stack
- **Primary Language**: Go
- **Database**: PostgreSQL
- **API Style**: REST
Key principles:
Current state only: Records what the system should be NOW, not change history
Override hierarchy: Phase preferences always win over workspace preferences
Resolution order: Phase preferences → Workspace preferences → Ask user
This hierarchy eliminates repetitive questions and maintains consistent state tracking across the development process.
Modification management
The prompts/modify/ directory is where the hybrid approach shines. When you request a change after development, the system classifies it:
Bug fixes go straight to the development code.
New features cascade from requirements through the entire pipeline.
Architecture changes update from the architecture phase down.
This maintains consistency while allowing flexibility—you get vibe-coding speed for bugs, but proper cascading updates for features.
Does history shine a light on the future?
The turning point came when my “process” outgrew me. The prompts exploded from 3k lines to over 13k lines. I realised I wasn’t just writing apps anymore; I was building tools to manage the prompts that build the apps.
This journey awakened three obsessions:
1. Tools to write roles: I need tools to author roles efficiently. Managing thousands of lines of definitions for an L2 Software Engineer or a Domain Architect requires meta-tooling, similar to an IDE for code.
2. The roles themselves: Each role is a universe of optimisation. How do you craft a Backend Engineer role that balances cost, speed, and accuracy? Teaching an L2 engineer to coordinate services requires hundreds of lines of careful instruction.
3. Virtual offices: We are moving toward multi-role teams—virtual offices where an Architect hands off to Engineers, who sync with QA. They share context and review each other’s output.
The future
We are entering a role definition era. Just as computing evolved from opcodes to compilers, LLM systems are evolving from prompts to roles. I don’t debug code anymore; I debug roles. I don’t optimise algorithms; I optimise agent interactions.
Will developers continue to write code? Unlikely. Will they compose offices of roles that collaborate to build things we can’t imagine? That window is wide open.
The process is open source at github.com/deriv-com/specai-process.
Kaveh Mousavi Zamani is a Vice President of Engineering at Deriv.
Follow our official LinkedIn page for company updates and upcoming events.
Claude Sonnet 3.7 was the model that finally had the capabilities we needed. The timing mattered—earlier models weren’t quite there.
Modern office design is terrible. All glass, no escape routes.









