How we replaced a 20-year-old back office — twice
From legacy to low-code to an AI-powered system
The starting point
Those who have been working on Deriv’s internal tooling long enough remember when our back-office system was a known quantity — maybe not great, but understood. By 2024, that understanding had curdled into something else: a 20-year-old system that ops teams quietly worked around rather than with.
The requirements were clear. Our ops teams needed one place to handle payments, review customers, manage compliance workflows, investigate issues, and maintain a full audit trail of every decision. The legacy system couldn’t do this cleanly. It was ageing, rigid, and increasingly painful to maintain.
The question was never whether to replace it. The question was how to do it without taking years.
Why we chose low-code
We evaluated several low-code platforms in late 2024, and development kicked off in January 2025. The appeal was real. From idea to working screen in days. Built-in tables, forms, and validation out of the box. No CI/CD pipeline to configure before you could ship. Predictable UI patterns that ops folks could navigate without hand-holding.
Most importantly, we needed something we could ship in weeks, not quarters. The low-code platform let us stand up real screens while we were still figuring out the workflows, which, at that stage, was exactly what we needed.
By the end of September 2025 — nine months after development started — we’d delivered a working back office to the ops team. A genuine milestone. It was live. It was in daily use.
Where low-code broke down
October looked good. Ops teams were fully on it. More modules, more users, more edge cases. The platform had become mission-critical.
November did not look good.
The team of five was burned out. The outcome didn’t match the effort we’d put in. Here’s an honest accounting of what was wrong:
The collaboration story was the most painful part. Multiple developers working on the same app caused constant conflicts. There was no real version control, no meaningful code review — just metadata exports and hope. The UX was rigid in ways our users felt every day. App size limits became a hard ceiling. API timeouts were hitting 10.5 seconds against a 10-second limit. Ghost files. Deployment conflicts. Performance timeouts. War-room mode became the norm.
In the same month, security risks emerged: a critical vulnerability in the form fields, plus auth bypass concerns. That changed something in the team, and hesitations were expressed:
“We’re spending more energy defending the platform than improving the product.”
“We can’t keep raising timeouts and calling it stability. We need a foundation we control.”
“The low-code tool is simply not working for us. Not for the scale we need, not for the collaboration our team requires, and not for the user experience our ops teams deserve.”
Finally, we stopped asking how to work around this and started asking whether we should be on this platform at all.
The night everything changed
Leadership gave the green light to explore a different path and made sure the right tools were in place: AI-assisted development platforms, access, and support. One evening in November, at home after dinner, when the kids were sleeping, I started building a new back office from scratch. The conviction was collective; the first keyboard strokes just happened to be mine.
Day 2: I showed a working draft to two VPs. They were astonished. A functional back office, in two days, that had taken months on the low-code platform.
Day 3: First formal demo to the Head of Engineering. The reaction confirmed we were onto something real.
Day 5: Presented to the CEO and management team. Everyone was convinced — not just by the speed, but by the user experience. Clean interfaces. Sub-second load times. Smooth interactions. Responsiveness that the low-code platform couldn’t touch.
But the honest question came up immediately: “It’s AI-generated; how do we trust it?”
That question became our north star for everything that followed.
The AI engineering system we built
Moving to code-first didn’t mean abandoning speed. The key insight is that AI-assisted development at this scale requires structure — not to slow things down, but to make speed sustainable and auditable.
The system rests on three pillars:
Blueprints are machine-readable docs that the AI loads before every task.
Workflows formalise the task lifecycle.
Traceability means every AI-generated change is marked and dual-reviewed.
Blueprints
The blueprint structure is straightforward to navigate once you understand the intent:
blueprints/
├── architecture/ # How the app is built
│ ├── overview.md
│ ├── authentication.md
│ └── data-flow.md
├── coding-standards/ # How code must be written
│ ├── file-structure.md
│ └── dos-and-donts.md
├── design-system/ # How the UI must look
│ ├── colors.md, typography.md
│ ├── spacing.md, components.md
├── ai-integration/ # AI-specific guidance
│ ├── onboarding.md
│ └── secure-code.md
└── testing/ # How tests must be written
├── TESTING_STANDARDS.md
└── templates/Every AI task begins by loading the relevant blueprints. This is the answer to the trust question: if AI is going to write code, it follows our rules.
The task lifecycle
Every feature goes through the same path:
The human writes the requirements. The AI generates a prompt from those requirements. A human approves that prompt before anything is implemented. The AI implements. Then dual review — both AI and human — before merge. This isn’t bureaucracy; it’s the audit trail that makes AI-generated code trustworthy in a production environment handling PII and financial data.
The migration strategy
We didn’t migrate because “AI is cool.” We migrated because we wanted the fundamentals back: standard JS/HTML/CSS, proper source control, unlimited extensibility, and full autonomy. Every change now lives in Git. PR reviews became normal again. Deployments became something we owned.
The approach was deliberate. We rebuilt core screens first, reused existing business logic wherever possible, bridged gaps temporarily to keep ops running, then moved to full cutover once parity was reached. On November 26, 2025, the low-code back office was taken offline. The AI back office became the new standard.
By December 2025, we’d reached feature parity. Seven applications are now in production on the same template foundation.
Five non-negotiable rules
These rules emerged from the first few weeks and haven’t changed:
1. Check before you create. Search for existing utilities before creating new ones. This eliminated our biggest source of code duplication — an issue that gets expensive fast when AI is generating code at volume.
2. Clean up after yourself. No unused imports, variables, commented-out code, or dead functions. Leave the codebase cleaner than you found it. This one sounds obvious, but requires explicit enforcement when AI is writing code — it won’t clean up unless you tell it to.
3. No logging without permission. Zero console.log unless explicitly requested. Keeps production clean and prevents accidental sensitive data exposure. In a fintech back office, this is not optional.
4. Mark your work. Every AI-generated block gets [AI] / [/AI] markers. Always know what AI wrote versus what humans wrote. This is the traceability pillar in practice.
5. Ask, don’t assume. When blueprints don’t cover a scenario, present options with trade-offs. Never silently make architectural decisions. This rule alone has prevented several incidents where an AI would otherwise have picked a reasonable-seeming but wrong approach.
Security and testing
In a back office handling PII and financial data, security is baked into every template — not bolted on after the fact.
Mandatory patterns include input validation, XSS prevention via DOMPurify, CSRF token handling, and sensitive data cleanup on component unmount.
Forbidden patterns: localStorage for tokens, unsanitised innerHTML, client-only authorisation, eval(). These are in the blueprints. The AI knows about them before it writes a single line.
Testing follows TDD: the AI writes tests first, then implements the feature to make them pass. Coverage targets are enforced and verified by both AI and human reviewers before merge:
90% for API functions
85% for hooks
70% for components
95% for utilities
What we learned
Version control is mandatory. Without real PR reviews, you’re not building software — you’re accumulating risk. This sounds obvious until you’ve spent months in a system where it wasn’t possible.
Humans own requirements. AI accelerates implementation. The moment AI starts defining what to build, you’ve lost control of your own system.
AI needs constraints. Unconstrained AI produces creative chaos. Constrained AI — given blueprints, rules, and a task lifecycle — produces auditable, pattern-conforming code.
The results since 26 November 2025:
What the low-code platform gave us was initial momentum. What the AI-powered code-first approach gave us is momentum and ownership. We’re not just shipping faster — we’re shipping with confidence, control, and a foundation that scales. That’s the difference between building features and running the business safely.
Shafi Trumboo is a Back-end Tech Lead at Deriv.
Follow our official LinkedIn page for company updates and upcoming events.
Join our team to work on projects like this.











