When we started building Ottly, we had a clear set of requirements: it had to be fast, reliable, and capable of running AI workloads at scale. Here is how we chose our stack.

Backend: Go with Chi Router

We chose Go for the API server. It gives us predictable performance, easy concurrency, and a tiny memory footprint. The Chi router keeps things simple without the overhead of a full framework.

Key backend components:

PostgreSQL (via Supabase) for persistent data
Redis for caching and real-time pub/sub
gRPC for communication with the worker service

Frontend: Next.js with App Router

Every Ottly product is a Next.js application using the App Router. This gives us server-side rendering for SEO, React Server Components for performance, and a clean file-based routing structure.

We use Tailwind CSS for styling and shadcn/ui as our component foundation. This lets us move fast while keeping a consistent design language across products.

AI Layer: Multi-Model Architecture

Ottly is not locked to a single AI provider. Our LLM abstraction layer supports OpenAI, Google Gemini, and Anthropic Claude. Users can switch models on the fly — different tasks benefit from different models.

The routing layer handles:

Model selection and fallback
Token counting and cost tracking
Streaming responses via Server-Sent Events

Worker: Python Sandbox

For workflow execution, we use a Python-based worker service connected via gRPC. Each execution runs in an isolated sandbox with resource limits, timeout handling, and automatic error recovery.

What We Learned

The biggest lesson: keep the architecture boring where it does not matter, and invest complexity only where it creates real value. Our AI routing and execution sandboxing are complex because they have to be. Everything else is deliberately simple.