Stack técnico compartido¶

Componentes que usan todas las implementations: FastAPI + LiteLLM (inferencia), Qdrant (RAG), Langfuse (observabilidad), Postgres (app data).

Quick start¶

# 1. Levanta servicios auxiliares (Qdrant, Langfuse, Postgres)
docker compose up -d

# 2. Verifica
docker compose ps
# qdrant     → http://localhost:6333
# langfuse   → http://localhost:3000
# postgres   → localhost:5432

# 3. Backend
cd backend
uv sync
cp ../.env.example .env  # rellena los valores
uv run uvicorn app.main:app --reload --port 8000

# 4. Health check
curl http://localhost:8000/health
# {"status": "ok", "ollama": "reachable", "qdrant": "reachable"}

Arquitectura¶

┌────────────┐
│  Frontend  │  Next.js / Tauri / RN
└──────┬─────┘
       │ HTTP
┌──────▼─────────────────────────────────────────┐
│  Backend FastAPI (este folder: stack/backend/) │
│  ┌──────────────────────────────────────────┐  │
│  │ Routes: chat, documents, embeddings      │  │
│  │ Services: business logic                 │  │
│  │ Inference: LiteLLM wrapper               │  │
│  │ RAG: Qdrant + EmbeddingGemma             │  │
│  └──────────────────────────────────────────┘  │
└────┬─────────┬────────────┬─────────┬──────────┘
     │         │            │         │
     ▼         ▼            ▼         ▼
  Ollama   Qdrant      Postgres   Langfuse
  (Gemma   (vectors)   (app data) (traces)
   4 E4B)

Componentes¶

Backend FastAPI¶

Ver backend/README.md.

Endpoints incluidos por defecto: - GET /health — health check - POST /chat — chat con Gemma 4 (proxy LiteLLM) - POST /embeddings — embeddings con EmbeddingGemma - POST /documents — ingesta de docs a Qdrant - POST /search — búsqueda semántica - POST /completions — text completion (legacy)

Cada implementation añade sus propios endpoints en app/routes/.

docker-compose.yml¶

Servicios: - qdrant (puerto 6333) — vector DB. - postgres-langfuse (puerto 5433) — DB para Langfuse. - langfuse-server (puerto 3000) — UI de trazas. - postgres-app (puerto 5432) — DB para tu app.

Variables de entorno requeridas¶

Ver .env.example. Críticas:

OLLAMA_BASE_URL=http://localhost:11434
QDRANT_URL=http://localhost:6333
LANGFUSE_HOST=http://localhost:3000
LANGFUSE_PUBLIC_KEY=  # Get from Langfuse UI after first signup
LANGFUSE_SECRET_KEY=

Customización por implementation¶

Cada implementation añade:

Routes en backend/app/routes/<impl>.py.
Models Pydantic en backend/app/models/<impl>.py.
Services business logic en backend/app/services/<impl>.py.
System prompts referenciados desde implementations/<impl>/prompts/.

El backend principal no se duplica por implementation — todas comparten el mismo runtime.

Deploy¶

Ambiente	Stack
Local dev	Mac M4 Pro + Docker Desktop
Staging	Hetzner CPX31 (€8/mes) + Mac Mini como GPU server vía Tailscale
Production	Cloud Run + Vertex AI (managed) o on-prem 2× H100

Más detalle en ../docs/06-tech-stack.md.