Architecture – What You Actually Own

// The full pipeline

FIVE LAYERS, NO BLACK BOXES.

LAYER 01 · INGEST

Document ingest

Carrier invoices, BOLs, customs docs, rate sheets, EDI 210/214, email → OCR → structured rows + line items

↓

LAYER 02 · MODELS

Local LLM layer

Llama 3.1 70B (general), Qwen 2.5 14B (fast), domain fine-tunes (e.g. invoice_extractor)

↓

LAYER 03 · MEMORY

Embeddings + vector store

BAAI/bge-m3 embeddings → ChromaDB or Qdrant → Postgres for structured shipment + rate metadata

↓

LAYER 04 · APP

Application layer

FastAPI services for rate quoting, invoice audit, carrier scoring, customs compliance checks

↓

LAYER 05 · UI

Web UI

Next.js + Tailwind. Hosted on your domain or behind your VPN. SSO via your existing identity provider.

// Layer breakdown

EVERY COMPONENT, NAMED.

MODELS

Open-weight LLMs

· Llama 3.1 70B Instruct
· Qwen 2.5 14B Instruct (fast lane)
· Mistral 7B (legacy support)
· LoRA fine-tunes per workflow

Weights live on your hardware. Inference via vLLM or llama.cpp.

VECTOR STORE

Embeddings + retrieval

· ChromaDB (default) or Qdrant
· BAAI/bge-m3 embeddings
· Hybrid search (BM25 + dense)
· Per-shipper namespace isolation

All vectors stay on your disk. No external embedding APIs.

API LAYER

FastAPI services

· FastAPI + Pydantic
· Celery + Redis for async jobs
· OAuth2 / SSO ready
· OpenAPI spec generated

Stateless services. Easy to scale or replace one module without touching the others.

UI

Web interface

· Next.js 14 (app router)
· Tailwind CSS
· React Query for state
· Mobile-friendly for ops desks

Yours to brand. Yours to extend. Source ships in the handover.

INFRA

Runtime

· Docker Compose (single-host)
· Postgres 16 for metadata
· MinIO or local FS for blobs
· Caddy or nginx as reverse proxy

No Kubernetes unless you ask. Boring tech is the point.

OBSERVABILITY

What broke and why

· Structured logs (JSON)
· Prometheus metrics
· Grafana dashboards (optional)
· Per-request audit trail

Your IT team can answer "what happened" without calling us.

// Where it runs

THREE DEPLOYMENT OPTIONS.

OPTION 01

Workstation

A single MacBook Pro M3 Max or a workstation with a 24GB GPU. Good for one ops manager running invoice audit and rate quoting.

~$5k hardware · 1 user · No IT lift.

OPTION 02 · COMMON

On-prem server

A single tower or 2U rack server in your office, dual-RTX or A6000-class GPU. Runs the full stack for a 10-50 person brokerage or 3PL.

~$15-25k hardware · Whole office · One IT day to install.

OPTION 03

Your private cloud

Your AWS / Azure / GCP tenant, your VPC, your IAM. We deploy via Terraform. Documents never leave your cloud account. EU-region option for GDPR.

Hourly GPU billing · Multi-site · Your existing cloud governance applies.

// What you get on day 14

THE HANDOVER FOLDER.

A real, opinionated folder structure ships with every build. Everything is versioned, documented, and reproducible. Here's what lands in your repo:

your-marapone-build/
├── README.md
├── LICENSE
├── docker-compose.yml
├── .env.example
├── models/
│   ├── llama-3.1-70b-instruct.gguf
│   ├── qwen2.5-14b-instruct.gguf
│   └── lora/
│       ├── invoice_extractor/
│       └── carrier_scoring/
├── data/
│   ├── ingested/
│   │   ├── invoices/
│   │   └── rate_sheets/
│   ├── embeddings/
│   │   └── carrier_rates.parquet
│   └── chroma/
├── src/
│   ├── api/
│   │   ├── main.py
│   │   ├── routers/
│   │   │   ├── rate_quote.py
│   │   │   ├── invoice_audit.py
│   │   │   └── carrier_score.py
│   │   └── services/
│   ├── ingest/
│   │   ├── ocr_pipeline.py
│   │   ├── edi_parser.py
│   │   └── chunkers/
│   └── ui/                      # Next.js app
├── runbooks/
│   ├── 01_install.md
│   ├── 02_ingest_new_carrier.md
│   ├── 03_retrain_extractor.md
│   ├── 04_backup_restore.md
│   └── 05_incident_playbook.md
├── tests/
└── infra/
    ├── terraform/               # if cloud deploy
    └── scripts/

Every file is yours under your license. Want to change the model? Edit one config. Want to fork the UI? It's already a clean Next.js app. Want to swap ChromaDB for Qdrant? One docker-compose line.

// Maintenance & upgrades

YOU OWN IT. WE'RE STILL THERE IF YOU WANT US.

Self-serve by default

Every runbook is in plain Markdown. Backups, model updates, retraining the invoice extractor on a new carrier's format — all documented step by step. A competent IT generalist can run it.

Optional retainer

If you want us on call, we offer a flat monthly retainer for upgrades, model swaps as new open-weight releases drop, and incident support. No required subscription.

Open-weight upgrades

When Llama 4 or the next Qwen ships and outperforms what you have, you can swap it. Your fine-tunes and pipelines are model-agnostic on purpose.

No phone-home

The system never calls back to us. No telemetry, no usage pings. If your network is air-gapped, the system runs anyway.

// Want to read the actual code?

WE'LL WALK YOU THROUGH
A REAL HANDOVER REPO.

Book a 30-minute call and we'll screen-share an anonymized client repo: the API, the runbooks, the model configs. No NDA needed for the walkthrough.

Request the Walkthrough → See Security Posture

UNDER THE HOOD —WHAT YOU ACTUALLY OWN.