Skip to content

// Architecture

UNDER THE HOOD —
WHAT YOU ACTUALLY OWN.

Most "AI for logistics" pages stop at the marketing layer. This one doesn't. Here is the exact stack we ship — the models, the vector store, the API, the UI, the infra. All of it open-source where it matters. All of it handed over on day 14 with the source code, weights, and runbooks.

// The full pipeline

FIVE LAYERS, NO BLACK BOXES.

LAYER 01 · INGEST

Document ingest

Carrier invoices, BOLs, customs docs, rate sheets, EDI 210/214, email → OCR → structured rows + line items

LAYER 02 · MODELS

Local LLM layer

Llama 3.1 70B (general), Qwen 2.5 14B (fast), domain fine-tunes (e.g. invoice_extractor)

LAYER 03 · MEMORY

Embeddings + vector store

BAAI/bge-m3 embeddings → ChromaDB or Qdrant → Postgres for structured shipment + rate metadata

LAYER 04 · APP

Application layer

FastAPI services for rate quoting, invoice audit, carrier scoring, customs compliance checks

LAYER 05 · UI

Web UI

Next.js + Tailwind. Hosted on your domain or behind your VPN. SSO via your existing identity provider.

// Layer breakdown

EVERY COMPONENT, NAMED.

MODELS

Open-weight LLMs

  • · Llama 3.1 70B Instruct
  • · Qwen 2.5 14B Instruct (fast lane)
  • · Mistral 7B (legacy support)
  • · LoRA fine-tunes per workflow

Weights live on your hardware. Inference via vLLM or llama.cpp.

VECTOR STORE

Embeddings + retrieval

  • · ChromaDB (default) or Qdrant
  • · BAAI/bge-m3 embeddings
  • · Hybrid search (BM25 + dense)
  • · Per-shipper namespace isolation

All vectors stay on your disk. No external embedding APIs.

API LAYER

FastAPI services

  • · FastAPI + Pydantic
  • · Celery + Redis for async jobs
  • · OAuth2 / SSO ready
  • · OpenAPI spec generated

Stateless services. Easy to scale or replace one module without touching the others.

UI

Web interface

  • · Next.js 14 (app router)
  • · Tailwind CSS
  • · React Query for state
  • · Mobile-friendly for ops desks

Yours to brand. Yours to extend. Source ships in the handover.

INFRA

Runtime

  • · Docker Compose (single-host)
  • · Postgres 16 for metadata
  • · MinIO or local FS for blobs
  • · Caddy or nginx as reverse proxy

No Kubernetes unless you ask. Boring tech is the point.

OBSERVABILITY

What broke and why

  • · Structured logs (JSON)
  • · Prometheus metrics
  • · Grafana dashboards (optional)
  • · Per-request audit trail

Your IT team can answer "what happened" without calling us.

// Where it runs

THREE DEPLOYMENT OPTIONS.

OPTION 01

Workstation

A single MacBook Pro M3 Max or a workstation with a 24GB GPU. Good for one ops manager running invoice audit and rate quoting.

~$5k hardware · 1 user · No IT lift.

OPTION 02 · COMMON

On-prem server

A single tower or 2U rack server in your office, dual-RTX or A6000-class GPU. Runs the full stack for a 10-50 person brokerage or 3PL.

~$15-25k hardware · Whole office · One IT day to install.

OPTION 03

Your private cloud

Your AWS / Azure / GCP tenant, your VPC, your IAM. We deploy via Terraform. Documents never leave your cloud account. EU-region option for GDPR.

Hourly GPU billing · Multi-site · Your existing cloud governance applies.

// What you get on day 14

THE HANDOVER FOLDER.

A real, opinionated folder structure ships with every build. Everything is versioned, documented, and reproducible. Here's what lands in your repo:

your-marapone-build/
├── README.md
├── LICENSE
├── docker-compose.yml
├── .env.example
├── models/
│   ├── llama-3.1-70b-instruct.gguf
│   ├── qwen2.5-14b-instruct.gguf
│   └── lora/
│       ├── invoice_extractor/
│       └── carrier_scoring/
├── data/
│   ├── ingested/
│   │   ├── invoices/
│   │   └── rate_sheets/
│   ├── embeddings/
│   │   └── carrier_rates.parquet
│   └── chroma/
├── src/
│   ├── api/
│   │   ├── main.py
│   │   ├── routers/
│   │   │   ├── rate_quote.py
│   │   │   ├── invoice_audit.py
│   │   │   └── carrier_score.py
│   │   └── services/
│   ├── ingest/
│   │   ├── ocr_pipeline.py
│   │   ├── edi_parser.py
│   │   └── chunkers/
│   └── ui/                      # Next.js app
├── runbooks/
│   ├── 01_install.md
│   ├── 02_ingest_new_carrier.md
│   ├── 03_retrain_extractor.md
│   ├── 04_backup_restore.md
│   └── 05_incident_playbook.md
├── tests/
└── infra/
    ├── terraform/               # if cloud deploy
    └── scripts/

Every file is yours under your license. Want to change the model? Edit one config. Want to fork the UI? It's already a clean Next.js app. Want to swap ChromaDB for Qdrant? One docker-compose line.

// Maintenance & upgrades

YOU OWN IT. WE'RE STILL THERE IF YOU WANT US.

Self-serve by default

Every runbook is in plain Markdown. Backups, model updates, retraining the invoice extractor on a new carrier's format — all documented step by step. A competent IT generalist can run it.

Optional retainer

If you want us on call, we offer a flat monthly retainer for upgrades, model swaps as new open-weight releases drop, and incident support. No required subscription.

Open-weight upgrades

When Llama 4 or the next Qwen ships and outperforms what you have, you can swap it. Your fine-tunes and pipelines are model-agnostic on purpose.

No phone-home

The system never calls back to us. No telemetry, no usage pings. If your network is air-gapped, the system runs anyway.

// Security defaults shipped on day one

ENCRYPTION

At-rest (LUKS/AES-256), in-transit (TLS 1.3)

AUTH

SSO-ready, role-based access, audit log

NETWORK

Bind-localhost default, VPN/private subnet for remote

EGRESS

No outbound API calls. Air-gap supported.

// Want to read the actual code?

WE'LL WALK YOU THROUGH
A REAL HANDOVER REPO.

Book a 30-minute call and we'll screen-share an anonymized client repo: the API, the runbooks, the model configs. No NDA needed for the walkthrough.