MODELS
Open-weight LLMs
- · Llama 3.1 70B Instruct
- · Qwen 2.5 14B Instruct (fast lane)
- · Mistral 7B (legacy support)
- · LoRA fine-tunes per workflow
Weights live on your hardware. Inference via vLLM or llama.cpp.
// Architecture
Most "AI for logistics" pages stop at the marketing layer. This one doesn't. Here is the exact stack we ship — the models, the vector store, the API, the UI, the infra. All of it open-source where it matters. All of it handed over on day 14 with the source code, weights, and runbooks.
// The full pipeline
LAYER 01 · INGEST
Document ingest
Carrier invoices, BOLs, customs docs, rate sheets, EDI 210/214, email → OCR → structured rows + line items
LAYER 02 · MODELS
Local LLM layer
Llama 3.1 70B (general), Qwen 2.5 14B (fast), domain fine-tunes (e.g. invoice_extractor)
LAYER 03 · MEMORY
Embeddings + vector store
BAAI/bge-m3 embeddings → ChromaDB or Qdrant → Postgres for structured shipment + rate metadata
LAYER 04 · APP
Application layer
FastAPI services for rate quoting, invoice audit, carrier scoring, customs compliance checks
LAYER 05 · UI
Web UI
Next.js + Tailwind. Hosted on your domain or behind your VPN. SSO via your existing identity provider.
// Where it runs
OPTION 01
A single MacBook Pro M3 Max or a workstation with a 24GB GPU. Good for one ops manager running invoice audit and rate quoting.
~$5k hardware · 1 user · No IT lift.
OPTION 02 · COMMON
A single tower or 2U rack server in your office, dual-RTX or A6000-class GPU. Runs the full stack for a 10-50 person brokerage or 3PL.
~$15-25k hardware · Whole office · One IT day to install.
OPTION 03
Your AWS / Azure / GCP tenant, your VPC, your IAM. We deploy via Terraform. Documents never leave your cloud account. EU-region option for GDPR.
Hourly GPU billing · Multi-site · Your existing cloud governance applies.
// Maintenance & upgrades
Self-serve by default
Every runbook is in plain Markdown. Backups, model updates, retraining the invoice extractor on a new carrier's format — all documented step by step. A competent IT generalist can run it.
Optional retainer
If you want us on call, we offer a flat monthly retainer for upgrades, model swaps as new open-weight releases drop, and incident support. No required subscription.
Open-weight upgrades
When Llama 4 or the next Qwen ships and outperforms what you have, you can swap it. Your fine-tunes and pipelines are model-agnostic on purpose.
No phone-home
The system never calls back to us. No telemetry, no usage pings. If your network is air-gapped, the system runs anyway.
// Want to read the actual code?
Book a 30-minute call and we'll screen-share an anonymized client repo: the API, the runbooks, the model configs. No NDA needed for the walkthrough.