Lattice Architecture

Go + PythonOpenAI CompatibleApache 2.0

Inference Architecture

Go API server + Python workers via JSON-RPC. MLX for Apple Silicon, llama.cpp for CPU/NVIDIA.

Go Layer (inferred/)

Python Layer (python/)

Cluster (Phase 3)

API Server

chi · OpenAI compat

Worker Pool

LRU eviction · mem budget

Worker Manager

Python subprocess

Model Registry

Download + cache

Prometheus

Inference metrics

JSON-RPC 2.0

stdin/stdout IPC

CLI: serve/pull/models

cmd/latticeinference

worker.py

JSON-RPC server

Model Loader

Format detection

MLX Backend

Apple Silicon Metal

llama.cpp

GGUF · CPU/NVIDIA

HF Pipeline

transformers

MLX Distributed

Multi-GPU

detect_format()

.gguf/.safetensors

is_apple_silicon()

Platform routing

mDNS Discovery

LAN zero-config

RDMA

GPU interconnect

Leader Election

Cluster coord

Topology

Node awareness