Feature Registry
Canonical feature completeness across the optalocal stack.
150 of 185 features implemented across all apps
Opta LMX
1M — MLX Inference
Opta LMX Features
Inference Server
- MLX-native inference on Apple Silicon
- OpenAI-compatible
/v1/chat/completions endpoint
- Streaming SSE responses
- GGUF model loading (llama.cpp fallback)
- Automatic quantization selection
- Model hot-swap without restart
- Concurrent request handling
- KV cache management
- Context length enforcement
- vLLM backend for parallel batching
Model Management
- Model inventory API (
/admin/models)
- Dynamic load/unload API
- Memory headroom enforcement (never crash on OOM)
- Model health monitoring
- HuggingFace model download integration
- GGUF format support
- LoRA adapter loading
- Model benchmarking suite
API Compatibility
- OpenAI
/v1/chat/completions
- OpenAI
/v1/models
- Health endpoint
/healthz
- Admin events SSE
/admin/events
- Rerank endpoint
/v1/rerank
- Skills API
/v1/skills
- Agents API
/v1/agents
- Embeddings endpoint
/v1/embeddings
- Function calling (tool_use)
Performance
- ANE (Apple Neural Engine) utilization
- Batch request coalescing
- Throughput metrics (tokens/sec)
- Active request tracking
- Auto-tune quantization per model size
- Thermal throttle detection