The voice pipeline currently supports MLX on any M1 through M5 chip. I used Whisper-Turbo for STT, Qwen3.5-9B-4bit for the LLM and Qwen3-TTS-0.6B-4bit for TTS.
The repo also has a Websocket Transport to add these voices to devices powered by the ESP32 via secure websockets.