idyl.inference is distributed AI inference across heterogeneous GPUs. Run any model on any hardware in the idyl network with automatic hardware matching and zero configuration.
It supports any model format — GGUF, SafeTensors, ONNX — and handles sharding across multiple GPUs for models that don’t fit on a single card. You get an OpenAI-compatible API endpoint, so switching from centralised providers takes one line of code.