Skip to main content

Model Overview

warning

🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.

When Cortex.cpp is started, it automatically starts an API server, this is inspired by Docker CLI. This server manages various model endpoints. These endpoints facilitate the following:

  • Model Operations: Run and stop models.
  • Model Management: Manage your local models.
info

The model in the API server is automatically loaded/unloaded by using the /chat/completions endpoint.

Model Formats​

Cortex.cpp supports three model formats:

  • GGUF
  • ONNX
  • TensorRT-LLM
info

For details on each format, see the Model Formats page.

Built-in Models​

Cortex.cpp offers a range of built-in models that include popular open-source options. These models, hosted on HuggingFace as Cortex Model Repositories, are pre-compiled for different engines, enabling each model to have multiple branches in various formats.

Built-in Model Variants​

Built-in models are made available across the following variants:

  • By format: gguf, onnx, and tensorrt-llm
  • By Size: 7b, 13b, and more.
  • By quantizations: q4, q8, and more.
info

You can see our full list of Built-in Models here.

Next steps​

  • Cortex requires a model.yaml file to run a model. Find out more here.
  • Cortex supports multiple model hubs hosting built-in models. See details here.