— Sovereign AI Platform Flagship

Run frontier AI entirely
on your own hardware.

Cube AI is Ultraviolet's flagship platform for private LLM deployment — inference, retrieval, guardrails, governance, and a production workspace, all inside your perimeter. Nothing leaves your network.

Read the docs
Apache 2.0 ·Air-gapped ready ·Model agnostic ·TEE
cube.ultraviolet.rs/dashboard
Cube AI dashboard showing model status, domains, and audit logs
3 models online · 0 bytes egressed
What is Cube AI

The complete operating environment for private, sovereign AI.

Cube AI gives regulated and sovereign organizations everything they need to build and run AI without sending a single byte to a third party. Open models serve from your own GPUs, retrieval runs over your own knowledge bases, and every prompt is policed by your guardrails and recorded in your audit trail.

It is the flagship of the Ultraviolet ecosystem — the product most teams start with, and the one the rest of the stack is built to support.

Private
Every prompt, embedding, and response stays inside your perimeter. Zero outbound by default.
Sovereign
Data residency, compliance, and control governed by your policies and jurisdiction.
Open
Apache 2.0 core. Inspect the source, fork the platform, extend it for your environment.
Portable
One platform across on-prem, air-gapped, sovereign cloud, private cloud, and edge.
How it works

One platform, top to bottom of the stack.

Cube AI bundles everything an organization needs to operate AI privately: model serving, retrieval, policy enforcement, and governance — all running on the same confidential-computing substrate as the rest of Ultraviolet.

vLLM Ollama NeMo Guardrails TEE OpenAI-compatible API RBAC + ABAC
Cube AI
Sovereign AI platform
Inference — vLLM · Ollama
Retrieval — RAG on your data
Guardrails — policy on every call
Governance — audit · RBAC · usage
Hardware TEE — AMD SEV-SNP · Intel TDX
Runs on Cocos AI — TEEs, remote attestation, hardware isolation
Capabilities

Everything you need to
operate AI in production.

Cube AI is not a model — it is the full platform around your models: serving, retrieval, safety, governance, and the interfaces your teams actually use.

Private inference

vLLM and Ollama runtimes serve open models on your own GPUs — no API keys, no egress, no rate limits but your own.

RAG on your data

Generate embeddings inside the environment and connect internal knowledge bases with retrieval that never copies a byte off-premises.

Guardrails & PII redaction

NeMo guardrails, prompt-injection defense, output sanitization, and automatic PII redaction via Microsoft Presidio on every call.

Governance & audit

Full audit trail, role-based access, and per-domain usage accounting built in.

Model management

Register, version, and route across open models — Llama, Mistral, Qwen, DeepSeek, Phi, Gemma — from one control plane.

Hardware TEE

Run the whole stack inside AMD SEV-SNP or Intel TDX enclaves — weights and tensors stay encrypted in use, even from the host.

Multi-tenancy & domains

SuperMQ-backed domain isolation gives every team or tenant a strictly separated workspace with its own identity and policies.

Secure chat

An end-to-end encrypted chat workspace backed by verifiable hardware attestation for everyday private AI.

OpenAI-compatible API

OpenAI-compatible endpoints and SDKs drop Cube AI into the apps and agents you already run.

The platform

A production workspace,
not just an endpoint.

Cube AI ships with the interfaces your teams actually use — operations, model management, safety, audit, and chat. Pick a surface to see it.

cube.ultraviolet.rs/dashboard
Unified dashboard
cube.ultraviolet.rs/models
Model management
cube.ultraviolet.rs/routes
AI gateway & routes
cube.ultraviolet.rs/guardrails
Guardrails & policy
cube.ultraviolet.rs/audit-logs
Audit logs
cube.ultraviolet.rs/chat
Secure chat
Models & infrastructure

The models you choose,
on the hardware you own.

Cube AI is model-agnostic and infrastructure-agnostic. Serve open-weight models through the runtime that fits your workload, on everything from a single GPU to an air-gapped confidential cluster.

Models & runtimes
Open models
LlamaMistralQwenDeepSeekPhiGemma
Embeddings
Nomic EmbedBGECustom
Runtimes
vLLMOllamaHugging Face
Formats
GGUFsafetensorsFine-tunes
Infrastructure
GPUs
NVIDIA A100H100Blackwell
Compute
GPUCPU inferenceEdge
Confidential HW
AMD SEV-SNPIntel TDX
Deploy
On-premAir-gappedSovereign cloudPrivate VPC
Deploy anywhere

Wherever your data has to live,
your AI can run.

On-premises

Run on your own servers and GPUs, behind your firewall, under your change control.

Same platform · same governance · same audit trail
FAQ

Questions teams ask
before they deploy.

What is Cube AI?

Cube AI is a self-hosted platform for running large language models privately. It bundles inference, retrieval-augmented generation, guardrails, governance, audit, and a production UI into one stack that runs entirely on infrastructure you control.

How is it different from a hosted API like OpenAI or Anthropic?

With a hosted API, your prompts and data leave your organization and are processed in someone else's cloud and jurisdiction. Cube AI runs inside your own perimeter — on-prem, air-gapped, or in a sovereign cloud — so data never leaves, and you own the models, the policies, and the audit trail.

Which models can I run?

Cube AI is model-agnostic. It serves open-weight models such as Llama, Mistral, Qwen, DeepSeek, Phi, and Gemma — plus custom fine-tunes in GGUF or safetensors format — through vLLM, Ollama, and Hugging Face, all from a single control plane.

Where can I deploy it?

Anywhere your data has to live: on-premises behind your firewall, fully air-gapped for classified environments, in an EU or national sovereign cloud, in your own private VPC, or at the edge — with the same governance everywhere.

What are guardrails?

Guardrails are policy rules that sit between your users and the model — inspecting every prompt and response before it is processed or returned. They can block harmful inputs, detect prompt-injection attempts, filter profanity or off-topic requests, and enforce domain-specific rules your organization defines.

How do guardrails work in Cube AI?

Cube AI uses NVIDIA NeMo Guardrails as its policy engine, augmented with Microsoft Presidio for automatic PII detection and redaction. You author guardrail configurations as YAML-based Colang rules through the Cube dashboard — no code required. Rules hot-reload with zero downtime, and every guardrail decision is recorded in the audit trail so you can see exactly what was allowed, blocked, or redacted on every call.

What is a Trusted Execution Environment (TEE)?

A TEE is a hardware-isolated region inside a CPU — such as an AMD SEV-SNP confidential VM or an Intel TDX Trust Domain — where code and data are encrypted in memory at all times. Even the hypervisor and the host OS cannot read the contents. Remote attestation lets you cryptographically verify that the correct, unmodified software is running inside the enclave before trusting it with sensitive data.

How do I enable TEE support in Cube AI?

Deploy Cube AI on a host with AMD SEV-SNP or Intel TDX hardware — available on several cloud providers and on-prem servers. Cube AI runs on top of Cocos AI, Ultraviolet's open-source confidential-computing layer, which handles enclave provisioning, remote attestation, and key management automatically. Once running inside a TEE, every model weight, prompt, and response is encrypted in use, and clients can verify the enclave before sending any data.

Is Cube AI open source?

Yes. The core is Apache 2.0. You can inspect the source, fork it, and run it indefinitely with no vendor lock-in. Commercial support and enterprise features are available from Ultraviolet.

One ecosystem

Part of the Ultraviolet
sovereign AI stack.

Three products, designed to work as one. Each runs on the same confidential-computing foundation, shares the same governance model, and deploys anywhere your data must live.

— Get started

Bring frontier AI inside
your perimeter.

Talk to the team about pilots, deployment architectures, and regulated-industry rollouts — on your hardware, on your terms.

Apache 2.0 · Deploy anywhere · No vendor lock-in