CLI Reference¶
Complete reference for the kfabrik command-line interface.
Synopsis¶
kfabrik [GLOBAL OPTIONS] command [COMMAND OPTIONS]
Global Options¶
| Option | Description |
|---|---|
-h, --help |
Show help message and exit |
--kubeconfig FILE |
Path to kubeconfig file (default: $KUBECONFIG or ~/.kube/config) |
-n, --namespace NAMESPACE |
Kubernetes namespace for model deployments (default: model-serving) |
Commands¶
cluster¶
Manage the minikube cluster lifecycle with GPU or CPU-only support.
cluster start¶
Start a new minikube cluster configured for ML workloads.
kfabrik cluster start [FLAGS]
Flags:
| Flag | Description |
|---|---|
--cpu-only |
Force CPU-only mode, even if GPU is available |
--driver NAME |
Minikube driver to use (default: auto-detect) |
--memory MB |
Memory to allocate in megabytes (default: 32768) |
--cpus N |
Number of CPUs to allocate (default: 8) |
--skip-model |
Skip deploying the default model |
--model NAME |
Model to deploy (default: qwen-small) |
Examples:
# Start with GPU (auto-detected)
kfabrik cluster start
# Start in CPU-only mode
kfabrik cluster start --cpu-only
# Start with custom resources
kfabrik cluster start --memory 16384 --cpus 4
# Start without deploying a model
kfabrik cluster start --skip-model
cluster stop¶
Stop and delete the minikube cluster, cleaning up all resources.
kfabrik cluster stop
deploy¶
Deploy one or more LLM models to Kubernetes using KServe InferenceServices.
kfabrik deploy [FLAGS]
Flags:
| Flag | Description |
|---|---|
--models MODEL[,MODEL...] |
Comma-separated list of models to deploy |
--all |
Deploy all configured models |
--wait |
Wait for models to become ready before returning |
Examples:
# Deploy a single model
kfabrik deploy --models qwen-small
# Deploy multiple models
kfabrik deploy --models qwen-small,tinyllama
# Deploy and wait for readiness
kfabrik deploy --models qwen-medium --wait
delete¶
Delete one or more deployed models from Kubernetes.
kfabrik delete [FLAGS]
Flags:
| Flag | Description |
|---|---|
--model NAME |
Name of the model to delete |
--all |
Delete all deployed models |
Examples:
# Delete a specific model
kfabrik delete --model qwen-small
# Delete all models
kfabrik delete --all
list¶
List models available for deployment or currently deployed.
kfabrik list [FLAGS]
Flags:
| Flag | Description |
|---|---|
--available |
List available models from configuration |
--deployed |
List currently deployed models with their status |
Without flags, lists both available and deployed models.
Example Output:
Available Models (all fit in 6GB VRAM):
NAME PARAMS VRAM DOWNLOAD DISPLAY NAME
qwen-small 0.5B 1GB 1GB Qwen 2.5 0.5B Instruct
qwen-medium 1.5B 3GB 3GB Qwen 2.5 1.5B Instruct
tinyllama 1.1B 2.5GB 2.2GB TinyLlama 1.1B Chat
smollm2 1.7B 3.5GB 3.4GB SmolLM2 1.7B Instruct
phi2 2.7B 5.5GB 5.5GB Phi-2 (2.7B)
Deployed Models:
NAME READY URL
qwen-small True http://qwen-small-model-serving.example.com
status¶
Check the status of deployed models.
kfabrik status [FLAGS]
Flags:
| Flag | Description |
|---|---|
--model NAME |
Name of the model to check |
--all |
Check status of all deployed models |
-o, --output FORMAT |
Output format: table (default) or json |
Examples:
# Check specific model
kfabrik status --model qwen-small
# Check all models as JSON
kfabrik status --all --output json
query¶
Send an inference query to a deployed model using the OpenAI-compatible chat completions API.
kfabrik query [FLAGS]
Flags:
| Flag | Description |
|---|---|
--model NAME |
Name of the model to query (required) |
--prompt TEXT |
Prompt to send to the model (required) |
--temperature FLOAT |
Sampling temperature 0.0-2.0 (default: 0.7) |
--max-tokens INT |
Maximum tokens to generate (default: 256) |
--top-p FLOAT |
Top-p sampling parameter 0.0-1.0 (default: 0.9) |
--timeout SECONDS |
Request timeout in seconds (default: 120) |
The command automatically sets up port-forwarding to the model's predictor service.
Examples:
# Basic query
kfabrik query --model qwen-small --prompt "What is Kubernetes?"
# Query with custom parameters
kfabrik query --model qwen-medium \
--prompt "Explain machine learning in simple terms" \
--temperature 0.5 \
--max-tokens 500
Example Output:
Model: qwen-small
Response:
Kubernetes is an open-source container orchestration platform...
[Tokens: prompt=34, completion=156, total=190]
logs¶
View logs for a deployed model's pods.
kfabrik logs [FLAGS]
Flags:
| Flag | Description |
|---|---|
--model NAME |
Name of the model (required) |
-f, --follow |
Stream log output (like tail -f) |
--lines INT |
Number of lines to show (default: 50) |
-c, --container NAME |
Container name (default: first container) |
Examples:
# View recent logs
kfabrik logs --model qwen-small
# Follow logs in real-time
kfabrik logs --model qwen-small --follow
# Show last 100 lines
kfabrik logs --model qwen-small --lines 100
version¶
Print the version of kfabrik.
kfabrik version
Available Models¶
The following models are pre-configured and optimized for consumer GPUs with 6GB VRAM or less:
| Name | Parameters | VRAM | Download | Description |
|---|---|---|---|---|
| qwen-small | 0.5B | ~1GB | ~1GB | Qwen 2.5 0.5B Instruct |
| qwen-medium | 1.5B | ~3GB | ~3GB | Qwen 2.5 1.5B Instruct |
| tinyllama | 1.1B | ~2.5GB | ~2.2GB | TinyLlama 1.1B Chat |
| smollm2 | 1.7B | ~3.5GB | ~3.4GB | SmolLM2 1.7B Instruct |
| phi2 | 2.7B | ~5.5GB | ~5.5GB | Microsoft Phi-2 |
Models are deployed one at a time; you don't need VRAM for all models simultaneously. System RAM requirements are approximately 2-3x the model size for inference server overhead.
Environment Variables¶
| Variable | Default | Description |
|---|---|---|
| KUBECONFIG | ~/.kube/config | Kubernetes configuration file |
| KFABRIK_CPU_ONLY | false | Force CPU-only mode |
| KFABRIK_MINIKUBE_BINARY | minikube | Path to minikube binary |
| KFABRIK_KUBECTL_BINARY | kubectl | Path to kubectl binary |
Exit Status¶
| Code | Description |
|---|---|
| 0 | Success |
| 1 | General error (invalid arguments, connection failed, etc.) |
See Also¶
- Getting Started - Installation and setup
- Addons - Addon configuration
- Architecture - How KFabrik works