CLI Reference¶

Complete reference for the kfabrik command-line interface.

Synopsis¶

kfabrik [GLOBAL OPTIONS] command [COMMAND OPTIONS]

Global Options¶

Option	Description
`-h, --help`	Show help message and exit
`--kubeconfig FILE`	Path to kubeconfig file (default: $KUBECONFIG or ~/.kube/config)
`-n, --namespace NAMESPACE`	Kubernetes namespace for model deployments (default: model-serving)

Commands¶

cluster¶

Manage the minikube cluster lifecycle with GPU or CPU-only support.

cluster start¶

Start a new minikube cluster configured for ML workloads.

kfabrik cluster start [FLAGS]

Flags:

Flag	Description
`--cpu-only`	Force CPU-only mode, even if GPU is available
`--driver NAME`	Minikube driver to use (default: auto-detect)
`--memory MB`	Memory to allocate in megabytes (default: 32768)
`--cpus N`	Number of CPUs to allocate (default: 8)
`--skip-model`	Skip deploying the default model
`--model NAME`	Model to deploy (default: qwen-small)

Examples:

# Start with GPU (auto-detected)
kfabrik cluster start

# Start in CPU-only mode
kfabrik cluster start --cpu-only

# Start with custom resources
kfabrik cluster start --memory 16384 --cpus 4

# Start without deploying a model
kfabrik cluster start --skip-model

cluster stop¶

Stop and delete the minikube cluster, cleaning up all resources.

kfabrik cluster stop

deploy¶

Deploy one or more LLM models to Kubernetes using KServe InferenceServices.

kfabrik deploy [FLAGS]

Flags:

Flag	Description
`--models MODEL[,MODEL...]`	Comma-separated list of models to deploy
`--all`	Deploy all configured models
`--wait`	Wait for models to become ready before returning

Examples:

# Deploy a single model
kfabrik deploy --models qwen-small

# Deploy multiple models
kfabrik deploy --models qwen-small,tinyllama

# Deploy and wait for readiness
kfabrik deploy --models qwen-medium --wait

delete¶

Delete one or more deployed models from Kubernetes.

kfabrik delete [FLAGS]

Flags:

Flag	Description
`--model NAME`	Name of the model to delete
`--all`	Delete all deployed models

Examples:

# Delete a specific model
kfabrik delete --model qwen-small

# Delete all models
kfabrik delete --all

list¶

List models available for deployment or currently deployed.

kfabrik list [FLAGS]

Flags:

Flag	Description
`--available`	List available models from configuration
`--deployed`	List currently deployed models with their status

Without flags, lists both available and deployed models.

Example Output:

Available Models (all fit in 6GB VRAM):
  NAME         PARAMS  VRAM   DOWNLOAD  DISPLAY NAME
  qwen-small   0.5B    1GB    1GB       Qwen 2.5 0.5B Instruct
  qwen-medium  1.5B    3GB    3GB       Qwen 2.5 1.5B Instruct
  tinyllama    1.1B    2.5GB  2.2GB     TinyLlama 1.1B Chat
  smollm2      1.7B    3.5GB  3.4GB     SmolLM2 1.7B Instruct
  phi2         2.7B    5.5GB  5.5GB     Phi-2 (2.7B)

Deployed Models:
  NAME        READY  URL
  qwen-small  True   http://qwen-small-model-serving.example.com

status¶

Check the status of deployed models.

kfabrik status [FLAGS]

Flags:

Flag	Description
`--model NAME`	Name of the model to check
`--all`	Check status of all deployed models
`-o, --output FORMAT`	Output format: table (default) or json

Examples:

# Check specific model
kfabrik status --model qwen-small

# Check all models as JSON
kfabrik status --all --output json

query¶

Send an inference query to a deployed model using the OpenAI-compatible chat completions API.

kfabrik query [FLAGS]

Flags:

Flag	Description
`--model NAME`	Name of the model to query (required)
`--prompt TEXT`	Prompt to send to the model (required)
`--temperature FLOAT`	Sampling temperature 0.0-2.0 (default: 0.7)
`--max-tokens INT`	Maximum tokens to generate (default: 256)
`--top-p FLOAT`	Top-p sampling parameter 0.0-1.0 (default: 0.9)
`--timeout SECONDS`	Request timeout in seconds (default: 120)

The command automatically sets up port-forwarding to the model's predictor service.

Examples:

# Basic query
kfabrik query --model qwen-small --prompt "What is Kubernetes?"

# Query with custom parameters
kfabrik query --model qwen-medium \
  --prompt "Explain machine learning in simple terms" \
  --temperature 0.5 \
  --max-tokens 500

Example Output:

Model: qwen-small
Response:
Kubernetes is an open-source container orchestration platform...

[Tokens: prompt=34, completion=156, total=190]

logs¶

View logs for a deployed model's pods.

kfabrik logs [FLAGS]

Flags:

Flag	Description
`--model NAME`	Name of the model (required)
`-f, --follow`	Stream log output (like tail -f)
`--lines INT`	Number of lines to show (default: 50)
`-c, --container NAME`	Container name (default: first container)

Examples:

# View recent logs
kfabrik logs --model qwen-small

# Follow logs in real-time
kfabrik logs --model qwen-small --follow

# Show last 100 lines
kfabrik logs --model qwen-small --lines 100

version¶

Print the version of kfabrik.

kfabrik version

Available Models¶

The following models are pre-configured and optimized for consumer GPUs with 6GB VRAM or less:

Name	Parameters	VRAM	Download	Description
qwen-small	0.5B	~1GB	~1GB	Qwen 2.5 0.5B Instruct
qwen-medium	1.5B	~3GB	~3GB	Qwen 2.5 1.5B Instruct
tinyllama	1.1B	~2.5GB	~2.2GB	TinyLlama 1.1B Chat
smollm2	1.7B	~3.5GB	~3.4GB	SmolLM2 1.7B Instruct
phi2	2.7B	~5.5GB	~5.5GB	Microsoft Phi-2

Models are deployed one at a time; you don't need VRAM for all models simultaneously. System RAM requirements are approximately 2-3x the model size for inference server overhead.

Environment Variables¶

Variable	Default	Description
KUBECONFIG	~/.kube/config	Kubernetes configuration file
KFABRIK_CPU_ONLY	false	Force CPU-only mode
KFABRIK_MINIKUBE_BINARY	minikube	Path to minikube binary
KFABRIK_KUBECTL_BINARY	kubectl	Path to kubectl binary

Exit Status¶

Code	Description
0	Success
1	General error (invalid arguments, connection failed, etc.)

CLI Reference¶

Synopsis¶

Global Options¶

Commands¶

cluster¶

cluster start¶

cluster stop¶

deploy¶

delete¶

list¶

status¶

query¶

logs¶

version¶

Available Models¶

Environment Variables¶

Exit Status¶

See Also¶