Getting Started¶
This guide walks you through setting up KFabrik and deploying your first LLM model.
Prerequisites¶
Required Software¶
- Go 1.21 or later (for building)
- Make (for building)
- Docker with NVIDIA Container Toolkit (for GPU support)
- kubectl (installed automatically with minikube)
For GPU Support (Linux only)¶
- NVIDIA GPU with drivers installed on host
- NVIDIA Container Toolkit configured for Docker
Platform Support¶
| Platform | GPU Support | Notes |
|---|---|---|
| Linux with NVIDIA GPU | Full | Default configuration |
| Linux without GPU | CPU-only | Auto-detected |
| macOS | Coming soon | |
| Windows | Coming soon |
System Requirements¶
Minimum Requirements¶
| Resource | Minimum |
|---|---|
| CPU | 4 cores |
| Memory | 8GB RAM |
| Disk | 40GB |
Recommended for GPU Workloads¶
| Resource | Recommended |
|---|---|
| CPU | 8+ cores |
| Memory | 16-32GB RAM |
| Disk | 50GB+ |
| GPU VRAM | 6GB+ |
Installation¶
KFabrik is distributed as a custom build of minikube. Clone the repository, build, and install.
Build Requirements¶
- Go 1.21 or later
- Make
- Docker
Clone and Build¶
# Clone the kfabrik minikube repository
git clone https://github.com/kfabrik/minikube.git
cd minikube
# Build minikube with kfabrik addons
make build
# Install kfabrik CLI and minikube
./scripts/install.sh
The install script places both minikube and kfabrik binaries in /usr/local/bin.
Quick Start¶
1. Start the Cluster¶
# Start with GPU support (auto-detected)
kfabrik cluster start
# Or start in CPU-only mode
kfabrik cluster start --cpu-only
# Custom resource allocation
kfabrik cluster start --memory 16384 --cpus 4
The cluster start command:
- Starts minikube with Docker driver and GPU passthrough
- Enables kfabrik-bootstrap addon (installs KServe, Istio, Cert-Manager)
- Enables kfabrik-model addon (creates model configurations)
- Enables kfabrik-monitoring addon (deploys Prometheus, Grafana)
- Deploys the default model (qwen-small)
2. List Available Models¶
kfabrik list
Output:
Available Models (all fit in 6GB VRAM):
NAME PARAMS VRAM DOWNLOAD DISPLAY NAME
qwen-small 0.5B 1GB 1GB Qwen 2.5 0.5B Instruct
qwen-medium 1.5B 3GB 3GB Qwen 2.5 1.5B Instruct
tinyllama 1.1B 2.5GB 2.2GB TinyLlama 1.1B Chat
smollm2 1.7B 3.5GB 3.4GB SmolLM2 1.7B Instruct
phi2 2.7B 5.5GB 5.5GB Phi-2 (2.7B)
Deployed Models:
NAME READY URL
qwen-small True http://qwen-small-model-serving.example.com
3. Deploy a Model¶
# Deploy a single model
kfabrik deploy --models qwen-small
# Deploy and wait for readiness
kfabrik deploy --models qwen-medium --wait
# Deploy multiple models
kfabrik deploy --models qwen-small,tinyllama
4. Check Model Status¶
kfabrik status --model qwen-small
Output:
NAME READY REASON URL
qwen-small True Ready http://qwen-small-model-serving.example.com
5. Query a Model¶
kfabrik query --model qwen-small --prompt "What is Kubernetes?"
Output:
Model: qwen-small
Response:
Kubernetes is an open-source container orchestration platform...
[Tokens: prompt=34, completion=156, total=190]
Query with custom parameters:
kfabrik query --model qwen-medium \
--prompt "Explain machine learning in simple terms" \
--temperature 0.5 \
--max-tokens 500
6. View Logs¶
# View recent logs
kfabrik logs --model qwen-small
# Follow logs in real-time
kfabrik logs --model qwen-small --follow
# Show last 100 lines
kfabrik logs --model qwen-small --lines 100
7. Delete Models¶
# Delete a specific model
kfabrik delete --model qwen-small
# Delete all models
kfabrik delete --all
8. Stop the Cluster¶
kfabrik cluster stop
Accessing Dashboards¶
Grafana¶
kubectl port-forward -n monitoring svc/grafana 3000:3000
Prometheus¶
kubectl port-forward -n monitoring svc/prometheus 9090:9090
Manual Addon Management¶
If you prefer to manage addons separately (after building and installing):
# Start minikube with GPU support
minikube start --driver=docker --memory=32768 --cpus=8 --gpus=all
# Enable core infrastructure
minikube addons enable kfabrik-bootstrap
# Wait for installation to complete
kubectl wait --for=condition=complete job/kfabrik-installer \
-n kserve --timeout=600s
# Enable model configurations
minikube addons enable kfabrik-model
# Enable monitoring (optional)
minikube addons enable kfabrik-monitoring
# Deploy a model
kfabrik deploy --models qwen-small --wait
Environment Variables¶
| Variable | Default | Description |
|---|---|---|
| KUBECONFIG | ~/.kube/config | Kubernetes configuration file |
| KFABRIK_CPU_ONLY | false | Force CPU-only mode |
| KFABRIK_MINIKUBE_BINARY | minikube | Path to minikube binary |
| KFABRIK_KUBECTL_BINARY | kubectl | Path to kubectl binary |
Troubleshooting¶
GPU Not Detected¶
# Verify NVIDIA driver on host
nvidia-smi
# Check device plugin logs
kubectl logs -n kube-system -l name=nvidia-device-plugin-ds
# Verify GPU is advertised
kubectl describe node minikube | grep nvidia.com/gpu
Model Fails to Become Ready¶
# Check InferenceService status
kfabrik status --model <name>
# Check predictor pod logs
kfabrik logs --model <name>
# Check KServe controller logs
kubectl logs -n kserve -l control-plane=kserve-controller-manager
Out of Memory Errors¶
Increase minikube memory allocation:
minikube stop
minikube config set memory 32768
minikube start --gpus=all
Next Steps¶
- CLI Reference - Complete command documentation
- Architecture - Understand how KFabrik works
- Addons - Addon configuration and customization