Skip to content

Addons

The KFabrik addon suite provides a complete machine learning inference platform for minikube, enabling local development and testing of Large Language Models (LLMs) with GPU support.

Overview

The suite consists of three addons that work together:

flowchart LR
    bootstrap["kfabrik-bootstrap<br/>(required first)"]
    model["kfabrik-model<br/>(optional)"]
    monitoring["kfabrik-monitoring<br/>(optional)"]

    bootstrap --> model
    bootstrap --> monitoring

The kfabrik-bootstrap addon must be enabled first as it provides the core infrastructure (KServe, Istio) that other addons depend on.

kfabrik-bootstrap

Purpose

The foundation addon that installs all core infrastructure components required for ML inference workloads.

Components

Component Purpose
Cert Manager TLS certificate lifecycle management
Istio Service mesh for traffic and security
KServe Kubernetes-native ML model serving
NVIDIA Device Plugin GPU resource discovery and allocation

Installation

# Enable the addon
minikube addons enable kfabrik-bootstrap

# Monitor installation progress
kubectl get jobs -n kserve -w

# Verify components are running
kubectl get pods -n cert-manager
kubectl get pods -n istio-system
kubectl get pods -n kserve
kubectl get pods -n kube-system | grep nvidia

Namespaces Created

Namespace Contents
cert-manager Certificate management components and webhooks
istio-system Istio control plane (istiod)
kserve KServe controller and installer job

Configuration

Configuration is stored in a ConfigMap in the kserve namespace:

apiVersion: v1
kind: ConfigMap
metadata:
  name: kfabrik-bootstrap-config
  namespace: kserve
data:
  INSTALL_CERT_MANAGER: "true"
  INSTALL_ISTIO: "true"
  INSTALL_KSERVE: "true"

Design Decisions

Helm-Based Installation: Uses a Job-based installer that runs Helm charts rather than static YAML manifests. This provides version pinning, configuration management, and upgrade capabilities.

RawDeployment Mode: KServe is configured for RawDeployment mode (standard Kubernetes Deployments) rather than Serverless mode (Knative). This simplifies setup, debugging, and operation for local development. Knative support is planned for a future release.

Job-Based Installer: A Kubernetes Job orchestrates sequential installation of components with proper dependency ordering. The job self-cleans after 5 minutes (TTL).

Troubleshooting

Check installer job:

kubectl get jobs -n kserve
kubectl logs -n kserve -l job-name=kfabrik-installer

Verify GPU detection:

kubectl get nodes -o json | jq '.items[].status.capacity'
kubectl logs -n kube-system -l name=nvidia-device-plugin-ds


kfabrik-model

Purpose

Provides model deployment configurations and the model-serving namespace for running LLM inference workloads.

Pre-Configured Models

All models are optimized for consumer GPUs with 6GB VRAM or less:

Name Parameters VRAM RAM Download
qwen-small 0.5B ~1GB ~6GB ~1GB
qwen-medium 1.5B ~3GB ~8GB ~3GB
tinyllama 1.1B ~2.5GB ~6GB ~2.2GB
smollm2 1.7B ~3.5GB ~8GB ~3.4GB
phi2 2.7B ~5.5GB ~12GB ~5.5GB

RAM requirements are approximately 2-3x model size due to inference server overhead. Models are deployed one at a time.

Installation

# Requires kfabrik-bootstrap to be enabled first
minikube addons enable kfabrik-bootstrap

# Wait for bootstrap to complete
kubectl wait --for=condition=Ready pod -l app=istiod \
  -n istio-system --timeout=300s

# Enable kfabrik-model
minikube addons enable kfabrik-model

Namespace Created

Namespace Contents
model-serving Deployed InferenceServices and model configuration ConfigMap

Model Configuration

Model definitions are stored in a ConfigMap:

kubectl get configmap model-config -n model-serving -o yaml

Example model configuration:

models:
  qwen-small:
    name: qwen-small
    displayName: "Qwen 2.5 0.5B Instruct"
    modelFormat: huggingface
    storageUri: "hf://Qwen/Qwen2.5-0.5B-Instruct"
    vram: "1GB"
    ram: "6GB"
    downloadSize: "1GB"
    parameters: "0.5B"
    replicas: 1
    timeout: 300
    resources:
      requests:
        cpu: "1"
        memory: "4Gi"
      limits:
        cpu: "4"
        memory: "8Gi"
    env:
      HF_MODEL_ID: "Qwen/Qwen2.5-0.5B-Instruct"
      MAX_MODEL_LEN: "512"
      GPU_MEMORY_UTILIZATION: "0.8"

Using with kfabrik CLI

# List available models
kfabrik list

# Deploy a model
kfabrik deploy --models qwen-small --wait

# Query the model
kfabrik query --model qwen-small --prompt "What is AI?"

# Delete the model
kfabrik delete --model qwen-small

Custom Models

To add custom models, create a custom configuration file:

# my-models.yaml
namespace: model-serving
models:
  my-model:
    name: my-model
    displayName: "My Custom Model"
    modelFormat: huggingface
    storageUri: "hf://organization/model-name"
    resources:
      requests:
        cpu: "2"
        memory: "8Gi"
      limits:
        cpu: "4"
        memory: "16Gi"

Then deploy with:

kfabrik deploy --config my-models.yaml --models my-model


kfabrik-monitoring

Purpose

Provides a complete observability stack for monitoring ML inference workloads, including GPU metrics.

Components

Component Purpose
Prometheus Metrics collection and storage
Grafana Visualization and dashboards
DCGM Exporter NVIDIA GPU metrics exporter

Installation

# Enable after kfabrik-bootstrap
minikube addons enable kfabrik-monitoring

# Verify components
kubectl get pods -n monitoring

Namespace Created

Namespace Contents
monitoring Prometheus, Grafana, and DCGM Exporter

Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: kfabrik-monitoring-config
  namespace: monitoring
data:
  PROMETHEUS_RETENTION: "15d"
  PROMETHEUS_STORAGE_SIZE: "10Gi"
  GRAFANA_ADMIN_USER: "admin"
  GRAFANA_ADMIN_PASSWORD: "admin"

Accessing Dashboards

Grafana:

kubectl port-forward -n monitoring svc/grafana 3000:3000
# Open http://localhost:3000
# Login: admin / admin

Prometheus:

kubectl port-forward -n monitoring svc/prometheus 9090:9090
# Open http://localhost:9090

GPU Metrics

The DCGM Exporter provides detailed GPU metrics:

Metric Description
DCGM_FI_DEV_GPU_UTIL GPU utilization percentage
DCGM_FI_DEV_MEM_COPY_UTIL Memory copy utilization
DCGM_FI_DEV_FB_USED Framebuffer memory used (bytes)
DCGM_FI_DEV_FB_FREE Framebuffer memory free (bytes)
DCGM_FI_DEV_GPU_TEMP GPU temperature (Celsius)
DCGM_FI_DEV_POWER_USAGE Power consumption (Watts)

Resource Requirements

Minimum Requirements

Resource Minimum
CPU 4 cores
Memory 8GB RAM
Disk 40GB
Resource Recommended
CPU 8+ cores
Memory 16-32GB RAM
Disk 50GB+
GPU VRAM 6GB+

Starting minikube with Appropriate Resources

# Set persistent defaults
minikube config set cpus 8
minikube config set memory 32768
minikube config set disk-size 50g

# Start with GPU support
minikube start --driver=docker --gpus=all

Disabling Addons

# Disable in reverse order of dependencies
minikube addons disable kfabrik-monitoring
minikube addons disable kfabrik-model
minikube addons disable kfabrik-bootstrap

For complete cleanup including CRDs:

# Delete Helm releases
helm uninstall kserve -n kserve
helm uninstall istiod -n istio-system
helm uninstall istio-base -n istio-system
helm uninstall cert-manager -n cert-manager

# Delete namespaces
kubectl delete namespace model-serving monitoring \
  kserve istio-system cert-manager

# Delete CRDs
kubectl get crd | grep -E 'kserve|istio|cert-manager' | \
  awk '{print $1}' | xargs kubectl delete crd

Troubleshooting

Addon Enable Fails

# Check addon status
minikube addons list | grep kfabrik

# View installer logs
kubectl logs -n kserve -l job-name=kfabrik-installer

# Check events
kubectl get events -n kserve --sort-by='.lastTimestamp'

Models Not Deploying

# Check InferenceService status
kubectl get inferenceservice -n model-serving

# Describe for detailed conditions
kubectl describe inferenceservice <name> -n model-serving

# Check pod status
kubectl get pods -n model-serving
kubectl logs -n model-serving <pod-name>

GPU Not Detected

# Verify device plugin is running
kubectl get pods -n kube-system | grep nvidia

# Check plugin logs
kubectl logs -n kube-system -l name=nvidia-device-plugin-ds

# Verify GPU is advertised
kubectl describe node minikube | grep nvidia.com/gpu

Out of Memory Errors

Increase minikube memory allocation:

minikube stop
minikube config set memory 32768
minikube start --gpus=all



See Also