Show HN: Open-source GPU cost analysis tool
Contribute to gpuaudit/cli development by creating an account on GitHub.
Full article excerpt tap to expand
gpuaudit Scan your cloud for GPU waste and get actionable recommendations to cut your spend. $ gpuaudit scan --skip-eks Found 38 GPU nodes across 47 nodes in gpu-cluster gpuaudit — GPU Cost Audit for AWS Account: 123456789012 | Regions: us-east-1 | Duration: 4.2s ┌──────────────────────────────────────────────────────────┐ │ GPU Fleet Summary │ ├──────────────────────────────────────────────────────────┤ │ Total GPU instances: 38 │ │ Total monthly GPU spend: $127450 │ │ Estimated monthly waste: $18200 ( 14%) │ └──────────────────────────────────────────────────────────┘ CRITICAL — 3 instance(s), $15400/mo potential savings Instance Type Monthly Signal Recommendation ──────────────────────────────────── ────────────────────────── ──────── ──────────────── ────────────────────────────────────────────── gpu-cluster/ip-10-15-255-248 g6e.16xlarge (1× L40S) $ 6752 idle Node up 13 days with 0 GPU pods scheduled. gpu-cluster/ip-10-22-250-15 g6e.16xlarge (1× L40S) $ 6752 idle Node up 1 days with 0 GPU pods scheduled. ... What it scans EC2 — GPU instances (g4dn, g5, g6, g6e, p4d, p4de, p5, inf2, trn1) with CloudWatch metrics SageMaker — Endpoints with GPU utilization and invocation metrics EKS — Managed GPU node groups via the AWS EKS API Kubernetes — GPU nodes and pod allocation via the Kubernetes API (Karpenter, self-managed, any CNI) What it detects Idle GPU instances — running but doing nothing (low CPU + near-zero network for 24+ hours) Oversized GPU — multi-GPU instances where utilization suggests a single GPU would suffice Pricing mismatch — on-demand instances running 30+ days that should be Reserved Instances Stale instances — non-production instances running 90+ days SageMaker low utilization — endpoints with <10% GPU utilization SageMaker oversized — endpoints using <30% GPU memory on multi-GPU instances K8s unallocated GPUs — nodes with GPU capacity but no pods requesting GPUs Install go install github.com/gpuaudit/cli/cmd/gpuaudit@latest Or build from source: git clone https://github.com/gpuaudit/cli.git cd cli go build -o gpuaudit ./cmd/gpuaudit Quick start # Uses default AWS credentials (~/.aws/credentials or environment variables) gpuaudit scan # Specific profile and region gpuaudit scan --profile production --region us-east-1 # Kubernetes cluster scan (uses KUBECONFIG or ~/.kube/config) gpuaudit scan --skip-eks # Specific kubeconfig and context gpuaudit scan --kubeconfig ~/.kube/config --kube-context gpu-cluster # JSON output for automation gpuaudit scan --format json -o report.json # Compare two scans to see what changed gpuaudit diff old-report.json new-report.json # Slack Block Kit payload (pipe to webhook) gpuaudit scan --format slack -o - | \ curl -X POST -H 'Content-Type: application/json' -d @- $SLACK_WEBHOOK # Skip specific scanners gpuaudit scan --skip-metrics # faster, less accurate gpuaudit scan --skip-sagemaker gpuaudit scan --skip-eks # skip AWS EKS API (use --skip-k8s for Kubernetes API) gpuaudit scan --skip-k8s Comparing scans Save scan results as JSON, then diff them later: gpuaudit scan --format json -o scan-apr-08.json # ... time passes, changes happen ... gpuaudit scan --format json -o scan-apr-15.json gpuaudit diff scan-apr-08.json scan-apr-15.json gpuaudit diff — 2026-04-08 12:00 UTC → 2026-04-15 12:00 UTC ┌──────────────────────────────────────────────────────────┐ │ Cost Delta │ ├──────────────────────────────────────────────────────────┤ │ Monthly spend: $142000 → $127450 (-$14550) │ │ Estimated waste:…
This excerpt is published under fair use for community discussion. Read the full article at GitHub.