WeSearch

Hybrid on-device inference on Android: llama.cpp + LiteRT + NPU/GPU routing

· 0 reactions · 0 comments · 5 views
Hybrid on-device inference on Android: llama.cpp + LiteRT + NPU/GPU routing

Hi everyone, I’m the maintainer of Box — a fork of Google’s AI Edge Gallery that I’ve been extending into a fully offline AI assistant for Android. Full disclosure: I built this project. It runs entirely on-device (no cloud, no accounts, no external inference), and combines multiple local inference backends in a single app. What I’ve been experimenting with The goal was to see how far a fully offline mobile AI stack could be pushed using: llama.cpp (GGUF LLM inference) whisper.cpp (on-device STT

Original article
LocalLlama
Read full at LocalLlama →
Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from LocalLlama