TensorSharp: Open-Source Local LLM Inference Engine

Jun 4, 2026 · 12:29 AM UTC ·42 min read · 0 reactions · 0 comments · 43 views

#technology #software #open-source #TensorSharp #Ollama #OpenAI #Gemma #Qwen #Mistral

TensorSharp: Open-Source Local LLM Inference Engine

TL;DR · WeSearch summary

TensorSharp is an open-source C# inference engine designed for running large language models locally. It supports various model architectures and provides multiple interfaces for programmatic access. The engine features optimized backends for CPU and GPU, enabling efficient multimodal inference.

Key facts

▪TensorSharp allows users to run large language models locally using GGUF model files.
▪It offers a console application, a web-based chatbot interface, and APIs compatible with Ollama and OpenAI.
▪The engine supports multiple model architectures and provides optimized backends for both CPU and GPU.

Original article

GitHub

Read full at GitHub →

Opening excerpt (first ~120 words) tap to expand

TensorSharp English | 中文 A C# inference engine for running large language models (LLMs) locally using GGUF model files. TensorSharp provides a console application, a web-based chatbot interface, and Ollama/OpenAI-compatible HTTP APIs for programmatic access. Documentation Map Start here Use this when you want to... Quick build and usage Build the solution, compile the native GGML bridge, and run the CLI or server Supported model architectures Check which GGUF architecture keys, modalities, thinking mode, and tool calling paths are implemented Compute backends Choose between pure C# CPU, direct CUDA/cuBLAS, MLX Metal, GGML CPU, GGML Metal, and GGML CUDA HTTP APIs Use the Ollama-compatible, OpenAI-compatible, or Web UI SSE endpoints Per-model architecture cards Read end-to-end documentation…

Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.

Anonymous · no account needed

Discussion

0 comments

TensorSharp: Open-Source Local LLM Inference Engine

Discussion

More from GitHub