TensorSharp: Open-Source Local LLM Inference Engine
TensorSharp is an open-source C# inference engine designed for running large language models locally. It supports various model architectures and provides multiple interfaces for programmatic access. The engine features optimized backends for CPU and GPU, enabling efficient multimodal inference.
- ▪TensorSharp allows users to run large language models locally using GGUF model files.
- ▪It offers a console application, a web-based chatbot interface, and APIs compatible with Ollama and OpenAI.
- ▪The engine supports multiple model architectures and provides optimized backends for both CPU and GPU.
Opening excerpt (first ~120 words) tap to expand
TensorSharp English | 中文 A C# inference engine for running large language models (LLMs) locally using GGUF model files. TensorSharp provides a console application, a web-based chatbot interface, and Ollama/OpenAI-compatible HTTP APIs for programmatic access. Documentation Map Start here Use this when you want to... Quick build and usage Build the solution, compile the native GGML bridge, and run the CLI or server Supported model architectures Check which GGUF architecture keys, modalities, thinking mode, and tool calling paths are implemented Compute backends Choose between pure C# CPU, direct CUDA/cuBLAS, MLX Metal, GGML CPU, GGML Metal, and GGML CUDA HTTP APIs Use the Ollama-compatible, OpenAI-compatible, or Web UI SSE endpoints Per-model architecture cards Read end-to-end documentation…
Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.