Tiny-LLM Inference Engine

Tiny-LLM is a lightweight CUDA C++ inference engine for experimenting with W8A16 quantization, KV Cache incremental decoding, and modular Transformer inference.

Current status: the core runtime, cache flow, and test scaffolding are implemented, but the repository is still experimental. The default demo binary currently reports CUDA readiness rather than providing a polished end-to-end CLI, and runtime GGUF loading is not supported yet.

Repository Overview

W8A16 quantized inference with INT8 weights and FP16 activations
CUDA kernels for matmul, attention, RMSNorm, and elementwise ops
Host-side modules for model loading, transformer execution, generation, and cache management
Dedicated docs site for quick start, API reference, changelog, and contribution notes

Quick Start

mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)
ctest --output-on-failure
./tiny_llm_demo

Notes:

A working CUDA toolkit with nvcc is required to configure/build this project.
The demo currently validates CUDA availability and prints runtime capability information.
InferenceEngine::load() currently supports the project test binary path via ModelLoader::loadBin(); .gguf runtime loading is not wired up yet.

License

MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
.kiro/specs/tiny-llm-inference-engine		.kiro/specs/tiny-llm-inference-engine
changelog		changelog
docs		docs
include/tiny_llm		include/tiny_llm
kernels		kernels
src		src
tests		tests
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
_config.yml		_config.yml
index.md		index.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tiny-LLM Inference Engine

Repository Overview

Quick Start

Read Next

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Tiny-LLM Inference Engine

Repository Overview

Quick Start

Read Next

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages