Llama-3.2 WebGPU

A private and powerful AI chatbot
that runs locally in your browser.

You are about to load Llama-3.2-1B-Instruct, a 1.24 billion parameter LLM that is optimized for inference on the web. Once downloaded, the model (1.15 GB) will be cached and reused when you revisit the page.

Everything runs directly in your browser using 🤗 Transformers.js and ONNX Runtime Web, meaning your conversations aren't sent to a server. You can even disconnect from the internet after the model has loaded!

Disclaimer: Generated content may be inaccurate or false.

Llama-3.2 WebGPU

A private and powerful AI chatbot that runs locally in your browser.

A private and powerful AI chatbot
that runs locally in your browser.