TOOL

llm-valet

Cross-platform LLM lifecycle manager

auto-pause/resume Ollama based on system pressure and gaming detection

What it is

Running a local LLM is expensive in RAM. A 7B-parameter model in 4-bit quantization is ~4.5 GB; an 8B at q4 is ~5.5 GB. On a 16 GB machine, keeping Ollama warm means you have ~10 GB for everything else — and the moment the system swaps, the LLM gets evicted unpredictably and re-loads on the next request (~30 s wait).

llm-valet sits in the middle. It watches:

When the pressure subsides, llm-valet warms the model back up automatically.

When to use it

When not to use it

Status

Active. Public. Cross-platform (macOS, Linux, Windows) but tested most heavily on Apple Silicon.

See the GitHub repo for installation and the latest release.