This repository contains a Python script designed to benchmark the performance of different large language models (LLMs) served by an Ollama instance. It measures key metrics such as latency, token ...
(used the git clone on 9/21/2025 on the github main branch with the Nemo:25.07.gpt_oss container image) And in order to use its recommended best config, I am trying to replicate its environment. I ...