New Show Hacker News story: Show HN: Made a batching LLM API for a project. Mistral 200 tk/s on RTX 3090

Fahad Al-Slmani ديسمبر 27, 2023

Show HN: Made a batching LLM API for a project. Mistral 200 tk/s on RTX 3090
2 by muttled | 0 comments on Hacker News.
I was running into an issue with a vLLM bug that affected multiple GPUs and I needed a stand-in while that bug was getting fixed that used the same API format but had better performance than the API on text-generation-webui. It's very rough. I'm not a coder by trade. But it's very fast once you have many simultaneous connections.

Hacker News

How To Get It For Free?

If you want to get this Premium Blogger Template for free, simply click on below links. All our resources are free for skill development, we don't sell anything. Thanks in advance for being with us.

Get It Now!Learn More...

New Show Hacker News story: Show HN: Made a batching LLM API for a project. Mistral 200 tk/s on RTX 3090

إرسال تعليق

نموذج الاتصال