Selfhosted LLM (ChatGPT)

autopilot@lemmy.world · edit-2 1 year ago

Selfhosted LLM (ChatGPT)

CeeBee@lemmy.world · edit-2 1 year ago

The best/easiest way to get started with a self-hosted LLM is to check out this repo:

https://github.com/oobabooga/text-generation-webui

Its goal is to be the Automatic1111 of text generators, and it does a fair job at it.

A good model that’s said to rival gpt-3.5 is the new Falcon model. The full sized version is too big to run on a single GPU, but the 7b version “only” needs about 16GB.

https://huggingface.co/tiiuae/falcon-7b

There’s also the Wizard-uncensored model that is popular.

https://huggingface.co/ehartford/Wizard-Vicuna-13B-Uncensored

There are a ton of models out there with new ones popping up every day. You just need to search around. The oobabooga repo has a few models linked in the readme also.

Edit: there’s also h20gpt, which seems really promising. I’m going to try it out in the next couple days.

https://github.com/h2oai/h2ogpt

pe1uca@lemmy.pe1uca.dev · 1 year ago

How do you know how much ram the model needs?

redcalcium@c.calciumlabs.com · edit-2 1 year ago

The model creator usually mentioned it in the readme:

You will need at least 16GB of memory to swiftly run inference with Falcon-7B.

Usually the models support CPU inference. Tremendously slow but works in a pinch.

dorkian_gray@lemmy.world · 1 year ago

If you want extremely low code, I recommend GPT4All. The prebuilt binaries/exes run locally on CPU and give you a choice of model to use so you can try out a couple to see which you like the best. It’s remarkably quick on my Ryzen 7 3700X, and it doesn’t take long to get a little web server running with Langchain if you want to put in a bit more effort, too.

Throwaway4669332255@lemmy.world · 1 year ago

If you don’t have a good GPU then just use gpt4all

https://gpt4all.io/index.html