The first GPT-4-class AI model anyone can download has arrived: Llama 405B

Wilshire@lemmy.world · 4 months ago

The first GPT-4-class AI model anyone can download has arrived: Llama 405B

raldone01@lemmy.world · 4 months ago

I regularly run llama3 70b unqantesized on two P40s and CPU at like 7tokens/s. It’s usable but not very fast.

sunzu@kbin.run · 4 months ago

so there is no way a 24gb and 64gb can run thing?

raldone01@lemmy.world · edit-2 4 months ago

My specs because you asked:

CPU: Intel(R) Xeon(R) E5-2699 v3 (72) @ 3.60 GHz
GPU 1: NVIDIA Tesla P40 [Discrete]
GPU 2: NVIDIA Tesla P40 [Discrete]
GPU 3: Matrox Electronics Systems Ltd. MGA G200EH
Memory: 66.75 GiB / 251.75 GiB (27%)
Swap: 75.50 MiB / 40.00 GiB (0%)

sunzu@kbin.run · 4 months ago

ok this is a server. 48gb cards and 67gb ram? for model alone?

raldone01@lemmy.world · 4 months ago

Each card has 24GB so 48GB vram total. I use ollama it fills whatever vrams is available on both cards and runs the rest on the CPU cores.

raldone01@lemmy.world · 4 months ago

What are you asking exactly?

What do you want to run? I assume you have a 24GB GPU and 64GB host RAM?

sunzu@kbin.run · 4 months ago

correct. and how ram speed work in this tbh

raldone01@lemmy.world · 4 months ago

My memory sticks are all DDR4 with 32GB@2133MT/s.