We recently released Smaug-72B-v0.1 which has taken first place on the Open LLM Leaderboard by HuggingFace. It is the first open-source model to have an average score more than 80.
We recently released Smaug-72B-v0.1 which has taken first place on the Open LLM Leaderboard by HuggingFace. It is the first open-source model to have an average score more than 80.
Every billion parameters needs about 2 GB of VRAM - if using bfloat16 representation. 16 bits per parameter, 8 bits per byte -> 2 bytes per parameter.
1 billion parameters ~ 2 Billion bytes ~ 2 GB.
From the name, this model has 72 Billion parameters, so ~144 GB of VRAM
Ok but will this run on my TI-83? It’s a + model.
Only if it’s silver.
Dang. So close.
My 83 was ganked by some kid I knew so my folks bought me a silver. He denied it. I learned that day to write my name in secret spots.
That kid you knew was a dick. At least he taught you a valuable lesson, I guess.
He absolutely was a dick. I stopped being mates with him after that. My school was like “yeah the cameras didn’t work that day actually”
Leads me to believe that the cameras never actually worked.
I believe that. Or they just didn’t want to be responsible for dealing with theft. Both ways make perfect sense to me.
no. but put this clustering software i wrote in ti-basic on 40 million of them? still no
It’s been discovered that you can reduce the bits per parameter down to 4 or 5 and still get good results. Just saw a paper this morning describing a technique to get down to 2.5 bits per parameter, even, and apparently it 's fine. We’ll see if that works out in practice I guess
Any idea what 8Q requirements would be? Or 4 or 5?
https://huggingface.co/senseable/Smaug-72B-v0.1-gguf/tree/main
About 44GB and 50GB for the Q4 and 5. You’d need quite some extra to fully use the 32k context length.
Llama 2 70B with 8b quantization takes around 80GB VRAM if I remember correctly. I’ve tested it a while ago.