GPU's rival? What is Language Processing Unit (LPU)

ooli@lemmy.world · 7 months ago

Scott@sh.itjust.works · 7 months ago

It’s not about their frontend, they are running custom LPUs which can process LLM tokens at 500/sec which is insanely impressive.

For reference with a max size of 2k tokens, my dual xeon silver 4114 procs take 2-3 minutes.

Lojcs@lemm.ee · 7 months ago

No I got what you meant, but that site is weird if it’s not doing anything on its own

Finadil@lemmy.world · 7 months ago

That with a fp16 model? Don’t be scared to try even a 4 bit quantization, you’d be surprised at how little is lost and how much quicker it is.