The difference is huge between even the beefiest CPU and a decent GPU. My RTX 3080 does 7-8 tokens per second on the models I’ve tried while my Ryzen 9 5950x gets 1-2.
I’m not an expert with them nor have I done any optimization or benchmarking to find the real answer but it’s significant enough that I’d go for a GPU if I were building something dedicated for it