Well, just running on a 6C/12T Coffee Lake CPU, (I'm looking through these speeds in LM Studio as I type this..) I got like 2 tokens a second with Deepseek R1 14B, 3.4 with 7B Qwen, and 4.4 with 8B Llama, although out of those two I found 7B Qwen's answer to be a bit better. (My GTX1650 has 4GB VRAM, loading 1/4 the layers is pretty ineffective, GPU util went up to 10% and I gained like 1 token a second LOL.)
So it'd take a minute or two to type out one of those answers where it's got about 4 or 5 beefy paragraphs of thought and a decent sized paragraph for it's answer. I'll put it this way, I can type 120 WPM and it puts out text a bit faster than I could write it.
Input's a LOT faster though, I was asking these models to analyze a document so my input was like 2200 tokens, they all did well over 100 tokens a second on input.
Nope, I read The Register (UK based) and they've had scandals from celebrities having their confidential SMS messages leaked; SMS spoofing; I think they even have SIM cloning going on every now and then in UK and some European countries. (since The Register is a tech site, my recollection is some carriers took technical measures to prevent these issues while quite a few didn't.)
I don't think it's a thing that happens that often in UK etc.; but, it doesn't happen that frequently in the US either. It's just a thing that can potentially happen.
So it'd take a minute or two to type out one of those answers where it's got about 4 or 5 beefy paragraphs of thought and a decent sized paragraph for it's answer. I'll put it this way, I can type 120 WPM and it puts out text a bit faster than I could write it.
Input's a LOT faster though, I was asking these models to analyze a document so my input was like 2200 tokens, they all did well over 100 tokens a second on input.