• 9 Posts
  • 418 Comments
Joined 6 months ago
cake
Cake day: March 22nd, 2024

help-circle














  • Jokes aside (and this whole AI search results thing is a joke) this seems like an artifact of sampling and tokenization.

    I wouldn’t be surprised if the Gemini tokens for XTX are “XT” and “X” or something like that, so it’s got quite a chance of mixing them up after it writes out XT. Add in sampling (literally randomizing the token outputs a little), and I’m surprised it gets any of it right.





  • but what am I realistically looking at being able to run locally that won’t go above like 60-75% usage so I can still eventually get a couple game servers, network storage, and Jellyfin working?

    Honestly, not much. Llama 8B, but very slowly, or maybe deepseek v2 chat, preprocessed on the 270 with vulkan but mostly running on CPU. And I guess just limit it to 6 threads? I’d host it with kobold.cpp vulkan, or maybe the llama.cpp server if there will be multiple users.

    You can try them to see if they feel OK, but llms are just not something that like old hardware. An RTX 3060 (or a Mac, or a 12GB+ AMD GPU) is considered bare minimum in the community, a 3090 or 7900 XTX standard.