Is there a tool or a model that can do what ChatGPT does with documents

afansfw@lemmynsfw.com · 6 days ago

Is there a tool or a model that can do what ChatGPT does with documents

NSFW

hendrik@palaver.p3x.de · edit-2 6 days ago

Most frontends should have you covered and scale down the image appropriately (and automatically). I’m not entirely sure about that. I think working on resolutions higher than supported should either not work due to the image encoder, or lead to degraded performance. Usually they’re scaled to what the model needs, somewhere within the pipeline. You can crop them if you like, that sheds off some pixels. Or split them up and feed them in one part after the other, if that somehow makes sense. But I bet with most software you can just upload a random image and it’ll do whatever scaling is required for you.

afansfw@lemmynsfw.com · 5 days ago

Managed to run it with llama.cpp. It was a great suggestion, thank you! MiniCPM-o-2_6 iq4 managed to read text from a picture of a shirt that gemma could not get right