I absolutely agree, but I have a sneaking but unfounded suspicion that many decision makers don’t want to prove out this theory.
WFH during the pandemic already triggered a panic from those whose income depends on the status quo of urban commute. To them, demonstrating we don’t need offices OR personal automobiles is a dangerous experiment to conduct in one of the largest metro areas in the world.
My god, what if it works? What would we do with all this pavement and gasoline?!
You would need to run the LLM on the system that has the GPU (your main PC). The front-end (typically a WebUI) could run in a docker container and make API calls to your LLM system. Unfortunately that requires the model to always be loaded in the VRAM on your main PC, severely reducing what you can do with that computer, GPU-wise.