• 8 Posts
  • 89 Comments
Joined 2 years ago
cake
Cake day: July 13th, 2023

help-circle
  • I dunno, I guess I should try it just to see what the buzz is all about, but I am rather opposed to plagiarism and river boiling combination, and paying them money is like having Peter Thiel do 10x donations matching for donations to a captain planet villain.

    I personally want a model that does not store much specific code in its weights, uses RAG on compatibly licensed open source and cites what it RAG’d . E.g. I want to set app icon on Linux, it’s fine if it looks into GLFW and just borrows code with attribution that I will make sure to preserve. I don’t need it to be gaslighting me that it wrote it from reading the docs. And this isn’t literature, theres nothing to be gained from trying to dilute copyright by mixing together a hundred different pieces of code doing the same thing.

    I also don’t particularly get the need to hop onto the bandwagon right away.

    It has all the feel of boiling a lake to do for(int i=0; i<strlen(s); ++i) . LLMs are so energy intensive in large part because of quadratic scaling, but we know the problem is not intrinsically quadratic otherwise we wouldn’t be able to write, read, or even compile the code.

    Each token has the potential of relating to any other token but does only relate to a few.

    I’d give the bastards some time to figure this out. I wouldn’t use an O(N^2) compiler I can’t run locally, either, there is also a strategic disadvantage in any dependence on proprietary garbage.

    Edit: also i have a very strong suspicion that someone will figure out a way to make most matrix multiplications in an LLM be sparse, doing mostly same shit in a different basis. An answer to a specific query does not intrinsically use every piece of information that LLM has memorized.





  • Film photography is my hobby and I think that there isn’t anything that would prevent from exposing a displayed image on a piece of film, except for the cost.

    Glass plates it is, then. Good luck matching the resolution.

    In all seriousness though I think your normal set up would be detectable even on normal 35mm film due to 1: insufficient resolution (even at 4k, probably even at 8k), and 2: insufficient dynamic range. There would probably also be some effects of spectral response mismatch - reds that are cut off by the film’s spectral response would be converted into film-visible reds by a display. Il

    Detection of forgery may require use of a microscope and maybe some statistical techniques. Even if the pixels are smaller than film grains, pixels are on a regular grid and film grains are not.

    Edit: trained eyeballing may also work fine if you are familiar with the look of that specific film.


  • Oh wow it is precisely the problem I “predicted” before: there are surprisingly few production grade implementations to plagiarize from.

    Even for seemingly simple stuff. You might think parsing floating point numbers from strings would have a gazillion examples. But it is quite tricky to do it correctly (a correct implementation allows you to convert a floating point number to a string with enough digits, and back, and always obtain precisely the same number that you started with). So even for such omnipresent example, which has probably been implemented well over 10 000 times by various students, if you start pestering your bot with requests to make it better, if you have the bots write the tests and pass them, you could end up plagiarizing something identifiable.

    edit: and even suppose there were 2, or 3, or 5 exfat implementations. They would be too different to “blur” together. The deniable plagiarism that they are trying to sell - “it learns the answer in general from many implementations, then writes original code” - is bullshit.





  • I think more low tier output would be a disaster.

    Even pre AI I had to deal with a project where they shoved testing and compliance at juniors for a long time. What a fucking mess it was. I had to go through every commit mentioning Coverity because they had a junior fixing coverity flagged “issues”. I spent at least 2 days debugging a memory corruption crash caused by such “fix”, and then I had to spend who knows how long reviewing every such “fix”.

    And don’t get me started on tests. 200+ tests, of them none caught several regressions in handling of parameters that are shown early in the frigging how-to. Not some obscure corner case, the stuff you immediately run into if you just follow the documentation.

    With AI all the numbers would be much larger - more commits “fixing coverity issues” (and worse yet fixing “issues” that LLM sees in code), more so called “tests” that don’t actually flag any real regressions, etc.





  • When they tested on bugs not in SWE-Bench, the success rate dropped to 57‑71% on random items, and 50‑68% on fresh issues created after the benchmark snapshot. I’m surprised they did that well.

    After the benchmark snapshot. Could still be before LLM training data cut off, or available via RAG.

    edit: For a fair test you have to use git issues that had not been resolved yet by a human.

    This is how these fuckers talk, all of the time. Also see Sam Altman’s not-quite-denials of training on Scarlett Johansson’s voice: they just asserted that they had hired a voice actor, but didn’t deny training on actual Scarlett Johansson’s voice. edit: because anyone with half a brain knows that not only did they train on her actual voice, they probably gave it and their other pirated movie soundtracks massively higher weighting, just as they did for books and NYT articles.

    Anyhow, I fully expect that by now they just use everything they can to cheat benchmarks, up to and including RAG from solutions past the training dataset cut off date. With two of the paper authors being from Microsoft itself, expect that their “fresh issues” are gamed too.





  • I was writing some math code, and not being an idiot I’m using an open source math library for doing something called “QR decomposition”, and its efficient, and it supports sparse matrices (matrices where many numbers are 0), etc.

    Just out of curiosity I checked where some idiot vibecoder would end up. AI simply plagiarizes from some shit sample snippets which exist purely to teach people what QR decomposition is. It’s actually unusable, due to being numerically unstable.

    Who in the fuck even needs this shit to be plagiarized, anyway?

    It can’t plagiarize a production quality implementation, because you can count those on the fingers of one hand, they’re complex as fuck and you can’t just blend a few together to try to pretend you didn’t plagiarize.

    The answer is, people who are peddling the AI. They are the ones who ordered plagiarism with extra plagiarism on top. These are not coding tools, these are demos to convince the investors to buy the actual product, which is company’s stock. There’s a little bit of tool functionality (you can ask them to refactor the code), but it’s just you misusing a demo to try to get some value out of it.

    And to that end, the demos take every opportunity to plagiarize something, and to talk about how the “AI” wrote the code from scratch based on its supposed understanding of fairly advanced math.

    And in coding, it is counter productive to plagiarize. Many of the open source libraries can be used in commercial projects. You get upstream fixes for free. You don’t end up with some bugs or worse yet security exploits that may have been fixed since the training cut-off date.

    No fucking one in the right mind would willingly want their product to contain copy pasted snippets from stale open source libraries, passed through some sort of variable-renaming copyright laundering machine.

    Except of course the business idiots who are in charge of software at major companies, who don’t understand software. Who just failed upwards.

    They look at plagiarized lines and count them as improved productivity.




  • If it was a basement dweller with a chatbot that could be mistaken for a criminal co-conspirator, he would’ve gotten arrested and his computer seized as evidence, and then it would be a crapshoot if he would even be able to convince a jury that it was an accident. Especially if he was getting paid for his chatbot. Now, I’m not saying that this is right, just stating how it is for normal human beings.

    It may not be explicitly illegal for a computer to do something, but you are liable for what your shit does. You can’t just make a robot lawnmower and run over a neighbor’s kid. If you are using random numbers to steer your lawnmower… yeah.

    But because it’s OpenAI with 300 billion dollar “valuation”, absolutely nothing can happen whatsoever.