Yeah, that’s about it. I’ve trown buggy code at it, tell it to check it, says it’ll work just fine… scripts as well. You really can’t trust anything that that thing outputs and it’s more than 1 or 2 lines long (hello world examples excluded, they work just fine in most cases).
Have you looked at the project that spins up multiple LLM “identities” where they are “told” the issue to solve, one is asked to generate code for it, the others “critique” it, it generates new code based on the feedback, then it can automatically run it, if it fails it gets the error message so it can fix the issues, and only once it has generated code that works and is “accepted” by the other identities, it is given back to you
It sounds a bit silly, but it turns out to work quite well apparently, critiquing code is apparently easier than generating it, and iterating on code based on critiques and runtime feedback is much easier than producing correct code in one go
The software that implements multi agents called ChatDev, it’s significant more capable than one agent working alone. The ability to critique and fix bugs in the code in an iterative process gives a massive step up to the ability of the AI to program.
Granted it might still get in a loop between the programing and testing departments, but it’s a solid step in the right direction.
There is a (non-meme) reason why Prompt Engineer is a real title these days. It takes a measure of skill to get the model to focus on and attempt to solve the right question. This becomes even more apparent if you try to generate a product description where a newb will get something filled with superlative lies and a pro will get something better than most human writers in the field can muster for a much lower cost per text (compared to professional writers, often on par or more expensive than content farms). AI is a great tool, but it’s neither the only tool (don’t hammer in screws) nor is it perfect. The best approach is to let the AI do the easy boiler plate 80% then add that human touch to the hard 20% and at most have the AI prepare the structure / stubs.
Yeah, that’s about it. I’ve trown buggy code at it, tell it to check it, says it’ll work just fine… scripts as well. You really can’t trust anything that that thing outputs and it’s more than 1 or 2 lines long (hello world examples excluded, they work just fine in most cases).
Have you looked at the project that spins up multiple LLM “identities” where they are “told” the issue to solve, one is asked to generate code for it, the others “critique” it, it generates new code based on the feedback, then it can automatically run it, if it fails it gets the error message so it can fix the issues, and only once it has generated code that works and is “accepted” by the other identities, it is given back to you
It sounds a bit silly, but it turns out to work quite well apparently, critiquing code is apparently easier than generating it, and iterating on code based on critiques and runtime feedback is much easier than producing correct code in one go
The software that implements multi agents called ChatDev, it’s significant more capable than one agent working alone. The ability to critique and fix bugs in the code in an iterative process gives a massive step up to the ability of the AI to program.
Granted it might still get in a loop between the programing and testing departments, but it’s a solid step in the right direction.
I was thinking of AutoGPT, but nice to see there are multiple projects taking a crack at this approach
Hm… that sounds interesting… a link to this AI?
Here ya go: https://github.com/Significant-Gravitas/AutoGPT
Thanks 👍, on my watch list.
There is a (non-meme) reason why Prompt Engineer is a real title these days. It takes a measure of skill to get the model to focus on and attempt to solve the right question. This becomes even more apparent if you try to generate a product description where a newb will get something filled with superlative lies and a pro will get something better than most human writers in the field can muster for a much lower cost per text (compared to professional writers, often on par or more expensive than content farms). AI is a great tool, but it’s neither the only tool (don’t hammer in screws) nor is it perfect. The best approach is to let the AI do the easy boiler plate 80% then add that human touch to the hard 20% and at most have the AI prepare the structure / stubs.
To be honest, I just gave up on it regarding code. Now I use it mostly for getting info into one place when I know it’s scattered all over the web.