

Not really. Here’s the chain-of-word-vomit that led to the answers:
Note that in “its impossible” answer it correctly echoes that you can take one other item with you, and does not bring the duck back (while the old overfitted gpt4 obsessively brought items back), while in the duck + 3 vegetables variant, it has a correct answer in the wordvomit, but not being an AI enthusiast it can’t actually choose the correct answer (a problem shared with the monkeys on typewriters).
I’d say it clearly isn’t ignoring the prompt or differences from the original river crossings. It just can’t actually reason, and the problem requires a modicum of reasoning, much as unloading groceries from a car does.
Yeah it really is fascinating. It follows some sort of recipe to try to solve the problem, like it’s trained to work a bit like an automatic algebra system.
I think they had employed a lot of people to write generators of variants of select common logical puzzles, e.g. river crossings with varying boat capacities and constraints, generating both the puzzle and the corresponding step by step solution with “reasoning” and re-printing of the state of the items on every step and all that.
It seems to me that their thinking is that successive parroting can amount to reasoning, if its parroting well enough. I don’t think it can. They have this one-path approach, where it just tries doing steps and representing state, just always trying the same thing.
What they need for this problem is to take a different kind of step, reduction (the duck can not be left unsupervised -> the duck must be taken with me on every trip -> rewrite a problem without the duck and with 1 less boat capacity -> solve -> rewrite the solution with “take the duck with you” on every trip).
But if they add this, then there’s two possible paths it can take on every step, and this thing is far too slow to brute force the right one. They may get it to solve my duck variant, but at the expense of making it fail a lot of other variants.
The other problem is that even seemingly most elementary reasoning involves very many applications of basic axioms. This is what doomed symbol manipulation “AI” in the past and this is what is dooming it now.