The manifold hypothesis and text
ChatGPT (and most GPT-like services) make up stuff all the time – it’s not that surprising, and should not be that hard to fix. What is surprising is that language can be abstracted in some smooth manifold, and tons of data reveal the correct one. In a sense, quantity has a quality of its own, and larger models do really well because they merely (!?) need to interpolate between training data examples.
I wonder if RL will have the same fate – there are some hints here and there: https://openreview.net/pdf?id=4-k7kUavAj, but no conclusive answer yet. If this is done successfully it will basically mean that tons of data and big machines is all you need, a somewhat anti-climatic end to the AI saga.