The best argument that bigger models matter is that the biggest models of all are proprietary. And this may include still-larger models about which there is no disclosure at all. We also don't know if so-called "smaller" models are compact because they have been optimized from larger ones that have not been disclosed. Model size is clearly not the only thing that matters (bloom is fairly awful for example), but untill we get our hands on a 1-trillion plus parameter model to see the difference that it makes, speculation in the other direction sounds like a poor path to achieving AGI. Those racing toward AGI may also mislead others about their approach too, if they spend a billion learning what works and what doesn't I'd be very surprised if they share that hard-earned information.