But a few websites set their updated date to the current date which was annoying, maybe to rank better in Google? And some people (including me) only mention the update time in the page text content.
I've used GPT to parse human formatted dates in another project too, it's quite reliable if you validate the output timestamp. And relatively cheap too if you only pass in the first part of the page text.
I can see how it's a tricky problem. I wish html had more structure here (and people followed the structure, a whole other problem...). FWIW, my page has a "last updated" date on its now page but comes up as 1969 in aboutideasnow.
But a few websites set their updated date to the current date which was annoying, maybe to rank better in Google? And some people (including me) only mention the update time in the page text content.
I've used GPT to parse human formatted dates in another project too, it's quite reliable if you validate the output timestamp. And relatively cheap too if you only pass in the first part of the page text.