Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can check against the API with just the first characters of your hashed password (SHA-1 or NTLM), for example: https://api.pwnedpasswords.com/range/21BD1 or you can download the entire dataset.


How can you download the entire dataset?


You can download the entire dataset using curl (will be 40+ GB)

    curl -s --retry 10 --retry-all-errors --remote-name-all --parallel --parallel-max 150 "https://api.pwnedpasswords.com/range/{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}"


It's not that I couldn't have written that oneliner, it's that I assumed you'd get blocked very quickly.


It is officially recommended by the Troy Hunt: https://github.com/HaveIBeenPwned/PwnedPasswordsDownloader/i...


That speaks to a certain confidence in one's servers ability to hold up under load, doesn't it?

"Oh you want your own copy? Sure, just thrash seven shades of shit out of the database. Here's how."


It's not a database, it's just files. And they are hosted by Cloudflare so they can cope with a lot of downloads.

I think he should make the files smaller my removing the second half of the hashes, i.e. reduce it from 40 hex digits to 20. This increases the change of a false positive (i.e. I enter my password, it says it was compromised but it wasn't, it just has the same hash as one that did) from 1 in 10^48 to 1 in 10^24 (per password), but that's still a huge number. (There's less than 10^10 people in the world, they only have a few passwords each). This will approximately halve the download, maybe more because the first half of each hash is more compressible (when sorted) the second half is totally random.


> It's not a database, it's just files. And they are hosted by Cloudflare so they can cope with a lot of downloads.

Database: a usually large collection of data organized especially for rapid search and retrieval (as by a computer) [1]

It is a database. Stop nitpicking.

[1] https://www.merriam-webster.com/dictionary/database


Confidence in Cloudflare, for sure.


That's crazy, thank you.


You are being purposefully obtuse here. HIBP is a very, very well established site with a long history of operating in good faith.


> > It's not that I couldn't have written that oneliner, it's that I assumed you'd get blocked very quickly.

> junon https://news.ycombinator.com/user?id=junon

> You are being purposefully obtuse here. HIBP is a very, very well established site with a long history of operating in good faith.

Allowing people to query and someone downloading the entire dataset is normally considered abuse, so being blocked is the expectation here. You're so dense you're bending light around you.


Several open source tools can be found on GitHub, but here’s the “official” one https://github.com/HaveIBeenPwned/PwnedPasswordsDownloader


Second line I already notice:

> 000F6468C6E4D09C0C239A4C2769501B3DD:5894

... Does the 5894 mean what I think it does?


I remember when I was searching the file for some passwords my friends and family use, it took me a while to work out that number too. There are some passwords that many people seem to independently come up with and think must be reasonably secure. I suppose they are to the most basic of attacks.


5894 means that the password appeared 5894 times in the dataset.

5894 is not the password associated with the hash.


Yes, it did mean what I thought, then.

But I guess some passwords appear far more often than that in the dataset.


Some passwords are far more commonly used than others; that isn't surprising.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: