Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Awesome. I'd love some hints or links, as I'm always looking to refactor.


In general, if you're going the mechanize route, .retrieve() is the function your looking for.

e.g.

  br = mechanize.Browser()
  br.retrieve("https://www.google.com/images/srpr/logo3w.png, google_logo.png)[0]
Mechanize doesn't really have a proper doc, but just about everything you'd need could be figured out from the very lengthy examples page on their site.


Playing with it now, and while it seems to hit my download need, I can't seem to get it to play nice with sites that are JavaScript dependent. Am I missing something, or is there a way to plugin an underlying WebKit engine?


PhantomJS is capable of downloading binary content from js dependent sites but it is a journey to get it working as it is not an out-of-the-box feature. Instead use CasperJS to drive Phantom and get a ton of snazzy features including simple binary downloads. Happy scraping!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: