Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So in essence they do tons of pre-processing on their data. I wonder how long the pre-processing takes compared to the amount of speed gains it produced for them.


Doesn't take so long actually, takes about 1 hour per day of data and every day we process about 10TB of uncompressed log files. The result of this can be stored and reused as many times as you need.


Interesting, you're processing ~500GB (10TB / 24 hour) of uncompressed loglines in 1 hour? Is the set up the same as the presentation?


How does this compare to something like spark/shark?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: