Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Cleo: the open source technology behind LinkedIn's typeahead search (linkedin.com)
114 points by jamesjyu on March 2, 2012 | hide | past | favorite | 12 comments


Nice! This looks to fill a role that's quite challenging at scale using MySQL, SOLR, redis, etc. I've forced sphinx to fill this role but the limitations of real-time updates (no infix/suffix support) make deployment awkward.

My two immediate questions are ballpark memory usage and if Cleo supports any form of persistence. I don't find mention of either in the docs, and only spammers have found the Google Group so far. Time to fire up a test instance.

Update: figured out persistence. The tutorials give a good idea of what's going on: http://sna-projects.com/cleo/tutorial_MyFriendsTypeahead.php


This is very cool! Have to look better into it, but it does not seem to be error tolerant. Hence here is a related approach to make error-tolerant this kind of typeahead search!!

Live demo: http://ipubmed.ics.uci.edu/

Paper: "Efficient interactive fuzzy keyword search" by Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng (UC Irvine & Tsinghua University)

scholar link: http://scholar.google.com/scholar?q=Efficient+interactive+fu...


This is a timely release - one of the question titles for the upcoming Codesprint Quora challenge[1] is "Quora Typeahead Search."

[1] http://www.quora.com/blog/Codesprint-Quora-1


For someone who is well versed in solr usage, but not in typeahead implementations, why would I use Cleo rather than something based on top of solr ?

Have to get something similar up and running for a client.


You could run a full search engine with nothing but Cleo. If you're indexing entities and the choices are narrow enough after a relatively small number of keystrokes, you never need to hit "search" and go to a Lucene index.


the question is why. Look, solr has enough traction and mindshare that it is easy to find consultants, employees, resources, etc. I would be loath to move my main search index out of lucene, simply because of all the tons of boosting and tuning for my specific business case.

Essentially my question is - is Cleo a replacement for Solr (with a focus on instantaneous indexing) or a killer application for typeahead. If it's the former, not really interested. If the latter, then how can one compare it with solr based typeahead solutions.


I'm trying to get this assessed right now for my startup. We used solr, but it didn't play nice with Ruby and MongoDB, so we had to hack up a solution to get it to do what we wanted. Trying to figure out if Cleo will do it outside of the box and reduce our server load.


really interested in your use case. Were the problems due to Ruby (Rsolr ? Sunspot?) or due to Mongodb. If it's the latter, I'm curious if it had something to do with keeping Mongodb and Solr in sync.

Essentially, did you face a problem with the real-time aspect of Solr ?


See other comment. Sunspot issue.


I'm not the biggest fan in the world of Solr (I prefer ElasticSearch), but "play nice"?

Did you have problems shifting content from MongoDB to the index?


Well, it was actually field collapsing support in Sunspot, to be specific. :) Which it seems they fixed in Sunspot 2.0. We should look at that.


Ah good to know, thanks for elaborating!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: