In my journey to learn Rust I sought for a person criticize my Rust skills, so I searched for a 'good first issue' in GitHub, by writing a PR in Rust I could fullfil two dreams of mine
1. Learn Rust.
2. Having a code that I wrote run on large amount of devices across the world.
Eventually I ended up on MeiliSearch repo, I fixed an interesting bug, and I must say that the maintainers were super nice across all the process, a couple of months after my contribution they sent hand written letters and a bunch of stickers to all of the project contributors, one of the nicest interactions I ever had on the internet (ironically the first PR that I wrote that got accepted involved one line of CSS, which is field I'm proficient at).
I have wanted to play around with rust literally for years and just haven’t since I’ve been intimidated that the compiler is really strict (this after 20 years of programming). Just need to sit down and get on with it. It’s going to be an incredible addon to the toolbox of skills, especially with elixir, my usual goto these days.
The compiler (or in reality the borrow checker) is strict but once the program does compile, it usually is more defect free (even functionally) than one written in other languages. After a few months writing Rust, I recently had to go back and use Javascript for another project (I know drastically different worlds) and my own code gave me micro panic attacks thinking about all the ways that things can go wrong in it (I guess I'll probably end up using typescript instead in future).
Overall well worth the time spent learning Rust as I feel it makes me a better programmer overall enforcing the thinking about lifetimes, return values, shared data and thread safety.
The compiler isn't so much strict in a pedantic sense. It more nudges you to avoid bugs, in a "you may want to rethink this" way. It's actually quite nice to have someone point out potential bugs in your code.
I think people make the Rust compiler out to be much more scary than it is. Sure, it's much more strict than say Python, but compared to strongly typed languages like C++ or C# it's not that different in 99% of your code. People just talk about that other 1% a lot.
Their team is fairly responsive to bugs but I had one negative experience when trying to help them fix their instantsearch lib. They were grabbing as many pages as you had set for max pages at once and would re query it on pagination - huge waste of data transfer. They refused to see the problem so I just did a private fork just to get it working but far as I know that’s still a bug.
I need to upgrade the engine itself but looks like they added the ability to upgrade and not lose all the data. That was frustrating but understandable.
Hi dawnerd! Sorry to hear that, do you have the issue link so that we can take a look at it?
Based on the informations I can read here, I think it comes from the fact that the engine is not able to give an exhaustive finite number of records matching the query for reasons of response time. A finite pagination style (with number of pages) on the client-side is for now a pure work-around.
From what I understand, some of our users try to use MeiliSearch as a primary datastore or expect a classic finite pagination coming from a SQL database env, when we are here to solve search relevancy problems.
Ideally the search results should be relevant enough so that end-users don't have to click on another page selector button, that's why we advocate to integrate a pagination without number selection. Infinite scroll style or prev/next.
Yeah having a cap on the number of results is fine. Problem is when it queries for every item at once. I’ve tested on large datasets and my patched version of instantsearch has no performance problems over 100 pages w/30 items per page. Every time you clicked next page it would request maxPages * perPage but start from index 0.
As seen previously on another post about MeiliSearch after reading an extensive comparison in [0], I'm sorry but I'm not convinced with it yet as it is extremely limited and immature.
The only argument here that is being made here is that it is 'written in Rust'.
Just use something production ready like Typesense. [0]
Hi, rvz! (Product team member of MeiliSearch here). This comparative table is not accurate and contains wrong pieces of information about MeiliSearch.
However, today TypeSense indeed has more features than MeiliSearch. After a long time of refactoring the engine's source code, we now have a solid base to welcome new features/improvements, and we hope to evolve quickly to solve many more search use-cases.
For Q3, we plan to add two new features: sort by and geo-search. The geo-search will come out as a first iteration allowing to sort documents around a geographical point and filter documents within a circle. We will also further improve the indexing speed (again yes, because we can do better) and provide two new formats for data indexing (csv and ndjson).
For Q4, we plan to add high availability and solve the multi-tenancy use-case.
That is just a preview of the upcoming features we are already working on.
The end of the year will be rich in evolution for MeiliSearch. We are looking forward to seeing you enjoy using MeiliSearch one day!
Hey Meili team, I work on Typesense and I’m the one who put that comparison matrix together. My intention was to provide as much factual information as possible based on my reading of each search engine’s documentation. This is one reason I stuck to a feature by feature comparison rather than an opinion based comparison for this particular page.
So if you see anything that’s wrong in the matrix, I apologize. Please do let me know which items are wrong and I would love to correct them.
Hi jabo! Thank you very much. That is very kind of you. It's always difficult to establish this kind of matrix when you can't necessarily know all the technical details and features with that level of details you wanted to emphasize. I'll get back to you with the list of points we think were misunderstood in a couple of days! Which medium do you prefer?
Your plans seem spot on for our needs, especially regarding geo search! HA and multi tenancy are a must-have for our use case, though. Would it be possible to have a single large highly available index with tenants of wildly different sizes?
I will surely keep a close eye on your developments. Thank you!
"Would it be possible to have a single large highly available index with tenants of wildly different sizes?"
Yes, we can imagine API keys allowing each consumer to have access to a certain number of documents in an index. When querying the index, the internal filters allowing to select the documents accessible by this API key are automatically inferred and added to the initial query of the consumer. In short, it's a bit like a WHERE clause that would be fixed at each query.
I'm not sure if it answer your question, so don't hesitate to reach me!
I'm available to discuss your use cases from our community slack or by email :)
If you're in the market for lightweight but fast search engines, I would recommend you take a look to typesense [1], instead; or even sonic [2], if it fits your use case. MeiliSearch does not give you anything on top of them (i.e. neither as feature complete as [1], not as fast as [2]).
And I personally stopped using them after a really bad experience I had with their "developers". They don't really care about you and it shows, also, they were kind of rude when I reported some bugs to them.
I moved to typesense and it's a whole different world, their creators truly enjoy that you're using their product; same thing with sonic, Valerian is the kind of hacker you'd want as a friend, super talented, super easy going, you could ask a completely dumb question on their GH and he takes the time to explain things to you at length. I know its open source, I know I didn't pay a dime, but for me, that kind of attitude makes it or break it. Plus, you actually get a superior product.
Thank you for your kind words. Made my day. Typesense is 100% bootstrapped, and a labor of love[0]. We will certainly do our best to keep making it better.
One of my favorite parts of working on Typesense is the opportunity to interact with so many developers from around the world, getting to know about the product and domain they are working on, their tech stacks and how Typesense fits into their world. I find these interactions helpful in enriching my own world view and helps me build valuable context as we design new features. I’ve sometimes been blown away by how the foundational construct of a fast and distributed search engine, is being used for use cases I could not have even imagined!
We mainly haven’t invested time in this because we’ve heard that some folks were able to get the Linux binary working on Windows using WSL. Does that work?
Hello, as CEO of MeiliSearch, I'm really sorry from the whole team if we did not satisfy you when solving one of your bugs. I don't know which bug exactly you are referring to, but in any case, we try to answer our users and contributors with the maximum of transparency and love.
Moreover, we are certainly, on some features, maybe a little bit late. Delay that we will more than compensate before the end of the year. Our priority until now has been to offer a robust search engine accessible to all. For us, the developer experience is really important, whether it is in the use of the API or in the communication with the community.
We will continue to try to do our best for the community. If you want to help us to improve, I would be happy to take your feedback.
You chose to respond with kindness and humility here when you could've been really hostile and defensive. Really nice to see. You've made me like MeiliSearch even more!
All CEOs are contrite when they respond to being publicly shamed on Hacker News. It’s somewhat of a Hacker News trope at this point. It’s amusing, but it doesn’t mean anything.
I’ve commented on both Stripe and DigitalOcean’s terrible support, under different accounts, and had some XO give the “we’re so sorry, email me directly and I’ll personally look into it, our customers are really important to us” tripe.
I've never used or looked at Typesense (I've been perfectly happy as a Meilisearch user), but your characterisation of interacting with Meilisearch is so alien it makes me wonder if we've been looking at the same project.
Across the assortment of Meilisearch repositories, I've raised two PRs (one accepted, one rejected), five issues, one feature request and pinged one issue for an update.
Every single time the Meilisearch team has been responsive, communicative and generally a delight to interact with - there are very few projects I would consider better.
I can back this completely. I'm working on a search engine for government transparency records with an NGO and Typesense really solved most of our problems with MeiliSearch, and our experiments with Sonic have been pretty good too.
How well do any of these alternatives work for doing an automatic "More like this" list? I implemented this in Elasticsearch for a client (although it's been so long that I don't recall the specifics of how it works) and as much as I'd like to move away from Java stuff if possible, it'd be a non-starter if I can't replicate that in the new system.
Debug info size depends on the language and the compiler. Binary packages installed via package managers are also typically stripped. There are too many confounding variables, simply running `strip foo` before comparing evens the playing field.
It is great to put a concept in place. For more advanced use (mainly index and search features) I was also evaluating TypeSense which didn't win me over as a product. I have not tried Algolia because of perception that it is heavier and paid from get go.
Can you really use the number of lines of code in a comparison of MeiliSearch and ElasticSearch, or Sphinx-Search?
Arguably I'm not the biggest fan of ElasticSearch, it's a way too complex to manage and interact with, if you just need to add search to a product. However, ElasticSearch i also much more than just a search engine. I would never use Bleve or Sphinx as a primary data store, but ElasticSearch is a perfectly good document database.
I would think it’s just a rough indicator of complexity. Elasticsearch has a lot of features, probably many more than MeiliSearch. This can be good or bad, depending on what you’re looking for.
If comparing the same language, maybe. But the amount of boilerplate when comparing a C/C++ to rust to Java, it doesn’t work. Even then, some teams might prefer to use more dependencies and others less.
In my experience, it would be possible to use it as a document database, and I suppose it would be good in the interest of reducing duplication issues if you were initially storing the documents in a traditional DBMS or file system.
However, that's not really what it was made for. Especially early on when you're planning out your schema and such, dropping and re-indexing your documents is a really simple task. If the index itself is your primary document store, what are you indexing from? Would you have a DBMS or file system as your secondary store in that case? That just seems so awkward and backwards.
Keep the square pegs in the square holes and use Elastic (and the alternatives discussed in this thread) as a search index.
It's ridiculously easy to use and has faceted search for my needs. However, there are some limitations so I have to use it in combination with redis, but the developers have a roadmap to fix these problems.
Synchronising with MeiliSearch is a bit of an effort because of the following limitations:
* When filtering by facet, it doesn't provide count for disjunctive facets
* No sort by
* No where clause (less than 50 for example)
To overcome these problems, I rebuild some parts of the database in redis, use code for filtering and query MeiliSearch multiple times for different facet counts.
Both redis and MeiliSearch are ridiculously fast so the performance loss is negligible, but it makes my code quite complex. As soon as the developers add these missing features, I want to simplify my code and only use redis for query caching. Typesense had some of these limitations too, but I'm not sure if that's still the case.
Hi! MeiliSearch product team here! It's super cool to see your feeback!
Concerning the disjunctive count of the facets, we are thinking about it. It is feasible on the client side by making several requests but we are aware that is it not ideal at all from a developer experience point of view. We are still thinking about the best way to solve that case in one of our future iterations!
The sort feature is coming in v0.22 (string and numeric fields) you will be able to easily configure the balance between exhaustivity and relevancy at index level through the positioning of the ranking rules.
I'm not sure I understand the where clause point so I'd love to hear more details!
Thanks for using us and giving us this kind of feedback :)
Hello, MeiliSearch team here :)
Please, do not hesitate to leave an issue on the Golang repository so we can improve it!
Also, indexing time will be much better with v0.21 planned to be released in a couple of days / weeks. You can test the RC in the meanwhile
I'm also using it for a plant species search on hedira.io, it's been great for the past 6 months or so, even for a more complex faceted search setup. I switched from Algolia (which was easy due to instantsearch integration) and have no regrets.
Yes. Small private project. It's quite fast. It's query interface is REST+JSON and now has an OpenAPIv3 spec; that said some of the query syntax is embedded in strings, so there you are still on your own.
I found the default order of results a bit off. Near-matches were positioned over exact matches.
I'm looking for a fulltext typo-tolerant search tool that integrates well Hasura+PG.
Also interested in the Hasura+PG options. Have you found anything interesting so far? At the moment I’m stringing together a few like clauses, which mostly does for my needs.
I've ran into the same "strip tags" issue. Having used ES before, that does sanitizing and stripping for you, at first I was dissapointed.
However, after thinking about it more, I wrote up this issue[0] with some ideas and thoughts so I could implement it as PR or work around it.
I ended up working around it, because that makes most sense: separation of concerns: meilisearch should indeed not get involved in stripping or fixing HTML as that i) ties Meili to HTML, ii) requires configuration and complexity to allow control and iii) adds features that become security-critical.
Indeed, my solution is to sanitize, clean and strip HTML before sending into the index.
I couldn't find it from a quick search... Do you know if this tool supports non English languages (specifically greek)? Also why idea if it also supports stemming for these i.e I would like to search for σκύλος (=dog) and get documents having σκύλων (=dogs).
Not sure, but I think they are taking about the recent licensing changes of the elasticsearch server [0] and restricting elasticsearch clients to only be compatible with elasticsearch, not any forks [1]
The workaround they recommend is to duplicate your index with all those characters removed and then strip out those characters from your search queries :/
I haven't tried MeiliSearch, but I spent a little bit of time this morning looking at the code. Maybe off topic, but Rust really is a nice language to read. I wanted to learn another non-Lisp language, and after a few evenings of playing with Rust, I settled on Swift for a few small side projects. I slightly regret that decision, but both languages fill the same application space for me.
As we don’t use RocksDB but LMDB, we use a lot less real memory than key-value stores that uses a user-side cache system. LMDB is memory mapped and therefore let the OS manage memory for it. Typesense uses RocksDB and ElasticSearch a custom key-value store, used by Lucene internally.
The real advantage of LMDB is that it is a BTree, key-values are ordered and do not need any computing when retrieved which is not the case of a LSM-Tree key-value store like RocksDB that needs to merge/compact pages of key-values pairs before being able to return it too you. Wasting CPU when the search engine must use its CPU to do union/intersection…
Another advantage of LMDB is that it returns a view into the DB itself of the entries, RocksDB can’t as it must do operations on the entries before returning them to the library user, for example: decompressing or compacting the values.
LMDB and RockDB are both great projects, but how much performance you can get out of them would depend on the larger architecture of the system they are being integrated into. Both projects provide tens of flags to adjust read/write amplification to customize them for a particular use case. You will often see public benchmarks of these systems being updated with suggestions from both sides on less-than-optimal configurations being used!
In the case of Typesense, RocksDB is not even a top-10 contributor to the overall latency involved in serving the result. In any case, it would be good to clarify a few things:
> As we don’t use RocksDB but LMDB, we use a lot less real memory than key-value stores that uses a user-side cache system.
Typesense stores only the raw-data in RocksDB. All indexing data structures for filtering, faceting etc. are compact in-memory data structures stored outside. The only fixed memory cost from RocksDB is an in-memory table that is used to buffer writes (see the next point) before being flushed to disk. In practice, this is a trivial percentage of memory used when compared to other data structures.
> LSM-Tree key-value store like RocksDB that needs to merge/compact pages of key-values pairs before being able to return it too you
This happens in-memory and is flushed to the disk in batches. Merging of on-disk SST files happens in the background with no real impact on reads. The advantage of this approach though is that it gets you really good batched write throughput [0] (the above caveat on the difficulty of benchmarking applies).
In summary, like all systems, choosing a storage system involves many trade-offs and what really matters is what works best for your architecture.
A previous version of MeiliSearch was using RocksDB and we were having a lot of trouble using it, a lot of setup to do to make sure that we were not killed by the OS due to OOM or even fixing a lot of strange segfault by patching the RocksDB library itself...
RocksDB doesn't support transaction but views in the database, which means that if you are indexing, writing into the database and that any event makes your program to stop unexpectedly, you can't just start your program and use the data like this as it could be corrupted.
This is why at MeiliSearch we prefer using LMDB, even in case of an unexpected crash, a reboot is instant and valid, you just need to restart the indexing you were previously doing and can serve requests to the users with the previous version of the database.
Also as you can see [0], the benchmarks between LMDB and RocksDB is very clear. I understand that it is maybe not reading the database that takes time on your side but it is on our side, combined with the set operations between sets of internal documents ids.
I see. Perhaps, because we're using the native CPP client we faced no such crash issues with RocksDB. We have also handled transactions at a higher layer.
MeiliSearch looks fantastic! I haven't tried it but at least it is written in Rust so that should be a good reason to try it out for a project of mine.
Eventually I ended up on MeiliSearch repo, I fixed an interesting bug, and I must say that the maintainers were super nice across all the process, a couple of months after my contribution they sent hand written letters and a bunch of stickers to all of the project contributors, one of the nicest interactions I ever had on the internet (ironically the first PR that I wrote that got accepted involved one line of CSS, which is field I'm proficient at).