MeiliSearch: A Minimalist Full-Text Search Engine

null_deref · on Aug 15, 2021

In my journey to learn Rust I sought for a person criticize my Rust skills, so I searched for a 'good first issue' in GitHub, by writing a PR in Rust I could fullfil two dreams of mine 1. Learn Rust. 2. Having a code that I wrote run on large amount of devices across the world.

Eventually I ended up on MeiliSearch repo, I fixed an interesting bug, and I must say that the maintainers were super nice across all the process, a couple of months after my contribution they sent hand written letters and a bunch of stickers to all of the project contributors, one of the nicest interactions I ever had on the internet (ironically the first PR that I wrote that got accepted involved one line of CSS, which is field I'm proficient at).

atonse · on Aug 15, 2021

I have wanted to play around with rust literally for years and just haven’t since I’ve been intimidated that the compiler is really strict (this after 20 years of programming). Just need to sit down and get on with it. It’s going to be an incredible addon to the toolbox of skills, especially with elixir, my usual goto these days.

option_greek · on Aug 15, 2021

The compiler (or in reality the borrow checker) is strict but once the program does compile, it usually is more defect free (even functionally) than one written in other languages. After a few months writing Rust, I recently had to go back and use Javascript for another project (I know drastically different worlds) and my own code gave me micro panic attacks thinking about all the ways that things can go wrong in it (I guess I'll probably end up using typescript instead in future).

Overall well worth the time spent learning Rust as I feel it makes me a better programmer overall enforcing the thinking about lifetimes, return values, shared data and thread safety.

stavros · on Aug 15, 2021

The compiler isn't so much strict in a pedantic sense. It more nudges you to avoid bugs, in a "you may want to rethink this" way. It's actually quite nice to have someone point out potential bugs in your code.

wongarsu · on Aug 15, 2021

I think people make the Rust compiler out to be much more scary than it is. Sure, it's much more strict than say Python, but compared to strongly typed languages like C++ or C# it's not that different in 99% of your code. People just talk about that other 1% a lot.

dawnerd · on Aug 15, 2021

I’m using it in production for https://opencoaster.com (very wip site). It’s fast.

Their team is fairly responsive to bugs but I had one negative experience when trying to help them fix their instantsearch lib. They were grabbing as many pages as you had set for max pages at once and would re query it on pagination - huge waste of data transfer. They refused to see the problem so I just did a private fork just to get it working but far as I know that’s still a bug.

I need to upgrade the engine itself but looks like they added the ability to upgrade and not lose all the data. That was frustrating but understandable.

Overall I’m very impressed how stable it is

ledoublegui · on Aug 15, 2021

Hi dawnerd! Sorry to hear that, do you have the issue link so that we can take a look at it?

Based on the informations I can read here, I think it comes from the fact that the engine is not able to give an exhaustive finite number of records matching the query for reasons of response time. A finite pagination style (with number of pages) on the client-side is for now a pure work-around.

From what I understand, some of our users try to use MeiliSearch as a primary datastore or expect a classic finite pagination coming from a SQL database env, when we are here to solve search relevancy problems.

Ideally the search results should be relevant enough so that end-users don't have to click on another page selector button, that's why we advocate to integrate a pagination without number selection. Infinite scroll style or prev/next.

Happy to discuss this further with more context!

Thank you for your feedback :)

dawnerd · on Aug 15, 2021

Yeah having a cap on the number of results is fine. Problem is when it queries for every item at once. I’ve tested on large datasets and my patched version of instantsearch has no performance problems over 100 pages w/30 items per page. Every time you clicked next page it would request maxPages * perPage but start from index 0.

Im not using as a primary data store.

https://github.com/meilisearch/instant-meilisearch/issues/18

ledoublegui · on Aug 15, 2021

Thanks for your answer dawnerd. I will take a look at it with fresh eyes, we may have missed something :)

rvz · on Aug 15, 2021

As seen previously on another post about MeiliSearch after reading an extensive comparison in [0], I'm sorry but I'm not convinced with it yet as it is extremely limited and immature.

The only argument here that is being made here is that it is 'written in Rust'.

Just use something production ready like Typesense. [0]

[0] https://typesense.org/typesense-vs-algolia-vs-elasticsearch-...

ledoublegui · on Aug 15, 2021

Hi, rvz! (Product team member of MeiliSearch here). This comparative table is not accurate and contains wrong pieces of information about MeiliSearch.

However, today TypeSense indeed has more features than MeiliSearch. After a long time of refactoring the engine's source code, we now have a solid base to welcome new features/improvements, and we hope to evolve quickly to solve many more search use-cases.

For Q3, we plan to add two new features: sort by and geo-search. The geo-search will come out as a first iteration allowing to sort documents around a geographical point and filter documents within a circle. We will also further improve the indexing speed (again yes, because we can do better) and provide two new formats for data indexing (csv and ndjson).

For Q4, we plan to add high availability and solve the multi-tenancy use-case.

That is just a preview of the upcoming features we are already working on.

The end of the year will be rich in evolution for MeiliSearch. We are looking forward to seeing you enjoy using MeiliSearch one day!

jabo · on Aug 15, 2021

Hey Meili team, I work on Typesense and I’m the one who put that comparison matrix together. My intention was to provide as much factual information as possible based on my reading of each search engine’s documentation. This is one reason I stuck to a feature by feature comparison rather than an opinion based comparison for this particular page.

So if you see anything that’s wrong in the matrix, I apologize. Please do let me know which items are wrong and I would love to correct them.

ledoublegui · on Aug 15, 2021

Hi jabo! Thank you very much. That is very kind of you. It's always difficult to establish this kind of matrix when you can't necessarily know all the technical details and features with that level of details you wanted to emphasize. I'll get back to you with the list of points we think were misunderstood in a couple of days! Which medium do you prefer?

jabo · on Aug 15, 2021

Sounds good! Email would be great: jasonb at typesense d0t org

clon · on Aug 15, 2021

Your plans seem spot on for our needs, especially regarding geo search! HA and multi tenancy are a must-have for our use case, though. Would it be possible to have a single large highly available index with tenants of wildly different sizes?

I will surely keep a close eye on your developments. Thank you!

ledoublegui · on Aug 16, 2021

Hi, clon!

"Would it be possible to have a single large highly available index with tenants of wildly different sizes?"

Yes, we can imagine API keys allowing each consumer to have access to a certain number of documents in an index. When querying the index, the internal filters allowing to select the documents accessible by this API key are automatically inferred and added to the initial query of the consumer. In short, it's a bit like a WHERE clause that would be fixed at each query.

I'm not sure if it answer your question, so don't hesitate to reach me!

I'm available to discuss your use cases from our community slack or by email :)

clon · on Aug 16, 2021

Thank you! That indeed is what I was hoping to hear.

philmcp · on Aug 15, 2021

MeiliSearch also work a 4 day week which is pretty cool

https://4dayweek.io/company/meilisearch/jobs

ilrwbwrkhv · on Aug 15, 2021

This site is a goldmine.

Finally companies which work in a far more human way.

moralestapia · on Aug 15, 2021

If you're in the market for lightweight but fast search engines, I would recommend you take a look to typesense [1], instead; or even sonic [2], if it fits your use case. MeiliSearch does not give you anything on top of them (i.e. neither as feature complete as [1], not as fast as [2]).

And I personally stopped using them after a really bad experience I had with their "developers". They don't really care about you and it shows, also, they were kind of rude when I reported some bugs to them.

I moved to typesense and it's a whole different world, their creators truly enjoy that you're using their product; same thing with sonic, Valerian is the kind of hacker you'd want as a friend, super talented, super easy going, you could ask a completely dumb question on their GH and he takes the time to explain things to you at length. I know its open source, I know I didn't pay a dime, but for me, that kind of attitude makes it or break it. Plus, you actually get a superior product.

1: https://typesense.org/

2: https://github.com/valeriansaliou/sonic

karterk · on Aug 15, 2021

Thank you for your kind words. Made my day. Typesense is 100% bootstrapped, and a labor of love[0]. We will certainly do our best to keep making it better.

[0]: https://typesense.org/blog/the-unreasonable-effectiveness-of...

jabo · on Aug 15, 2021

Echoing what @karterk said.

One of my favorite parts of working on Typesense is the opportunity to interact with so many developers from around the world, getting to know about the product and domain they are working on, their tech stacks and how Typesense fits into their world. I find these interactions helpful in enriching my own world view and helps me build valuable context as we design new features. I’ve sometimes been blown away by how the foundational construct of a fast and distributed search engine, is being used for use cases I could not have even imagined!

ternaryoperator · on Aug 15, 2021

This looks very nice. Do you foresee a downloadable version for Windows like the Mac and Linux versions (i.e., not as a Docker container)?

jabo · on Aug 15, 2021

We mainly haven’t invested time in this because we’ve heard that some folks were able to get the Linux binary working on Windows using WSL. Does that work?

qdequelen · on Aug 15, 2021

Hello, as CEO of MeiliSearch, I'm really sorry from the whole team if we did not satisfy you when solving one of your bugs. I don't know which bug exactly you are referring to, but in any case, we try to answer our users and contributors with the maximum of transparency and love.

Moreover, we are certainly, on some features, maybe a little bit late. Delay that we will more than compensate before the end of the year. Our priority until now has been to offer a robust search engine accessible to all. For us, the developer experience is really important, whether it is in the use of the API or in the communication with the community.

We will continue to try to do our best for the community. If you want to help us to improve, I would be happy to take your feedback.

aarondf · on Aug 15, 2021

You chose to respond with kindness and humility here when you could've been really hostile and defensive. Really nice to see. You've made me like MeiliSearch even more!

Keep up the good work.

wibblewobble123 · on Aug 16, 2021

All CEOs are contrite when they respond to being publicly shamed on Hacker News. It’s somewhat of a Hacker News trope at this point. It’s amusing, but it doesn’t mean anything.

I’ve commented on both Stripe and DigitalOcean’s terrible support, under different accounts, and had some XO give the “we’re so sorry, email me directly and I’ll personally look into it, our customers are really important to us” tripe.

There is no good work here. Only platitudes.

rrjanbiah · on Aug 16, 2021

A few days ago, I shared a couple of suggestions for Meili. I'm not sure if you received them. You may DM me at @rrjanbiah

aidanhs · on Aug 15, 2021

I've never used or looked at Typesense (I've been perfectly happy as a Meilisearch user), but your characterisation of interacting with Meilisearch is so alien it makes me wonder if we've been looking at the same project.

Across the assortment of Meilisearch repositories, I've raised two PRs (one accepted, one rejected), five issues, one feature request and pinged one issue for an update.

Every single time the Meilisearch team has been responsive, communicative and generally a delight to interact with - there are very few projects I would consider better.

Just thought I'd throw in my experience.

agucova · on Aug 15, 2021

I can back this completely. I'm working on a search engine for government transparency records with an NGO and Typesense really solved most of our problems with MeiliSearch, and our experiments with Sonic have been pretty good too.

moralestapia · on Aug 15, 2021

>[...] a search engine for government transparency records with an NGO

What? Are you me? Haha.

Check your email, hermano!

rrjanbiah · on Aug 16, 2021

I'm not from Meili, but I like/care about it. Meili team sent me stickers (to India!) for opening 1 or 2 issues.

Can you please DM me your concerns @rrjanbiah? I'll try to coordinate with the Meili team and sort them out.

Cyberdog · on Aug 15, 2021

How well do any of these alternatives work for doing an automatic "More like this" list? I implemented this in Elasticsearch for a client (although it's been so long that I don't recall the specifics of how it works) and as much as I'd like to move away from Java stuff if possible, it'd be a non-starter if I can't replicate that in the new system.

stanislavb · on Aug 16, 2021

MeiliSearch vs Typesense https://www.libhunt.com/compare-MeiliSearch-vs-typesense

Rochus · on Aug 15, 2021

> and lives as a 35 MB binary when installed ... it's made up of 7,600 lines of Rust

Wow; how on earth can this blow up to 35 MB? For comparison: the Crossline stand-allone exe (http://software.rochus-keller.info/CrossLine_win32.zip) with built-in https://github.com/rochus-keller/Fts and Sqlite (all written in C/C++) is less than 7 MB. Where do the other ~30 MB come from?

berkes · on Aug 15, 2021

Rust uses static linking (by default). So everything and the kitchen sink is compiled in.

https://stackoverflow.com/a/29008355

ComputerGuru · on Aug 15, 2021

Binaries should also be stripped before comparing file size.

Rochus · on Aug 15, 2021

That might be an explanation for the huge size; unfortunately we don't know whether the author used a stripped version or not.

generalizations · on Aug 15, 2021

Apparently stripped binaries aren't the common experience, so I don't see why they should be used for comparison.

ComputerGuru · on Aug 15, 2021

Debug info size depends on the language and the compiler. Binary packages installed via package managers are also typically stripped. There are too many confounding variables, simply running `strip foo` before comparing evens the playing field.

Rochus · on Aug 15, 2021

The same applies to my example; the exe includes a statically linked version of Qt as well as all the other stuff mentioned. So the question remains.

User23 · on Aug 16, 2021

Tangentially, has there been any fundamental algorithmic improvement in full text search since Boyer-Moore[1] and Knuth-Morris-Pratt[2]?

[1] https://xlinux.nist.gov/dads/HTML/boyermoore.html

[2] https://xlinux.nist.gov/dads/HTML/knuthMorrisPratt.html

freediver · on Aug 15, 2021

MeiliSearch is minimalist, fast and easy to deploy (like few mins to get up and doing its thing). I am using it to power full text search at TinyGem.

https://tinygem.org

It is great to put a concept in place. For more advanced use (mainly index and search features) I was also evaluating TypeSense which didn't win me over as a product. I have not tried Algolia because of perception that it is heavier and paid from get go.

mrweasel · on Aug 15, 2021

Can you really use the number of lines of code in a comparison of MeiliSearch and ElasticSearch, or Sphinx-Search?

Arguably I'm not the biggest fan of ElasticSearch, it's a way too complex to manage and interact with, if you just need to add search to a product. However, ElasticSearch i also much more than just a search engine. I would never use Bleve or Sphinx as a primary data store, but ElasticSearch is a perfectly good document database.

rmetzler · on Aug 15, 2021

I would think it’s just a rough indicator of complexity. Elasticsearch has a lot of features, probably many more than MeiliSearch. This can be good or bad, depending on what you’re looking for.

ComputerGuru · on Aug 15, 2021

If comparing the same language, maybe. But the amount of boilerplate when comparing a C/C++ to rust to Java, it doesn’t work. Even then, some teams might prefer to use more dependencies and others less.

catmanjan · on Aug 15, 2021

>ElasticSearch is a perfectly good document database.

I recently asked about this and people replied that it wasn't fit for this purpose

Cyberdog · on Aug 15, 2021

In my experience, it would be possible to use it as a document database, and I suppose it would be good in the interest of reducing duplication issues if you were initially storing the documents in a traditional DBMS or file system.

However, that's not really what it was made for. Especially early on when you're planning out your schema and such, dropping and re-indexing your documents is a really simple task. If the index itself is your primary document store, what are you indexing from? Would you have a DBMS or file system as your secondary store in that case? That just seems so awkward and backwards.

Keep the square pegs in the square holes and use Elastic (and the alternatives discussed in this thread) as a search index.

mrweasel · on Aug 15, 2021

It might depend on the amount of data. We used it store 150 - 200GB of product data, and at that scale it was completely fine, just hard to manage.

qatanah · on Aug 15, 2021

We are using it on our PoC products. It's really great and fast! it removes all the traction of doing an autocomplete search.

https://correlate.meetglimpse.com/

If you're doing some test products and just want to have a search that is easier to setup than ES. Meilisearch is a great alternative.

vtail · on Aug 15, 2021

What are some of the advantages of using MeiliSearch as opposed to, say, FTS5 in a SQLite database?

cpach · on Aug 15, 2021

Looks very interesting! Anyone here who’s tried MeiliSearch?

adrianvincent · on Aug 15, 2021

Yes, I use it for https://www.comparedial.com/

It's ridiculously easy to use and has faceted search for my needs. However, there are some limitations so I have to use it in combination with redis, but the developers have a roadmap to fix these problems.

axhl · on Aug 15, 2021

What's your underlying data store and how do you find the experience of runningly synchronising this with MeiliSearch?

adrianvincent · on Aug 15, 2021

I use postgres as the data store.

Synchronising with MeiliSearch is a bit of an effort because of the following limitations:

  * When filtering by facet, it doesn't provide count for disjunctive facets
  * No sort by
  * No where clause (less than 50 for example)

To overcome these problems, I rebuild some parts of the database in redis, use code for filtering and query MeiliSearch multiple times for different facet counts.

Both redis and MeiliSearch are ridiculously fast so the performance loss is negligible, but it makes my code quite complex. As soon as the developers add these missing features, I want to simplify my code and only use redis for query caching. Typesense had some of these limitations too, but I'm not sure if that's still the case.

ledoublegui · on Aug 15, 2021

Hi! MeiliSearch product team here! It's super cool to see your feeback!

Concerning the disjunctive count of the facets, we are thinking about it. It is feasible on the client side by making several requests but we are aware that is it not ideal at all from a developer experience point of view. We are still thinking about the best way to solve that case in one of our future iterations!

The sort feature is coming in v0.22 (string and numeric fields) you will be able to easily configure the balance between exhaustivity and relevancy at index level through the positioning of the ranking rules.

I'm not sure I understand the where clause point so I'd love to hear more details!

Thanks for using us and giving us this kind of feedback :)

adrianvincent · on Aug 15, 2021

Thank you for MeiliSearch.

By where clause I mean as in SQL. For example, select results where cost <= 50.

ledoublegui · on Aug 15, 2021

Thanks adrianvincent! Did you see https://docs.meilisearch.com/reference/features/filtering.ht...?

adrianvincent · on Aug 15, 2021

I'll take a look, thanks.

gnur · on Aug 15, 2021

I’ve done some stuff with it. Works pretty well but not perfect.

Insertion times grows linear with index size, up to tens of milliseconds with an index of couple 100k documents.

Go library is very un-go, with not all the options exposed. And had a couple of breaking changes without upgrading major versions.

Other then that, the search part works really well

tpayet · on Aug 15, 2021

Hello, MeiliSearch team here :) Please, do not hesitate to leave an issue on the Golang repository so we can improve it! Also, indexing time will be much better with v0.21 planned to be released in a couple of days / weeks. You can test the RC in the meanwhile

bizzleDawg · on Aug 15, 2021

I'm also using it for a plant species search on hedira.io, it's been great for the past 6 months or so, even for a more complex faceted search setup. I switched from Algolia (which was easy due to instantsearch integration) and have no regrets.

cies · on Aug 15, 2021

Yes. Small private project. It's quite fast. It's query interface is REST+JSON and now has an OpenAPIv3 spec; that said some of the query syntax is embedded in strings, so there you are still on your own.

I found the default order of results a bit off. Near-matches were positioned over exact matches.

I'm looking for a fulltext typo-tolerant search tool that integrates well Hasura+PG.

aidos · on Aug 15, 2021

Also interested in the Hasura+PG options. Have you found anything interesting so far? At the moment I’m stringing together a few like clauses, which mostly does for my needs.

cies · on Aug 16, 2021

Went to have a look and found this...

Use PG's built in full text search capabilities:

https://hasura.io/blog/full-text-search-with-hasura-graphql-...

https://www.lateral.io/resources-blog/full-text-search-in-mi...

Extend those capabilities with pggroonga:

https://github.com/pgroonga/pgroonga

https://pgroonga.github.io/

cies · on Aug 15, 2021

Nope sadly. I want proper full text search with fault tolerance, like meilli and co provide, but it comes down to my own integration.

No nice integrations like hasura-backend-plus and combines hasura with minio/s3 and authentication service.

the_mitsuhiko · on Aug 15, 2021

Yep. I have a small toy app that uses it and I keep monitoring the progess. It’s already very useful.

ushakov · on Aug 15, 2021

I tried adding search to my markdown blog

MeiliSearch doesn’t strip HTML tags and i had to do that manually before adding posts to index

berkes · on Aug 15, 2021

I've ran into the same "strip tags" issue. Having used ES before, that does sanitizing and stripping for you, at first I was dissapointed.

However, after thinking about it more, I wrote up this issue[0] with some ideas and thoughts so I could implement it as PR or work around it.

I ended up working around it, because that makes most sense: separation of concerns: meilisearch should indeed not get involved in stripping or fixing HTML as that i) ties Meili to HTML, ii) requires configuration and complexity to allow control and iii) adds features that become security-critical.

Indeed, my solution is to sanitize, clean and strip HTML before sending into the index.

https://github.com/meilisearch/MeiliSearch/issues/1409

ledoublegui · on Aug 15, 2021

Hi berkes! (Guillaume from the MeiliSearch team here) I'm glad to see you were able to implement a solution for your project ;)

leetrout · on Aug 15, 2021

A reminder about Xapian which the author did not include (it is only a library)

https://xapian.org/

spapas82 · on Aug 15, 2021

I couldn't find it from a quick search... Do you know if this tool supports non English languages (specifically greek)? Also why idea if it also supports stemming for these i.e I would like to search for σκύλος (=dog) and get documents having σκύλων (=dogs).

lloydatkinson · on Aug 15, 2021

Seems like good timing with the ongoing elastic drama

jatins · on Aug 15, 2021

what's the ongoing elastic drama?

AkshitGarg · on Aug 15, 2021

Not sure, but I think they are taking about the recent licensing changes of the elasticsearch server [0] and restricting elasticsearch clients to only be compatible with elasticsearch, not any forks [1]

[0]: https://www.elastic.co/pricing/faq/licensing [1]: https://news.ycombinator.com/item?id=28110610

softinio · on Aug 15, 2021

pretty aweful, this: https://thenewstack.io/this-week-in-programming-the-elastics...

1vuio0pswjnm7 · on Aug 15, 2021

"MEILI_NO_ANALYTICS=1"

Looks like it was designed with the user in mind. Telemetry by default.

wiradikusuma · on Aug 15, 2021

I had been following its development for a while, but then I moved to Typesense after evaluating this matrix:

https://typesense.org/typesense-vs-algolia-vs-elasticsearch-... (yes it's hosted by Typesense)

hitekker · on Aug 15, 2021

Some basic features are missing from that doc. For example, typesense doesn’t handle periods, underscores, dashes and other characters as delimiters:

https://github.com/typesense/typesense/issues/122

https://github.com/typesense/typesense/issues/95Oh

The workaround they recommend is to duplicate your index with all those characters removed and then strip out those characters from your search queries :/

victor106 · on Aug 15, 2021

How do you handle stop words in Typesense?

marcinzm · on Aug 15, 2021

You could remove them from the query text yourself although since they'd still be in the index I suspect misspellings could cause issues.

KaoruAoiShiho · on Aug 15, 2021

According to the matrix it doesn't have support for it.

KaoruAoiShiho · on Aug 15, 2021

Damn this matrix made me want to use Algolia.

yewenjie · on Aug 15, 2021

They have another prototype engine with more advanced features and performance too.

https://github.com/meilisearch/milli

Kerollmops · on Aug 15, 2021

Hey, this no more a prototype, it is the internal engine under MeiliSearch. I forgot to update the README :)

mark_l_watson · on Aug 15, 2021

I haven't tried MeiliSearch, but I spent a little bit of time this morning looking at the code. Maybe off topic, but Rust really is a nice language to read. I wanted to learn another non-Lisp language, and after a few evenings of playing with Rust, I settled on Swift for a few small side projects. I slightly regret that decision, but both languages fill the same application space for me.

softinio · on Aug 15, 2021

I am really impressed with swift. I think its good to have both in your toolbelt if you have the time for it.

xvilka · on Aug 15, 2021

Rust is truly cross-platform, while Swift isn't (though even it was initially announced to be).

Labo333 · on Aug 15, 2021

How are the disk and RAM usage? Compared to elasticsearch and typesense?

It's an information that is typically missing yet very important!

Kerollmops · on Aug 15, 2021

As we don’t use RocksDB but LMDB, we use a lot less real memory than key-value stores that uses a user-side cache system. LMDB is memory mapped and therefore let the OS manage memory for it. Typesense uses RocksDB and ElasticSearch a custom key-value store, used by Lucene internally.

The real advantage of LMDB is that it is a BTree, key-values are ordered and do not need any computing when retrieved which is not the case of a LSM-Tree key-value store like RocksDB that needs to merge/compact pages of key-values pairs before being able to return it too you. Wasting CPU when the search engine must use its CPU to do union/intersection…

Another advantage of LMDB is that it returns a view into the DB itself of the entries, RocksDB can’t as it must do operations on the entries before returning them to the library user, for example: decompressing or compacting the values.

karterk · on Aug 16, 2021

LMDB and RockDB are both great projects, but how much performance you can get out of them would depend on the larger architecture of the system they are being integrated into. Both projects provide tens of flags to adjust read/write amplification to customize them for a particular use case. You will often see public benchmarks of these systems being updated with suggestions from both sides on less-than-optimal configurations being used!

In the case of Typesense, RocksDB is not even a top-10 contributor to the overall latency involved in serving the result. In any case, it would be good to clarify a few things:

> As we don’t use RocksDB but LMDB, we use a lot less real memory than key-value stores that uses a user-side cache system.

Typesense stores only the raw-data in RocksDB. All indexing data structures for filtering, faceting etc. are compact in-memory data structures stored outside. The only fixed memory cost from RocksDB is an in-memory table that is used to buffer writes (see the next point) before being flushed to disk. In practice, this is a trivial percentage of memory used when compared to other data structures.

> LSM-Tree key-value store like RocksDB that needs to merge/compact pages of key-values pairs before being able to return it too you

This happens in-memory and is flushed to the disk in batches. Merging of on-disk SST files happens in the background with no real impact on reads. The advantage of this approach though is that it gets you really good batched write throughput [0] (the above caveat on the difficulty of benchmarking applies).

In summary, like all systems, choosing a storage system involves many trade-offs and what really matters is what works best for your architecture.

[0]: https://dgraph.io/blog/post/badger-lmdb-boltdb/

Kerollmops · on Aug 16, 2021

A previous version of MeiliSearch was using RocksDB and we were having a lot of trouble using it, a lot of setup to do to make sure that we were not killed by the OS due to OOM or even fixing a lot of strange segfault by patching the RocksDB library itself...

RocksDB doesn't support transaction but views in the database, which means that if you are indexing, writing into the database and that any event makes your program to stop unexpectedly, you can't just start your program and use the data like this as it could be corrupted.

This is why at MeiliSearch we prefer using LMDB, even in case of an unexpected crash, a reboot is instant and valid, you just need to restart the indexing you were previously doing and can serve requests to the users with the previous version of the database.

Also as you can see [0], the benchmarks between LMDB and RocksDB is very clear. I understand that it is maybe not reading the database that takes time on your side but it is on our side, combined with the set operations between sets of internal documents ids.

[0]: http://www.lmdb.tech/bench/optanessd/imdt.html

karterk · on Aug 16, 2021

I see. Perhaps, because we're using the native CPP client we faced no such crash issues with RocksDB. We have also handled transactions at a higher layer.

All the best with MeiliSearch!

chairmanwow1 · on Aug 15, 2021

This is a pretty interesting format for a blog post. I'm not sure I've really seen something like this before, but I really enjoyed this one~

mikevm · on Aug 15, 2021

After looking at various alternatives, I'm thinking of trying out https://vespa.ai/

joelp · on Aug 15, 2021

MeiliSearch looks fantastic! I haven't tried it but at least it is written in Rust so that should be a good reason to try it out for a project of mine.

h1fra · on Aug 15, 2021

This smells like a sponsored post...

hdjjhhvvhga · on Aug 15, 2021

So? MeiliSearch is open source, so all the better.

serhack_ · on Aug 15, 2021

It's not mine, but I found it interesting.

berkes · on Aug 15, 2021

Why does it smell like that to you?