Jump to page: 1 2 3
Thread overview
Fixing dub search
Dec 28, 2020
aberba
Dec 28, 2020
Imperatorn
Dec 28, 2020
aberba
Dec 29, 2020
Imperatorn
Dec 28, 2020
sarn
Dec 29, 2020
aberba
Dec 29, 2020
sarn
Dec 31, 2020
James Blachly
Jan 01, 2021
sarn
Dec 29, 2020
bachmeier
Dec 29, 2020
bachmeier
Dec 29, 2020
aberba
Dec 29, 2020
aberba
December 28, 2020
Current dub registry search is inaccurate because it uses the built-in MongoDB search which isn't designed for accurate search.

To fix this, a real search engine is needed. ElasticSearch is an overkill for what we need... basic accurate string search.

Solution: use MeiliSearch. It's very lightweight and fast (1GB vps is more than enough). Very easy to use... just a REST API call. I already have a package for meilisearch on dub.


What's needed: a hosted running instance of MeiliSearch for use in dub search. Since only the search functionality needs to be fixed, MeiliSearch will index a copy off all packages and re-index when they chang. The MeiliSearch index will handle search queries whilst MongoDB continues to handle everything else.


I can make a PR for the MeiliSearch integration but I need to know foundation is willing to host a MeiliSearch instance for that.
December 28, 2020
On Monday, 28 December 2020 at 10:49:58 UTC, aberba wrote:
> Current dub registry search is inaccurate because it uses the built-in MongoDB search which isn't designed for accurate search.
>
> [...]

https://github.com/dlang/dub-registry/pull/481
December 28, 2020
On Monday, 28 December 2020 at 18:55:09 UTC, Imperatorn wrote:
> On Monday, 28 December 2020 at 10:49:58 UTC, aberba wrote:
>> Current dub registry search is inaccurate because it uses the built-in MongoDB search which isn't designed for accurate search.
>>
>> [...]
>
> https://github.com/dlang/dub-registry/pull/481

I've sent him an email about using MeiliSearch instead of a hack
December 28, 2020
On Monday, 28 December 2020 at 10:49:58 UTC, aberba wrote:
> To fix this, a real search engine is needed. ElasticSearch is an overkill for what we need... basic accurate string search.
>
> Solution: use MeiliSearch. It's very lightweight and fast (1GB vps is more than enough). Very easy to use... just a REST API call. I already have a package for meilisearch on dub.

ElasticSearch also has a simple REST API and would do this job on whatever hardware we'd realistically use.  I'm not a huge ES fan, personally, but do you have more reasons to dismiss it as overkill and recommend MeiliSearch instead?

The best place for discussion is here, though:
https://github.com/dlang/dub-registry/issues/93

But I have to say something again: please, please, please, I beg, consider using an embedded search tool before adding an external server (or, worse, an external SaaS) as a runtime dependency to the dub registry.  There are only a few thousand packages, and they don't update much.  Even grepping the whole dataset every request would be fast enough (just not featureful enough).
December 29, 2020
On Monday, 28 December 2020 at 20:49:12 UTC, sarn wrote:
> On Monday, 28 December 2020 at 10:49:58 UTC, aberba wrote:
>> To fix this, a real search engine is needed. ElasticSearch is an overkill for what we need... basic accurate string search.
>>
>> Solution: use MeiliSearch. It's very lightweight and fast (1GB vps is more than enough). Very easy to use... just a REST API call. I already have a package for meilisearch on dub.
>
> ElasticSearch also has a simple REST API and would do this job on whatever hardware we'd realistically use.  I'm not a huge ES fan, personally, but do you have more reasons to dismiss it as overkill and recommend MeiliSearch instead?
>
> The best place for discussion is here, though:
> https://github.com/dlang/dub-registry/issues/93
>
> But I have to say something again: please, please, please, I beg, consider using an embedded search tool before adding an external server (or, worse, an external SaaS) as a runtime dependency to the dub registry.  There are only a few thousand packages, and they don't update much.  Even grepping the whole dataset every request would be fast enough (just not featureful enough).

If you've looked at the very discussion you referenced, you'd realize they went around and still came back to using mongodb for search.

Not only is elasticsearch built in Java, hence bloatware, it's also designed to do more than just search... hence more bloatware and overkill for just basic search. You may compare the size of elastic with meilisearch which is just a small binary. MeiliSearch can run on very little ram...

The accuracy you'd get from a search engine just isn't possible with brute-force and hacks. Search is a complex problem involving stemming, plurals, synonyms, step words, ranking, etc. You'd want to use a real search engine.

And between elasticsearch and MeiliSearch, MeiliSearch is simpler, lightweight and easy to use.

December 29, 2020
On Monday, 28 December 2020 at 19:15:33 UTC, aberba wrote:
> On Monday, 28 December 2020 at 18:55:09 UTC, Imperatorn wrote:
>> On Monday, 28 December 2020 at 10:49:58 UTC, aberba wrote:
>>> Current dub registry search is inaccurate because it uses the built-in MongoDB search which isn't designed for accurate search.
>>>
>>> [...]
>>
>> https://github.com/dlang/dub-registry/pull/481
>
> I've sent him an email about using MeiliSearch instead of a hack

👍
December 29, 2020
On Monday, 28 December 2020 at 10:49:58 UTC, aberba wrote:
> I can make a PR for the MeiliSearch integration but I need to know foundation is willing to host a MeiliSearch instance for that.

It is written in Rust...

But seriously, in-memory-search is easy to implement, so it would look better if it is done in D.

An alternative is to use an existing online indexing service, probably cheaper and more scalable than setting up a dedicated service yourself.

December 29, 2020
On Tuesday, 29 December 2020 at 11:47:11 UTC, Ola Fosheim Grøstad wrote:
> But seriously, in-memory-search is easy to implement, so it would look better if it is done in D.

You could just use a trie for the tokens and implement Levenshtein-Damerau fuzzy matching on that. That is a fun exercise to do. The next fun exercise is to abstract it in a way that fits into Phobos!

(Fun fact: I've just read a bunch of suggestions for how to do this as I am spending my holiday grading exams in text search... :-P Ok, not so fun...)

December 29, 2020
On Tuesday, 29 December 2020 at 11:47:11 UTC, Ola Fosheim Grøstad wrote:
> On Monday, 28 December 2020 at 10:49:58 UTC, aberba wrote:
>> I can make a PR for the MeiliSearch integration but I need to know foundation is willing to host a MeiliSearch instance for that.
>
> It is written in Rust...
>
> But seriously, in-memory-search is easy to implement, so it would look better if it is done in D.
>
> An alternative is to use an existing online indexing service, probably cheaper and more scalable than setting up a dedicated service yourself.

Read the previous GitHub discussion. They've gone through that route.

Any PaaS cost more than IaaS. If cost isn't an issue then we can go with that too.

But since the registry is hosted, it's quite straightforward to do ./meilisearch --master-key PRIVATE_KEY and be done with.


December 29, 2020
On Tuesday, 29 December 2020 at 11:47:11 UTC, Ola Fosheim Grøstad wrote:

> It is written in Rust...

If anyone has one written in D too, we can use that as well. I just want to have the embarrassing search fixed.



« First   ‹ Prev
1 2 3