guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Inverted index to accelerate guix package search


From: Arun Isaac
Subject: Re: Inverted index to accelerate guix package search
Date: Fri, 17 Jan 2020 00:36:37 +0530

Pierre Neidhardt <address@hidden> writes:

> By the way, what about using Xapian in Guix?

I looked up xapian's features at https://xapian.org/features and it is
quite impressive. I was introduced to xapian through notmuch. notmuch
does not utilize xapian to the fullest and I therefore ended up
underestimating its value. Of particular importance might be the
following.

- Relevance feedback - given one or more documents, Xapian can suggest
  the most relevant index terms to expand a query, suggest related
  documents, categorise documents, etc.
- Phrase and proximity searching - users can search for words occurring
  in an exact phrase or within a specified number of words, either in a
  specified order, or in any order.
- Supports stemming of search terms (e.g. a search for "football" would
  match documents which mention "footballs" or "footballer")

I think these features would really help in Pierre's work trying to
improve search and discoverability on Guix. If we are planning to have a
"Software Center" like interface at some point in the future, xapian's
search could come in handy.

Not directly related to Guix, but I also wonder if info manuals would be
a lot more useful if they had good full text search using xapian.

For the time being, since we don't have xapian bindings, I think we
should settle for sqlite's full text search capabilities.

https://www.sqlite.org/fts5.html

I have attached a short proof of concept script for an sqlite based
search. Speedup is around 200x, and populating the database only takes
around 2.5 seconds. Here is a sample run.

Sqlite database populated in 2.5516340732574463 seconds
Brute force search took 0.11850595474243164 seconds
Sqlite search took 5.459785461425781e-4 seconds

Attachment: sqlite-search.scm
Description: Text document

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]