guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Cuirass] Missing database indexes?


From: Danny Milosavljevic
Subject: Re: [Cuirass] Missing database indexes?
Date: Tue, 13 Nov 2018 00:27:15 +0100

Hi Björn,

On Mon, 12 Nov 2018 19:50:44 +0100
Björn Höfling <address@hidden> wrote:

> Hm. This code smells ... It looks too complicated.

I was trying to cut down the number of prepared statements in use and prevent a
combinatorial explosion while keeping the kinds of queries we can do open.

Either the value of a parameter is specified, in which case the associated 
column
is filtered for it; or it isn't, then it's not.

> So, even when we have a constant part(23=23) in the OR clause, this
> leads to a full table scan. I think the optimizer cannot detect the
> fact that it is a constant boolean value. 

Sounds like an easy fix in sqlite.  Could you report this upstream?

> I double-checked with Postgresql and it is also performing a full table
> scan in the "boolean-constant OR :id=id" case. I could not find any
> references on the net about it.

Something easy to try would be to use the row values in sqlite instead.

See also https://www.sqlite.org/rowvalue.html
See also https://lists.gnu.org/archive/html/guix-devel/2018-07/msg00101.html

> When this would be Java/JPA I would suggest to dynamically create the
> query. Can we do something in Scheme-DB too? I.e. pseudo-code

(1) Combinatorial explosion of the number of prepared statements
(2) Slow to parse all those SQL statements

But if we don't use all combinations in practise then it's not so bad
and we could generate those statements after all.  It's still a
workaround if we have to do that.

Then we'd have to make sure that the user can't specify arbitrary
combinations and/or limit the number of prepared statements that
exist at the same time.  Cuirass is a web-facing service so these
are not theoretical problems - someone *will* break it (maybe on
purpose) if it's possible.

All in all, generating these SQL statements is like generating a
program source code and recompiling it every time you want to change
the filter used.  It might be the only good solution in this case
(maybe) - but in general it's an antipattern.

> We should add a column:
> 
> id INTEGER PRIMARY KEY AUTOINCREMENT
> 
> Problem is that this concept of AUTOINCREMENT does only work for
> Primary Keys in Sqlite. So we need to degrade "derivation" to a
> secondary key, i.e. make it non-null and unique:
> 
> derivation    TEXT NOT NULL UNIQUE,
> 
> Is there anything speaking against that?

Sounds good.  Note that when you use autoincrement, you eventually
have to handle the case when the value overflows.  The window of
used IDs slowly creeps up over the months.

> Lastly, the query has a limit and an order-by. The question is: Will
> the result be first ordered and then the limit taken? The answer (I know
> only for Postgresql and MySql, but I think it is the same for Sqlite,

I don't think the documentation at https://www.sqlite.org/lang_select.html
specifies, but it's the only thing that makes sense.  Otherwise LIMIT would
be useless as a pager.  Still would be good to have official confirmation
by the sqlite authors.

> I haven't found any reference): The order is always executed first, but
> it has to be stable. In this case it is, because we order by
> Builds.rowid, which is a key.

>Did this happen intentionally

Of course.  Otherwise LIMIT would be useless as a pager.

Attachment: pgpBOONXDpbUR.pgp
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]