bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#33653: 27.0.50; Change Gnus obarrays-as-hash-tables into real hash t


From: Eric Abrahamsen
Subject: bug#33653: 27.0.50; Change Gnus obarrays-as-hash-tables into real hash tables
Date: Mon, 25 Mar 2019 10:35:32 -0700
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux)

Andy Moreton <andrewjmoreton@gmail.com> writes:

> On Sun 24 Mar 2019, Eric Abrahamsen wrote:
>
>> Katsumi Yamaoka <yamaoka@jpl.org> writes:
>>
>>> Hi,
>>>
>>> Gnus got not to work for groups of which the group name contains
>>> non-ASCII letters.  For instance, I got this error when trying
>>> to update the "nnml:テスト" group using `M-g'[1]:
>>>
>>> nnml:\343\203\206\343\202\271\343\203\210 error: No such group: テスト
>>>
>>> When trying to enter the group using `0 RET'[2] I got:
>>>
>>> Group nnml:\343\203\206\343\202\271\343\203\210 couldn't be activated
>>>
>>> Those raw bytes are utf-8 encoded "テスト", that is also used in
>>> the group entry in gnus-newsrc-alist saved in the ~/.newsrc.eld
>>> file as follows:
>>>
>>> ("nnml:\343\203\206\343\202\271\343\203\210" 1 nil ((unexist) (seen (1
>>> . 5))) "nnml:" ((timestamp 23704 11958)))
>>
>> Yes, this is something I screwed up in c1b63af445. Gnus has always
>> stored group names as raw bytes in.newsrc.eld (at least I believe it
>> has, you probably know better than I do, it does in my experiments with
>> Emacs 26, anyway), and only encodes during display. But obviously I've
>> messed something up between file persistence and display, and I'm
>> working on sorting it out.
>
> Perhaps it would be better to revert and reintroduce your changes after
> further testing ? Taking time over this is better than causing data loss
> for gnus users.

That's the conclusion I'm coming to, yes.

> Other notes from reading the code:
>
> 1) In `gnus-gnus-to-quick-newsrc-format' you ignore the contents of
>    `gnus-newsrc-alist' when saving "newsrc.eld", and replace it with the
>    details from `gnus-newsrc-hashtb'. Why ? The rest of the gnus code
>    appears to treat `gnus-newsrc-alist' as the single source of truth,
>    with the hash tables being used only for faster access to it.

Eventually I would like to reduce the number of data structures so that
groups are held in `gnus-newsrc-hashtb', and ordering is kept in
`gnus-group-list', and that's it. `gnus-newsrc-alist' would only be used
when persisting to disk. My next proposed change (once I've recovered my
confidence) is to turn groups into actual objects, in which case the
alist would really just be a kind of serialization format.

The hash table ought to be in sync with the rest of the data structure
-- if it isn't, that's another bug.

> 2) In `gnus-gnus-to-quick-newsrc-format' you dropped the code to remove
>    the dummy group from `gnus-newsrc-alist'. Why ? This internal dummy
>    group is now saved in "newsrc.eld", which is not needed.

This was an error. (Though in my case, I've had the dummy group in my
newsrc.eld for months, and it hasn't done any harm. I don't know why
it's necessary.)

> 3) The format of the entries in `gnus-newsrc-hashtb' has changed,
>    removing the second element. Why ?

Because the old `gnus-gethash' call returned a slice of
`gnus-newsrc-alist', where the second element was actually the group
*before* the group you wanted, and the third element was the cdr of
`gnus-newsrc-alist', starting with the group you wanted. This was
undocumented, and took a bit to figure out. Now, the gethash call just
gives you the group. Ideally, in the next set of changes, it will give
you an object.

> 4) You changed several hash tale sizesfrom 4096 to 4000, and 1024 to
>    1000. Why ?

My understanding is that using a prime number is significant when it
comes to vector access, but that the hash table implementation is
higher-level, where a prime number is no longer significant. If that's
incorrect I would like to know!

> Your patch contains several logical changes that would be easier to
> understand (and bisect) as a series of patches with one logical change
> in each patch:
>  - code layout changes
>  - add missing doc strings and code comments
>  - change hash table implementation
>  - change format of `gnus-newsrc-hashtb' entries
>  - change usage of `gnus-group-change-level'
>  - change coding of group names
> While it can take extra work to split things up, the end result is much
> easier to understand.

In principle I agree with this completely. In practice I found it
extraordinarily difficult to touch one part of Gnus without running into
knock-on repercussions.

The ultimate goal of the changes I have in mind for Gnus is to address
exactly this: to make it more modular, to improve isolation of code
paths, and to reduce the number of semi-redundant data structures. But
the process is evidently even messier than I thought. I held back
another commit to group name encoding in an attempt to keep things
simple, but that seems to have made things even worse.

But yes, if I end up backing this change out, I'll try to break it up
into smaller commits.

> Thanks for working on gnus,

Thanks for the code review! I wish I'd gotten this to begin with.

Eric





reply via email to

[Prev in Thread] Current Thread [Next in Thread]