[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Pan-users] request (if this is the way to do it) for pan to be bett
From: |
Duncan |
Subject: |
Re: [Pan-users] request (if this is the way to do it) for pan to be better at downloading ALL the headers. |
Date: |
Wed, 23 Oct 2024 06:38:44 -0000 (UTC) |
User-agent: |
Pan/0.161 (Chasiv Yar; b869a5e4abb23a59086db723904ea0f8e72e8c09) |
Pedro via Pan-users posted on Tue, 22 Oct 2024 11:43:00 +1000 as
excerpted:
> I would like an option to show how many headers they are, and to
> download a range.
>
> the current (and always) options were all headers, or last 100k
>
> I have tried last 1000000000000 headers.
>
> it always locks up. loses what it had downloaded.
Are you tracking memory usage? Pan's likely just... running out of
memory! See the (much longer!) discussion below.
> so an option would be on the fly saving what it has got
>
> and not to try again for those ones.
>
> we have the drive space now for gigabytes of headers.
>
> any other tips or workarounds appreciated
pan keeps (and has always kept) headers in memory, building a threading
model as it goes, and actually rebuilds it (tho with some optimizations on
reload) every time it restarts, so keep enough history around and
particularly on spinning rust, you *WILL* notice pan taking some time to
load up.
(I effectively archive, unexpiring-cache, a number of text groups
including this mailing list via gmane.io's list2news service. Back before
I switched to ssd, pan was taking 10 minutes to start, so I actually
scripted a system-service to cat the entire pan-text-instance cache to
/dev/null, thus caching it in RAM. Then I had pan set to start with my
kde login and would only shut it down momentarily between restarts, thus
keeping stuff cached and the pan restart time to normally a few seconds.
With ssds I've not really seen the problem with the handful of text groups
I archive, tho I suspect I probably would again, either with cache-loading
speed or just the shear scaling of memory issue under discussion here, if
I was working with multiple millions of headers as is typical on high-
retention binary groups.)
Over the years pan's memory handling has triggered issues, the earliest
for 32-bit, where memory address space is limited to 4 GB and depending on
kernel build options individual application memory is often limited to 2
GB (50/50 kernel/userspace split), tho at some efficiency loss when
switching between user/kernelspace it's possible to do 4G/4G with
userspace and kernelspace each having 4GB addressable and switching
between them.
I remember back then, when 32-bit was still king and people were running
into issues with 2 GB RAM, the complaint was pan couldn't do much over
100K headers (I'm not sure if that has anything to do with the 100K
default, but it could).
Some optimizations later (combining string segments so frequent poster
strings are stored once and referenced, with big series where much of the
subject line is duplicated similarly handled, for instance), the cap was a
bit north of 200K for most people. But by then 64-bit amd64/x86_64 was
becoming more common, along with 8 GB memory systems, and the complaints
mostly disappeared for awhile.
64-bit doesn't have that address-space limitation, but even with those
optimizations pan does still run into scaling issues due to memory usage,
generally somewhere above a million headers but for most systems (I
believe) near 200 million, /maybe/ half a billion if you're lucky enough
to have 32 or 64 gig RAM, depending on how much memory you actually have.
If I count the zeros correctly you're trying a billion headers...
So just how much memory do you have, and do you have swap enabled and if
so how much, and how fast are the hopefully SSDs it's on? Because you're
very likely hitting your system's memory limits and the "lockup" is the
memory-thrashing "live-lock" either as you get GiB into swap or without
swap, before the OOM-killer (out of memory killer) is activated.
Unless of course you have ulimits set and pan's simply hitting its
application memory limit before the system itself runs into problems,
which won't help pan, but should help limit the damage to it instead of
locking up and potentially crashing other things on the system, depending
on what the OOM-killer picks to kill.
Meanwhile, for years (well over a decade, must be getting close to two,
making "decades" possibly accurate...) now, there has been discussion of
switching pan's header handling to some sort of database format, allowing
the database to handle it "on-disk", with only a working-set in actual
RAM. Charles Kerr mentioned it a few times back when he was still
primary/lead pan dev, but my personal suspicion is that he was a C and C++
dev but didn't consider himself a database dev and simply wasn't
comfortable doing it without someone more familiar with the pitfalls of
that area. (And believe me as I've seen it in other areas including
email, where I switched clients over the problem, it takes a *very* good
coder, or often several, generally several years of stabilizing, before
most database app-implementations are stable enough to *not* regularly
lose data due to corrupted database, etc.) Regardless of his reason,
though, to my knowledge no effort at it was ever made public.
Several lead devs and as I said must be nearing two decades later, and the
suggestion continues to appear from time to time. But now there's
actually some development. Continue reading. =:^)
Recently, Dominique Dumont (which I usually shorten to DD) stepped up as
upstream pan lead dev (we were without for a few years) from Debian pan
maintainer. His first priority of course was updating pan code to work
with current versions of the libraries it depends on, etc, given it was
behind from several years of neglect. The worst of that is now done and
he seems well into dealing with the second priority, porting still working
but deprecated library usage before it stops working too. Redoing the
icon-handling code was part of that. Now, pan is more stable and on a
better track in terms of its future than it has been for many years. =:^)
Now that the critical and nearing-critical stuff is done, DD's expanding
into some of the deeper projects. One of the first was porting/
modernizing the build system from gnu-auto* to cmake. That is now done
and seems to be stable after a few initial hiccoughs. =:^) Another was
rewriting some rather legacy color handling (I believe the old code was
using recently deprecated calls so it falls under that too). As someone
who /needs/ a non-default light-on-dark color-scheme for medical reasons
and who builds and runs live-git pan direct from the git repos, I was
personally involved in reporting and getting the hiccoughs fixed there,
and I'm happy to say the new color code that was broken in 0.159 was fixed
for 0.160 (with 0.161 current). =:^)
But potentially more challenging, and certainly more apropos to the
current topic, DD has recently started (announced in June) a sqlite
database-porting effort. The announcement says it's available for testing
as the sqlite branch in the git repo, and back then, only the news server
information was ported, with sqlite storage for the group information next
on his list.
He said it would take a few months, potentially 1-2 years for all pan
data, so don't hold your breath. And while I'm not building that branch
yet (at the time I was stuck on the auto* builds and wasn't even doing
cmake yet, I'm on cmake now but haven't tried switching to that branch and
building with sqlite yet), as I've said, my experience is that it can take
quite some time to stabilize database code, so even if it's "working" I
could easily see it not really /stable/ for some time after that. We'll
see...
So there's an effort underway altho the branch is experimental and hasn't
been merged to main/master yet. If you're into building from sources you
may wish to try it. I'm building from sources (on gentoo, using an ebuild
for the purpose), but haven't tried that branch yet, and I've not seen
anything more about it on-list, so I don't know current status. But I
believe it's still there (I've not actually checked recently when I do my
git pulls) to try if you want...
Of course as the announcement suggested and as I'll second, be sure and
backup your ~/.pan dir (or whatever you have PAN_HOME pointed at if
different) before you try it, and I'll add, based on experience with other
projects, do expect some stability issues and potentially restoring or
rebuilding the database at times, because that does tend to happen with
new database code.
The announcement can be found on-list as (including gmane.io newsgroup
info, DD's email address deleted as gmane mungs those for spam control,
spaces added around the @ in the message-id hoping to keep it from trying
to mung that further too):
From: Dominique Dumont
Newsgroups: gmane.comp.gnome.apps.pan.devel,gmane.comp.gnome.apps.pan.user
Subject: Experiment on Sqlite storage
Date: Sat, 22 Jun 2024 17:08:29 +0200
Message-ID: <2942917.e9J7NaK4W3 @ ylum>
Xref: news.gmane.io gmane.comp.gnome.apps.pan.devel:1714
gmane.comp.gnome.apps.pan.user:16222
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman