bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: stat() order performance issues


From: Jim Meyering
Subject: Re: stat() order performance issues
Date: Fri, 26 Jan 2007 18:18:22 +0100

Phillip Susi <address@hidden> wrote:
> Jim Meyering wrote:
>> Which ls option(s) are you using?
>
> I used ls -Ui to list the inode number and do not sort.  I expected this
> to simply return the contents from getdents, but I see stat64 calls on
> each file, I believe in the order they are returned by getdents in,
> which causes a massive seek storm.
>
>> Which file system?  As you probably know, it really matters.
>
> In my case, reiserfs, but this should apply equally as well to ext2/3.

That's good, but libc version matters too.
And the kernel version.  Here, I have linux-2.6.18 and
Debian/unstable's libc-2.3.6.

>> If it's just "ls -U", then ls may not have to perform a single "stat" call.
>> If it's "ls -l", then the stat per file is inevitable.
>> But if it's "ls --inode" or "ls --file-type", with the right file system,
>> ls gets all it needs via readdir, and can skip all stat calls.  But with
>> some other file system types, it still has to stat every file.
>
> It seems that ls -U does not stat, but ls -Ui does.  It seems it
> shouldn't because the name and inode number are returned by readdir
> aren't they?

Yes.

Make sure you're using the latest version of coreutils.
If necessary, use a debugger to see whether readdir provides
valid inode information on your system.  It should

>> For example, when I run "ls --file-type" on three maildirs containing
>> over 160K entries, it's nearly instantaneous.  There are only 3 stat calls:
>>     $ strace -c ls -1 a b c|wc -l
>
> Are a, b and c files or directories?  If they are files, then of course

They're directories (of course), containing a total of 160K+ entries.

> it would only stat 3 times, because you have only asked ls to look up 3
> files.  Try just ls -Ui without the a b c parameters.
>
>>> du in a Maildir with many thousands of small files takes ages to
>>> complete.  I have investigated and believe this is due to the order in
>> Yep.  du has to perform the stat calls.
>> "ages"?  Give us numbers.  Is NFS involved?  A slow disk?
>> I've just run "du -s" on a directory containing almost 70,000 entries,
>> and it didn't take *too* long with a cold cache: 21 seconds.
>
> Modest disk, no NFS, 114k entries, and it takes 10-15 minutes with cold
> cache.  When I sorted the directory listing by inode number and ran stat
> on each in that order with cold caches, it only took something like 1
> minute.

10-15 minutes is very bad.
Something needs an upgrade.

I presume you used xargs -- you wouldn't run stat 114K times...




reply via email to

[Prev in Thread] Current Thread [Next in Thread]