grub-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Optimise memset on i386


From: Vladimir 'φ-coder/phcoder' Serbinenko
Subject: Re: [PATCH] Optimise memset on i386
Date: Fri, 25 Jun 2010 20:04:41 +0200
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100515 Icedove/3.0.4

On 06/23/2010 11:38 PM, Colin Watson wrote:
> With this approach, one of the most noticeable time sinks is that
> setting a graphical video mode (I'm using the VBE backend) takes ages:
> 1.6 seconds, which is a substantial percentage of this project's total
> boot time.  It turns out that most of this is spent initialising
> double-buffering: doublebuf_pageflipping_init calls
> grub_video_fb_create_render_target_from_pointer twice, and each call
> takes a little over 600 milliseconds.  Now,
> grub_video_fb_create_render_target_from_pointer is basically just a big
> grub_memset to clear framebuffer memory, so this equates to under two
> frames per second.  What's going on?
>
> It turns out that write caching is disabled on video memory when GRUB is
> running, so we take a cache stall on every single write, and it's
> apparently hard to enable caching without implementing MTRRs.  People
> who know more about this than I do tell me that this can get
> unpleasantly CPU-specific at times, although I still hold out some hope
> that it's possible in GRUB.
>
>   
On non-device memory GRUB should take advantage of cache. On MIPS
enabling/disabling cache is done by using a different address. So we
have all infrastructure necessary for differentiating
cacheable/non-cacheable is present. Enabling cache on video memory is
however more of a trouble. One of the reasons is that cache nmishandling
produces difficult bugs.
> However, there's a way to substantially speed things up without that.
> The naïve implementation of grub_memset writes a byte at a time, and for
> that matter on i386 it compiles to a poorly-optimised loop rather than
> using REP STOS or similar.  grub_memset is an inner loop practically by
> definition, and it's worth optimising.  We can fix both of these
> weaknesses by importing the optimised memset from GNU libc: since it
> writes four bytes at a time except (sometimes) at the start and end, it
> should take about a quarter the number of cache stalls.  And, indeed,
> measurement bears this out: instead of taking over 600 milliseconds per
> call to grub_video_fb_create_render_target_from_pointer (I think it was
> actually 630 or so, though I neglected to write that down), GRUB now
> takes about 160 milliseconds per call.  Much better!
>
> The optimised memset is LGPLv2.1 or later, and I've preserved that
> notice, but as far as I know this should be fine for use in GRUB; it can
> be upgraded to LGPLv3, and that's just GPLv3 with some additional
> permissions.  It's already assigned to the FSF due to being in glibc.
>
>   
It's ok to use this code but be sure to mention its origin. It's also ok
to keep its license unless big divergeance is to be expected.

Did you test it on x86_64?
> +void *
> +grub_memset (void *s, int c, grub_size_t n)
> +{
> +  unsigned char *p = (unsigned char *) s;
> +
> +  while (n--)
> +    *p++ = (unsigned char) c;
> +
> +  return s;
> +}
>   
This can be optimised the same way as i386 part, just replace stos with
a loop over iterator with a pointer aligned on its size.
> Thanks,
>
>   


-- 
Regards
Vladimir 'φ-coder/phcoder' Serbinenko


Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]