RE: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register o

avr-gcc-list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register o

From:	Weddington, Eric
Subject:	RE: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations]
Date:	Sun, 13 Jan 2008 20:53:56 -0700

 

> -----Original Message-----
> From: Dave N6NZ [mailto:address@hidden 
> Sent: Sunday, January 13, 2008 4:19 PM
> To: Weddington, Eric
> Cc: John Regehr; address@hidden
> Subject: Re: AVR Benchmark Test Suite [was: RE: 
> [avr-gcc-list] GCC-AVR Register optimisations]
> 
> 
> 
> Weddington, Eric wrote:
> >
> 
> It's worth drawing a distinction between benchmarks and regression 
> tests.  They need to be written differently.  A regression 
> test needs to 
> sensitize a particular condition, and needs to be small enough to be 
> debuggable. A benchmark needs to be "realistic", which often 
> makes them 
> harder to debug. I say we need both.  The performance 
> regression tests 
> can easily roll into release criteria.  A suite of performance 
> benchmarks is more useful as a confirmatory "measure of 
> goodness" -- but 
> actual mysteries in the aggregate score will most likely be 
> chased with 
> smaller tests.

Ok. Regression tests should really fit within the GCC Regression Test
framework. I would rather not duplicate the work that they have there.
So I'm really looking for benchmark tests, under your definition. That's
not to say I want to ignore the regression tests. I just want to fill in
a gap that's missing for the AVR.

 
> A semi-related question is how many of these tests can be pushed up 
> stream?  If we could get a handful of uCtlr-oriented code size 
> regression tests packaged up so that the developers of the generic 
> optimizer could run them as release criteria, it would, I 
> would think, 
> improve the overall quality of gcc for all uCtlr targets.

Nothing can be pushed upstream right now. As I mentioned in another post
in this thread, the AVR target is not that important in the eyes of the
overall members of the GCC project. I'm working diligently to change
that. But it's one of those, "if we want something done, do it
ourselves".

 
> > 
> > There is also an interest in comparing AVR compilers, such 
> as how GCC
> > compares to IAR, Codevision or ImageCraft compilers.
> 
> Who is interested? gcc developers, as a means to keep gcc 
> competitive? 
> Or potential users?  The former is benchmarking, the latter is moving 
> towards bench-marketing. Not that marketing is bad, but that sort of 
> thing can be a distraction.  In any case, the tests that are 
> meaningful 
> here are the benchmark "overall goodness" test suite, not the 
> targeted 
> test suite.

As a gcc developer, I am interested in some kind of metric to keep gcc
competitive with other AVR compilers. Honestly, it seems that it is
"urban myth" that IAR optimizes better than GCC. Is that really true?
For what applications? For what compiler switches? Eventually I'd like
to have something definitive to combat any FUD.

I don't want to get into bench-marketing. I would really like to have
something of value and meaningful, and not have to tweak numbers to
arrive at good results to show off. If AVR GCC sucks in an area, I don't
want to paper over it. I want to show it so we know what needs
improvement.

 
> > 
> > And sometimes there is an interest in comparing AVR against other
> > microcontrollers, notably Microchip's PIC and TI's MSP430.
> 
> Different processor with same compiler?  Different processor 
> with best 
> compiler? -- Now this is beginning to sound like SPEC.

Well, lofty goals for sure. I don't want to get outside of the 8-bit
microcontroller realm. I certainly want to do first things first. But I
think it might be interesting, at some point in the future, if some of
those things could be achieved.
 
> > 
> > If we are going to put together a benchmark test suite, like others
> > benchmarks for GCC (for larger processors), then I would 
> think that it
> > would be better to model it somewhat after those other 
> benchmarks. I see
> > that they tend to use publicly available code, and a variety of
> > different types of applications.
> 
> For benchmarking, and bench-marketing, that's a good 
> approach.  I'll be 
> redundant and say those are probably not what you want to be 
> debugging. 
> It would make sense for what I'll call a "avr-gcc dashboard". 
>  I see a 
> web page with a bunch of bar graphs on it.  A summary bar at the top 
> that is the weighted sum of individual test bars.  As an 
> avr-gcc user, 
> that kind of summary page would be very useful from one 
> release to the 
> next for setting expectations regarding performance on your own 
> application. As an avr-gcc release master, it's a good dashboard for 
> tracking progress and release worthy-ness.

That's definitely the idea.
 
> 
> > the Atmel 802.15.4 MAC,
> Need to check license on that one -- but a good choice otherwise

:-)
 
> > and the GCC version of the
> > Butterfly firmware. I also have a copy of the "TI Competitive
> > Benchmark", which they, and other semiconductor companies, 
> have used to
> > do comparisons between processors.
> Not familiar with it.  Also, check the license.  Processor 
> manufacturers 
> (like, oh, for instance, *all* the several I have worked for) 
> are very 
> touchy about benchmarks and benchmark publications.  My sea 
> charts have 
> a notation: "Here be lawyers".

Well it certainly helps that I work for Atmel. ;-)
 
 
> 
> > 
> > Thoughts?
> 
> Test categories:
> 1. float v. scalar
> 2. targeted test v. benchmark v. published dashboard metric
> 3. member of quick v. extended v. full test list
> 4. size v. speed
> 
> That unrolls into 36 test lists, but the same test may appear 
> multiple 
> times (in both quick and extended, perhaps both size and speed).
> 
> As to priorities, IMO the top two priorities are:
> 1. targeted scalar size
> 2. targeted scalar speed
> 
> Why?  To get tests that target specific optimization regressions.  A 
> size regression is more painful to an embedded developer than a speed 
> regression. Floating point math is largely in a library so 
> less at risk 
> for a compiler optimization regression.

Good point.
 
> I'm not saying other things are not important, that's just my take on 
> what to tackle first (after infrastructure, of course.)
> 
> -dave
> 
> BTW -- having a defined place to put a performance regression 
> test is a 
> good start.  Any performance regression that pops up should 
> have a test 
> written for it and cataloged in the framework.
> 

Thanks for such a thoughtful response! :-)

Eric Weddington
Product Manager
Atmel

[Prev in Thread]

Current Thread

[Next in Thread]

RE: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Registeroptimisations], (continued)

Prev by Date: Re: [avr-gcc-list] Re: Simulator for GCC Testing
Next by Date: Re: [avr-gcc-list] avr gcc for java
Previous by thread: Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations]
Next by thread: Re: AVR Benchmark Test Suite [was: RE: [avr-gcc-list] GCC-AVR Register optimisations]
Index(es):
- Date
- Thread