avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] Handling __flash1 and .trampolines [Was: .trampolines


From: Georg-Johann Lay
Subject: Re: [avr-gcc-list] Handling __flash1 and .trampolines [Was: .trampolines location.]
Date: Thu, 13 Dec 2012 12:24:12 +0100
User-agent: Thunderbird 2.0.0.24 (Windows/20100228)

Erik Christiansen schrieb:
Warning:  Alternative solutions are offered for some of the sub-problems.
          Choices are clearer at the end.
          The best choice depends mostly on whether one default linker
script should handle all the use cases mooted here. The reply is a bit long. (Grab a coffee ;-)

On 11.12.12 17:47, Georg-Johann Lay wrote:

[Beautifully explanatory example of AVR memory paging snipped]

This puts x into input section .progmem1.data and expects that this section is
located appropriately, in particular that  &x div 2^16  =  1

It allows to use 16-bit addresses in order to address 0x10000...0x1ffff,
similar for other __flashN.

However, .progmem1.data is not handled by the default linker script, i.e. will
match .progmem*

The easiest part is to stop gobbling any .progmemN.data into the .text
output section. (__flash IIUC). Just change the line:

*(.progmem*)

to:

*(.progmem.data)     /* We only want page 0 stuff here. */

Also needs *(.progmem.data*) or at least *(.progmem.data.*) because with avr-gcc 4.7 up progmem is sensitive to -fdata-sections. I found no easy way to have something like -mprogmem-sections that is selective and only affects progmem in a way similar to -fdata-sections.

The GCC code, in particular varasm.c, is horrible. It has hooks but as soon as you want one bit more than other targets need you are stuck...

There are several ways to place the .progmemN.data input sections where
we want them, one way is to just grow the "text" memory segment in the
linker MEMORY model:

MEMORY
{
  text   (rx)   : ORIGIN = 0, LENGTH = 190K        /* 3 x 64k pages */
  boot   (rx)   : ORIGIN = 62K, LENGTH = 2K
  data   (rw!x) : ORIGIN = 0x800100, LENGTH = 4K
  eeprom (rw!x) : ORIGIN = 0x810000, LENGTH = 2K
}

Now we tweak the end of the .text output section with:

   *(.progmem1.data)    /* Page 1 */
   *(.progmem2.data)    /* Page 2 */
   *(.progmem3.data)    /* Page 3 */
   _etext = . ;
__code_end = . } > text

(Or put the higher pages before the destructors?)

There are also restrictions to ctors / dtors that come from the startup code bits from libgcc.

If that is done, then we have to engineer all page overflow erroring
ourselves, perhaps as described later. But we are cheating ourselves of
ld's help. If we let ld in on the secret of the memory model, then it
can detect page overflow without any effort from us. (Skip this option
if the __memx use case has to be handled as well, by the one linker script.)

We just need a separate memory segment for each physical page,
e.g. text, flash1, flash2:

MEMORY {
  text   (rx)   : ORIGIN = 0, LENGTH = 62K

I don't think that is helpful, except we want a linker script for each scenario which basically means that the user has to write her script and juggle with .text, .progmem, .progmemN, .lowtext, .trampolines, ctors, dtors, bootloader, whatever.

The AVR memory model is complicated thanks to its harvardness, but still the users expect that -mmcu=mydevice produces perfect executable with however many program code or however many progmem data they stuff into their sources.

If you propose to accommodate for a specific setup by means of a custom linker script, you will immediately get the response (e.g. on avrfreaks) "linker script is too complicated for the user, don't propose that, everything must run out of the box."

I understand that out-of-the-box solutions are convenient, but as the number of kludges increase towards oo and the number of supporters decrease towards 0, it will lead to frustration sooner or later.

For example, we have around 200 -mmcu= variants in the compiler and in the binutils and in the libc, and nobody ever dared to work out an alternative solution to the insane-number-of-mmcu scheme. End of rant ;-)

  boot   (rx)   : ORIGIN = 62K, LENGTH = 2K
  flash1 (rx)   : ORIGIN = 64K, LENGTH = 64K
  flash2 (rx)   : ORIGIN = 128K, LENGTH = 64K
data (rw!x) : ORIGIN = 0x800100, LENGTH = 4K eeprom (rw!x) : ORIGIN = 0x810000, LENGTH = 2K
}

(I would though, name these physical-world entities page0, page1, page2

NACK, we will see similar things for RAM, i.e. it is paged, too.

for user readability. There is no good reason why the .text ouput
section name has to infect the memory model. ;-) Then we would have: MEMORY {
  page0  (rx)   : ORIGIN = 0, LENGTH = 62K
  boot   (rx)   : ORIGIN = 62K, LENGTH = 2K

What about different bootloader layouts? Bootloader at end of flash? No bootloader at all?

  page1  (rx)   : ORIGIN = 64K, LENGTH = 64K
  page2  (rx)   : ORIGIN = 128K, LENGTH = 64K
data (rw!x) : ORIGIN = 0x800100, LENGTH = 4K eeprom (rw!x) : ORIGIN = 0x810000, LENGTH = 2K
}

Now, instead of the tweak at the end of the .text output section, we
add new output sections after the end of that section:

.flash1 :
{  *(.progmem1.data)    /* Page 1 */
} > page1

.flash2 :
{  *(.progmem2.data)    /* Page 2 */
} > page2

Now ld will automatically detect page overflows. (But as we discover
later, there's a contrary use case which doesn't want that.)

The preferred behavior is:

- Locate .progmem1.data at 0x10000

A third way that can be done is by setting the VMA in an output section
used with the original memory model, e.g.:

.flash1 0x10000 :
{  *(.progmem1.data)    /* Page 1 */
} > text

Basically anything works where:

o  .progmemN.data is empty    --> don't care for overlaps
o  .progmemN.data is nonempty --> must be subset of [0xN0000..0xNffff]

For example .text may overlap .progmemN.data provided it still satisfies the subset constraint. This is just the requirements, independent of it is possible to describe that in the script.

And again, notice the effect of -fdata-sections on __flashN section names.

It is the __memx use case which constrains our choice of method for
solving this easy matter.

Currently, .progmem is at the low end and .text atop because it is easy to run code at the high addresses thank to the linker stubs.

The only thing that is needed is that the stubs are in the first text segment of 16-bit words, i.e. byte address 0x0..0x1ffff, i.e. EIND = 0. It works with .trampolines in other word segments, but then that location must be made explicit in the script and EIND set appropriately in the startup code. AFAIR what don't work is shifting .trampolines out of place because .progmem is too big.

- Complain if data exceeds .progmem1.data and enters .progmem2.data

If the __memx use case does blithely overflow page boundaries without
error, so that we can't use separate memory segments and automatic
overflow detection, then the simplest linker script syntax (that I know
of) which achieves that is this assertion placed at the end of the
flash1 output section:

__memx was designed that way (after strong protest from Jan when I objected that __memx would produce too bloaty code). If __memx is needed, very likely code size is no issue. I cannot tell to what degree speed is an issue. Notice __memx can also used to access RAM.

flash1_test = ASSERT( . < 0x1ffff, "2nd 64k page (__flash1) overflow!");

Even if nothing else is changed to support .progmemN, such assertions are strongly indicated and we should have them, IMHO.

Currently, nobody uses __flashN. *If* someone used __flashN, he would get broken code (users typically don't read the docs and won't write their ld scripts). Because nobody ever complained, the conclusion is nobody ever used __flashN ;-)

Alternatively, the following should do the same, is more explicit, and
can be placed anywhere after the section.:

   flash1_size_test = ASSERT( SIZEOF(.progmem1.data) < 0x10000, \
                      "2nd 64k page (__flash1) overflow!");

We just need to suppress these errors if __memx is used, AIUI.

The assertion to .progmemN still applies even if __memx is used.

I am not sure if all usage scenarios can be supported without raising unresolvable conflicts. The compiler just drops data as the user tells it, one reason is my limited knowledge of ld script capabilities.

Maybe it's even the case that we have to make __flashN and __memx mutually exclusive. Maybe we can have something like -mmodel=foo to get proper checks and code for a specific layout in the case where there is no one-fits-all script and the models are reasonably common.

Problem is that the tool dependency turn-around cycle is slow and will take 1 year or even more...

- Similar for .text

Ah, yes: .text , "1st 64k page ..."

No no.  .text should not be limited.

- Print diagnostics that are comprehensible, i.e. mention the input
sections, that they overlap, and the symbol and object file that
trigger the overlap.

By the time ld is locating, the input sections are not even a dim
memory, IIRC. We would be told of the overlap, and can expect the memory
segment to be named in automatic overlap detection.

Sounds reasonable and helpful. Ditto for trampolines and maybe also ctors / dtors.

In my experience the symbol and file can usually only be inferred from
examining the memory map for an input section which has suddenly grown.
The guff which falls off the end is usually the victim of shoving from
behind. Who but the designer can say which lump of the program is taking
too much room? I look at the map file in these cases, to find what has
suddenly grown obese.

- If .text spans from 0x0 to, say, 0x2aaaa, that's fine provided
.progmem1.data and .progmem2.data are empty.

Oh. That's quite a contrary use case, compared to what precedes. It
constrains our freedom a bit, if we want it all in one default script.
We have to _not_ use a memory segment per page, just expand "text", in
order to stop ld from automatically complaining about page overflow.
Then we can perhaps formulate such a test as:

with_the_lot = ASSERT( SIZEOF(.text) <= 0x30000 && \ SIZEOF(.progmem1.data) == 0 && \
                  SIZEOF(.progmem2.data) == 0,
                  "text segment (__memx) overflow!");

If there is such logic, we can express anything we want :-)

First step is to barf with comprehensible diagnostic if any constraint is violated.

Second step is to reduce barfs to a minimum.

That is compatible with separate size assertions for .progmemN.data, but
how to choose whether to allow .progmem.data to silently grow?
(__flash, __flash1 vs __memx) Use an avr-gcc commandline define (-D) to
manually select? Or use non-zero __flash1 to decide?

The __flashN spaces were introduced with __flash and because they are not much
more work than __flash.  I don't know if anybody is using that.

If there were no use case allowing several physical pages to be treated
as one, then all __flashN could be handled as easily as __flash1. It is
only the __memx case, which constrains our choice of linker script
design, IIUC.

Alternative is __memx which implements 24-bit pointers.  Notice that
address-arithmetic is still 16-bit arithmetic.  Using __memx with the code
above, e.g.

... [example elided]

Notice x is in .progmem.data now and an access function is used because hlo8(x)
is not known at compile time.

This means __memx can be regarded as extension of __flash.

Yes, that's the use case which militates against expressing the flash
pages as separate memory segments. With 24-bit pointers, it is legal to
overflow physical pages? If so, the examples above which use a common

Yes. Goal is to have objects that overlap page boundaries and the compiler generates code that can read across a boundary. I.e. you can have a float located at 0xffff..0x10002 and reading shall work out of the box.

text memory segment, but separate flashN output sections, look more
attractive.

Again, it is reasonable to locate .trampolines early.  Because there
is no limitation for .progmem.data except that it must not exceed
0x7fffff because higher addresses are taken as RAM or I/O locations.

So long as early trampolines are reachable by all their users, such
placement can avoid later surprises when code grows. (Having them move
hither and yon isn't my cup of tea.)

As said, it's already helpful to have comprehensible diagnose if .trampolines is pushed across a 17-bit boundary.

I would enjoy formulating a linker script to handle the various use
cases. I have not paused tonight to update my avr-gcc, but will do so.
Then I can tweak the latest default script, and test any offering before
inflicting it on the unsuspecting. Some iteration is to be expected, and
correction of any misapprehension on my part would be gratefully
received.

You intend to contribute the script to binutils?

Johann






reply via email to

[Prev in Thread] Current Thread [Next in Thread]