[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2] Add io space MMAP for memory mapping devices and files
From: |
Andreas Klinger |
Subject: |
Re: [PATCH v2] Add io space MMAP for memory mapping devices and files |
Date: |
Tue, 9 Jan 2024 14:37:54 +0100 |
Hi Jose,
"Jose E. Marchesi" <jemarch@gnu.org> schrieb am Mo, 08. Jan 19:55:
>
> > Hi Jose,
> >
> > "Jose E. Marchesi" <jemarch@gnu.org> schrieb am Mo, 08. Jan 18:06:
> >>
> >> > Hi Jose,
> >> >
> >> > "Jose E. Marchesi" <jemarch@gnu.org> schrieb am Mo, 08. Jan 15:08:
> >> >>
> >> >> > Hi Jose,
> >> >> >
> >> >> > "Jose E. Marchesi" <jemarch@gnu.org> schrieb am Mo, 08. Jan 00:03:
> >> >> >>
> >> >> >> > Hi again,
> >> >> >> >
> >> >> >> > Andreas Klinger <ak@it-klinger.de> schrieb am Sa, 06. Jan 20:42:
> >> >> >> >> Hi Jose,
> >> >> >> >>
> >> >> >> >> "Jose E. Marchesi" <jemarch@gnu.org> schrieb am Sa, 06. Jan 20:14:
> >> >> >> >> > > I think that the approach of patch v2 could be optimized by a
> >> >> >> >> > > .set alignment
> >> >> >> >> > > variable for those guys who are aware of such hardware
> >> >> >> >> > > constraints on their
> >> >> >> >> > > architecture and want to optimize this. But I would set the
> >> >> >> >> > > default to a value
> >> >> >> >> > > which works for all, which is the size of the address bus.
> >> >> >> >> >
> >> >> >> >> > I think it is perfectly ok to have IOD specific configuration
> >> >> >> >> > parameters, sure.
> >> >> >> >> >
> >> >> >> >> > Let's make it a Poke variable that the user can customize, plus
> >> >> >> >> > a
> >> >> >> >> > setting in the registry so the .set dot-command works on it. I
> >> >> >> >> > suppose
> >> >> >> >> > it shall have the name mmap in it, like pk_mmap_alignment.
> >> >> >> >> > WDYT?
> >> >> >> >>
> >> >> >> >> great. It makes sense to name it mmap as it only applies to mmap
> >> >> >> >> io spaces.
> >> >> >> > [...]
> >> >> >> >
> >> >> >> > After thinking about the problem again I came to a different
> >> >> >> > solution which is
> >> >> >> > much simpler, doesn't need an additional .set variable and avoids
> >> >> >> > the signal
> >> >> >> > handler as well as it is copying with the address bus size if
> >> >> >> > possible, thus
> >> >> >> > avoiding performance bottlenecks. The read function is as follows:
> >> >> >> >
> >> >> >> > static int
> >> >> >> > ios_dev_mmap_pread (void *iod, void *buf, size_t count,
> >> >> >> > ios_dev_off offset)
> >> >> >> > {
> >> >> >> > struct ios_dev_mmap *dev_map = iod;
> >> >> >> > int align = sizeof(void*);
> >> >> >> > char *m = buf;
> >> >> >> >
> >> >> >> > if (offset > dev_map->size || count > dev_map->size - offset)
> >> >> >> > return IOD_EOF;
> >> >> >> >
> >> >> >> > /* copy unaligned bytes */
> >> >> >> > while (count && offset % align)
> >> >> >> > {
> >> >> >> > memcpy (m, dev_map->addr + offset, 1);
> >> >> >> > count--;
> >> >> >> > offset++;
> >> >> >> > m++;
> >> >> >> > }
> >> >> >> >
> >> >> >> > /* copy with the address bus size */
> >> >> >> > while (count >= align)
> >> >> >> > {
> >> >> >> > memcpy (m, dev_map->addr + offset, align);
> >> >> >>
> >> >> >> This is still copying a byte at a time, isn't it?
> >> >> >
> >> >> > There is no general answer to this question. It depends on what the C
> >> >> > library
> >> >> > and the compiler are making out of it. Thus it's also dependent on the
> >> >> > architecture and the value of align.
> >> >>
> >> >> Actually, memcpy is guaranteed to work regardless of any alignment
> >> >> requirement restrictions imposed by the architecture targetted by the
> >> >> compiler, so the general answer is that you can always assume that
> >> >> memcpy is equivalent to copying bytes by bytes.
> >> >>
> >> >> The compiler will only optimize a memcpy (foo, bar, 8) into a long word
> >> >> move if it knows that foo and bar are properly aligned according to the
> >> >> rules of the target.
> >> >
> >> > You're right.
> >> >
> >> > But I'm still wondering why I get the SIGBUS (on zynqmp) when doing a
> >> > memcpy of
> >> > 2 bytes from an odd address which is mmaped, like:
> >> >
> >> > memcpy(some-variable, odd-address, 2);
> >>
> >> I have no idea. I guess that's up to the driver providing the mmapped
> >> contents? Is the driver documenting the rules on how to access the
> >> contents?
> >
> > I wrote the driver and I'm doing a usual memory mapping.
>
> Ok then it would be very strange to get SIGBUS when accessing a single
> byte.
>
> >
> > I wrote a function which demonstrates the difference if memcpy is called
> > with a
> > variable a (for alignment) or with a fixed value:
> >
> > #include <string.h>
> >
> > int mycopy(void)
> > {
> > int a = 4;
> > unsigned int x;
> > unsigned char y[4];
> >
> > memcpy(&x, &y[0], 4);
> >
> > memcpy(&x, &y[0], a);
> >
> > return 0;
> > }
> >
> > In the objdump we can see that when called with the fixed value of 4 bytes
> > to be
> > copied there are just 2 instructions and the function memcpy isn't called at
> > all. So the compiler resolves the memcpy without the function call.
>
> That is to be expected, since the compiler knows the addresses of both x
> and y automatics, and therefore their alignment.
see disassemble below.
> > I think this could really be optimized in the code snipped I provided
> > earlier to
> > avoid byte by byte copying.
>
> In the code snippet earlier the address of the buffer where the stuff is
> copied is passed as an argument to th function, so I don't see how the
> compiler could optimize it that way.
>
> Have you tried to disassemble the pread function in the mmap IOD to see
> what the calls to memcpy are compiled to?
Here is the objdump of libpoke created for bbb:
0009c170 <ios_dev_mmap_pread>:
{
9c170: e92d4030 push {r4, r5, lr}
9c174: e59d300c ldr r3, [sp, #12]
9c178: e59de010 ldr lr, [sp, #16]
if (offset > dev_map->size || count > dev_map->size - offset)
9c17c: e5904028 ldr r4, [r0, #40] ; 0x28
9c180: e590c02c ldr ip, [r0, #44] ; 0x2c
9c184: e1540003 cmp r4, r3
9c188: e0dc500e sbcs r5, ip, lr
9c18c: 3a000024 bcc 9c224 <ios_dev_mmap_pread+0xb4>
9c190: e0544003 subs r4, r4, r3
9c194: e0ccc00e sbc ip, ip, lr
9c198: e1540002 cmp r4, r2
9c19c: e2dcc000 sbcs ip, ip, #0
9c1a0: 2a000007 bcs 9c1c4 <ios_dev_mmap_pread+0x54>
return IOD_EOF;
9c1a4: e3e00004 mvn r0, #4
}
9c1a8: e8bd8030 pop {r4, r5, pc}
memcpy (m, dev_map->addr + offset, 1);
9c1ac: e590c030 ldr ip, [r0, #48] ; 0x30
9c1b0: e7dcc003 ldrb ip, [ip, r3]
9c1b4: e4c1c001 strb ip, [r1], #1
count--;
9c1b8: e2422001 sub r2, r2, #1
offset++;
9c1bc: e2933001 adds r3, r3, #1
9c1c0: e2aee000 adc lr, lr, #0
while (count && offset % align)
9c1c4: e3520000 cmp r2, #0
9c1c8: 0a000008 beq 9c1f0 <ios_dev_mmap_pread+0x80>
9c1cc: e3130003 tst r3, #3
9c1d0: 1afffff5 bne 9c1ac <ios_dev_mmap_pread+0x3c>
9c1d4: ea000005 b 9c1f0 <ios_dev_mmap_pread+0x80>
memcpy (m, dev_map->addr + offset, align);
9c1d8: e590c030 ldr ip, [r0, #48] ; 0x30
9c1dc: e79cc003 ldr ip, [ip, r3]
9c1e0: e481c004 str ip, [r1], #4
count -= align;
9c1e4: e2422004 sub r2, r2, #4
offset += align;
9c1e8: e2933004 adds r3, r3, #4
9c1ec: e2aee000 adc lr, lr, #0
while (count >= align)
9c1f0: e3520003 cmp r2, #3
9c1f4: 8afffff7 bhi 9c1d8 <ios_dev_mmap_pread+0x68>
9c1f8: ea000005 b 9c214 <ios_dev_mmap_pread+0xa4>
memcpy (m, dev_map->addr + offset, 1);
9c1fc: e590c030 ldr ip, [r0, #48] ; 0x30
9c200: e7dcc003 ldrb ip, [ip, r3]
9c204: e4c1c001 strb ip, [r1], #1
count--;
9c208: e2422001 sub r2, r2, #1
offset++;
9c20c: e2933001 adds r3, r3, #1
9c210: e2aee000 adc lr, lr, #0
while (count)
9c214: e3520000 cmp r2, #0
9c218: 1afffff7 bne 9c1fc <ios_dev_mmap_pread+0x8c>
return IOD_OK;
9c21c: e3a00000 mov r0, #0
9c220: e8bd8030 pop {r4, r5, pc}
return IOD_EOF;
9c224: e3e00004 mvn r0, #4
9c228: e8bd8030 pop {r4, r5, pc}
The glibc function memcpy is never called but the compiler builtin is used. This
makes it clear where the SIGBUS comes from.
> >
> > Here is the objdump:
> >
> > mycopy.o: file format elf32-littlearm
> >
> > Disassembly of section .text:
> >
> > 00000000 <mycopy>:
> > #include <string.h>
> >
> > int mycopy(void)
> > {
> > 0: e92d4800 push {fp, lr}
> > 4: e28db004 add fp, sp, #4
> > 8: e24dd010 sub sp, sp, #16
> > c: e59f2070 ldr r2, [pc, #112] ; 84 <mycopy+0x84>
> > 10: e08f2002 add r2, pc, r2
> > 14: e59f306c ldr r3, [pc, #108] ; 88 <mycopy+0x88>
> > 18: e7923003 ldr r3, [r2, r3]
> > 1c: e5933000 ldr r3, [r3]
> > 20: e50b3008 str r3, [fp, #-8]
> > 24: e3a03000 mov r3, #0
> > int a = 4;
> > 28: e3a03004 mov r3, #4
> > 2c: e50b3010 str r3, [fp, #-16]
> > 30: e51b300c ldr r3, [fp, #-12]
> > unsigned int x;
> > unsigned char y[4];
> >
> > memcpy(&x, &y[0], 4);
> > 34: e50b3014 str r3, [fp, #-20] ; 0xffffffec
> >
> > memcpy(&x, &y[0], a);
> > 38: e51b2010 ldr r2, [fp, #-16]
> > 3c: e24b100c sub r1, fp, #12
> > 40: e24b3014 sub r3, fp, #20
> > 44: e1a00003 mov r0, r3
> > 48: ebfffffe bl 0 <memcpy>
> >
> > return 0;
> > 4c: e3a03000 mov r3, #0
> > [...]
> >
> >> >> > But yes I need to confess that when looking into the glibc for
> >> >> > example this code
> >> >> > snippet could be optimized to allow larger amounts of data to be
> >> >> > passed to the
> >> >> > memcpy function so that it's up to the C library and compiler to
> >> >> > optimize it the
> >> >> > best.
> >> >> >
> >> >> >
> >> >> >>
> >> >> >> > count -= align;
> >> >> >> > offset += align;
> >> >> >> > m += align;
> >> >> >> > }
> >> >> >> >
> >> >> >> > /* copy remaining unaligned bytes */
> >> >> >> > while (count)
> >> >> >> > {
> >> >> >> > memcpy (m, dev_map->addr + offset, 1);
> >> >> >> > count--;
> >> >> >> > offset++;
> >> >> >> > m++;
> >> >> >> > }
> >> >> >> >
> >> >> >> > return IOD_OK;
> >> >> >> > }
> >> >> >> >
> >> >> >> > The write function is quite similar implemented in the same manner.
> >> >> >> >
> >> >> >> > WDYT?
> >> >> >> >
> >> >> >> > Andreas
> >> >> >
> >> >> > Best regards,
> >> >> >
> >> >> > Andreas
signature.asc
Description: PGP signature
- Re: [PATCH v2] Add io space MMAP for memory mapping devices and files, (continued)
- Re: [PATCH v2] Add io space MMAP for memory mapping devices and files, Jose E. Marchesi, 2024/01/06
- Re: [PATCH v2] Add io space MMAP for memory mapping devices and files, Andreas Klinger, 2024/01/06
- Re: [PATCH v2] Add io space MMAP for memory mapping devices and files, Andreas Klinger, 2024/01/06
- Re: [PATCH v2] Add io space MMAP for memory mapping devices and files, Jose E. Marchesi, 2024/01/07
- Re: [PATCH v2] Add io space MMAP for memory mapping devices and files, Andreas Klinger, 2024/01/08
- Re: [PATCH v2] Add io space MMAP for memory mapping devices and files, Jose E. Marchesi, 2024/01/08
- Re: [PATCH v2] Add io space MMAP for memory mapping devices and files, Andreas Klinger, 2024/01/08
- Re: [PATCH v2] Add io space MMAP for memory mapping devices and files, Jose E. Marchesi, 2024/01/08
- Re: [PATCH v2] Add io space MMAP for memory mapping devices and files, Andreas Klinger, 2024/01/08
- Re: [PATCH v2] Add io space MMAP for memory mapping devices and files, Jose E. Marchesi, 2024/01/08
- Re: [PATCH v2] Add io space MMAP for memory mapping devices and files,
Andreas Klinger <=
- Re: [PATCH v2] Add io space MMAP for memory mapping devices and files, Jose E. Marchesi, 2024/01/09
- Re: [PATCH v2] Add io space MMAP for memory mapping devices and files, Andreas Klinger, 2024/01/09
- Re: [PATCH v2] Add io space MMAP for memory mapping devices and files, Jose E. Marchesi, 2024/01/09
- Re: [PATCH v2] Add io space MMAP for memory mapping devices and files, Jose E. Marchesi, 2024/01/09
- Re: [PATCH v2] Add io space MMAP for memory mapping devices and files, Andreas Klinger, 2024/01/09
- Re: [PATCH v2] Add io space MMAP for memory mapping devices and files, Jose E. Marchesi, 2024/01/09
- Re: [PATCH v2] Add io space MMAP for memory mapping devices and files, Jose E. Marchesi, 2024/01/09
- Re: Re: [PATCH v2] Add io space MMAP for memory mapping devices and files, Mohammad-Reza Nabipoor, 2024/01/07
- Re: Re: [PATCH v2] Add io space MMAP for memory mapping devices and files, Andreas Klinger, 2024/01/08
Re: [PATCH v2] Add io space MMAP for memory mapping devices and files, Jose E. Marchesi, 2024/01/06