Migrate the ARM entry point to `global_asm!`. #383

jrvanwhy · 2022-02-15T22:13:27Z

Building libtock_runtime no longer requires an external toolchain.

There is one instruction that compiles differently after this PR -- mov r5, r0 now compiles into mov r5, r0 rather than adds r5, r0, #0. I don't anticipate any issues from this change, but I don't have any way to test it, so help testing this would be appreciated.

Building `libtock_runtime` no longer requires an external toolchain. Help Needed ----------- This generates significantly different code for the entry point than the old implementation. I don't have an ARM board to test with, or enough knowledge to understand why the difference exists, so I need help testing/debugging this.

jrvanwhy · 2022-02-15T22:35:56Z

D'oh, I should've looked at my own comparison a bit harder before asking for help! When I compared the assembly using vimdiff, it was hard to see the difference, but now that I've lined them up side-by-side the issue is much more apparent.

The source code uses the add/mov/sub mnemonics. These mnemonics can represent multiple instructions of differing widths. The external toolchain (GCC) always picked the smallest instructions for those mnemonics, whereas the Rust toolchain (LLVM) was always picking 4-byte instructions for those mnemonics. Manually switching those instructions to adds/movs/subs generates almost identical code to what GCC generates.

I'll push a new commit containing the fix, and remove the request for help from the PR description. I think the request for help should be archived for future reference, so I've copied it into this comment:

Archived request for help

This generates significantly different code for the entry point than the old
implementation. I don't have an ARM board to test with, or enough knowledge to
understand why the difference exists, so I need help testing/debugging this.

Disassembly comparison

Here is a comparison of the disassembly of the entry point before and after this PR:

30068 <start>:
+-------------------------------------------+ +-------------------------------------------+
| External                                  | | global_asm!                               |
+-------------------------------------------+ +-------------------------------------------+
30068: 467c      mov   r4, pc                 30068: 467c      mov   r4, pc
3006a: 1c05      adds  r5, r0, #0             3006a: 4605      mov   r5, r0
3006c: 6828      ldr   r0, [r5, #0]           3006c: 6828      ldr   r0, [r5, #0]
3006e: 3003      adds  r0, #3                 3006e: f100 0003 add.w r0, r0, #3
30070: 42a0      cmp   r0, r4                 30072: 42a0      cmp   r0, r4
30072: d005      beq.n 30080 <start+0x18>     30074: d009      beq.n 3008a <start+0x22>
30074: 2008      movs  r0, #8                 30076: f04f 0008 mov.w r0, #8
30076: 2101      movs  r1, #1                 3007a: f04f 0101 mov.w r1, #1
30078: 2202      movs  r2, #2                 3007e: f04f 0202 mov.w r2, #2
3007a: df02      svc   2                      30082: df02      svc   2
3007c: 2000      movs  r0, #0                 30084: f04f 0000 mov.w r0, #0
3007e: df06      svc   6                      30088: df06      svc   6
30080: 2000      movs  r0, #0                 3008a: f04f 0000 mov.w r0, #0
30082: 6869      ldr   r1, [r5, #4]           3008e: 6869      ldr   r1, [r5, #4]
30084: df05      svc   5                      30090: df05      svc   5
30086: 68a8      ldr   r0, [r5, #8]           30092: 68a8      ldr   r0, [r5, #8]
30088: 4685      mov   sp, r0                 30094: 4685      mov   sp, r0
3008a: 68e8      ldr   r0, [r5, #12]          30096: 68e8      ldr   r0, [r5, #12]
3008c: b140      cbz   r0, 300a0 <start+0x38> 30098: b158      cbz   r0, 300b2 <start+0x4a>
3008e: 6929      ldr   r1, [r5, #16]          3009a: 6929      ldr   r1, [r5, #16]
30090: 696a      ldr   r2, [r5, #20]          3009c: 696a      ldr   r2, [r5, #20]
30092: 680b      ldr   r3, [r1, #0]           3009e: 680b      ldr   r3, [r1, #0]
30094: 6013      str   r3, [r2, #0]           300a0: 6013      str   r3, [r2, #0]
30096: 3804      subs  r0, #4                 300a2: f1a0 0004 sub.w r0, r0, #4
30098: 3104      adds  r1, #4                 300a6: f101 0104 add.w r1, r1, #4
3009a: 3204      adds  r2, #4                 300aa: f102 0204 add.w r2, r2, #4
3009c: 2800      cmp   r0, #0                 300ae: 2800      cmp   r0, #0
3009e: d1f8      bne.n 30092 <start+0x2a>     300b0: d1f5      bne.n 3009e <start+0x36>
300a0: 69a8      ldr   r0, [r5, #24]          300b2: 69a8      ldr   r0, [r5, #24]
300a2: b130      cbz   r0, 300b2 <start+0x4a> 300b4: b148      cbz   r0, 300ca <start+0x63>
300a4: 69e9      ldr   r1, [r5, #28]          300b6: 69e9      ldr   r1, [r5, #28]
300a6: 2200      movs  r2, #0                 300b8: f04f 0200 mov.w r2, #0
300a8: 700a      strb  r2, [r1, #0]           300bc: 700a      strb  r2, [r1, #0]
300aa: 3801      subs  r0, #1                 300be: f1a0 0001 sub.w r0, r0, #1
300ac: 3101      adds  r1, #1                 300c2: f101 0101 add.w r1, r1, #1
300ae: 2800      cmp   r0, #0                 300c6: 2800      cmp   r0, #0
300b0: d1fa      bne.n 300a8 <start+0x40>     300c8: d1f8      bne.n 300bc <start+0x54>
300b2: f000 f814 bl    300de <rust_start>     300ca: f000 f814 bl    300f6 <rust_start>

LLVM doesn't automatically pick the smallest instructions for the add/mov/sub mnemonics like GCC did. This means that moving to global_asm! made `start` larger. This change explicitly names the smaller instructions, and generates code of identical size to the GCC toolchain. There is one remaining difference between the external assembly and this `global_asm!` implementation: LLVM translates `mov r5, r0` into `mov r5, r0`, whereas GCC translated it into `adds r5, r0, #0`.

hudson-ayers · 2022-02-16T00:02:27Z

I have not tested this code yet, but the changes look good

jrvanwhy · 2022-02-16T21:38:55Z

bors r+

This PR and #381 will conflict because #381 will need to modify the startup assembly. Merging this now so the assembly can be fixed in #381 without conflicts.

bors · 2022-02-16T21:52:51Z

Build succeeded:

ci
size-diff

jrvanwhy added help wanted significant Indicates a PR is significant as defined by the code review policy. and removed help wanted labels Feb 15, 2022

jrvanwhy marked this pull request as ready for review February 15, 2022 22:46

hudson-ayers approved these changes Feb 15, 2022

View reviewed changes

alistair23 approved these changes Feb 16, 2022

View reviewed changes

alexandruradovici mentioned this pull request Feb 16, 2022

Add ARM v6 syscalls and Raspberry Pi Pico / Nano Rp2040 Connect #381

Merged

bors bot merged commit 5d4c4f9 into tock:master Feb 16, 2022

jrvanwhy deleted the arm-global_asm branch June 1, 2022 22:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate the ARM entry point to `global_asm!`. #383

Migrate the ARM entry point to `global_asm!`. #383

jrvanwhy commented Feb 15, 2022 •

edited

Loading

jrvanwhy commented Feb 15, 2022 •

edited

Loading

hudson-ayers commented Feb 16, 2022

jrvanwhy commented Feb 16, 2022

bors bot commented Feb 16, 2022

Migrate the ARM entry point to global_asm!. #383

Migrate the ARM entry point to global_asm!. #383

Conversation

jrvanwhy commented Feb 15, 2022 • edited Loading

jrvanwhy commented Feb 15, 2022 • edited Loading

Archived request for help

Disassembly comparison

hudson-ayers commented Feb 16, 2022

jrvanwhy commented Feb 16, 2022

bors bot commented Feb 16, 2022

Migrate the ARM entry point to `global_asm!`. #383

Migrate the ARM entry point to `global_asm!`. #383

jrvanwhy commented Feb 15, 2022 •

edited

Loading

jrvanwhy commented Feb 15, 2022 •

edited

Loading