Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate the ARM entry point to global_asm!. #383

Merged
merged 2 commits into from
Feb 16, 2022

Conversation

jrvanwhy
Copy link
Collaborator

@jrvanwhy jrvanwhy commented Feb 15, 2022

Building libtock_runtime no longer requires an external toolchain.

There is one instruction that compiles differently after this PR -- mov r5, r0 now compiles into mov r5, r0 rather than adds r5, r0, #0. I don't anticipate any issues from this change, but I don't have any way to test it, so help testing this would be appreciated.

Building `libtock_runtime` no longer requires an external toolchain.

Help Needed
-----------

This generates significantly different code for the entry point than the old
implementation. I don't have an ARM board to test with, or enough knowledge to
understand why the difference exists, so I need help testing/debugging this.
@jrvanwhy jrvanwhy added help wanted significant Indicates a PR is significant as defined by the code review policy. and removed help wanted labels Feb 15, 2022
@jrvanwhy
Copy link
Collaborator Author

jrvanwhy commented Feb 15, 2022

D'oh, I should've looked at my own comparison a bit harder before asking for help! When I compared the assembly using vimdiff, it was hard to see the difference, but now that I've lined them up side-by-side the issue is much more apparent.

The source code uses the add/mov/sub mnemonics. These mnemonics can represent multiple instructions of differing widths. The external toolchain (GCC) always picked the smallest instructions for those mnemonics, whereas the Rust toolchain (LLVM) was always picking 4-byte instructions for those mnemonics. Manually switching those instructions to adds/movs/subs generates almost identical code to what GCC generates.

I'll push a new commit containing the fix, and remove the request for help from the PR description. I think the request for help should be archived for future reference, so I've copied it into this comment:

Archived request for help

This generates significantly different code for the entry point than the old
implementation. I don't have an ARM board to test with, or enough knowledge to
understand why the difference exists, so I need help testing/debugging this.

Disassembly comparison

Here is a comparison of the disassembly of the entry point before and after this PR:

30068 <start>:
+-------------------------------------------+ +-------------------------------------------+
| External                                  | | global_asm!                               |
+-------------------------------------------+ +-------------------------------------------+
30068: 467c      mov   r4, pc                 30068: 467c      mov   r4, pc
3006a: 1c05      adds  r5, r0, #0             3006a: 4605      mov   r5, r0
3006c: 6828      ldr   r0, [r5, #0]           3006c: 6828      ldr   r0, [r5, #0]
3006e: 3003      adds  r0, #3                 3006e: f100 0003 add.w r0, r0, #3
30070: 42a0      cmp   r0, r4                 30072: 42a0      cmp   r0, r4
30072: d005      beq.n 30080 <start+0x18>     30074: d009      beq.n 3008a <start+0x22>
30074: 2008      movs  r0, #8                 30076: f04f 0008 mov.w r0, #8
30076: 2101      movs  r1, #1                 3007a: f04f 0101 mov.w r1, #1
30078: 2202      movs  r2, #2                 3007e: f04f 0202 mov.w r2, #2
3007a: df02      svc   2                      30082: df02      svc   2
3007c: 2000      movs  r0, #0                 30084: f04f 0000 mov.w r0, #0
3007e: df06      svc   6                      30088: df06      svc   6
30080: 2000      movs  r0, #0                 3008a: f04f 0000 mov.w r0, #0
30082: 6869      ldr   r1, [r5, #4]           3008e: 6869      ldr   r1, [r5, #4]
30084: df05      svc   5                      30090: df05      svc   5
30086: 68a8      ldr   r0, [r5, #8]           30092: 68a8      ldr   r0, [r5, #8]
30088: 4685      mov   sp, r0                 30094: 4685      mov   sp, r0
3008a: 68e8      ldr   r0, [r5, #12]          30096: 68e8      ldr   r0, [r5, #12]
3008c: b140      cbz   r0, 300a0 <start+0x38> 30098: b158      cbz   r0, 300b2 <start+0x4a>
3008e: 6929      ldr   r1, [r5, #16]          3009a: 6929      ldr   r1, [r5, #16]
30090: 696a      ldr   r2, [r5, #20]          3009c: 696a      ldr   r2, [r5, #20]
30092: 680b      ldr   r3, [r1, #0]           3009e: 680b      ldr   r3, [r1, #0]
30094: 6013      str   r3, [r2, #0]           300a0: 6013      str   r3, [r2, #0]
30096: 3804      subs  r0, #4                 300a2: f1a0 0004 sub.w r0, r0, #4
30098: 3104      adds  r1, #4                 300a6: f101 0104 add.w r1, r1, #4
3009a: 3204      adds  r2, #4                 300aa: f102 0204 add.w r2, r2, #4
3009c: 2800      cmp   r0, #0                 300ae: 2800      cmp   r0, #0
3009e: d1f8      bne.n 30092 <start+0x2a>     300b0: d1f5      bne.n 3009e <start+0x36>
300a0: 69a8      ldr   r0, [r5, #24]          300b2: 69a8      ldr   r0, [r5, #24]
300a2: b130      cbz   r0, 300b2 <start+0x4a> 300b4: b148      cbz   r0, 300ca <start+0x63>
300a4: 69e9      ldr   r1, [r5, #28]          300b6: 69e9      ldr   r1, [r5, #28]
300a6: 2200      movs  r2, #0                 300b8: f04f 0200 mov.w r2, #0
300a8: 700a      strb  r2, [r1, #0]           300bc: 700a      strb  r2, [r1, #0]
300aa: 3801      subs  r0, #1                 300be: f1a0 0001 sub.w r0, r0, #1
300ac: 3101      adds  r1, #1                 300c2: f101 0101 add.w r1, r1, #1
300ae: 2800      cmp   r0, #0                 300c6: 2800      cmp   r0, #0
300b0: d1fa      bne.n 300a8 <start+0x40>     300c8: d1f8      bne.n 300bc <start+0x54>
300b2: f000 f814 bl    300de <rust_start>     300ca: f000 f814 bl    300f6 <rust_start>

LLVM doesn't automatically pick the smallest instructions for the add/mov/sub mnemonics like GCC did. This means that moving to global_asm! made `start` larger. This change explicitly names the smaller instructions, and generates code of identical size to the GCC toolchain.

There is one remaining difference between the external assembly and this `global_asm!` implementation: LLVM translates `mov r5, r0` into `mov r5, r0`, whereas GCC translated it into `adds r5, r0, #0`.
@jrvanwhy jrvanwhy marked this pull request as ready for review February 15, 2022 22:46
@hudson-ayers
Copy link
Contributor

I have not tested this code yet, but the changes look good

@jrvanwhy
Copy link
Collaborator Author

bors r+

This PR and #381 will conflict because #381 will need to modify the startup assembly. Merging this now so the assembly can be fixed in #381 without conflicts.

@bors
Copy link
Contributor

bors bot commented Feb 16, 2022

Build succeeded:

@bors bors bot merged commit 5d4c4f9 into tock:master Feb 16, 2022
@jrvanwhy jrvanwhy deleted the arm-global_asm branch June 1, 2022 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
significant Indicates a PR is significant as defined by the code review policy.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants