[ASM] A (short) Treatise On Longcalling

We’ve all had to use more space somewhere in a routine. So, we’ve had to branch to more space. Generally, from what I’ve heard from @CT075 is we’d do something like this

Offset - Code

00 (code)
02 ldr r1, constant
04 bx r1
06 nop
08 cons-
0A -tant 

Then at the end of our custom code, just

bx lr

But this is inflexible – what if we just needed to replace a subroutine? Do a calculation, then do more stuff. I ended up doing something like this:

(code)
push {r1-r2}
ldr r1, constant
mov r2, pc
add r2, #0x5
mov lr, r2
mov pc, r1 @or bx r1
pop {r1-r2}
b stuff_after
.align
.long constant
stuff_after:

Which gets the job done but is sooo horribly inelegant.

Then I realized the answer was under our noses the entire time – look at the way the game handles dynamic functions.

080386B4 9909     ldr     r1,[sp,#0x24]
080386B6 F087FACB bl      #0x80BFC50

What is at 0x80BFC50? Why, it’s very simple…

080BFC4C 4700     bx      r0
080BFC4E 46C0     nop
080BFC50 4708     bx      r1    @this is the line we branched to
080BFC52 46C0     nop
080BFC54 4710     bx      r2
080BFC56 46C0     nop
080BFC58 4718     bx      r3
080BFC5A 46C0     nop

That’s right, what the game does is it uses bl to set the link register, then uses bx to go where it needs to. It combines the returnability of bl with the longcalling of bx. It also allows for dynamic calling.

So to implement this in our own code, we might change code that’s like

(code)
(where we want our calcs to take place)
(more code)

into something like

(code)    
ldr r1, const
bl goto_r1
b more_code
goto_r1:
bx r1    @This may possibly go after the constant depending on word-alignment
.long constant
more_code:
(more code)

And this is the real way to insert more calculations in the middle of a routine. Even more elegantly, if the routine you’re inserting it ends in a bx r0 or bx r1, you can set a label there and bl to that.

if you’re longcalling from inside the code section of the ROM you may as well just use the built-in function

the thing you cited me as the source for is only if you’re replacing the entire subroutine, in which case it’s more efficient cycle-wise to hack that function to instead longcall you

for that matter, if the initial subroutine loads a big constant at a convenient spot, you can replace that constant to a pointer to your routine, hack in a bx to that, then bx back to the (fixed) location of the rest of the routine (which is the correct solution if you only need to slightly modify a value that is only known at runtime)

there isn’t a “this is always the correct solution” implementation, it’s all situational anyway

your initial solution isn’t as bad as you think it is; it can actually be compressed and then macro’d, which is what hextator does

Right, if you’re replacing the whole thing there’s no reason to use the first, but it does end up being a bit inflexible, and if you do want to replace even most of it, like 2/3 you’d have to end up copying the whole routine most times because of jumps and such.

But the second solution I think shouldn’t be used, as it’s hard to understand at a first glance and – well, I am going to preach about readable assembly code. It takes up about the same amount of space as the bl solution though, which is more robust anyway, so I do see it as worse.

I’ve done this before and again my complain is robustness. It’s really difficult to expand upon if someone else wants to make an expansion to your code or to the original that is compatible with yours, as opposed to if you change only the relevant calculations.

Yeah, agreed. But I get too lazy to look up where those are sometimes :P. Also unfortunately not viable if you’re longcalling from inside your custom routine outside the code section, but there, space is less of a concern.

it hadn’t occurred to me to use a call to adjust the LR (I’m assuming it uses significantly fewer bytes since there’s no pushing/popping/adr pc to target the return location) but I do make my code call itself when I want to get the current PC in some x86 code so

I guess

it’s a shame I didn’t run into that problem with x86 until after I had considered this similar issue with Thumb “solved”

I do not recommend moving stuff into pc though, use bx

e: I looked it over and it saves 4 bytes, not bad