ASM Hacking: “Conventions”
This is for people that have basic understanding of ARM/thumb ASM and its concepts, but want to know more about how to integrate it with the game’s code/other people’s code.
This is also for people that know all of the important stuff but are curious about the details.
This is also for people wanting to hack in C or something and need to know how to expect a C compiler to generate code you need to interface with ASM.
This “guide” is here to help you understand the standard ARM ASM conventions.
Sure this is all kind of common knowledge between hackers and it doesn’t take more than 10 min to explain/understand the big lines to/as a newcomer, but I think that there is value going through all of this “again”, especially when it comes to the sometimes more obscure details.
“Calling”?
When I say “calling” or a “call”, I refer to the act of jumping to a (sub-)routine from another (sub-)routine, with the expectation that that subroutine will jump back to where it was jumped from.
For example: in the GetUnitLuck
routine (FE8U:08019298
; this will be our main example in this document), there’s a “call” to GetUnitEquippedWeapon
(FE8U:08016B28
). This is done through the use of the bl
opcode, which stores the value read from pc
in lr
so that the routine we’re calling (GetUnitEquippedWeapon
) knows how to come back (“return”) to GetUnitLuck
(by looking at lr
).
If you ever coded in C or another C-like programming language (or any programming language really), you’ll be familiar with the concept of calling a function. This is exactly the same thing: we’re calling a routine, which can be considered as a C function no problem (that’s what part of conventions are all about actually, so we’ll definitely get to that eventually).
The Registers
The first thing we have to understand is the role of each registers in assembly code. As you may know, there’s 16 of them (r0-r15)[*1], and only 8 of them are accessible most of the time in thumb (r0-r7). Here’s a simple description of what is used for what:
-
r0-r3
: Those are the “scratch registers”. Between routine calls, you can do whatever you want with them and be fine. They also have special meaning when it comes to calls, which is what this document is all about explaining. Stay tuned. -
r4-r11
: Those are the “variable registers”. If you want to use them, you have to make sure you revert them to their original state when you are done with them (usually done throughpush
&pop
). This also means that those should be guaranteed to be effectively unchanged even after a routine call. -
r12
/ip
: This one is kinda wierd: in 99.9% of cases, you can consider it scratch (same asr0-r3
). This is good but also kinda meh, as it is the only scratch you can’t easily acces in thumb. The remainder 0.1% of the time, it is the “The Intra-Procedure-call scratch register”, which is kinda complicated and I’ll get to it later. -
r13
/sp
: The “stack pointer”. Points to the top of the stack. It is modified when you callpush
&pop
, and you rarely have to mess with it directly. You have to conserve its value when your routine is called, which translates to the rule of “eachpush
must correspond to apop
with the same amount of registers”. -
r14
/lr
: The “link register”. This is where the return location is stored to when you usebl
. -
r15
/pc
: The “program counter”. This is where the current[*2] opcode address is stored. Keep in mind: when you write there, you effectively jump/branch to the address you just wrote.
[*1] Those are actually only the core registers, aka the set of registers visible by most of the instruction set. There’s a few others that are special or are alternative versions of some core ones in other modes.
[*2] It’s not exactly the current opcode, but the one currently being read by the cpu, which is 2 opcodes ahead. When you read from it, you’ll get the address 2 opcodes ahead from where the reading opcode is (also keep in mind: one bl
is 2 opcodes).
The Stack
See also the wikipedia article.
The stack is a part of memory that is used by routines to store local data (data that will only be used during the one routine call). The stack pointer (register r13
/sp
) contains an address to the top of that part of memory.
The stack on the ARM Architecture is “full-descending”, which means that the stack grows downwards. When you call push {r4}
, the top of the stack (aka sp
) goes down (4 bytes) (and the value of r4
is stored at that location). It’s descending.
Stack Frames
A Stack Frame is the term used to define a single subparts of the stack that represents the local data of a single routine. The top of the stack is also the top of the Stack Frame for the currently executing routine (Which is “below” the rest of the stack address-wise, since its descending, but it’s still the top… still there?).
When a routine is called, the first thing this routine should do (in theory) is allocate its stack frame, which is usually done through one or more push
opcodes, and when it’s done, deallocates it (which in the process should revert the stack pointer to its original state), usually with pop
s.
An example: GetUnitLuck
To illustrate the concept of stack frames, we’ll look at a fairly simple routine (located at address 0x08019298
in FE8U), whose assembly looks like this:
ASM
GetUnitLuck:
push {r4, lr}
mov r4, r0
bl 0x08016B28 @ GetUnitEquippedWeapon
lsl r0, #0x10
lsr r0, #0x10
bl 0x08016510 @ GetItemLckBonus
mov r1, r0
mov r0, #0x19 @ UnitStruct.lck
ldsb r0, [r4, r0]
add r0, r1
pop {r4}
pop {r1}
bx r1
I’m sure you won’t have many issues trying to understand what this routine does (especially with me laying it out cleanly like that), but that isn’t the point. Here we will simply try to understand it in terms of stack frames.
Let’s consider the initial stack pointer, and name that value sp0
. This is the bottom of this routine’s stack frame. the first opcode of this routine is push {r4, lr}
, which pushes lr
and r4
on the stack (in that order). Since a register is 4 bytes, and we’re pushing 2 registers, we’re offseting the stack pointer by 2×4=8 bytes. so here, sp = sp0-8
(the stack is descending so we substract).
Since there isn’t any other push
instruction (nor any direct references to sp
), we can safely assume that the current sp
value is the top of this routine’s stack frame.
Now let’s jump forward a bit towards the end of the routine: we now have pop {r4}
& pop {r1}
. We’re popping one register at the time twice, which means 2 registers: r4
& r1
(in that order). Since we’re popping 2 registers aka 8 bytes, we have sp=(sp0-8)+8=sp0
. The stack pointer has been restored and we now just have to return to the calling location (that’s what the bx r1
is for).
From this, we can deduce the size & content of the stack frame: it’s 2 words (8 bytes) big. word at sp+0 contains the saved value of r4
(allowing us the use the register and restore its content at the end of our routine), and word at sp+4 contains the return address of this routine (the value of lr
on call).
Routines as Functions: Arguments & Return Values
Ok so what’s a (C) Function? It’s a piece of reusable code that does stuff (just like a routine, which is totally surprising). What it introduces over regular routines is the concept of arguments & return values. Arguments are a bunch of values that the calling code passes to the function for it to mess with, and the return value is a value that the function returns to the calling code.
For example, a “square” function could take a number as sole argument, and return the value of the square of that number.
Calling Conventions are rules allowing subroutines to behave, and consider each-other like C-Style functions.
Arguments in Conventional ARM ASM
So this is where the “special” role of the r0-r3
registers comes into play: the convention says that the first 4 argument values are mapped to those registers (from r0
to r3
, in order). That’s the fairly easy part.
In the case where you have more than 4 arguments, it becomes a bit more complicated. The stack is used to pass arguments 5+. The fifth argument is stored at word sp+00
, the sixth one at sp+04
and so on. This does mean that the calling code has to allocate enough memory on its stack frame to store these arguments. As for the called function, the extra space allocated by the caller for extra arguments can be considered as an extention to this function’s stack frame (so part of the stack frames of each function becomes shared).
In the case where you have arguments whose values are larger than 4 bytes (this mostly applies to higher level languages where structs are a language construct that exist), then the value is cut into multiple 4-byte parts and each of those parts are considered as separate arguments. This isn’t really relevant, as most larger structures are referenced through 4 bytes pointers and not by value. I only ever found one function that has an argument obviously split in two because it represents a structure passed by copy (FE8U:0801106C
, takes a pointer to popup definition in r0, and a copy of a text handle structure in r1-r2).
TL;DR: Arguments are stored into (in order): [r0
, r1
, r2
, r3
, [sp+0]
, [sp+4]
, [sp+8]
, …], and large single arguments (>4 bytes) are split into multiple.
Return values in Conventional ARM ASM
We’re back to r0-r3
, even if it’s really only r0
. The return value (if any) is stored in r0
.
In the case where you’re not returning anything, then nobody cares and this isn’t relevant (read: r0
can be whatever at the end of the function).
In the case where you’re returning a single value that’s more than a word big, but less than 2 (this isn’t common, in fact the only case I think this would happen in our situation is when in C you’re returning a value of type long long
), the value is cut in two and returned in r0-r1
.
In the case where you’re returning a composite value that’s more than a word big (such as a structure by copy), then it gets complicated: it adds an additional hidden argument to the function (and it becomes the first), which contains the output address for the routine to write the return value to.
Examples
GetUnitLuck
If you go back to the section on the Stack, you’ll get that routine’s code. We can deduce if a function takes arguments by looking if it’s reading from something that can be an argument.
And indeed we see that the first thing the routine does after doing stack frame shenanigans is copying r0
(!) into r4
(mov r4, r0
), which implies that there is a value expected in r0
(to know what it is, you will have to do some extra research, for example by looking at what that function is doing with it, or through getting value samples by debugging & setting breakpoints. Spoiler: this value is a pointer to a unit struct (the Teq Doq is good for info on those)).
Since there doesn’t seem to be any reference to r1
or after before the first subroutine call (where we can’t expect r0-r3
to be conserved anymore), we’ll assume that the function takes one argument.
Next we’ll be looking towards the end of the routine to see if it seems to return anything (you probably already deduced from the name I gave that function that it probably does, and it’s probably one Unit’s Luck stat, but let’s assume we don’t know that).
One of the last opcodes in this function/routine/whatever is add r0, r1
which modifies r0
, again since we won’t be messing with r0
anytime in this routine anymore, we can safely deduce that this is r0
as return value holder. We have a return value.
Another way we can deduce the existance of a return value is by looking at the very end of the routine (the part where the “return” is done) and look which register is used for returning. If it doesn’t return a value, it probably will use r0
to get the return address and return to the caller. But in this case the register used is r1
, which actually implies that r0
is already used for something else (like say, returning the value). This method is more useful when the routine ends on a subroutine call (this subroutine then probably returns a value too, since the same r0
would be passed down two+ calls).
So to conclude, this routine has one argument (a pointer to a Unit struct), and one return value (a Unit’s luck stat value). With those informations, we can write a C-Style Function Signature:
int GetUnitLuck(void* unit);
(int
denotes an integer type, and void*
a generic pointer type). If you don’t know how to read a C-Style function signature, here’s the general format:
<return type> <name>(<argument type> <argument name>, ...);
Some C-Style Pseudo-Functions
int Add(int left, int right);
Arguments:
-
r0
= left -
r1
= right
Returns:
-
r0
= probably the sum of left & right
struct vector2 { int x; int y; };
void SetUnitPosition(void* unit, struct vector2 position);
Arguments:
-
r0
= unit (pointer) -
r1
= position.x -
r2
= position.y
(Doesn’t Return)
struct SomeVeryBigStruct MakeSomeVeryBigStruct(int aValue);
Arguments:
-
r0
= pointer to enough space to hold a “SomeVeryBigStruct” structure -
r1
= aValue
(Doesn’t Return)
Ok whatever but what’s r12
/ip
for
I already told you in the register section that is was “The Intra-Procedure-call scratch register”. Which means it’s the scratch register during “Intra-Procedure-call”.
“Intra-Procedure-call” refers to the possible overhead that can arise when calling a routine that is defined externally. To put it simply: even if in your C or ASM you put in an explicit call to SomeRoutine
, the linker (such as binutils ld
, binutils gold
or lyn
) may need to put in some extra steps into the call (“Intra-Procedure-call”) for various reasons (called function is out of reach (remember that thumb bl
has a ±4MB range), or one routine is thumb while the other is arm (mode exchange required), etc). In fact, lyn
does that sometimes.
Since r0-r3
can’t be modified since they need to hold arguments (remember, we’re in the middle of a routine call), and r4-r11
can’t either by convention, said convention reserves ip
/r12
as the sole scratch register available during “Intra-Procedure-call”.
Example of usage (from lyn
):
ASM
.thumb
SomeRoutine:
bx pc
.align
.arm
ldr ip, _SomeRoutineAddress
bx ip
_SomeRoutineAddress:
.word SomeRoutineAddress
Note: It is my opinion that the macro helpers in EA Extentions/Hack Installation.txt
should be using that register too.
The End
This was probably all too not worth it but whatever I already wrote it. In the unlikely even in which it would have been, and in which you even have a question and/or suggestion, feel free to ask
Good Day! - Stan