[DOC][ASM] "Conventions"

ASM Hacking: “Conventions”

This is for people that have basic understanding of ARM/thumb ASM and its concepts, but want to know more about how to integrate it with the game’s code/other people’s code.

This is also for people that know all of the important stuff but are curious about the details.

This is also for people wanting to hack in C or something and need to know how to expect a C compiler to generate code you need to interface with ASM.

This “guide” is here to help you understand the standard ARM ASM conventions.

Sure this is all kind of common knowledge between hackers and it doesn’t take more than 10 min to explain/understand the big lines to/as a newcomer, but I think that there is value going through all of this “again”, especially when it comes to the sometimes more obscure details.

“Calling”?

When I say “calling” or a “call”, I refer to the act of jumping to a (sub-)routine from another (sub-)routine, with the expectation that that subroutine will jump back to where it was jumped from.

For example: in the GetUnitLuck routine (FE8U:08019298; this will be our main example in this document), there’s a “call” to GetUnitEquippedWeapon (FE8U:08016B28). This is done through the use of the bl opcode, which stores the value read from pc in lr so that the routine we’re calling (GetUnitEquippedWeapon) knows how to come back (“return”) to GetUnitLuck (by looking at lr).

If you ever coded in C or another C-like programming language (or any programming language really), you’ll be familiar with the concept of calling a function. This is exactly the same thing: we’re calling a routine, which can be considered as a C function no problem (that’s what part of conventions are all about actually, so we’ll definitely get to that eventually).

The Registers

The first thing we have to understand is the role of each registers in assembly code. As you may know, there’s 16 of them (r0-r15)[*1], and only 8 of them are accessible most of the time in thumb (r0-r7). Here’s a simple description of what is used for what:

  • r0-r3: Those are the “scratch registers”. Between routine calls, you can do whatever you want with them and be fine. They also have special meaning when it comes to calls, which is what this document is all about explaining. Stay tuned.
  • r4-r11: Those are the “variable registers”. If you want to use them, you have to make sure you revert them to their original state when you are done with them (usually done through push & pop). This also means that those should be guaranteed to be effectively unchanged even after a routine call.
  • r12/ip: This one is kinda wierd: in 99.9% of cases, you can consider it scratch (same as r0-r3). This is good but also kinda meh, as it is the only scratch you can’t easily acces in thumb. The remainder 0.1% of the time, it is the “The Intra-Procedure-call scratch register”, which is kinda complicated and I’ll get to it later.
  • r13/sp: The “stack pointer”. Points to the top of the stack. It is modified when you call push & pop, and you rarely have to mess with it directly. You have to conserve its value when your routine is called, which translates to the rule of “each push must correspond to a pop with the same amount of registers”.
  • r14/lr: The “link register”. This is where the return location is stored to when you use bl.
  • r15/pc: The “program counter”. This is where the current[*2] opcode address is stored. Keep in mind: when you write there, you effectively jump/branch to the address you just wrote.

[*1] Those are actually only the core registers, aka the set of registers visible by most of the instruction set. There’s a few others that are special or are alternative versions of some core ones in other modes.

[*2] It’s not exactly the current opcode, but the one currently being read by the cpu, which is 2 opcodes ahead. When you read from it, you’ll get the address 2 opcodes ahead from where the reading opcode is (also keep in mind: one bl is 2 opcodes).

The Stack

See also the wikipedia article.

The stack is a part of memory that is used by routines to store local data (data that will only be used during the one routine call). The stack pointer (register r13/sp) contains an address to the top of that part of memory.

The stack on the ARM Architecture is “full-descending”, which means that the stack grows downwards. When you call push {r4}, the top of the stack (aka sp) goes down (4 bytes) (and the value of r4 is stored at that location). It’s descending.

Stack Frames

A Stack Frame is the term used to define a single subparts of the stack that represents the local data of a single routine. The top of the stack is also the top of the Stack Frame for the currently executing routine (Which is “below” the rest of the stack address-wise, since its descending, but it’s still the top… still there?).

When a routine is called, the first thing this routine should do (in theory) is allocate its stack frame, which is usually done through one or more push opcodes, and when it’s done, deallocates it (which in the process should revert the stack pointer to its original state), usually with pops.

An example: GetUnitLuck

To illustrate the concept of stack frames, we’ll look at a fairly simple routine (located at address 0x08019298 in FE8U), whose assembly looks like this:

ASM
GetUnitLuck:
	push {r4, lr}
	
	mov r4, r0
	
	bl 0x08016B28 @ GetUnitEquippedWeapon
	
	lsl r0, #0x10
	lsr r0, #0x10
	
	bl 0x08016510 @ GetItemLckBonus
	
	mov r1, r0
	
	mov  r0, #0x19 @ UnitStruct.lck
	ldsb r0, [r4, r0]
	
	add r0, r1
	
	pop {r4}
	pop {r1}
	bx r1

I’m sure you won’t have many issues trying to understand what this routine does (especially with me laying it out cleanly like that), but that isn’t the point. Here we will simply try to understand it in terms of stack frames.

Let’s consider the initial stack pointer, and name that value sp0. This is the bottom of this routine’s stack frame. the first opcode of this routine is push {r4, lr}, which pushes lr and r4 on the stack (in that order). Since a register is 4 bytes, and we’re pushing 2 registers, we’re offseting the stack pointer by 2×4=8 bytes. so here, sp = sp0-8 (the stack is descending so we substract).

Since there isn’t any other push instruction (nor any direct references to sp), we can safely assume that the current sp value is the top of this routine’s stack frame.

Now let’s jump forward a bit towards the end of the routine: we now have pop {r4} & pop {r1}. We’re popping one register at the time twice, which means 2 registers: r4 & r1 (in that order). Since we’re popping 2 registers aka 8 bytes, we have sp=(sp0-8)+8=sp0. The stack pointer has been restored and we now just have to return to the calling location (that’s what the bx r1 is for).

From this, we can deduce the size & content of the stack frame: it’s 2 words (8 bytes) big. word at sp+0 contains the saved value of r4 (allowing us the use the register and restore its content at the end of our routine), and word at sp+4 contains the return address of this routine (the value of lr on call).

Routines as Functions: Arguments & Return Values

Ok so what’s a (C) Function? It’s a piece of reusable code that does stuff (just like a routine, which is totally surprising). What it introduces over regular routines is the concept of arguments & return values. Arguments are a bunch of values that the calling code passes to the function for it to mess with, and the return value is a value that the function returns to the calling code.

For example, a “square” function could take a number as sole argument, and return the value of the square of that number.

Calling Conventions are rules allowing subroutines to behave, and consider each-other like C-Style functions.

Arguments in Conventional ARM ASM

So this is where the “special” role of the r0-r3 registers comes into play: the convention says that the first 4 argument values are mapped to those registers (from r0 to r3, in order). That’s the fairly easy part.

In the case where you have more than 4 arguments, it becomes a bit more complicated. The stack is used to pass arguments 5+. The fifth argument is stored at word sp+00, the sixth one at sp+04 and so on. This does mean that the calling code has to allocate enough memory on its stack frame to store these arguments. As for the called function, the extra space allocated by the caller for extra arguments can be considered as an extention to this function’s stack frame (so part of the stack frames of each function becomes shared).

In the case where you have arguments whose values are larger than 4 bytes (this mostly applies to higher level languages where structs are a language construct that exist), then the value is cut into multiple 4-byte parts and each of those parts are considered as separate arguments. This isn’t really relevant, as most larger structures are referenced through 4 bytes pointers and not by value. I only ever found one function that has an argument obviously split in two because it represents a structure passed by copy (FE8U:0801106C, takes a pointer to popup definition in r0, and a copy of a text handle structure in r1-r2).

TL;DR: Arguments are stored into (in order): [r0, r1, r2, r3, [sp+0], [sp+4], [sp+8], …], and large single arguments (>4 bytes) are split into multiple.

Return values in Conventional ARM ASM

We’re back to r0-r3, even if it’s really only r0. The return value (if any) is stored in r0.

In the case where you’re not returning anything, then nobody cares and this isn’t relevant (read: r0 can be whatever at the end of the function).

In the case where you’re returning a single value that’s more than a word big, but less than 2 (this isn’t common, in fact the only case I think this would happen in our situation is when in C you’re returning a value of type long long), the value is cut in two and returned in r0-r1.

In the case where you’re returning a composite value that’s more than a word big (such as a structure by copy), then it gets complicated: it adds an additional hidden argument to the function (and it becomes the first), which contains the output address for the routine to write the return value to.

Examples

GetUnitLuck

If you go back to the section on the Stack, you’ll get that routine’s code. We can deduce if a function takes arguments by looking if it’s reading from something that can be an argument.

And indeed we see that the first thing the routine does after doing stack frame shenanigans is copying r0 (!) into r4 (mov r4, r0), which implies that there is a value expected in r0 (to know what it is, you will have to do some extra research, for example by looking at what that function is doing with it, or through getting value samples by debugging & setting breakpoints. Spoiler: this value is a pointer to a unit struct (the Teq Doq is good for info on those)).

Since there doesn’t seem to be any reference to r1 or after before the first subroutine call (where we can’t expect r0-r3 to be conserved anymore), we’ll assume that the function takes one argument.

Next we’ll be looking towards the end of the routine to see if it seems to return anything (you probably already deduced from the name I gave that function that it probably does, and it’s probably one Unit’s Luck stat, but let’s assume we don’t know that).

One of the last opcodes in this function/routine/whatever is add r0, r1 which modifies r0, again since we won’t be messing with r0 anytime in this routine anymore, we can safely deduce that this is r0 as return value holder. We have a return value.

Another way we can deduce the existance of a return value is by looking at the very end of the routine (the part where the “return” is done) and look which register is used for returning. If it doesn’t return a value, it probably will use r0 to get the return address and return to the caller. But in this case the register used is r1, which actually implies that r0 is already used for something else (like say, returning the value). This method is more useful when the routine ends on a subroutine call (this subroutine then probably returns a value too, since the same r0 would be passed down two+ calls).

So to conclude, this routine has one argument (a pointer to a Unit struct), and one return value (a Unit’s luck stat value). With those informations, we can write a C-Style Function Signature:

int GetUnitLuck(void* unit);

(int denotes an integer type, and void* a generic pointer type). If you don’t know how to read a C-Style function signature, here’s the general format:

<return type> <name>(<argument type> <argument name>, ...);

Some C-Style Pseudo-Functions

int Add(int left, int right);

Arguments:

  • r0 = left
  • r1 = right

Returns:

  • r0 = probably the sum of left & right
struct vector2 { int x; int y; };
void SetUnitPosition(void* unit, struct vector2 position);

Arguments:

  • r0 = unit (pointer)
  • r1 = position.x
  • r2 = position.y

(Doesn’t Return)

struct SomeVeryBigStruct MakeSomeVeryBigStruct(int aValue);

Arguments:

  • r0 = pointer to enough space to hold a “SomeVeryBigStruct” structure
  • r1 = aValue

(Doesn’t Return)

Ok whatever but what’s r12/ip for

I already told you in the register section that is was “The Intra-Procedure-call scratch register”. Which means it’s the scratch register during “Intra-Procedure-call”.

“Intra-Procedure-call” refers to the possible overhead that can arise when calling a routine that is defined externally. To put it simply: even if in your C or ASM you put in an explicit call to SomeRoutine, the linker (such as binutils ld, binutils gold or lyn) may need to put in some extra steps into the call (“Intra-Procedure-call”) for various reasons (called function is out of reach (remember that thumb bl has a ±4MB range), or one routine is thumb while the other is arm (mode exchange required), etc). In fact, lyn does that sometimes.

Since r0-r3 can’t be modified since they need to hold arguments (remember, we’re in the middle of a routine call), and r4-r11 can’t either by convention, said convention reserves ip/r12 as the sole scratch register available during “Intra-Procedure-call”.

Example of usage (from lyn):

ASM
.thumb
SomeRoutine:
	bx pc
.align
.arm
	ldr ip, _SomeRoutineAddress
	bx ip
_SomeRoutineAddress:
	.word SomeRoutineAddress

Note: It is my opinion that the macro helpers in EA Extentions/Hack Installation.txt should be using that register too.

The End

This was probably all too not worth it but whatever I already wrote it. In the unlikely even in which it would have been, and in which you even have a question and/or suggestion, feel free to ask :slight_smile:

Good Day! - Stan

Reference

17 Likes

Stannnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn

Come back ;;

register alias list:

r0 a1
r1 a2
r2 a3
r3 a4
r4 v1
r5 v2
r6 v3
r7 v4
r8 v5
r9 sb, v6
r10 sl, v7
r11 fp, v8
r12 ip
r13 sp
r14 lr
r15 pc

They are often asked in discord channel, so I added them here.

4 Likes

I’d like to add how to share this knowledge with your IDA.
IDA supports these conventions:
_cdecl : all arguments are passed in stack, and the caller adjusts sp (default for many C compilers on x86)
_stdcall : all arguments are passed in stack, and the callee adjusts sp (common in Microsoft DLL)
_fastcall: some arguments are passed by registers, others by stack. (most functions should be set to this)
_thiscall : C++ non-static class member function has a hidden *this argument as its first argument
_usercall: you can define the “convention” by yourself
Read IDA Pro Book for more info in detail. 6.2 Stack Frame -> 6.2.1 Procedure Call Standard.
I will take an example here:



Further expansion from this instance:
Now that those functions are marked as library functions, we can google them and even read their source code.
Here is copied from the src of libgcc:

#ifdef L_call_via_rX

/* These labels & instructions are used by the Arm/Thumb interworking code. 
   The address of function to be called is loaded into a register and then 
   one of these labels is called via a BL instruction.  This puts the 
   return address into the link register with the bottom bit set, and the 
   code here switches to the correct mode before executing the function.  */
	
	.text
	.align 2, 0

.macro call_via register
	.globl	SYM (_call_via_\register)
	TYPE	(_call_via_\register)
	.thumb_func
SYM (_call_via_\register):
	bx	\register
	nop
	
	SIZE	(_call_via_\register)
.endm

	call_via r0
	call_via r1
	call_via r2
	call_via r3
	call_via r4
	call_via r5
	call_via r6
	call_via r7
	call_via r8
	call_via r9
	call_via sl
	call_via fp
	call_via ip
	call_via sp
	call_via lr

#endif /* L_call_via_rX */
4 Likes

This document is really great! I personally feel it difficult to understand how a callee function loads arguements from the stack if it pushes some variable registers :thinking:
Can you give me an example of this in GBAFE? Thank you.