GBAFE Assembly for Dummies, by Dummies

Tequila · March 30, 2018, 1:08pm

PLEASE DO NOT POST. THIS IS NOT FINISHED.

Greetings, fleshlings!

You may have heard that assembly hacking, or “asm”, is incredibly difficult and is only pursued by the bravest of souls after many years of study under an old man on a mountain. This is completely wrong. For starters, do you have any idea how difficult it is to get wifi on a mountaintop? Not to mention the wind and snow wreaks havoc on our creaking bones.

It is true that learning and applying it takes time, but that goes for just about any programming language out there (not to mention just about everything except eating donuts). So forget any preconceived notions. It’s really not as bad as it sounds.

This guide focuses strictly on Fire Emblem 8: Sacred Stones (English version). The methods and theory are applicable to other GBA Fire Emblem roms, but the offsets will almost certainly not match. The theory is applicable to all GBA games. While there are assembly hacking tutorials out there for other games (Pokemon, Sonic, etc), I wanted to make one explicitly for FE.

Why should I bother learning this, you ask? Well, if you’ve ever worked on a romhacking project, you may have come up with a really cool idea (or at least, thought it was cool at the time). For me, that idea was having different colored text and text backgrounds. After asking around, you are told that the only way to implement your idea is via asm hacking. Or perhaps you’re simply curious as to what you can do within the constraints of GBA hardware (quite a bit, as it turns out). Whatever your motivation, this is a good place to begin.

What is assembly, you ask?
If you’ve done any romhacking at all at this point, even if it’s just Nightmare edits, you have most likely only edited assets of the game. Now we’re going to learn about the code that uses those assets to make the game run, akin to the engine of a car. If you’ve edited or created your own chapter events, assembly hacking is similar, just on a much smaller scale.

As a “language” (technically it’s an architecture, not a language, but that’s irrelevant at this point), assembly is extremely low-level. This has advantages and disadvantages; what you could do in 1 line in, say, Python, might require 10 or more (or way more) lines in assembly, and if you try to do something stupid, there’s little to no error-catching. However, executing the code is almost certainly faster, which is a good thing if you’re working with a system with limited processing power like the GBA.

That being said, the game’s code is not originally written in assembly. Sure, you could do it. You could also technically sweep your driveway with a toothbrush, but you wouldn’t actually do that (…I hope). It’s written in C and compiled into assembly. Can we do the same thing? Yes. However, that’s beyond the scope of this guide, and for the smaller examples I will be demonstrating later, it’d actually make things more difficult. Think trying to perform open-heart surgery with a sledgehammer.

Windows (it is possible the rest of the required items will run just fine on another OS, but I have absolutely no experience in that department)
A text editor (I use Notepad++; Atom, Sublime, or even regular old Notepad will work just fine. Don’t use Microsoft Word or anything like that.)
A hex editor (I use HxD)
An FE8U rom (you can use another game, but you won’t be able to follow along precisely)
The ability to think logically (not really sure where to download this, sorry)

The following can be downloaded here:
No$GBA 2.8 (if you have your own debugger, feel free to use it)
DevkitARM
Event Assembler 10.1 or higher (just get the most recent version)
Assemble ARM.bat

Things that will be helpful to have, but aren’t essential:
A project (small would be preferable, but having something that you want to do will ensure that you’re less likely to give up)
Nightmare modules (can be handy for reference. We won’t actually be using Nightmare)

Notice what you don’t need: any prior programming experience. While it certainly can’t hurt, it’s not essential. Aside from a semester of Intro to Python, assembly was my first foray into programming.

Tequila · March 30, 2018, 3:23pm

Reference

Definitions

The following things may already be known to you if you’ve read the Ultimate Tutorial or done any romhacking in the past.

Basic Definitions

Bit: The most basic unit of information used in computing. Can only have 1 of 2 values, 0 or 1. Binary numbers are followed with a “b”, and will be split up into groups of 4 digits to make it easier to read.
Byte: 8 bits. The smallest addressable unit of memory in most computer architectures (ie, the smallest value that you can manipulate directly)
Short (or halfword): 16 bits, or 2 bytes.
Word: 32 bits, or 4 bytes. Note, however, that in other systems, a word can be 16 bits, and a double word, or dword, is the 32 bit version. For the purposes of this guide, however, a word is 32 bits.
Nibble (or nybble): 4 bits. There are 2 nybbles in a byte (get it?).
Bit field/Bitfield: A cluster of bits (usually a byte, short, or word) where each bit is a flag. An example is character and class abilities (see your nightmare modules), which is a 32-bit bitfield, each corresponding to a particular ability. The first (zero’th) bit is “Mounted” and has value 2^0 = 1. The second bit is "Canto (Move Again), and has value 2^1 = 2. To denote that a unit has both the Mounted and Canto abilities, we sum up the bits: 1 + 2 = 3. If you then want to add Dance, which is the 4th bit (2^4 = 0x10), then the new value of the bitfield is 0x1 + 0x2 + 0x10 = 0x13.
Binary: A system of representing numbers using only 2 symbols: 0 and 1. Also called base 2 notation. Binary numbers will have a ‘b’ at the end, such as 10010101b.
Hexadecimal: Base 16 notation, where we represent numbers using 0-9 and A-F. Denoted with a preceding 0x or $, or a trailing ‘h’. Example: 0x9A, $4, 3445h.
Little endian: A method of arranging bytes such that the least important byte is first (reading from the left). Example: the number 0x12345678, in little endian, would be represented by breaking it up into bytes ($12 $34 $56 $78) and reversing their order ($78 $56 $34 $12).
Pointer: An object whose value is an address of another piece of information. Example: FE8’s item table begins at 0x809B10. The value 0x8809B10 is “pointing” to the item table, and is thus a pointer. See memory map for an explanation of that extra 8 in the beginning.
Dereference: To dereference a pointer means going the place where the pointer is, uh, pointing to, and retrieving the data there. How much data depends on what kind of command is doing the dereferencing (usually either a byte, short, or word).
Offset: In our case, this is basically analogous to an address; ie, the location in memory where something is written.
ROM: Read-Only Memory. As the name suggests, this is memory that can only be read by a program, not written to. May or may not be capitalized, so don’t be alarmed if I use it without ALL CAPS. A .gba file is a ROM, so the program running it (your emulator) can only read from it, not write to it.
RAM: Random Access Memory. This is memory that can be both read and written to. A character’s stats, for instance, will be stored in ram, since they can change as the unit levels up.
Alignment: Sometimes, an address has to be 2- or 4-aligned. This means the address must be divisible by 2 or 4, which can be determined by looking at the last digit of the address.

Decimal/Binary/Hexadecimal Conversion

If you’re not used to hexadecimal (or hex, as it’s colloquially called), it can be confusing to use. It’s important that you learn how to manipulate it, because unless specified otherwise, all numbers used in this guide are in base 16. (Decimal numbers will have a ‘d’ at the end, like 99d). Why do we bother using it? As you may know, computers work with binary: strings of 0s and 1s. The human brain, however, cannot easily parse binary. If you ask me what 11010101011110101011010101 was in decimal, it would take me a minute or so to answer. Decimal, or base 10, doesn’t easily convert to base 2, because 10 is not a power of 2. 16, however, is. In fact, it’s a power of a power of 2 (2^2^2). This means that hex is very easy to convert to binary, and vice versa. Hex also has enough unique symbols (16 of them) that our brains can parse them relatively easily. So hexadecimal is a compromise between usability for the machine and readability for the human.
When you actually need to convert from decimal to hex, it’s easiest to use the calculator that comes with most computers. However, it can be useful to know how to convert from binary to hex quickly by hand. Let’s use the number I wrote earlier: 11010101011110101011010101.
First, break the number up into groups of 4, starting from the right. If the last group has less than 4 digits, add 0s to the left until it has 4.
0011 0101 0101 1110 1010 1101 0101
Next, find the hex representation of each group of 4, which will a number from 0x0-0xF:
3 5 5 E A D 5
Finally, concatenate the string:
0x355EAD5
Voila!
Use the inverse to convert from hex to binary.

Signed vs Unsigned Numbers

A byte can take a value from 0x00 to 0xFF, which is 0 to 255d. But say you want to have negative numbers. How would you do that? In decimal, we use the “-” sign to denote that a number is negative, but that’s not an option here because we don’t have room for another symbol in our alphabet. Instead, we use the topmost bit (furthest to the left) as a sign. If it is set (ie, 1), the number is negative; if not, the number is positive.
If your byte is between 0x00 and 0x7F (0111 1111b), then there’s no difference between signed and unsigned, since the top bit isn’t set. The decimal values range from 0 to 127.
Between 0x80 and 0xFF, however, the top bit is set, and a signed byte would have a negative value from -128 to -1.
To sum up:
Unsigned: 0 - 255
Signed: -128 - 127

About assembly

The ARM, the thumb, and the ugly

The processor that the GBA works with, the ARM7tdmi, uses an instruction set called, appropriately enough, ARM. ARM stands for Advanced RISC Machine, and RISC stands for Reduced Instruction Set Computing. As the name suggests, we have a smaller set of instructions that we can actually perform, but this is offset by being able to perform those instructions faster. The processor is a 32-bit one, and the ARM instructions (also called opcodes) are each 32 bits (4 bytes, or 1 word) long. There is also a subset of ARM, called THUMB (doesn’t stand for anything, it’s just a cute name), which uses 16-bit (2 bytes, or a short) instructions. As you might imagine, that limits the number of opcodes we can work with. Why bother? Because 16-bit opcodes execute even faster than 32-bit ones, and speed is the name of the game here. In addition, the ROM port (the connection between the memory containing the game’s code and the part of the processor that executes said code) is only 16 bits . We could run ARM assembly from the rom (and in fact, Fire Emblem does so sometimes), but it would be slower because the opcode has to be loaded in two separate pieces and put back together before it can be executed. Therefore, the majority of the game’s code is written in thumb assembly, which is actually a good thing, since it means it’s easy enough to memorize all the opcodes you can use. The hard part about assembly hacking isn’t so much learning the language, but rather applying it efficiently to whatever you’re trying to do.

That’s not to say that ARM code is too slow to use, ever. The chip does feature a section of memory with a 32-bit bus called IWRAM, or Internal Work RAM (usually abbreviated to IRAM). When you boot up the game, functions that are to be executed in ARM mode are copied from the ROM to IRAM, where they will stay until the game is turned off. Things such as sprite manipulation, text decompression, and pathfinding algorithms all reside in IRAM. You will probably never need to mess with these unless you’re attempting something extremely ambitious, and if you’re in a position to do so, then you probably don’t need this guide.

Ingredients

Alright, so we have our instruction set. How do we use it?
Glad you asked. The GBA has 16 registers (technically, it has more, but we don’t really care about the rest), labeled r0 to r15. Each register contains a 32-bit value, which you manipulate using the opcodes.
Oftentimes, it is necessary to save a value in a register so that you can use it later. Fortunately, we have a stack that we can copy the contents of a register to in order to save said contents. For more detail, see the stack explanation in the no$gba overview section.
Not all registers are equal. r0 - r7 are called “low” registers, and r8-r15 are “high” registers. In THUMB mode, we’re limited to what we can do with the high registers. In addition, r13-r15 are special registers that are reserved for a specific purpose.

Conventions

There are a number of conventions, or rules, that ought to be followed to make your (and everyone else’s) life easier when working with assembly. Don’t worry if you don’t quite understand these yet; I’ll explain things in more detail in other sections.

r0-r3, and r12, are called scratch registers. This means that the values in the them aren’t important and can be overwritten (most of the time). It also means that they’re not expected to be saved during a function call.
By contrast, r4-r11 are preserved registers. If you want to use them, you must save their values by copying to the stack. Failure to do so will almost certainly result in Very Bad Things occurring.
To pass parameters to a function, use r0-r3, starting with r0. If there are more than 4 parameters, the remaining arguments are expected to be passed in via the stack.
When a function is complete, if it returns a value, that value is expected to be in r0. Functions don’t return more than 1 value at a time.
After returning from a function, always assume that the scratch registers have been cleared, even if they haven’t actually been touched.

Memory blocks

The GBA’s memory is divided into several “blocks”, indicated by the first byte of the address (for instance, 0x8000000 is a rom address). Each block is reserved for a specific task.

00 - BIOS: Basic Input/Output System. Called during start-up and software interrupts. Basically in charge of getting everything up and running.
02 - EWRAM/WRAM/ram: External Work Random Access Memory. This is where most of the mutable information is stored, like a character’s stats, for instance.
03 - IWRAM/IRAM: Internal Work Random Access Memory. As previously mentioned, this is where the majority of ARM code is stored and executed due to its 32-bit bus. Also used to store some data, like event ids, but not nearly as much as the 02 block.
04 - I/O: In/out. This is basically the interface between the hardware and the software. Button presses, sound, etc, are located here.
05 - PAL RAM: Palette ram. Stores palettes. That’s basically it.
06 - VRAM: Video ram. Stores graphics data.
07 - OAM: Object Attribute Memory. Controls sprites.
08 - ROM: Read-only memory. In a physical GBA, this would be the game cartridge; in an emulator, it’s the .gba file.
0E - Cart RAM. This is where memory is actually saved (when you save after completing a chapter, or suspend in the middle of one, data gets copied here).

In this guide, you will be editing ROM (08) the most, with an occasional foray into wram and iram (02 and 03). The rest will probably only be touched on briefly, if at all; they’re a bit beyond the scope here.

Documentation

Before starting a project, it is worth looking at the documentation that already exists and checking a) whether what you’re trying to do has already been done, and b) whether there’s information that can help you solve the problem. There’s no sense in reinventing the wheel, after all (actually, you can try and solve an already-solved problem yourself and then compare to the finished version if you want the practice, but I digress).
In the Unified Dropbox, each person with a folder usually has some doc somewhere in it. The most complete is Stan’s doc. For the purposes of this guide, I will referring to my own doc, found here (mostly the Teq Doq).

Tequila · March 30, 2018, 4:03pm

Opcode Glossary

The following section is meant to be skimmed once and then used as a reference. You don’t need to memorize everything in here right off the bat.
You’ll probably want to book this handy-dandy reference: THUMBREF. It shows all the opcodes that you’ll have access to in thumb mode, along with how to use them, albeit in a shorthand that may seem intimidating to a beginner. That’s where this section comes in. I’ll explain things in more detail and include a few tips and tricks.
Rather than being organized in alphabetical order, the opcodes are grouped by similarity.

Stack manipulation: push, pop

Writing a function, any function, almost always begins with a push and ends with a pop/bx combo. Push copies the contents of the register(s) onto the stack, in descending order, and decrements the stack pointer by 4 (since each register is 4 bytes in size) as it goes.
Example: push {r4-r7,r14} copies r14 to the stack, then r7,r6, r5, and r4, in that order, and subtracts 4*5 = 0x14 to sp (the stack grows downwards, see the Stack Viewer in the no$gba overview section for an explanation).
Pop is basically the opposite. It copies the topmost value to from the stack to the register in the argument, and increments sp by 4.
Example: pop {r4-r7} copies the first value on the stack to r4, the second value to r5, third to r6, and fourth to r7. Notice how that’s the same order that we ‘pushed’ earlier.
Why didn’t we pop into r14, you ask? Because we’re not allowed to. You can pop into r15, but there are a couple of reasons we don’t:

It doesn’t work if you’re returning to ARM code. While this will basically never be an issue, it’s still good to keep in mind.
It doesn’t allow you to tell at a glance whether a function returned something or not. See the “Branchs” section for more info.
The arguments for both are surrounded by curly braces. You can use a dash to indicate a range, or separate registers with commas, or both, as in {r1,r3-r4,r6,r7}. The order doesn’t matter; {r1,r2} is the same as {r2,r1}, although using the former is recommended for readability.
Notes:

Always, always, ALWAYS pop as much as you push. If you don’t, you’ll probably mis-align the stack, and that will almost certainly make your game come to a screeching halt. Literally, in some cases.
A common misconception is that pop {r4}, for instance, puts the value that used to be r4, back into r4. This is not true. Pop just takes the top value from the stack. There’s no way for the code to “remember” where a value came from; that’s on you to ensure everything goes back in the right place.
If you’re writing a function, and you want to use r4-r11, you’ll have to save their values via push/pop. Don’t assume just because a register’s value is 0, that means it’s not being used.
Continuing that thought, you can’t push or pop the higher registers directly (push r14 and pop r15 being the exceptions). So if you want to save r8-r11, for instance, you’ll have to do something like this:

push	{r4-r7,r14}
mov 	r4,r8
mov	r5,r9
mov	r6,r10
mov	r7,r11
push	{r4-r7}
<rest of code here>
pop	{r4-r7}
mov	r11,r7
mov	r10,r6
mov	r9,r5
mov	r8,r4
pop	{r4-r7}
pop	{r1}
bx	r1

There’s a few opcodes that you haven’t been introduced to yet. ‘mov’ copies the contents of one register to another, and bx…well, see the Branches section.
So what does this do? Well, first we save r4-r7 and r14 with the first push. Next, we copy the higher registers’ contents into r4-r7, which we’re allowed to do because we already preserved their original values, and we push those. When it comes time to end the function, we do the same in reverse: pop what was in r8-r11 in a lower register, copy them to the higher register, and finally pop the original values.

Comparisons and branches: cmp, cmn, b, bl, bx

There are 3 different kinds of branches available to us, each with their advantages and disadvantages. The first is b, which is an unconditional jump, or goto. Pretty simple to use and understand. There are also conditional branches, such as beq, bne, bge, etc. These are equivalent to if/then statements.

B
A conditional branch is always preceded by a cmp, or compare. Cmp can either compare a register to a byte-sized constant (cmp r0,#0xFF), or compare 2 registers (cmp r0,r1). If the latter, at least one of the registers must be a low one.
Cmp works by subtracting the right argument from the left one, so cmp r0,#0xFF will do r0 - 0xFF, and then sets the 4 status flags next to the register list (negative, zero, carry, and overflow). (Technically c and v are both overflow flags; c is for unsigned overflow and v is for signed overflow, but you don’t really need to understand how the flags work to use cmp).
Cmn is compare negative, and works almost the same, except cmn r0,#0xFF would calculate r0+0xFF and set the flags accordingly. I have never used it, never seen it used, and completely forgot about its existence until I wrote this guide, but I’m mentioning it for the sake of completeness.
Once the flags are set, you can use them to do the following conditional branches:

It should be noted that while conditional branches are the only opcodes that use the flags, several others other than cmp and cmn can set them.

The last column has x’s indicating which flags an opcode can set. For this reason, it’s recommended to keep your compares and conditional branchs close together. There’s no reason not to, and it makes your code easier to read. And while you can use opcodes other than cmp to set the flags, I’d say you shouldn’t unless you absolutely really positively need the space that a cmp would take, just for the sake of readability.

BL
bl, in contrast to every other thumb opcode, is 32 bits. This is because it’s basically 2 opcodes in 1, a branch and a link. bl is the equivalent of a function call. The code that’s currently being executed is put on hold, we jump to another location, execute code there, and return back when done.
In order to return, we need to know the address that we jumped from. This address (+4/5, see bx) is stored in r14, which is also called lr, or the link register. To ensure the address is saved, the first thing the function being called will do is push r14. Otherwise, if the function being called has its own function calls, the new address to return to will overwrite the previous one, and bad things will almost certainly happen. The exception to this rule is if the function is very short and is guaranteed to not have any function calls; if that’s the case, you can leave r14 alone (although there’s nothing wrong with always pushing it).
The syntax is quite simple: bl 0x8019430, for example.

BX
bx is also an opcode that does 2 things at once, but unlike bl, it’s a standard 2-byte opcode. bx stands for branch and exchange. The branch part is easy. You put the address you want to go to in a register, say, r0, then bx to it: bx r0.
But wait. What are we exchanging, you ask? Well, remember way back in the beginning, I explained that the game contains both thumb and arm code? Sometimes, we need to switch from one to the other. That’s where the exchange part comes in. How does the processor know when to switch? If the first bit of the address to jump to is set (ie, address is odd), the code will be in thumb mode. If it’s not set, the code will be ARM. Pretty simple. If you’ve done any eventing and had to use ASMC, and had it drilled into your head that the address of the asmc has to have 1 added to it, this is why. If you don’t make it odd, the code will be executed in ARM mode, and, well, almost certainly won’t work.
While b and bl have constraints as to how far they can go (see comparisons), the fact that bx loads the address in a register means it has no such constraint. You can go anywhere with it.
If you’ll recall from the push/pop section, I pushed and popped a bunch of registers. Amongst those, we pushed lr in the beginning, and then the last two lines were pop {r1}, bx r1. lr, you’ll recall, contains the address we need to return to after this function is finished. We can’t pop it directly into r14, and we don’t like to pop into r15, so we put it in a lower register, and the bx to the address. bl adds 4 to the address to skip the bl command itself (we don’t want to execute the bl again, otherwise that would result in an infinite loop), and then adds 1 if the code to return to is in thumb mode, for a total of 5.
Convention dictates that if the function returns a value, you use r1 to bx back, and if it doesn’t, use r0. This allows you to tell at a glance whether a function returns something or not. It’s recommended that you stick to this convention, although you don’t have to.

Comparisons
b: The unconditional branch is 2 bytes long, and has a max range of +/- 0x3FF bytes.
b <cond>: The conditional branch is also 2 bytes long, but only has a max range of +/- 0xFF bytes, due to needing space for the conditional type.
bl: branch and link is 4 bytes long, and has a max range of +/- 0x3FFFFF bytes.
bx: branch and exchange is technically 2 bytes long, although it needs the target address to be loaded in a register, and has no limits as to how far it can go.

That’s nice to know and all, but when should you use what?
b and its variants should be used only within a function.
bl is used for function calls. There is one exception: if the function is so long that even b won’t work, you can use bl as a branch and just ignore the ‘save the link register’ bit. I believe there’s only 2 functions in FE8 that are long enough to require this.
bx is used for returning from a function call, for jumping out of bl range (this will be used a lot), and for exchanges between thumb and arm code.

Unary and binary bitwise operators: mvn, neg, and, orr, eor, bic, tst

The following instructions are easier to visualize with numbers in a binary representation, so all numbers are in base 2. They’re also separated into groups of 4 digits so you can read them easier. Remember that a byte is 8 bits.
A unary operation is an operation that only takes 1 input. Our instruction set has 2: mvn (move negative) and neg (negate). Despite sounding similar, they’re not quite the same.
mvn is more commonly known as “not”, as in “not A” or “!A”, or as taking the “one’s complement”. Given a number, you toggle each bit. If the bit was set, unset it (1 becomes 0); if it was unset, set it (0 becomes 1).

@r0 = 1010 1101
mvn r1,r0
@r1 = 0101 0010

Notice that if you sum r0 and r1, you get 1111 1111, or -1 (if the number is signed).

neg is also known as taking the “two’s complement”. As the name suggests, you’re just negating the number. 1 becomes -1, 3 becomes -3, -20 becomes 20. The easiest way to do this is by taking the one’s complement and adding 1.

@r0 = 1010 1101
neg r1,r0
@r1 = 0101 0011

As you might expect, summing r0 and r1 will equal 0.

Binary operations, as the name might suggest, take 2 operands, or inputs. For the following, r0 = 1010 1101 and r1 = 1100 0101.
The operation looks at bits in the same position in the two inputs and creates a new number with the results, which replaces one of the operands: and r0,r1, for instance puts the result in r0.

and: Also represented with &. A bit is set (1) only if it is set in both inputs.

1010 1101
1100 0101 &
1000 0101 =

orr: More commonly known as or, or |. A bit is set (1) if it is set in one or both inputs.

1010 1101
1100 0101 |
1110 1101

eor: More commonly known as xor, “exclusive or”, or ^. A bit is set (1) if it is set in one input or the other, but not both.

1010 1101
1100 0101 ^
0110 1000

bic, or bit clear, is a combination of mvn +and.
bic r0,r1
is almost equivalent to
mvn r1,r1 and r0,r1
In other words, it clears the bits that r0 has in common with r1. The difference between those two examples is that bic would not change the value in r1.

1010 1101
1100 0101 bic
0010 1000

Note that order matters! 1 bic 0 is 1, but 0 bic 1 is 0. If this weren’t the case, bic would be identical to eor.
I’ve never seen this used in vanilla FE8 code. The compiler seems to prefer a neg/and combination.

tst, or test, is a combination of and + cmp (register), #0x0.
tst r0,r1
is almost equivalent to
and r0,r1 cmp r0,#0x0
The difference is that the value in r0 is untouched. It’s useful if you need to know whether a bit is set, but don’t care about the value obtained when using and (which is most of the time).

Loads + mov: mov, ldr, ldrb, ldrh, ldsb, ldsh, ldmia

These opcodes are for putting a value into a register, or dereferencing memory (getting data at an address).

mov stands for “move”. There’s 2 ways to use it.
mov r0,#0xFF puts the value 0xFF into register r0. The argument must be 1 byte.
mov r0,r1 copies the value in r1 into r0. Nothing happens to r1, so it’s not exactly moving here, but whatever.
That’s it. Pretty straightforward, no?

Now let’s look at the dereferencing stuff.
ldr stands for load register. With what, you ask? That depends on the following letter:
b = byte: ldrb
sb = signed byte: ldrsb, which is shortened to ldsb
h = halfword: ldrh
sh = signed halfword: ldrsh, which is shortened to ldsh
if there’s no letter, than load a word

Ok, now where are loading from? That’s defined in the arguments.
Example: ldr r0,[r1]
The brackets indicate deferencing: “Go to the address in r1, retrieve the word at that location, and put that value in r0.”
But we can also get a bit more complex by throwing in a non-negative constant:
ldr r0,[r1,#0x4]
Same as the above example, except we’re retrieving the word at address [r1+0x4].
We could also put the constant into another register:
mov r0,#0x4 ldr r0,[r1,r0]
and this will yield the same result as example 2.
Now, there’s a couple of things you should know:

If you’re using the second method, there’s 5 bits set aside for the constant. That means in ldrb [r0,r1,X], X must be less than 32 (0x20). ldrh and ldr multiply this value by 2 and 4, respectively, meaning you can load stuff further away, but also means you cannot load values that aren’t 2- or 4- aligned. ldrh r0,[r1,#0x2] will work fine; ldrh r0,[r1,#0x3] will not.
ldsb and ldsh will not work with method 2 at all. If you want to use that constant, you must load it into another register first. Even if the constant is 0. ldsb r0,[r1] will not work.
That second point might make you ask, what exactly is the point of using ldsb over ldrb? That’s best illustrated through an example. Let’s let the value at the address in r1 be 0x7F. Then ldrb r0,[r1](which is equivalent to ldrb r0,[r1,#0x0]) will put 0x0000007F into r0, which is the same as 0x7F.
We can’t ldsb directly; we have to put the constant into another register first.
mov r0,#0 ldsb r0,[r1,r0]
This also puts 0x0000007F into r0, because 0x7F doesn’t have the top bit set, so it’s not a negative number. In this instance, the two opcodes have the same result.
Now instead, let’s put 0x80 at that location.
ldrb r0,[r1] will still yield r0 = 0x80. Nothing new there.
ldsb, using the same code as above, will see that the byte loaded does have the top bit set, indicating that it’s a negative number. But that’s a byte, and the register we’re loading into is a word (4 bytes) long. Whatever will we do? Extend the sign by setting all the bits in front of (to the left of) that byte. So now 0x80, as a byte, becomes 0xFFFFFF80, as a word.
Ldrh vs ldsh is the same. Let the example value be 0x8000; then ldrh will yield r0 = 0x00008000, and ldsh will yield 0xFFFF8000.
You might notice that there’s no “load register with signed word”. Why? I’ll leave that as an exercise for the reader.

Remember how mov can only put a value that’s a byte long into a register? If you want to put something larger, and you will, you’ll also need to use ldr as follows:
ldr r0,Deadbeef <code> .align 4 Deadbeef: .long 0xDEADBEEF
You put the value in a constant pool or literal pool, and then use ldr to dereference [pc + constant] and retrieve the value. Or in other words, since we can’t fit the argument (0xDEADBEEF, 4 bytes) into a 2-byte opcode, we tell the game how far ahead it needs to look to find the argument. The .align 4 is because your literals must be word-aligned.
Notes:

You must load a word. You can’t do ldrh r0,Beef or something like that. At least, not directly. You could do ldr r0, Beef, where Beef is defined as .long 0x0000BEEF.
The literal must be after the ldr opcode.
You can dereference the stack using r13, but must load a word; eg ldr r0,[sp,#0x4]. If you only wanted to load a byte, you’d have to do something like

mov r0,r13
ldrb r0,[r0]```
- Remember how the constant in, say, the above example has to be non-negative? This is why having the stack grow downwards is useful. The top value has the smallest address, so you add a non-negative constant to retrieve earlier values.

Finally, there's `ldmia`, or **l**oa**d** **m**ultiple, **i**crement **a**fter. It's not an opcode that you'll use very often (probably), but the syntax is a bit weird, so here's an example:
`ldmia [r1]!, r0, r2-r3`
r1 contains an address which we are loading from, and r0, r2, and r3 are the registers we are loading to (that's the multiple part). We load the first word into r0, then add 4 to r1 (increment after), load the second word into r2 and increment r1 by 4 again, and finally load the third word into r3 and increment by 4. The command is usually followed by `stmia`, which will be covered in the next section. You usually see it used to copy a table from the ROM to the stack (unnecessarily, in my opinion, but that's neither here nor there).

Stores: str, strb, strh, stmia

As you might expect, stores are the complement to loads. Rather than loading to memory, you write to it.

str r0,[r1] stores the entirety of r0 into the address in r1
strh r0,[r1,#0x4] stores the bottom 2 bytes of r0 into the address in r1+0x4
strb r0,[r1,r2] stores the bottom byte of…well, you get the point.

Example: if r0 = 0xDEADBEEF, strb r0,[r1] will write 0xEF to the address in r1. r0 isn’t affected in the process.

stmia is identical to ldmia, aside from the obvious: store multiple registers, increment after. If you think about it, push is essentially stmia exclusively for r13.

Integer operations and shifts: add, sub, mul, lsl, lsr, asr, ror, adc, sbc, adr

Add stands for addition.
add r0,#0xFF should be fairly obvious
add r0,r0,r1 stands for r0 = r0 + r1. If the destination register is omited, it’s assumed the first register is the destination. So we could instead write this as add r0,r1.
add sp,#-0x4 This is how you allocate space on the stack. Notice the minus sign, because the stack grows downwards! To deallocate the space at the end of the function, add it back without the minus sign.

Sub stands for subtraction.
sub r0,#0xFF Again, obvious.
sub r0,r0,r1 Same as addition. Note that while the register order doesn’t matter in addition (add r0,r0,r1 and add r0,r1,r0 are the same), it does matter when subtracting.
While thumbref does say that there’s a sub sp,#constant, I haven’t managed to make it work. So just stick with adding a negative number to r13 for your allocation needs.

Mul is the first one that isn’t as obvious. You can only multiply 2 registers together; you cannot multiply a register and a constant. mul r0,r1 will multiply r0 by r1 and put the result in r0.

lsl = logical shift left
lsr = logical shift right
asr = arithmetic shift right
These are all shifts (no way, really?). In decimal mode, it’s very easy to multiply or divide by powers of 10, right? You just shift the decimal point over to the left or right and add zeroes as necessary. In binary, it’s very easy to multiply (lsl) or divide (lsr) by powers of 2.

@r0=0010 1110b
lsl r0,r0,#0x1 @shift to the left by 1, akin to multiplying by 2^1 = 2
@now r0 = 0101 1100b
lsr r0,r0,#0x2 @shift to the right by 2, akin to dividing by 2^2 = 4
@now r0 = 0001 0111b

Similar to add and sub, if the destination register is the same as the argument, you can omit it: lsl r0,#0x1 is the same as lsl r0,r0,#0x1

Notice that if whatever is shifted outside the register is ‘chopped off’, or truncated. This can be useful:

@r0 = 0xDEADBEEF. We would like to isolate the BEE part.
lsl r0,#0x10 @shift 16 places to the left
@r0 = 0xBEEF0000
lsr r0,#0x14 @shift 20 places to the right
@r0 = 0x00000BEE

asr is similar to lsr, but with a key difference: if the top bit is set prior to the shift (ie, it’s a negative number), then the sign is extended.
Let r0 = 0x80000000
lsr r0,#0x18 would yield r0 = 0x00000080
asr r0,#0x18 would yield r0 = 0xFFFFFF80
You’ll often find code that looks something like:
ldrb r0,[r1] lsl r0,#0x18 asr r0,#0x18
This is called casting an unsigned byte into a signed one. In this example, you could achieve the same result using ldsb (see loads), but that requires at least 2 free registers; 1 to hold the address and the other to hold the constant. If you only have 1 free register, this is an appropriate substitution.

Something you’ll encounter fairly often is that FE’s compiler doesn’t particularly like using mul when it doesn’t absolutely have to. An example is loading item table data. Each entry is 0x24 (36d) bytes long. Given an item id, you can retrieve its data by multiplying the item id by 0x24 and adding that to the table pointer, eg

@r0 = item id
mov r1,#0x24
mul r0,r1
ldr r1,ItemTable
add r0,r1

Which puts the beginning of that item’s data in r0. Here’s how the compiler chooses to implement it:

@r0 = item id
lsl r1,r0,#0x3 @shift by 3, which is the same as multiplying by 2^3 = 8
add r0,r1 @add that number to itself, which is the same as multiplying by 8 + 1 = 9
lsl r0,#2 @shift by 2, or multiply by 2^2 = 4, 9 * 4 = 36 = 0x24
ldr r1,ItemTable
add r0,r1

Instead of using mul, we use a combination of shifts and adds. This actually does end up being faster, at the expense of 1 more opcode, but space is cheap. On the other hand, the time saved is fairly minuscule in the grand scheme of things, so don’t feel bad about using mul. Just watch out for overflows if you’re multiplying big numbers.

ror, or “rotate right”, is similar to shift. Rather than chop off the bits that are outside the register, however, ror inserts them at the other end.

@r0 = 0xDEADBEEF
@r1 = 8
ror r0,r1
@r0 = 0xEFDEADBE

I’ve never seen this used and cannot fathom when you would need to, but, hey, now you know that it exists. Also, it’s another fun one to say. Let me hear you ROR! No? Just me? Ok then. Moving on!

adc (add with carry) and sbc (subtract with carry) are similar to their normal counterparts add and sub, except they bring the carry flag into the mix. I’m not entirely sure how these work, and have never seen them used.

adr, you might notice, doesn’t actually appear in thumbref. That’s because it’s a special form of add; namely, add reg, pc, #. It’s kind of like using ldr to load a literal, except without actually dereferencing the address. It just loads the address into the register. I’ve used it exactly once as of this writing.

Now, you might be wondering why I talked about add, sub, and mul, but there’s no div. That’s because division is actually kind of hard if you’re not dividing by a power of 2 (if you are, just use lsr). You have 3 options:

Meander over to the Software Interrupts section, which does have a div function
Call the function (FE8: D18FC), which is basically a thumb version of #1 (according to Zahlman)
Approximate by finding a fraction approximately equal to what you want to divide by, with the denominator being a power of 2
Example: Let’s say I want to divide by 5. Well, 1/5 is equal to 13/65, which is a teensy bit smaller than 13/64. So you could multiply by 13 and right shift by 6 (same as dividing by 2^6 = 64). Is this actually faster than option 1? I haven’t the faintest idea. Again, see SWI.

Software interrupts: swi

There’s only one opcode in this section, and it’s in the title: swi. Think of it as a function call, like bl, except it calls a function that resides in BIOS (the 00 memory block), and is part of the GBA hardware, rather than contained in the cart data. Which function is called depends on the parameter: swi 0x6, for instance, calls Div (divide). You can find out more about the different functions here.

Spacers: nop

nop, or no op(code), is used when you just want to write out some code and are too lazy to use a branch to skip it. It does nothing. It’s actually mov r8,r8, which you may also recognize as doing nothing. Another appropriate spacer has the hex code 00 00, which is lsl r0,r0,#0, and this again, does nothing. Although nop is fun to say.

ARM opcodes

Arm opcodes are, for the most part, nearly identical to their thumb versions, albeit with loosened restrictions (you can use the high registers whenever you’d like) and some extra features (can incorporate shifts and flag-setting in the same opcode for free).

I know the -s ending means “set flags”, but other than that I never had to mess with ARM code. And hopefully, you don’t either.

Tequila · April 6, 2018, 4:15pm

Introduction to No$GBA (aka GET TO DA DEBUGGAH)

The game’s assembly code resides in the beginning of the rom after the header, approximately the first 0xE0000 bytes. If you looked at it in your hex editor, however, it wouldn’t really tell you anything useful:

We need something to translate that string of hex into something that a human can parse. Enter… the debugger. For this guide, I will be using no$gba 2.8. The debugger is basically an emulator with extra tools that allow us to play God with our game, and understanding how to use it is key to being able to do assembly hacking. So let’s get right into that. Open the no$gba.exe executable.

You should get a window that looks like this:

Go to File -> Cartridge Menu (FileName), navigate to your .gba file, and open it.

Voila!

Now I’m going to explain what each part is for.

1) Main Window

This window shows the assembly code being executed. From the left:

The first column is the address of the opcode. The first byte of the address (from the left, so 08 here) shows which memory block the code resides in (see Memory Blocks in the reference section). The 08 means rom. 99% of the code you will edit is in the rom. If you want to look at it in your hex editor, don’t forget to subtract 0x8000000 from the address!
See the little rectangle next to the address 80D16DE? That indicates that this opcode is next in line to be executed.
The second column is the hexadecimal representation of the opcode. If you go to D16CC in your hex editor, you’ll see this:

Notice that the bytes are reversed (4770 shows up as 70 47). That’s because of little endianness.
The third and fourth columns combine to form the opcode and its arguments.
b and its variants have an up or down arrow to the right of the argument to show what direction the branch goes:

Similarly, bl shows a right-facing arrow (although that doesn’t tell you which direction the call is):

You can use the right arrow key to be taken to the location that is branched to, and the left arrow key to return. Pretty handy, no?
Currently not displayed, but can be turned on by going to Options -> Debug -> Clock Cycle Comments. This is an indication of how long each opcode takes to execute. It’s not something I’ve ever had to worry about, since it’s aimed more at people building games from scratch.

You can edit an opcode by clicking the line and starting to type, which will bring up this box:

Once done, hit Enter. If you wrote a valid opcode, good for you! It should immediately change. If you didn’t, you’ll be faced with this box

and may be forced to reevaluate your life choices. Or at least, stop making typos.

You can press ctrl + g to bring up the go-to window, enter the address (don’t forget the 8 in the beginning if it’s a rom address!), and hit Enter to be taken to that address.

Finally, you can select a line and press Enter once to center it, twice to put it at the top of the window.

2) Register list

This is, well, the register list. It shows the values of each 32-bit register. You can select a register by clicking on it, typing in the value, and hitting Enter. Note that if the value begins with a letter (ie, A-F), it must be preceded with 0x, otherwise you’ll get see this:

r13-r15 are reserved for specific tasks:

r13, or sp, is the stack pointer, which is covered in more detail in section 4 (Stack).
r14, or lr, is the link register, which is explained in the Branches section of the Opcode Glossary.
r15, or pc, is the program counter. It points to the address of the next opcode to be executed.

The rest are yours to do with as you will, more or less; refer back to the Conventions section.
There are 2 more visible registers, which you should never have to touch: cpsr (current program status register), and spsr (saved current program status register). cpsr contains the flags for condition codes (see 3) Flags), processor mode, endianness state, interrupts, and other things that I have no idea about. spsr is merely used to hold cpsr's value during an exception. I mention these only for the sake of completeness; you will almost certainly never need to use these. In fact, just pretend they don’t exist.

3) Flags

In order from top to bottom:

n: Negative condition code flag
z: Zero condition code flag
c: Carry condition code flag
v: Overflow condition code flag
i: IRQ (Interrupt Request) mask bit.
f: FIRQ (Fast Interrupt Request) mask bit.
t: Thumb execution state bit.
q: Cumulative saturation bit, also known as the sticky overflow bit

The first four are used for conditional branches (see the Comparisons and branches section in the opcode Glossary).
I haven’t the faintest idea what i and f do; they seem to be related to another set of registers for interrupts.
The thumb flag is pretty obvious; it’s set if the processor is in thumb mode, which it currently is.
Lastly, q is similar to v, but once set, it’s not unset until you specifically unset it. I don’t think this is actually used, but can’t swear to it.

You can select or unselect any of the flags by clicking in the check box (and see the corresponding change in the value of cpsr). The only ones I recommend messing with are the condition branch flags, since sometimes you’ll want to see what happens when a branch is taken (or not). Definitely don’t touch the thumb flag, because that will bring everything to a screeching halt.

4) Stack viewer

The stack is a block of memory that used to hold register values that we need to save. Think of it as a bunch of books piled one on top of another. Normally, stacks are LIFO, or last in, first out, structures; the last book to be placed on the stack is the first to be removed, and if you want to access a book in the middle of the stack, you must first remove all the ones on top it.
In our example, each book is a register’s contents, which is 32 bits. We add a ‘book’ to the top of the stack with the push opcode, and remove a book by popping the book into a register.
Now, let’s say that you want to access a specific value in the middle of the stack. With a normal stack, you’d have to remove everything above said value. However, we have a stack pointer, which usually points to the topmost value on the stack. And since it’s a value like any other, we can manipulate it to retrieve values in the middle of the stack.
Notice I said that the stack pointer usually points to the top of the stack? That’s because sometimes it’s useful to reserve space on the stack to temporarily store values. To do this, we merely subtract a constant from r13, and voila! Note that the constant has to be divisible by 4.
Hold up, you say. Subtract a constant? Why would we subtract? Well, if you look at the leftmost column in the image, you’ll notice the addresses are ordered in decreasing order. In other words, the stack grows downwards. Why? Because the concept of the stack was invented in 1973 by an Australian person, and they can never do anything the proper way (MOD EDIT: This is totally untrue and I will be suing for defamation of character - Circles).
Ok, the actual reason is because the stack and another structure called a heap would occupy the same block of memory in an operating system. To maximize space efficiency, the heap starts at the bottom (smallest memory address) and grows upwards, and the stack starts at the top (largest memory address) and grows downwards. This is better than allocating half the space for each, since sometimes the heap needs a lot of memory, but the stack doesn’t, or vice versa. Of course, if both require a lot of space at the same time, then you get trouble in the form of a stack-heap collision. Note that you don’t have to worry about this here, since we don’t have a heap to worry about.
That being said, it is technically possible to overwrite other things. If you look at the addresses, you’ll notice they have the 03 prefix (the stack begins at 3007DFC), which is IRAM. There’s most definitely other data in there. If you raise the stack enough, you will eventually run into this data. Should you be concerned about this ever happening? Not really. If you are actually getting overflow errors, that probably means something else has gone horribly wrong.
The middle column, as you might guess, contains the value at the address.
The right column describes what action was taken when copying the value. Return from means that r14 (the link register) was pushed, while any other register being saved merely says Pushed {reg}. If space was allocated, then it says, well, Allocated.

5) Memory/Breakpoint Viewer

The image currently shows the memory viewer, so we’ll cover that first. It’s essentially identical to your hex editor, except you can edit any section of memory. Click in the area, press ctrl + g to bring up the Go-to window, enter the address, and press enter. You can even write the hex representation of your assembly code, if you really want to (I don’t recommend it unless it’s very short).

If you press Tab, you’ll switch to the breakpoint window, which currently looks like this:

Breakpoints are important enough to warrant their own section, so go look at that. Press Tab to toggle between the memory and breakpoint viewers.

6) Game Window + buttons

The game window is pretty obvious. You can have a second window pop up when the game is active by going to Options - Emulation Setup - Execute Games In - Separate Game Window if you prefer. To make the game active, click in the window; to pause, click outside the window or press Escape.

Of all the buttons under the game window, the two most important ones are Trace and Run Next. Trace executes the next opcode and halts. Run Next does the same, except when it encounters a bl, or function call (see Comparisons and branches in the Opcode Glossary). Trace will take you into the function after executing the bl command, while Run Next executes the entire function and halts at the opcode after the bl. In the bottom left corner, there will also appear a cycle count, which indicates how long it took to execute the function.

Note: Trace won’t execute swi, or software interrupts (such as that VBlankIntrWait we’re currently paused on), but Run Next will.

Reload, uh, reloads the game. Useful if you’ve royally screwed up and everything is on fire. Which will happen far more often than you’d like. Alternately, you’ve made changes to the rom and need to reload for them to take effect.
Note: You will accidentally click Reload instead of Trace. I’ve lost track of how many times this has happened to me, and yes, it is incredibly infuriating. You may wish to use the keyboard shortcuts instead: F7 for Trace, F3 for Run Next.
Also, savestates are your friend.

GBA Specs and CPU Specs bring up some handy-dandy no$gba doc. It’s basically a shorter and much more complicated version of this guide.

I’ve not used the remaining 3 buttons ever. Screenshot is probably fairly obvious, but I use ShareX for that. Edit Files allows you to edit your .asm files (which we’ll be writing later) with a built-in text editor, which is cool, but it’s not as nice as Notepad++ or Sublime. As to Upload, I have no idea what it does, since clicking it gives me this error:

Make of that what you will.

7) Menu bar

I won’t cover all the menu options here, only the ones I deem important.

Roms that you’ve recently used will appear under the File menu, although you may need to close and reopen the rom for it to appear on the menu.
-File - Write snapshot (or Ctrl + W) will create a savestate with the .sg1 file ending. As far as I can tell, you can’t make a savestate while the game is active, which is somewhat annoying.
File - Load snapshot (or Ctrl + L) will load a savestate. As with creating a savestate, the same must be paused. Give your savestates a good name so that you’ll know what it’s for later.
Strictly speaking, not a menu option, but still relevant: no$gba creates save files (.sav) in its Battery folder, not in the rom’s directory like VBA does. You can copy a VBA .sav and put it in the Battery folder; just make sure the rom isn’t running at the moment, otherwise the save file will be overwritten due to FE’s autosaving.
In Window, the first 6 options all end up opening the same window, but on different tabs:

This is extremely useful for graphics-related assembly hacks, although those may be a bit challenging for a brand new hacker. Graphics alone deserve a guide of their own; fortunately, one that has already been written by someone far more knowledgeable than I, and can be found here (along with a lot of other useful stuff).
Utility - Disassemble to File will bring up a box to select the number of bytes to disassemble. It begins with the address at the top of the main window, not at the highlighted line! The diassembled file is a .dis, which can be opened in a text editor.
Options: Play around with these if you’re not happy with the current layout. You can also disable layers here if you need to, which is something VBA can do with a keyboard shortcut. Under Debug, I personally prefer setting the Disassembler Syntax to Nocash Z80/X86 style, since that’s what I’m used to (I find it easier to read), but will use Native ARM/C64-style for this guide since that is the syntax that devkitARM expects. Less confusion down the line and all.
You can check out the other menu options at your leisure; as far as I know, none of them should break everything horribly if you don’t know what you’re doing. And if the game does break, just reload the emulator.

NOTE: If you make changes to the rom, either via memory viewer, or by writing code into the main window, the changes will NOT be saved if you hit Reload or close the emulator. This can be a good thing; if you wrote the wrong thing somewhere and the game locks up, reload and everything’s back to normal. On the other hand, if you made a change that you actually want to keep, you’ll need to change it in the .gba file yourself.

Breakpoints

When the game runs, there’s hundreds of thousands of opcodes executed every frame, meaning the odds of manually pausing the game when it executes a certain command is basically zero. This is where breakpoints come in. When you set a breakpoint, the game will halt when the breakpoint’s conditions are met.

There are a few different kinds of breakpoints that we can use. The trick is figuring out which one to use to get the information that you need.

Break on execution
A break on execution is set on an opcode at an address. To set one, you can either press Ctrl + B, write the address (don’t forget the 8 if it’s a rom address), and hit enter, or navigate to the line in question and click it twice (once if it’s already selected) or press F2. The line should change color and say BRK on the right side:

That’s it! If an opcode has a breakpoint on it, the game will halt on that line, ie, before execution.
If you want to set a break on the currently highlighted line with the breakpoint window, you can use $ instead of the address (useful for conditional execution breaks, see below).

Break on read/write
You can break on read or write to a place in memory, which will be using ldr or its variants for reads, and str and its variants for write. To set a break, press Ctrl + B, and write the address with brackets around it, and the appropriate ending:
? is a break on read
!! is a break on write
! is a break on write if the new value is different from the old
!? or !!? is break on both read and write (write adhering to the same rules as above). The order of ! and ? don’t matter.
So if I want to set a break on read to 202BCFE, I would write [202BCFE]? in the Set Breakpoint box.
You can also set a break on read/write to a range of memory. Simply put the two addresses in brackets, separated with 2 periods:
[202BCFE..202BD0D]? will break upon reading any value between those two offsets, including the endpoints.
Note that the game will pause after executing the command that did the read/write.

Other/conditional breakpoints
You can set conditions to a break. For instance, you might want to break on the current line only if r2 is 0. Simply press Ctrl + B, enter your address (you can use $ for the current line), and separate the conditions with a comma, like $, r2 = 0.
You can also use the conditions by themselves: a break when r2 = 0 would be written as r2 = 0. Of course, that will almost certainly bring up a ton of false breaks if you’re looking for a specific instance of r2 = 0, but that’s part of the fun! Right? …Right? Ok, I guess it’s not fun, but you’ll have to put up with it anyway. Also, please note you can only have one break on execution per address, and only one condition per breakpoint.

All breakpoints will appear in the breakpoint viewer:

You can remove one by selecting it and pressing Delete, or clear them all by pressing F10. You can have a maximum of 10 breakpoints at any one time. If a breakpoint causes a halt, the responsible one will be highlighted.

Please note that if you close the emulator, or it crashes, your breakpoints will not be saved. It may be useful to write down important ones in case such an incident happens.

Tequila · April 6, 2018, 9:32pm

Project 1: Changing Mounted Aid

Alright. You’ve had a crash course in the assembly opcodes, and a basic idea of what each part of no$gba does. Next thing we’re going to do is take that knowledge and apply it by making a small hack.
The idea behind this guide was to not only explain the basics of assembly hacking, but also to show how to I would tackle a problem. Whether that works for you remains to be determined, but hopefully this part will help you develop a plan of your own.

Project 1

Currently, mounted aid is as follows:
Male: 25 - Con
Female: 20 - Con
I never liked this. I have decided to change it to the following:
Unpromoted mounted: 20 - Con
Promoted mounted: 25 - Con

To do this, we need to find where the game checks whether a unit is female or not, and modify it so it checks if the unit is promoted instead. To do that, we need to figure out how a unit is determined to be female. Fortunately, we know that: if you look at your Class and Character Editors folder in your Nightmare modules (I told you they’re a good reference!), you’ll see a file called Ability 2.txt. And if you open that, you’ll notice line 66 says 0x40 Female. This means the 6th bit in this byte-length bitfield (2^6 = 0x40, we start counting at 0) is set if the unit is female.
Given this knowledge, a logical course of action is to set a break on read on a particular unit’s character or class abilities (it doesn’t matter which, and you’ll see why later) and then see where it checks for this particular bit. However, we’re not going to do that, and again, you’ll see why later.

Another possibility is to set a break on read to a particular unit’s Con bonus, because we know that Aid depends on Con, and thus Con will have to be calculated at some point. In order to find out where that’s located, we shall refer to my notes, affectionately dubbed the Teq Doq. Under Character and Battle Struct, you should find this:

This means that given a unit’s data struct pointer, the 0x1A’th byte will be the constitution bonus. So how do we find a unit’s data struct pointer? There’s a pointer to the current unit’s data struct at 3004E50. Remember this address.

Given this knowledge, we’re going to select Seth with the A button, and then go to that offset:

If you reverse those bytes (remember, little endian), you get 202BE4C. That is where Seth’s unit data currently resides, and from before, we know that 202BE4C + 1A should be his constitution bonus. Let’s confirm that’s actually the case:

which should take you here:

Seems the constitution bonus is 0, which is as expected, since Seth hasn’t had an opportunity to use any Body Rings yet. But we can also verify by changing the value, and checking whether it changes in the game. If I change the byte to 01…

…we verify that we have the correct byte.
Now that we’ve done that, let’s set a break on read to that byte. Press ctrl + B and enter the following:

Notice that we can let no$gba do the math for us! Very handy. Hit Enter or press OK, and the breakpoint should show up in the breakpoint viewer. Now we have to perform an action that makes the game read the con bonus byte, so go ahead and pull up the stat screen (if it’s already up, exit out/change characters and return), and you should have a pause here:

Recall that a break on read will pause after the command that did the break, which means the ldsb r0,[r1,r0] at 87332 is what caused the halt.
Now we have to figure out what exactly is going on.
Given that r0 = 0x1A from 87330, we can deduce that r1 has the unit data pointer, since [r0 + r1] = con bonus. A glance at the register list proves that r1 = 202BE4C.
87334 adds r0 and r3. Given that r0 is the con bonus, I’m going to assume that r3 has either the character constitution value, the class constitution value, or the sum of the two. If you look up a bit, you’ll find that the last option is the correct one. So r0 will have Seth’s total constitution.
Now I’m going to keep an eye on what happens to r0. At some point, I’m expecting to have the value subtracted from 0x19 (25d) to calculate Aid. 87336 shows that r0 gets copied to the stack (str r0,[sp]), and then r0 is immediately overwritten.
At 87348, we have a function call: bl 80870BC. So far, nothing has been done with the constitution value still sitting on the stack. It’s possible that the Aid calculation will be performed in this new function, but I get the feeling that’s unlikely.
What I’m going to do is make a note somewhere that we had a break at 87334, and then continue seeing if there are any more break on reads to 202BE4C+1A. If there are not, then we know that this is the code we’re looking for, somehow, and we’ll look further into the function at 870BC.

To resume play, we’re going to click back in the game window. And whatdya know, there’s another break, this time at 189FA:

Well, this looks promising. The code looks similar to the previous break, which makes sense, since we’re calculating constitution again, although the registers are switched around. More importantly, the next line is mov r0, #0x19, which is the number we’re interested in! Let’s follow that unconditional branch at 189FE. Simply select it, and then press the right arrow key on your keyboard. You should be taken to 18A12:

r0 = r0 - r1, where r0 will be 0x19, and r1 is the con value. Hooray! We’ve found the relevant section of code! (You can verify this by seeing if there are any more breaks, which there shouldn’t be).

Next step is to find the check for whether a unit is female. We don’t know exactly what the check looks like, but we do know that there’s going to be a conditional branch. So we’re going to scroll upwards until we find either a branch of some sort (keep an eye out for those arrows), or we find a push indicating the beginning of the function.
Well, we don’t have to look very hard. 189EA has a conditional branch to 18A00, which is immediately after the unconditional branch that we take to the sub r0,r0,r1 opcode. And the code in those two sections actually looks awfully similar…

In fact, the only difference is the last line in each block: mov r0,#0x19 vs mov r0,#0x14. Or, in decimal, 25 vs 20, which is the difference between male and female mounted aid. Given that, we can conclude that the conditional branch at 189EA is taken if the unit is female, and not taken if the unit is not female.

Now let’s take a look at the code that makes up the comparison/branch combination:

That top branch is unconditional, so it’s not interesting, except to show that we want to look at what comes immediately afterward.
The next two lines put the value 0x4000 in r0.
Finally, we check if that bit is set in the value in r1, and branch if is set (if r0 & r1 != 0). We’re not quite sure what r1 is, although you might be able to hazard a guess. Let’s scroll up and find out where r1 is set.
First, I’m going to look for a conditional branch. I assume there is one, otherwise we’d have taken the unconditional branch at 189E0. r1 will have had to be set before that.

Found another branch at 189CC, which takes us to 189E2, as expected. Then I check each opcode argument one by one, going backwards, from that point, looking for r1 on the left side of the arguments column, since that means r1 is being modified. We see 189C4 has orr r1,r0, and that seems to be the most recent time r1 was edited. So we scroll up a bit further to see what the values being orr’d are.

Well, we actually found the beginning of the function, since there’s a push. Let’s analyze what happens.
First, we push r4 and r14. r14 is usually pushed, unless the function is very short, but r4 being pushed implies that it will be used.
mov r4,r0 is saving the parameter that was passed in, implying that it will be used multiple times. Currently, r4 is 202BE4C, which is a specific unit’s data pointer. We can infer that this function takes a unit data pointer, and given what we’ve seen happens later, returns the unit’s Aid. That makes this function a “getter”, because it, uh, gets data. If this is a getter function, that makes our job much easier, because every time the game needs to calculate Aid, it’ll call this function. Any changes that we make to Aid won’t have to be repeated elsewhere (for instance, when calculating whether a unit can rescue another unit).
ldr r3,[r4] gets the first word in the unit data struct. If you refer to the Teq Doq, it says “pointer to character data”. That refers to the Character Editor table, where you can edit things like name, description, portrait, initial weapon ranks, etc.
Similarly, ldr r2,[r4,#0x4] loads the second pointer in the unit struct, which points to class data.
The next two are basically the same. We load the word at +0x28 of the character and class structs, respectively. If you want to find out what those are, open up the class and/or character editor nightmare modules (.nmm) files in a text editor, and find out what’s at the 0x28th (40d, since nightmare modules are indexed in decimal) offset:

Notice, however, that we are using ldr, not ldrb, meaning that we load all the character abilities at once. Remember how I said that we shouldn’t set a break on read on class abilities back when we were trying to set breaks? There were 2 reasons for that.

We would have set a break on read on the second byte, logically, since that has the bit that represents the ‘female’ ability. So if class abilities start at +0x28, we would set a break on read to +29. Problem is, since all the abilities are loaded at once, as a word, setting a break on read to the second byte would never break.
Ok, you say. We’ll just set a break on read to the first byte. And technically, this would work. However, you would almost certainly get a ton of breaks that aren’t related to checking the Female ability, due to checking other bits. I opted to not use this method because it was inefficient, not because it wouldn’t work.

Now that we have the class abilities in r0 and the character abilities in r1, we orr them together to get the sum of the unit’s abilities in r1. This is the value we wanted to know the origin of. Given this, we can figure out what the conditional branch at 189CC is for; it’s checking if bit 0 is set in the abilities, which is the ‘Mounted’ ability. Or, in pseudocode, the function ends up looking like:

if unit is not mounted:
   calculate constitution
   r0 = con - 1
   goto End
else:
   if unit is not female:
       calculate con
       r0 = 25
       goto CalculateMountedAid
   else:
       calculate con
       r0 = 20
   CalculateMountedAid:
   r0 = r0 - con
End:
return Aid in r0

We would like to change the if unit is not female to if unit is promoted. First, we have to find out how to determine if a unit is promoted. That’s actually easy: it’s another ability. Specifically, byte 2, 0x1, or bit 0x100 in the ability word. Female was byte 2, 0x40, or 0x4000 in the ability word. We generated that value with
mov r0,#0x80 lsl r0,r0,#0x7
How many times do we need to shift 0x80 to get 0x100 instead of 0x4000? Pretty easy. Once to the left. So instead of lsl r0,r0,#0x7, we need lsl r0,r0,#0x1.
Are we done? Not quite. Previously, we branched if the bit was set, because the result of and r1,r0 would be non-zero. Now, we want to branch if the bit is not set, ie, the result of and r1,r0 is zero. So we just need to change bne 0x8018A00 to beq 0x8018A00. Do this by clicking on the line, writing the appropriate command, and hit Enter.

->

Now to test. We expect nothing to change, because Seth is a promoted mounted unit, so he should still use the 25-con formula. And if you check…

…indeed he does. Now, how to check whether unpromoted mount returns the right result? We don’t have one on the map. Well, that’s ok. We’ll just pretend that Seth is unpromoted.
First, remove your original breakpoint (the break on read to Seth’s con bonus), and set a break at 189EA, right on the beq.

Then close the stat screen and reopen it. The game should pause.

See the ‘false’? That’s expected, because the bit was set, so the result is non-zero. But we don’t need to change the value. Remember that conditional branches depend on the flags that are set/unset? Well, beq will branch if the z (zero) flag is set. It currently isn’t, but if we were to check it manually…

…suddenly it’s true! The game is convinced that Seth is unpromoted. Let’s see if that claim is backed up by the stat screen.

It is! 20 - (11 + 1) = 8, as we expected. We have successfully accomplished what we set out to do.

…Or have we? We haven’t verified that this same function is used to calculate Aid when, say, seeing whether to display the Rescue option on the unit menu. So let’s do that. Set a break somewhere in the function, pull up the unit, and see whether you get a break. You did? Ok, good. See why having a getter is handy? And why not having one would be a pain in the ass if you need to track down every single instance? Like, say, for constitution and move?
No, I’m not bitter at all. Why do you ask? Anyway, MOVING ON!

If you close no$ or reload the rom, these changes will vanish, and that just won’t do. You could implement the changes manually in the rom; it’s literally 4 bytes. But instead, we’re going to make this into a file that can be inserted with Event Assembler, either by itself, or as part of a bigger project. Search for ‘buildfile’ on the forum for more details.

I use the following template for my hacks:

//FE8 Something or other
//By Tequila

#ifndef _FE8_
    ERROR "You're not assembling to an FE8 ROM!"
#endif

#include EAStdlib.event
#include "Extensions/Hack Installation.txt"

#ifndef FreeSpace
    #define FreeSpace 0xB2A610
    ORG FreeSpace
#endif

PUSH

POP

Let’s go through this line by line:

First, anything that begins with // is a comment. You can use /* */ around something to comment that something out; it also works over multiple lines. So the first two lines are a description of what this hack is for. In this case, I would call this something like “FE8 Mounted Aid Rework” or something.

#ifndef _FE8_ is for the radio buttons on Event Assembler’s main screen. If you accidentally select the wrong rom to assemble to, EA will throw an error, since you probably don’t want to put an FE8 hack in an FE7 rom, for instance.

I #include a couple of files that come with EA. EAStdlib (EA Standard Library) contains definitions for characters, classes, and items, along with a few other things. Hack Installation.txt has macros for, well, installing hacks, which will be covered a bit more in projects 2 and 3.

The FreeSpace bit isn’t actually necessary for this particular hack. I’ll explain when it’s useful in the PUSH/POP part.

ORG stands for origin; it basically means “start writing at this offset”.

PUSH and POP are for the ‘cursor’ position (ie, where things are getting written to). Why is this useful? Let’s illustrate with an example. Assume we have 3 chunks of data, 2 of which need to go in free space (called F1 and F2) and 1 which needs to go at offset 0x1000 (called X). We can make sure everything goes in its proper place as follows:

ORG 0x1000
<write X here>

ORG FreeSpace (whatever that might be)
<write F1 here>
<write F2 here>

First, the cursor is set to 0x1000, and X is written beginning there. Once that’s done, the cursor value is changed to FreeSpace. First, F1 is written, beginning at FreeSpace. Then F2 is written immediately following F1. No issues whatsoever. Now let’s change the order:

ORG FreeSpace
<write F1 here>

ORG 0x1000
<write X here>

ORG ????
<write F2 here>

We write F1 and X without any issues, but then F2 is a problem. If you don’t have an ORG, EA will keep writing after the last thing it wrote, which was X at 0x1000, and you definitely don’t want to keep writing there. If you know how long F1 is, you could replace the ???? with value (FreeSpace + length_of_F1). But what if F1 changes at a later point? Or maybe you want to insert something in between? We don’t want to have to keep updating the ORG; that sounds suspiciously like work.
Instead, we’re going to let EA take care of space management for us by using PUSH to save the current cursor position, and POP to retrieve it, as follows:

ORG FreeSpace
<write F1 here>

PUSH

ORG 0x1000
<write X here>

POP

<write F2 here>

Once F1 has been written, we save the current cursor position with PUSH, go to 0x1000 to write X, then retrieve the cursor with POP and continue writing F2 after F1. Handy, no?
Basically, you write your inline stuff (anything that has an ORG) after the PUSH, POP once you’re done, then write the stuff in free space.

For this mounted aid rewrite hack, all our changes were inline, so everything will be after the PUSH. First, we changed line 189E4, so begin with
ORG $189E4
You can use $ or 0x to indicate hex; they’re interchangeable, but if you don’t have either one, EA will assume the value is decimal, and that may lead to issues.
At this address, we’re going to write the hex of the opcode we changed. Since opcodes are halfwords/shorts, we’re going to use SHORT:

ORG $189E4
SHORT 0x0040

We can do the same with the other line we modified:

ORG $189EA
SHORT 0xD009

Lastly, I recommend adding comments to explain briefly what each thing is for.

//FE8 Mounted Aid Rework
//By Tequila

#ifndef _FE8_
ERROR "You're not assembling to an FE8 ROM!"
#endif

#include EAStdlib.event
#include "Extensions/Hack Installation.txt"

#ifndef FreeSpace
#define FreeSpace 0xB2A610
ORG FreeSpace
#endif

PUSH

ORG $189E4
SHORT 0x0040 //lsl r0,1; checking if unit is promoted rather than female
ORG $189EA
SHORT 0xD009 //changed bne to beq

POP

And that’s it! Save this, either as a .txt or .event (EA looks for .event files first, but it really doesn’t matter), then assemble it with Event Assembler, or #include the file as part of a larger project.

Congratulations, you made a hack. A short one, to be sure, but that’s fine. Gotta start somewhere, after all.

Tequila · April 27, 2018, 2:22am

Project 2: Modify Staff Exp

Our next project involves something that many people have asked about over the years: changing the experience gained from using staves.

This time, I’m going to try and let you, the reader, follow along, by putting the answers in spoilers. Use them to double-check your work. If you get something wrong, don’t worry. I know this can be rather overwhelming if you’re doing assembly hacking for the first time.

Project 2

I’m proceeding under the assumption that we don’t know how staff experience is currently calculated. It’s possible that you do know, and if you did, you could solve this problem in a more efficient way. But let’s say we don’t.

First order of business is always to consult the documentation. Let’s refer to the Teq Doq and see if we can find something useful in the battle struct documentation.

Years go by. Civilizations rise and fall. The heat death of the universe is just around the corner…

Hmm.

0x6E Byte Experience gained in this battle

This looks relevant.

It should be noted that ‘battle’ doesn’t necessarily mean 2 units fighting each other. It’s more of an…interaction struct. So a unit healing another unit would use this same struct. So here’s what we’re going to do. We’re going to set a break on write to that offset, and then have a unit use a staff on another unit.

Well, the first staff user in vanilla FE8 is Moulder, and we can’t use him until chapter 2. Do you really want to waste precious time playing through the prologue and chapter 1? Even with Seth and the speed-up button, that’s about 1-2 minutes you’ll never get back!

I have a better idea. Let’s make Eirika a staff user.

Wait, what?

Oh, we’re not going to turn her into a cleric or anything. There’s two things that prevent her from using a staff: not having a staff, and not having a staff rank. Both of those are easy to rectify.

First, select her with the A button. Then go to the offset containing a pointer to the current character. What was it again? Oh, right. 3004E50. Her character data pointer should be at 202BE94, so let’s head on over.
I think a Heal staff (item 0x4B) with 5 uses should suffice nicely for testing purposes. Heal is an E rank staff, so Eirika needs a minimum of 1 staff exp to be able to use that. Using the Teq Doq, go ahead and try to give her the staff and rank.

Answer

Staff rank is at +0x2C, so write a 01 at 202BE94+2C.
Items begin at +0x1E, but Eirika already has a rapier and a vulnerary. You can replace an existing item with the heal staff, or add it at the end of the existing items. Doesn’t make a difference. Write the uses immediately after the item id, so 4B 05.

Now that that’s done, let’s make a savestate, so that if we mess up, we don’t have to redo this part. Press Ctrl + W when the game is paused, and give your savestate a good name, like fe8_staff_exp.

Now we can set our break on write to the “exp gained during battle” byte in the battle struct, which we will do by pressing Ctrl + B and writing

Answer

[203A4EC+6E]!

As soon as you select a target for healing (not that you have much choice in the prologue), the game should break.

(If you set the break earlier, you might get a break that writes 0 to this byte to clear the battle struct. Just keep going.)

The most recently executed opcode is the one that caused the break, so the culprit is strb r0,[r1]. We can tell that r1 is r4 + 0x6E, which implies r4 is the attacker battle struct at this time, and r0 has 0xB (11d), which is the exp given from a heal staff. r0 was obtained from the function call to 2C638, so our next order of business is to trace through that function and figure out how the staff exp is calculated. Set a break at 2C5EA (you can get rid of the break on write one, since we now know where the exp is written), then reload your savestate with Ctrl + L.

Hit Trace to be taken into the function. Remember that Trace will execute 1 opcode at a time, while Run Next will also execute 1 opcode at a time, unless it’s a bl, where it will execute the function, return, and halt.

First thing we notice is that the attacker struct has been passed in. Which makes sense, since we need some of the data in it (not sure what yet, but item id is probably a given) to calculate exp.

After saving the battle struct, 2B9F4 is called. Before looking at it, I’m going to see what happens with the results of the call, and then I might be able to guess what the function does.
First, notice that it returns something, since r0 is immediately checked. The lsl r0,r0,#0x18 is a cast to byte, just in case the function returned something that wasn’t a byte, so I’m going to ignore it. We check if the returned value is 0, and if it is, we put 0 in r0 and go to 2C69A. If you follow that, you’ll see it’s the end of the function.
So if this function returns 0, the exp gained in this battle will also be 0. Given this knowledge, what do you think the function at 2B9F4 is for?

Answer

The function checks whether the unit can gain experience, and returns False (0) if it cannot.

If you want to verify this for yourself, you can read the function. First, you’ll notice the battle struct was passed in. The first part of the function checks if [202BCB0 + 4] && 0x40 = True, and returns 1 (True) if it is set. Not entirely sure what that is, but whatever. Next, we check if the unit’s current exp (battle struct +0x9) is equal to 0xFF (which shows up as – on the stat screen), and returns (0) False if it is. Finally, we check if the deployment id (battle struct + 0xB) has either of the top 2 bits set, indicating the unit is either an enemy or npc, and returns False if those bits are set. We know that units that are capped (have – exp) and enemies/npcs can’t gain exp, so I can confidently say that the function checks whether the unit can gain exp, even if I’m still not entirely sure what’s at 202BCB4.

As you practice, you’ll get better and seeing a piece of a function and being able to determine fairly quickly what it’s supposed to do.

If you use Run Next, you’ll find it returns 1, so we jump over the mov r0,#0; b End section. Now we are confronted with another odd thing. If [203A5EC] && 0x2 = True, return 1 and jump to the End. Hmm…under what circumstances does using a staff only give 1 exp?

Answer

When an offensive staff (berserk, sleep, silence) misses

Either way, looks like that wasn’t true, so we’re not returning 1. Instead, we load the halfword at [battle struct + 0x48], which is… equipped item and uses after battle. The item in question would be the heal staff. Next, we call 0x17754. This isn’t a difficult function to dissect, so give it a try. Hint: the item table is at 0x809B10.

Answer

The function loads the halfword at +0x1A of the item’s entry. If you look at the item editor nightmare module, you’ll see that’s the item’s cost per use.
Note: If you have an item editor module that says the item table begins at 809B34 instead of 809B10, it’s off by 1, because whoever made the module decided not to include the (blank) 0th entry. Don’t worry too much about that.

Looks like the number returned will be used immediately in the function call to D18FC, which also takes a parameter in r1. This function is significantly harder to understand. Fortunately, I have another document of useful functions that I frequently use, appropriately called Useful Functions.txt. Let’s search for D18FC and see if that’s in the list.

So we divided the cost per use by 0x14 (20d). Next, we move the result to r2 and add 0xA (10d). So our formula for staff exp, so far, looks like

(cost/20) + 10

A bit of a strange formula, but it kind of works: more expensive staffs give more exp.

From Project 1, you should be able to tell what 2C678 - 2C68A do.

Answer

Load the unit’s class and character abilities, orr them together, then check if 0x100 is set. You might recall this is the “is promoted” bit. In other words, check if the unit is promoted.

If the result is non-zero, we divide the exp by 2. (The lsr r0,r2,#0x1F; add r0,r2,r0 bit would only change the result if the original number is negative, which shouldn’t happen, so the only opcode that’s important is asr r2,r0,#0x1). If you guessed correctly (or read the spoiler), experience is halved if the unit is promoted. If it isn’t, we go straight to a check to make sure experience gained is less than or equal to 100, and finally, we return the value.

In conclusion: If unit can gain exp, and the “attack” didn’t miss, exp = (cost per use/20) + 10, halved if unit is promoted.

Now let’s say we’re not happy with this arrangement. You want Heal to give, say, 17 exp per use, not 11 as it currently does, but you don’t want to bump up the price to 140 gold per use.
We need to associate the experience gained with the staff id in some way. Say, a byte in the item data. My idea is to use the staff’s Might byte to store exp. Why might? Because it’s not used for anything else. We could also use hit or crit; it really doesn’t matter. I’ll be using might, though.

First off, we need to find out which byte it is in the item struct. Refer back to your handy-dandy nightmare module.

Answer

It’s byte 0x15 (21d). Might be called Power instead of Might, though.

Now, we could write some code to get that byte. But you know what? We found a getter for an item’s cost per use. It strikes me that there very well could already be a getter for item might. Perhaps we should go and have a look around in the area…

If you look up, you’ll see a function that looks identical, except that it loads byte 0x1E rather than short 0x1A. So let’s see if there’s one that loads byte 0x15. Hint: They won’t necessarily be in order, ie, 1E coming before 1A.

Answer

It’s at 175DC

We can now change the bl 8017754 to go to the function we found, instead. Once that’s done, the value is in r0, but it has to end up in r2, since that’s where it’s expected to be. We don’t need to divide by 20 or add 10, so you can either put an unconditional branch to where the code resumes, or replace the redundant opcodes with nop or 0.

Final result

Now, if you were to try this as is, Eirika won’t gain any experience, since all staffs have 0 might. You’ll have to change that externally.

Now, it’s time to make an EA buildfile. Like project 1, it’s going to be fairly short, and everything is inline, ie, everything is between the PUSH and POP in the template (see Project 1 for the template).
If you followed the previous project, you might be tempted to put:

ORG $2C66A
SHORT 0xF7EA 0xFFB7 0x1C02 0xE002

This works fine. Absolutely nothing wrong with it. But representing bl like that is kind of ugly; you can’t tell what it’s calling at a glance. And you know, we’ve had that #include "Extensions/Hack Installation.txt and not actually used it yet. Well, here’s a good opportunity to use one of the macros in it:

#define BLRange(pointer) "((pointer - CURRENTOFFSET - 4)>>1)"
#define BL(pointer) "SHORT (((BLRange(pointer)>>11)&0x7ff)|0xf000) ((BLRange(pointer)&0x7ff)|0xf800);"

Looks scary, I know. But all you need to know is that we can replace that SHORT 0xF7EA 0xFFB7 (the bl) with the much nicer BL(0x175DC).

ORG $2C66A
BL(0x175DC) //gets item might
SHORT 0x1C02 0xE002 //mov r2,r0; b 2C678

Isn’t that much prettier? If you’re wondering what the other macros like call_hackr# and jumpToHack are for, those will be explained in project 3, so keep reading!

Incidentally, in case you’ve never done events before and are unfamiliar with EAstdlib.event (which we’re also #include’ing), it’s basically a list of definitions and macros for the different games. For instance, if you have a list of items for a particular hack, instead of BYTE 0x1 0x2 0x3, you could write BYTE IronSword SteelSword SilverSword instead. You won’t always use it, but there’s no harm in #include’ing it multiple times, so you might as well.

Tequila · April 30, 2018, 2:31pm

Project 3: Project With a Vengeance

I didn’t have a better title. Sue me.
Since I couldn’t find a better project that was appropriate for a beginner, we’re going to be changing some battle calculations. This project will have you writing assembly code in a text file and assembling it, rather than writing it directly into the debugger.

Project 3

Today, I woke up and decided that thieves and rogues don’t do enough damage.
But Tequila, thieves aren’t supposed to be damage dealers, you say.
Says who, says I.
Says I, says you.
Well, I’m the boss, and I decided that thieves should deal more damage, I say.

WIZARD’S CREED: Do first, ask whether it was a good idea later.

First, let’s see what the calculations currently look like. For this, Serenes Forest is a good resource. Go to Sacred Stones -> Calculations, and you should find this:

I think that, for thieves and rogues, instead of using Strength, we’re going to change the calculation to (Strength + Skill)/2. Seems reasonably fair, no?
If it doesn’t, see the Wizard’s Creed again.

Now that that’s settled, it seems that we have 3 different things to find and modify, right? Well, actually, #1 and #3 are basically the same, merely replacing strength with magic. Since there’s no str/mag split in vanilla FE8, those are effectively the same. So there’s 2 things we need to find and change.

Let’s start with #1, and think about where and how to set a break. Since this is a battle calculation, it stands to reason the battle struct will be involved. And since the numbers need to be calculated for the pre-battle display…

…we can set a break on read on the weapon selection screen.

Ok, that takes care of when to set the break, but we still don’t know on what to break. Well, we want to modify the part where it gets strength. Seems to be that we should break on read to the strength byte in the battle struct, which we do by entering…

Answer

[203A4EC + 0x14]?

So let’s go ahead and set that break. Here’s where I ended up.

Here’s my thought process when I see this:

First, set a break on the line that we stopped on. This is for 3 reasons:
1), if I accidentally click in the game window and lose my place, I can easily get back.
2) if you set a break on execute, you can double click on the break in the breakpoint viewer to be taken to that line.
3) If you’re scrolling up and down to take a look at the surrounding code, the BRK on the side acts as a bookmark to draw your attention.

Next, I’m going to write down my train of thought as I go through this.

First, I look at registers. I see that r6 is used. This tells me that at the bare minimum, r4-r6 were pushed. Also, looking at r6, it’s the attacker struct (203A4EC).
It’s probable that if the attacker struct is saved, as it is here, the defender struct is also present. I see it at r8 and r12. r12 I immediately disregard, since that’s treated as a scratch register. If r8 was pushed, then I can use the value in r8; otherwise, I should ignore it. If I scroll down a bit, I see that we’re near the end of the function because there’s a pop at 2AB6A, and r8 is in fact restored, so I know I can refer to it if I want to.

Uh...why do we care?

You might be wondering why I made a big fuss of seeing whether r8 was being used in this function or not. Well, here’s a story that might explain why.

One of my first hacks was creating an item such that if it was in a unit’s inventory, that unit would have the Acrobat skill, ie, all traversable terrain costs 1 mov. I did this by finding the function that copies the class’s movement costs from ROM to IRAM, which looked something like this:

For each movement cost value:
    store cost in iram

Pretty easy, right? I changed it to:

set flag (a register) to 0
if unit has acrobat:
    set flag to 1
For each movement cost value:
    if cost != 0xFF (indicating that terrain is not traversable):
        if flag is set:
            cost == 1
    store cost to iram

The problem was that the function was not passed in the unit data struct, only a pointer to the class’s movement cost table entry. However, I happened to notice that r4 had the unit data struct from an earlier function, so I just went ahead and used that. Everything worked fine…until I went outside the highlighted squares, came back in, and the game crashed, because r4 was no longer what I expected it to be since it had been called from a different place.I later found there were about 14 calls to this function, and Circles taught me of the wonder that is 3004E50.
Moral of the story is, don’t use an important register in a function unless it’s been pushed.

Do I actually need the defender struct for this particular calculation? No, actually. But it might be useful to know for the future. All we actually need is the attacker struct. Which we have. Hooray.

However, there’s an issue: space. We don’t have room to insert our code here (at this point, if you don’t believe me, just take my word for it). So what we have to do is jump to free space, insert our code there, and then return once we’re done. There’s a couple of ways to do this, both involving macros found in EA’s Extensions/Hack Installation.txt, jumpToHack and callHack.

jumpToHack:

jumpToHack is pretty simple. It looks like this:

ldr r3,PlaceToJumpTo
bx r3
PlaceToJumpTo:
@literal here

As the name suggests, it’s a simple jump. Here’s an example of using it.

PUSH
ORG $10000
jumpToHack(MyHack)
POP
ALIGN 4
MyHack:
#incbin "MyHack.dmp"

You might recall that bx requires addresses to be have the first bit set (ie, the address is odd) if you want the function to be executed in THUMB mode. The macro takes care of that for us, so you don’t need to use jumpToHack(MyHack+1) or anything.

callHack

There’s a variant for each low register: callHack_r0, callHack_r1, etc. I’ll use callHack_r0 as an example:

ldr r0,FunctionToCall
bl bx_r0
b End
FunctionToCall:
(literal is here)
End:

Slightly more complicated, yes, but not too bad. As with jumpToHack, we first load the address we want to go to. However, instead of jumping directly to it with bx, we bl to a bx (if you look at the macro, the address of bx_r0 is 0xD18C0, which is, as you might expect, a bx r0). We do this so we can return when we’re done executing code, but aren’t restricted with bl's range limits. Some instruction sets have a blx opcode which does the same thing, but we don’t.
Once we return from the bl, we need to jump over the literal that we loaded, since we don’t want to execute that as code, hence the b End. After that’s done, business as usual.
Using it looks virtually identical to jumpToHack, aside from having to make sure you use the register to jump with:

PUSH
ORG $10000
callHack_r0(MyHack)
POP
ALIGN 4
MyHack:
#incbin "MyHack.dmp"

That’s great, but what do we use?

First thing we must note: both of these macros need to be placed at word-aligned offsets.
The main difference is that jumpToHack doesn’t return, and callHack does. If you’re replacing an entire function, or you want to expand an existing one and add a jump to the end of it, jumpToHack is perfect.
The second difference is that jumpToHack takes 0x8 bytes (2 opcodes and a literal), while callHack takes 0xC bytes (3 opcodes, one of which is bl, and a literal). There may be occasions where you need to insert a jump but simply don’t have 0xC bytes free where you need them. In such a scenario, I would use jumpToHack, and then jump back after executing the new code. I don’t particularly like doing this because it requires hardcoding the return address, but sometimes you don’t have a choice.

Example

In my FE7 str/mag split, I had a bug brought to my attention where arena fights wouldn’t work properly if the arena’s equipped weapon was magical and the unit’s actual equipped weapon wasn’t, or vice versa. The issue turned out to be because IntSys didn’t expect units to have both physical and magical weapon ranks, so they never thought the currently equipped weapon and arena weapon would be different types. I

ANECDOTE TO BE FINISHED UPON ACCESS TO OTHER COMPUTER

Ok, back to our situation. I’m going to use callHack, because there is still code to execute later. Now we have to figure out 2 things: 1) where do we insert it, and 2) which register do we use to jump.
To answer 1), we need to find a word-aligned offset at or slightly before the code we want to modify (which is around 0x2AB50), count off 0xC bytes, and make sure that area isn’t complicated. What do I mean by complicated? If there’s a branch in the middle of the code you’re replacing, that’s a bit annoying (since you’re probably jumping to free space, you’ll need to replace b with bx and it just makes things uglier). If there’s a branch from elsewhere to your code, then you have a big issue; trying to jump into the middle of another jump will probably crash/hang the game, or at the very least not do what you want. So you have to have an idea of what the code surrounding the block you’re using to jump looks like!
If you scroll up a bit, you’ll notice 0x2AB3C has a jump to 0x2AB48, so our jump needs to be later than that. Either 0x2AB4C or 0x2AB50 would work. I’m going to use 0x2AB4C.

Using breakpoints as markers. Everything between them, including the endpoints, will have to be copied over. That’s why I prefer to determine where the jump goes before writing my assembly code.

Now that I know where to insert the jump, I need to figure out which register to jump with. To do that, I check what registers are using during the jump itself, and what will be required later, in case I need to preserve those values. In the jump block, none of the scratch registers (r0-r3) are used. After we return, the only register used is r0, and that’s filled in after returning. So that means we can use any scratch register to jump with. By default, I like using r3, so we’ll use that.

We can now start filling in our buildfile, after picking a suitable label name:

//FE8 Make Thieves Mediocre Again
//By Tequila

#ifndef _FE8_
    ERROR "You're not assembling to an FE8 ROM!"
#endif

#include EAStdlib.event
#include "Extensions/Hack Installation.txt"

#ifndef FreeSpace
    #define FreeSpace 0xB2A610
    ORG FreeSpace
#endif

PUSH

ORG $2AB4C
callHack_r3(MyThirdHack)

POP

ALIGN 4
MyThirdHack:

You’ll notice that I made a new buildfile for this minor project, but you don’t have to. You can add it to the one you made for projects 1 and 2 if you’d like.

Well, we have our jump, or hook. Next, we’re gonna write some assembly. Are you ready? ARE YOU READY??

crickets

Good enough.

First, let’s open a new file in your text editor of choice.

Perfect. Now, you’re gonna write .thumb at the very top. This is a directive, or command, for the assembler. If you don’t put .thumb, the assembler automatically assumes your code is in ARM mode, and it won’t work.

Next thing we’re gonna is do is push stuff. If you wanted to use the non-scratch registers (r4-r11), you’d do that here. We’re not going to, however, so the only register we’ll push is r14.

QUICK NOTE: Whitespace (spaces, tabs, extra line breaks, etc, are all ignored by the assemble. Use that to your advantage to make your code look neat; it’ll make debugging easier.
ANOTHER QUICK NOTE: To insert a comment (which you should use liberally), use @. If you’re new to this, it may be useful to comment every line to understand what it does and how it works.

Actually, I’m going to add a note saying where our function got called from; if I need to know that, I don’t have to refer back to our buildfile. And while I’m at it, I’m going to note what registers have things that are important to my code:

Next, we look at the code we replaced with a jump, and see what parts of it need to be included in our code.

add r4,#0x5A definitely needs to be added.
strh r5,[r4], at first glance, should also be. However, we’re going to immediately overwrite that value when we add strength (or whatever unholy bastardization of stats we come up with) to the weapon might, as at 0x2AB56. So this is completely redundant.
The rest is loading strength, adding it to might, and storing it. We’ll definitely be doing that later.

Conclusion: The only opcode we need to add right now is add r4,#0x5A. So let’s go ahead and do that.

Next, I’m going to insert my check for whether the unit is a thief. Hmmm. How do we do that? By consulting our Nightmare modules. Byte 4 of the class data is the class id, and we can get the class data from the battle struct.

Answer

Now, I want to compare that value to 0xD, and branch if not a thief. The nice thing about using an assembler like DevkitARM is the ability to use labels for branches. Rather than saying “bne 0xC bytes”, or something of that ilk, we can use “bne LabelName”, define LabelName somewhere, and the let the assembler take care of the “how far away do we jump” issue.
How do you define a label? Simply put the label name on a new line, followed by a colon.
NOTE: If you have a jump to a label, but don’t define the label, the branch will jump to itself, leading to an infinite loop. You can’t count on the assembler giving you an error when that happens! To avoid this happening, I usually define the label further downward so that I don’t forget about it.

So let’s make that branch, shall we?

Answer

Notice how I already defined the label.

Now we’re going to write code for each outcome. If the class is not a thief, we want to load strength. If the class is a thief, then we want to load strength and skill and average them. Once we have our value, then it’ll get added to weapon might and stored in the appropriate location.
That last sentence applies to both outcomes, right? There’s no point in writing the same thing twice. Therefore, a sensible thing to go would be to have your strength (or strength/skill average) result in the same register, and then use an unconditional branch to get both conditions in the same place. Go ahead and try that.

Answer

Next, we add our value to weapon might and store it in the right location. If you picked your registers correctly, it should be identical to the code replaced with the hook. If you didn’t, well, that’s not a big deal.

Answer

Finally, we just need to return to where we came from.

Answer

NOTE: We could have not pushed r14 in the beginning used bx r14 at the end, since we didn’t have any function calls (no bl's of our own). But it’s only two extra opcodes, and if we did insert a function call, we wouldn’t have to remember to add a push {r14} and pop/bx combo.

Voila! We’re done. I think. Here’s my complete file:

Answer

If your registers are a bit different, that’s not a huge deal. I try to start with r0 and work upwards as necessary, unless I know I need a particular value in a specific register. In our example, if you used r0-r3 as your scratch registers, that should be fine.

Now we’re going to save it. I save my assembly files as .asm, out of sheer habit. Notepad++ offers some syntax highlighting, which for me isn’t super useful, but it’s nice to see pretty colors.

Pretty colors

You could just leave it as a plain text file, if you’d like. Some people use a .s format, which I think allows you to define your own syntax highlighting, but I don’t actually know anything about that. Whatever you pick, the important part is saving the file.
NOTE: Organization is super duper important. I tend to make a new folder for each hack I work on:

https://i.imgur.com/HJo0IoM.png

Next, we’re going to turn your assembly code into a hex dump, using DevkitARM and Assemble ARM.bat. First, open up Assemble ARM.bat in your text editor and make sure the directory for DevkitARM\bin is correct.

The default is expecting the DevkitARM folder to be in the same folder as Assemble ARM.bat. I have a copy of the .bat in each project folder; for obvious reasons, I’m not going to have a copy of DevkitARM in each project folder as well. Thus, I put my DevkitARM in my C drive, and just copy and paste the .bat as needed.

(In case you’re wondering, devkitARM is part of a collection of tools for assembling different things, called DevkitPRO. It’s a pretty big file, and we only use the ARM assembler, so only that folder is hosted in the Unified Hacking Dropbox.)

Once you’ve gotten the start directory sorted out, take your .asm or .s file, and drag and drop it onto Assemble ARM.bat. You should get a command-line window that looks like this:

If you get an error saying “The system cannot find the path specified.”, double check the start directory path.
If you got a warning saying it added a newline, that’s fine; the file still assembles. To avoid seeing that message in the future, have a newline at the very end of the file. Don’t ask me why this is, because I don’t know.

If it did work, you should also get a .dmp file.

You can open a .dmp file in your hex editor. It may be useful to do this the first time to make sure it’s not empty.

Looks like it’s not. Last thing to do is add it to the buildfile. We’re going to use #incbin, which stands for "include binary (file), as follows:

MyThirdHack:
#incbin "MyThirdHack.dmp"

Time to put this thing in a ROM. If you have a makefile, you can use that. If you have no idea what a makefile is, double click Event Assembler.exe and select your rom (FE8), the buildfile (under Text), and the rom to assemble to (under ROM). Then hit Assemble. You should get a cheery message saying to please continue being awesome.
(Batch files/makefiles will be touched upon in a later section, since they can be quite useful.)

Now, we could check if the changes manifested in-game, but there’s one teensy issue: we don’t have a thief. We could change either Eirika or Seth into a thief with ramhacking, but there’s an easier way, in my opinion.

Open up the .asm file and change the class check from 0xD (thief) to 0x2 (Eirika lord).
Reassemble by dragging and dropping the .asm onto the .bat file.
Reassemble the buildfile using EA.

Eirika has 4 strength and 8 skill at base, and the rapier has 7 might. I’m expecting her attack power to be 7+(4+8)/2 = 13.

Sure enough, it works. Let’s just double check Seth. He has 14 strength and 13 skill, and the steel sword has 8 might. The average of strength and skill would be 13, since we round down, so if things work out properly, he should have 14+8=22 might.

Hooray! Looks like our little hack works as planned. Or does it?

Nope. Recall in the beginning, we said there were two calculations that needed to be changed. This one, and one involving magic swords.

Oh, bugger.

Well, time to find this calculation. Any ideas, dear reader?

Answer

It’s possible that there’s some documentation on the relevant routine for FE8; I know there is in FE7. This would normally be the first thing to look for, but I personally believe I can find the routine faster than I can find pertinent documentation.
My method is to ramhack a magic sword (say, a Light Brand) onto Seth (since he already has the appropriate weapon rank, which saves me some time). Next, I would set a break on write to the attack short in the battle struct (the thing we just edited). Finally, attack an enemy from 2-range, and voila!

You could also suggest setting a break on read to the item abilities word at +0x4C, which would yield a similar result.

If you thought about setting a break on read to the item abilities in the item table, you won’t find it, alas. First, because you’ll have to navigate a ton of false positives, and second because that’s not even checked in the routine. Good try, though.

Setting another break on read to the strength byte in the battle struct would work as well.

Using whatever methods necessary, you should have found the relevant function in the vicinity of

Answer

0x2AE06

UNFINISHED

Crazycolorz5 · November 5, 2019, 11:00pm

A post was merged into an existing topic: Asm hack help for more npcs?

GBAFE Assembly for Dummies, by Dummies

PLEASE DO NOT POST. THIS IS NOT FINISHED.

Greetings, fleshlings!

Table of Contents

Things you will need:

Reference

Definitions

About assembly

Opcode Glossary

Introduction to No$GBA (aka GET TO DA DEBUGGAH)

Breakpoints

Project 1: Changing Mounted Aid

Project 2: Modify Staff Exp

Project 3: Project With a Vengeance

jumpToHack:

callHack

That’s great, but what do we use?