[FE7] Miscellaneous notes about ASM contents

zahlman · January 20, 2015, 2:13am

00-BF = standard ROM header (see GBATEK)
C0-FB = ARM code infinite loop; sets up control registers and maintains a pointer at (IRAM) 7FFC to the user interrupt handler (this is standard boot-up stuff for GBA) and branches to a50 in THUMB mode (game code entry point, I guess)
FC-227 = ARM code for handling interrupts. It scans to find the highest priority interrupt request, looks up the corresponding handler in the interrupt vector table (stored at (IRAM) 28e0), calls it, and clears the request flags. If a certain interrupt flag is set (presumably is intended never to happen), it will go into a tight infinite loop (an instruction that branches to itself) - this check appears twice in the code.
228-9fb = Miscellaneous ARM code that gets copied to IRAM early on and is normally run from IRAM. The copying is done because the data bus to the ROM is only 16 bits wide, which makes it slow to run ARM code directly from the ROM. Of course, there is limited IRAM space, so only a few key routines get copied this way.
9fc = first THUMB code in the ROM
a50 = THUMB entry point
11b0 = copies values from IRAM (around (IRAM) 2870-28d8) into LCD I/O registers
1348 = returns one of (IRAM) 287c, (IRAM) 2880, (IRAM) 2884, (IRAM) 2888 - mapped to the LCD I/O registers for BG Control (see GBATEK).
1398 = return ‘character base block’ for the specified BG layer, in bytes
13cc = (r1 - 1398(r0)) / 32?
1400 = return ‘screen base block’ for the specified BG layer, in bytes
1434 = set ‘character base block’ for the specified BG layer, specified in bytes
It seems to go on like this for a while… these routines are terribly unoptimized btw, playing silly games with the stack pointer for no reason.
…
427c = Copies the 228-9fb code block to IRAM, at (IRAM) 2f40-3713. Also sets up a few pointers in IRAM to some of those routines:

(IRAM) 2f30 points to the routine copied from 564;
(IRAM) 3940 points to the routine copied from 6c0 (text Huffman decompression);
(IRAM) 2920 points to the routine copied from 494;
(IRAM) 3944 points to the routine copied from 534;
(IRAM) 4150 points to the routine copied from 760;
(IRAM) 2918 points to the routine copied from 850

Next come some (debugging?) wrappers for those routines. Each of these saves the input parameters on the stack, dereferences the appropriate IRAM pointer set up above in order to find the IRAM copy of the code, longcalls to the routine using the “bx ladder” (which also switches to ARM mode), then cleans up upon return. The stack values are never actually used for anything.

4338 = Wrapper for 564
4364 = Wrapper for 6c0
4388 = Wrapper for 494
43b4 = Wrapper for 534
43e0 = Wrapper for 760
4408 = Wrapper for 850

At bfa04 - bfa73 are a handful of routines that wrap SWI instructions (BIOS calls):

bfa04 - SWI 0xA (ArcTan2)
bfa08 - SWI 0xE (BgAffineSet)
bfa0c - SWI 0xC (CpuFastSet)
bfa10 - SWI 0xB (CpuSet)
bfa14 - SWI 0x6 (Div)
bfa18 - SWI 0x6 (Mod) *
bfa20 - SWI 0x13 (HuffUnComp)
bfa24 - SWI 0x12 (LZ77UnCompVram)
bfa28 - SWI 0x11 (LZ77UnCompWram)
bfa2c - SWI 0x25 (Multiboot; transfer mode hard-coded to 1) (I guess Link Arena might use this?)
bfa34 - SWI 0xF (ObjAffineSet)
bfa38 - SWI 0x15 (RLUnCompVram)
bfa3C - SWI 0x14 (RLUnCompWram)
bfa40 - Reset. Disable all interrupts (clear IME I/O reg); reset stack pointer to (IRAM) 7f00; SWI 0x1 (RegisterRamReset); SWI 0x0 (SoftReset). The passed-in r0 will be forwarded to RegisterRamReset (see documentation). Note that (see GBATEK) a flag can be set ahead of time at (IRAM) 7ffa that causes code execution to start at the beginning of WRAM instead of the beginning of the ROM o_O
bfa58 - SWI 0x19 (adjust SOUNDBIAS gradually down to 0)
bfa60 - SWI 0x19 (adjust SOUNDBIAS gradually up to 0x200)
bfa68 - SWI 0x8 (Sqrt)
bfa6c - SWI 0x5 (VBlankIntrWait) - sets r2 to 0 first, but the SWI doesn’t seem to care?

Note: The Div SWI normally returns r0 / r1 in r0, and r0 % r1 in r1. However, the compiler is going to expect that only r0 contains meaningful information, so a separate ‘mod’ wrapper is provided that calls the SWI and then copies r1 to r0. (Div also puts abs(r0 / r1) in r3…)

The SOUNDBIAS thing… you kinda have to read GBATEK, and also understand a thing or two about how audio waveforms work.

bfa74 appears to be for dealing with save data; it configures a control register (so that the wait state for SRAM access is the maximum 8 cycles), then copies a specified (r2) number of bytes from r0 to r1. bfab4 is identical; I can only assume that one is intended for loading and the other for saving, or something. bfaf4 compares values instead; if all the bytes match it returns 0, otherwise a pointer to the first mismatch.

At bfc4c is the “bx ladder” used for longcalls. It’s just a sequence of bx r0; nop; bx r1; nop; etc. for each register. I assume all y’all hackers know this already, but: the idea is that you load the target address of the code you’re calling into a register, and then BL to the corresponding instruction, in order to simulate the BLX opcode that’s missing on GBA. This allows for calling into code that’s currently in the IRAM, for example, or between code at the start of the ROM and code inserted near the end (too far for BL to reach on its own).

I don’t know why the NOPs are there; these instructions don’t need to be word-aligned, as far as I can tell - that’s only needed for BX PC (since ARM addresses need to be word-aligned), and there isn’t a BX PC in the ladder. I’m guessing they were just mindlessly inserted by an assembler/linker/compiler somewhere along the way to be safe.

The code seems to end at c57db. At c57ac, the last part before the end, we see a repeating (THUMB) bx pc; nop; (ARM) b pattern; i.e. bl’ing to those addresses switches to ARM mode and routes to those ARM routines instead. The strange part is that they branch to the ROM versions of routines from the block that gets copied to IRAM! The mappings are

c57ac -> 304
c57b4 -> 43c
c57bc -> 3a8
c57c4 -> 234
c57cc -> 3e0
c57d4 -> 360

After this point seems to be data; there are a lot of apparent pointer values, but they don’t seem to be LDC’d anywhere. There’s also this weird patterned data (ascending, then descending halfword values) at c5a48, that has a lot of references to it in the code.

zahlman · April 19, 2016, 5:19pm

Just a random thing I stumbled upon today, while tracing through all the text rendering code.

That routine at 564 (which I haven’t actually examined yet, but the context is strong evidence) is used to expand 2bpp font data to 4bpp. It makes use of some kind of pre-computed mapping data that accounts for the current “text colour” value and chooses the appropriate palette indices. So this is all super-optimized stuff (at least by the standards of this ROM ).

Edit: also, BFD20 appears to be a software MOD routine (with some interesting tricks for optimization) that sometimes gets used instead of the SWI wrapper (BFA18). I guess different parts of the code got compiled with different settings (although most of it looks like very much a debug build)…