IntSys pls | Vanilla ASM Goof Thread


#1

If you look through a game’s routines long enough you’re bound to find something wrong. Maybe a register was clobbered. Maybe something was written twice when it didn’t need to be. Maybe something goes horribly wrong and just happens to work out, disaster narrowly avoided. It’s likely that your average player (and/or your average hacker) might never see these goofs, so I figured it’s time to have a thread for them.

Post whatever bugs, compiler goofs, etc. here. If anything this thread will keep me from pasting random snippets of ASM into the discord server, and hopefully this thread’ll be a good read.


Let’s get into it. I like to play around with FE5, so these examples will be 65816 assembly. I’d like to explain each of these so that someone with little to no ASM experience could hopefully follow along.

#Block Transfer To/From Anywhere

65816 has two block memory transfer opcodes, MVP and MVN, which are similar to THUMB’s ldmia+stmia for transferring data. The opcodes include the bank (the upper 8 bits of a pointer on the 65816) for both the source and the destination, with the number of bytes to transfer and the lower 16 bits of the source and destination in CPU registers. This poses a bit of a problem, as you can only copy data to/from locations known at compile time (you need to know the banks to write the opcode). To overcome this, FE5 has routines that build a block transfer routine in RAM. When you need to copy data, you fill out the banks of the opcode in RAM and hop to the routine. It’s quite clever in my opinion.

The first byte of the MVN/MVP opcodes are written to RAM on startup, along with another opcode to return from the routine:

Routine (written for the 64tass assembler)
blockcopy_copier
	phb 
	php 
	phk 
	plb 
	sep #$20
	ldx #size(mvn_routine) - 1

-
	lda mvn_routine,x
	sta $04AE,x
	dex 
	bpl -
	ldx #size(mvp_routine)

-
	lda mvp_routine,x
	sta $04B2,x
	dex 
	bpl -
	plp 
	plb 
	rts 

mvn_routine
	mvn #$00,#$00
	rts 

mvp_routine
	mvp #$00,#$00
	rts 

unknown_routine
	phb 

If you don’t know 65816, this might be mumbo jumbo to you, so let’s break it down. We’re copying two routines, mvn_routine and mvp_routine, to RAM addresses $0004AE and $0004B2 respectively. We copy them end first, using a loop counter in the X register. We need this counter to be one byte less than the size of the routine because we’re looping with a BPL opcode (0 is considered positive). After each byte, we decrement the loop counter.

MVN Transfer breakdown
X    Byte Part
0003 60   rts
0002 00   destination bank
0001 00   source bank
0000 54   mvn
FFFF      end loop

Here’s the issue: when copying the MVP routine, the size wasn’t reduced by one, so the first byte of the next routine (a phb opcode) gets copied into RAM at $0004B6, overwriting whatever was there accidentally. Man, that’s a huge explanation for such a tiny thing, right? So, what was originally at $0004B6? $0004B6 is used exactly once when setting up the sound system, and probably didn’t even need to be used. Lucky us, nothing of value was lost. Even better, the only known routines that use this block memory copier look like this:

MVN Routine user (64tass syntax)
    phb 
    php 

    ; program bank -> data bank

    phk 
    plb 
    phx 
    phy 

    ; get the source, dest banks and
    ; build the mvn opcode

    sep #$20
    lda $04AB ; dest bank
    sta $04AF
    lda $04A8 ; source bank
    sta $04B0
    lda #$54 ; mvn opcode
    sta $04AE
    lda #$60 ; rts opcode
    sta $04B1
    rep #$20

    ; get params

    ldx $04A6 ; source
    ldy $04A9 ; dest
    lda $04AC ; size
    dec a

    ; cool trick, can rts
    ; because $0000-$2000
    ; of RAM mirrored to
    ; every bank

    jsr $04AE
    ply 
    plx 
    plp 
    plb 
    rtl 

These rewrite the entire MVN/MVP routines anyway! $0004B6 was clobbered needlessly!

There’s some interesting other things to consider: The way the routine user loads the parts of the routine as literals and writes them to fixed points in RAM is faster than the startup routine. The startup routine would probably be faster if it actually used MVN/MVP to copy the MVN/MVP routines. And, finally, none of these seem to be called.

Same thing happens in FE4 in the same place, too.



#2

This is a really weird way to structure a loop:

[details=ASM dump]080249AC B570 PUSH {r4,r5,r6,lr} //AttackCommandUsability 080249AE 4806 LDR r0, [PC, #0x18] # pointer:080249C8 -> 03004E50 (Pointer to the work memory of the operation character ) 080249B0 6801 LDR r1, [r0, #0x0] # pointer:03004E50 (Pointer to the work memory of the operation character ) r0=Unit 080249B2 68CA LDR r2, [r1, #0xC] r1=Unit 080249B4 2040 MOV r0, #0x40 080249B6 4010 AND r0 ,r2 080249B8 2800 CMP r0, #0x0 //moved this turn? 080249BA D12F BNE #0x8024A1C 080249BC 2080 MOV r0, #0x80 080249BE 0100 LSL r0 ,r0 ,#0x4 //800 - ballista 080249C0 4002 AND r2 ,r0 080249C2 2A00 CMP r2, #0x0 080249C4 D004 BEQ #0x80249D0 //if 0 then proceed 080249C6 E029 B 0x8024A1C //else exit; ballista attack is separate command 080249C8 4E50 0300 //LDRDATA 080249CC 2001 MOV r0, #0x1 080249CE E026 B 0x8024A1E 080249D0 2600 MOV r6, #0x0 080249D2 8BCC LDRH r4, [r1, #0x1E] //r1=Unit 0x1E=1st item 080249D4 2C00 CMP r4, #0x0 080249D6 D021 BEQ #0x8024A1C 080249D8 1C20 MOV r0 ,r4 080249DA F7F2 FDC7 BL 0x0801756C //GetItemAttributes 080249DE 2101 MOV r1, #0x1 080249E0 4001 AND r1 ,r0 080249E2 2900 CMP r1, #0x0 080249E4 D00F BEQ #0x8024A06 080249E6 4D0F LDR r5, [PC, #0x3C] # pointer:08024A24 -> 03004E50 (Pointer to the work memory of the operation character ) 080249E8 6828 LDR r0, [r5, #0x0] # pointer:03004E50 (Pointer to the work memory of the operation character ) r5=Unit 080249EA 1C21 MOV r1 ,r4 080249EC F7F1 FEB0 BL 0x08016750 //CanUnitUseWeapon 080249F0 0600 LSL r0 ,r0 ,#0x18 080249F2 2800 CMP r0, #0x0 080249F4 D007 BEQ #0x8024A06 080249F6 6828 LDR r0, [r5, #0x0] # pointer:03004E50 (Pointer to the work memory of the operation character ) r5=Unit 080249F8 1C21 MOV r1 ,r4 080249FA F000 FBDB BL 0x080251B4 //MakeTargetListForWeapon 080249FE F02B F993 BL 0x0804FD28 //GetTargetListSize Gets list size (used to check for empty lists in usability routines) Number of entries in the list 08024A02 2800 CMP r0, #0x0 08024A04 D1E2 BNE #0x80249CC 08024A06 3601 ADD r6, #0x1 08024A08 2E04 CMP r6, #0x4 08024A0A DC07 BGT #0x8024A1C 08024A0C 4805 LDR r0, [PC, #0x14] # pointer:08024A24 -> 03004E50 (Pointer to the work memory of the operation character ) 08024A0E 6800 LDR r0, [r0, #0x0] # pointer:03004E50 (Pointer to the work memory of the operation character ) r0=Unit r0=Unit 08024A10 0071 LSL r1 ,r6 ,#0x1 08024A12 301E ADD r0, #0x1E 08024A14 1840 ADD r0 ,r0, R1 08024A16 8804 LDRH r4, [r0, #0x0] r0=Unit 08024A18 2C00 CMP r4, #0x0 08024A1A D1DD BNE #0x80249D8 08024A1C 2003 MOV r0, #0x3 08024A1E BC70 POP {r4,r5,r6} 08024A20 BC02 POP {r1} 08024A22 4708 BX r1 08024A24 4E50 0300 //LDRDATA[/details]

That “unreachable” bit at 249CC gets jumped to as a loop break. It’s shoved in right after the ballista check and just sets the return value to true and jumps right back down to the end of the function. I initially thought it was completely unreachable so it looks stupider than it is, but still, branching to write one byte and then branching back…


#3

#65816 Quirks

The SNES has a 16-bit processor with an interesting property: Software can decide whether the CPU’s three general-purpose registers are 8 or 16 bit. It can change these sizes on the fly through the use of the rep and sep opcodes. Much like the stack, the state of how large the registers are must be restored after actions that change them.

There’s a routine in FE5 that forgets this rule and manages to avoid crashing the game, if only by chance.

Code Snippet

...
	beq _A5A6
...
	bra _A5B6

_A5A6
	sep #$20
	lda #$FE
	sta $51D4
	sta $51D6
	sta $51D8
	sta $51DA

_A5B6
	plx 
	lda #$03
	sta $E2
	lda #$FF
	sta $06BD,x
	plb 
	plp 
	rtl 

I’ve trimmed out the parts of this snippet that aren’t needed to demonstrate what’s happening. On one path this routine can take, it encounters a sep #$20 opcode which sets the accumulator, A, to be 8 bits. It continues executing through _A5B6 as intended. Now, the fun part of being able to change your register sizes is that certain opcodes, such as ones that load literals, also change size to match the register size. Under normal operation, the lda #$03 here loads the byte-sized value $03 into A. Without setting the accumulator to the right size, the other route the routine can take will encounter much different code. Here’s a snippet of what the code looks like from that route:

Bad Intsys

_A5B6
	plx
	lda #$8503
	sep #$A9
	sbc $06BD9D,x
	plb
	plp
	rtl

Luckily all of the pops (plx, plb, plp) are all still there, so there aren’t any stack issues, and it returns fine. The end result is that the delay between button reads when exiting an item selection menu is slightly different if the unit has a weapon.

The plp opcode at the end pops the processor’s state back to what it was at the start of the routine, so it returns with the right sizes.