SNESFE Assembly for Dummies, by a Skeleton

Please refrain from replying to this thread.

This thread’s purpose is to mimic Tequila’s GBAFE Assembly For Dummies, By Dummies thread. Key differences being that I’m worse at explaining things to people, that I’m not Tequila, that Tequila is much funnier than I am, and also that this thread is focused on the SNES.

Disclaimer

This guide is written from the perspective of a Fire Emblem: Thracia 776 hacker. Most of the information is relevant to SNES hacking as a whole, but you’re likely to see wildly different things in different games.

Development for the GBA was very organized compared to the SNES. We know the compilers and tools used to create GBA games. The GBA is a much neater, easy-to-use machine. The SNES is wild and disorganized, and so are the practices used when developing for it.

You can get a compiler that targets the SNES’s CPU from the designers of the chip (here, if you’re interested), with their manual even mentioning SNES development. There’s a problem, though. Intelligent Systems didn’t use this compiler for their games (evidenced by how the games handle function parameters).

I’d assume some games did use this compiler. I’d also assume that other games use other compilers. Some games might be written in assembly. What I’m really trying to get at here is that you’ll likely see differing styles of assembly between different games.

What is Assembly? Why learn it?

Assembly, or ASM, is essentially the human-readable form of a processor’s machine code. Each instruction, called an opcode or op, typically represents a single action for the CPU to perform. Unlike higher-level languages like C or Python, each line of code you write will typically contain one thing for the CPU to do.

There are many advantages to learning ASM, especially as a SNES hacker. Unlike GBAFE, support for SNES hacking is rather sparse. By knowing ASM you can write your own modifications, accomplishing things more complicated than simple graphical, text, or event edits.

You’ll also be able to do your own research on your game, because let’s face it: nobody’s going to do the research for you. The information and tools to work with most SNES games just doesn’t exist. By being able to understand and modify ASM, you’ll have the potential to make things happen.

Table of Contents

  • Requirements
  • Reference
  • BSNES-Plus Overview
  • Tips and Tricks
10 Likes

Requirements

  • An operating system that doesn’t suck
  • A text editor (think programmer, not writer) (I suggest Notepad++ or Sublime Text)
  • A hex editor, like HxD
  • A ROM image of a game (I can’t help you here)
  • A debugger (I’m focusing on using BSNES-Plus for this guide)
  • A 65816 assembler (I use 64tass and all ASM in this guide will be written how 64tass expects it.)
  • fullsnes (Read the doc)
  • 65816 opcodes listing
  • A calculator with a programmer mode (Windows and gnome both come with good ones in my opinion)
  • The ability to think, read, and solve problems

Things you don’t need

  • Past experience programming (although it might help you a lot)
6 Likes

Reference

Basics
  • Bit - The smallest unit of data. A bit can either be set (represented by 1) or unset (represented by 0). When talking about numbers represented using bits (called binary numbers), we append the suffix b to the number so we know we’re talking about a binary number. For example: 010b.
  • Byte - Typically the smallest group of data we express files in. A byte is made of 8 bits.
  • Word - A group of two bytes. The SNES typically accesses data in units of a word.
  • Long - A group of three bytes. The SNES can’t read a whole long at a time, but longs are necessary to refer to the range of memory the SNES can access.
  • Nybble - Half a byte (4 bits). You get two nybbles in a byte. It’s a joke. Anyway, sometimes two pieces of information are packed into a byte and nybbles are how we do that. Sometimes it’s important to refer to one half of a byte.
  • Bitfield - A collection of data where each bit is used as a flag, rather than representing a number or some other piece of data. We typically count the first bit as the zero’th and may refer to bits within the bitfield by their number (the last, highest bit in a byte could be called the 7th bit) or what their hexadecimal value is (the last, highest bit in a byte could be called bit $80)
  • Binary - Expressing a number as a string of bits, suffixed with b. The decimal number 15 is 1111b.
  • Decimal - Expressing a number in base 10. Counting up, 9 -> 10, 19 ->20, etc. Most people know this. Most of the time, we don’t use a suffix for decimal numbers. If we’re also using other ways to express numbers it might be useful to use the suffix d after a decimal number.
  • Hexadecimal - Expressing a number in base 16. Instead of just having 0-9 to use for a digit, we also use A-F. Think of how you count hours in the day, going 10 -> 11 -> 12 before wrapping back (or you continue up to 23 -> 0 when thinking with a 24 hour system) but instead of 12 or 24 we go 0 … 9 -> A … F -> 10. Hexadecimal numbers are prefixed with 0x or $ or are suffixed with h to differentiate them from other numbers. For the purposes of this guide I’ll be using $ because it’s what 64tass expects.
  • Little Endian - When storing a number as data, a system has to store the bytes a specific way. The SNES uses little endian, where the lowest (or least-significant) byte is stored first, continuing until the highest (or most-significant) byte is stored last. Say, the number $C0FFEE would be stored as $EE $FF $C0.
  • Pointer - A pointer is a number that represents the address of something. On the SNES, pointers are either words or longs, but we’ll get to that later.
  • Offset - The address of something. For the sake of clarity you can have a pointer to the offset of something.
  • ROM - Read-only memory. The CPU can’t change this. Technically, the thing you edit as a hacker is a ROM image–a sort of snapshot of the contents of a ROM chip. You’re free to mess with this using a hex editor, assembler, memory editor of a debugger, etc. but your emulator or debugger cannot (It should see it as unwritable.)
  • RAM - Random-access memory. Your CPU reads and writes to this, storing variables, graphics, and other things necessary to run the game.
  • Signed and Unsigned numbers - For some numbers, their intended interpretation is as a number that could be positive or negative, or as a number where the sign doesn’t matter. A signed number is the first case, and the typical implementation for a signed number is that any number with its uppermost bit set is negative. For a byte, the highest positive value is 127d, $7F or 01111111b, counting down to 0 before becoming negative. A negative number has its upper bit set, starting with -1d being $FF or 11111111b, counting down to the lowest negative number -128d, $80 or 10000000b. An unsigned byte could be anything $00 to $FF, which is 0d to 255d. Calling something signed or unsigned only makes sense if the data is treated as a number. It wouldn’t make sense to call a byte that’s part of some graphics as signed because the upper bit is set.
The SNES and You
A bit about the SNES

The SNES is an old 16-bit console (Older than me, actually) with a plethora of fantastic games that many people have grown up with (I’m sure quite a few people who didn’t grow up with it have still experienced child-like wonder playing its games.)

I’m not going to give you a history lesson, though. Let’s talk technical info relevant to us:

  • Ricoh 5A22 Processor based on the WDC 65816
  • 24-bit addressing space
  • Inability to access things other than the cartridge and WRAM easily
  • Special memory transferring for accessing things like VRAM
  • Tiny register set
  • Lots of memory mapped I/O
  • Complex memory mapping
  • Sound is handled by another chip that I won’t be talking about
  • Cartridges can also have their own CPUs, called coprocessors. I won’t talk about them either.
Memory Map or How to Get Lost

The SNES can access memory within a 24-bit range, which is to say $000000-$FFFFFF.

Memory is split up into $10000-byte sections called banks. The contents of these banks is determined by which layout a game uses. There are two main layouts: LoROM and HiROM. FE5 is a LoROM game, so I’ll be talking about that. If you want to understand the HiROM memory map or a different map mode, check out fullsnes.

LoROM games have their banks split into $8000-byte segments: a system area and a ROM area.

Bank  Offset    Contents
00-3F 0000-7FFF System Area
00-3F 8000-FFFF Slow LoROM
70-7D 0000-7FFF SRAM
7E-7F 0000-FFFF WRAM
80-BF 0000-7FFF System Area (Mirror)
80-BF 8000-FFFF Fast LoROM (Mirror)

Here are some quick explanations of each:

System Area - This section is for memory mapped I/O, i.e. communication with other chips in the SNES or cartridge.
LoROM - A chunk of the actual ROM data.
SRAM - Save RAM on cartridge. Used for game saves.
WRAM - General work RAM.
Mirror - Areas marked as mirrors have copies (mirror images) of their normal versions. Reading/writing to a mirror has the change reflected across all mirrors.

Memory mapping is what makes working with the SNES much harder than working with the GBA. Say you have a routine as the first thing in the ROM, at $000000 in a hex editor. In a LoROM game, your pointer to this routine would be $008000, because the ROM section of the bank starts at $8000. If you had a routine at $008000 in your hex editor, the pointer to it would be $018000.

You might have noticed that these examples are in the areas marked as Slow. Some cartridges have faster ROM chips, and have a special bit set in their cartridge header to specify this. These cartridges can read from the fast mirrors faster than the normal addresses. We call these FastROM carts. Typically all pointers in these games will be to the faster memory. That is, a routine at $000000 in a hex editor would have a pointer of $808000 in a LoROM FastROM game. A routine at $008000 would be $818000 for LoROM FastROM.

The System Area has the format:

Offset    Content
0000-1FFF Mirror of 7E0000-7E1FFF
2100-21FF I/O
4000-41FF I/O
4200-5FFF I/O
6000-7FFF Expansion

See fullsnes for an explanation of I/O ports.

Unlike the GBA, the SNES cannot access VRAM, OAM, Sound RAM, or Sound ROM directly, and must use I/O ports.

Registers

What’s a register?

Think of registers as boxes of memory that the CPU works with directly. They aren’t a part of RAM or ROM, but are an internal feature of the chip. The CPU can perform operations such as addition or subtraction on the contents of these registers. It can store the values in them to RAM, or it can load values into them from RAM or ROM.

SNES Registers

The SNES has one general-purpose register and two “index” registers. These three registers are (Abbreviations formed from the highlighted characters):

  • Accumulator - This register interracts the most with memory and operations.
  • Index Register X - Sometimes used for loop counters or indexed addressing.
  • Index Register Y - Same as X.

These registers can hold either 8 or 16 bits at a time, depending on the state of the processor. More on that in a moment.

The Accumulator is sometimes thought of in terms of its individual bytes. The lower byte of the accumulator is called A, while the upper byte of the accumulator is called B. Together, these two form the 16-bit C accumulator.

As mentioned before, the SNES can access mapped memory from $000000 to $FFFFFF. These addresses contain two parts: the bank and the offset. Understanding these parts will help us in a moment.

The offset of the instruction that the CPU is about to run is stored in the Program Counter. The Program Counter is automatically incremented after running an opcode, and is changed whenever the CPU branches or jumps. The Program Counter is actually stored in two registers: PC and K.

  • Program Counter - Holds the 16 bit offset
  • Program Counter Bank - Holds the 8 bit bank

There’s something important to note here: Although it’s alright to think of the Program Counter as one thing, these registers are independent. When the PC register is $FFFF and is incremented, it will wrap back to $0000, but K will not be incremented.

When information needs to be preserved across function calls or operations, it is saved on the stack. The stack is a place in RAM where preserved data, like return addresses, are stored. Adding something to the stack is called pushing, and removing something from the stack is called popping or pulling. Operations that affect the stack increment or decrement the Stack Pointer. It points to the next free byte on the stack.

  • Stack Pointer - 8/16 bit offset within bank 00.

Usually, you’d set the SP to be $1FFF on startup (rather than $2000, see the last sentence of the paragraph above), and the value would decrease as you pushed data onto it. FE5 leaves almost $200 bytes free for the stack, but it’s unlikely to ever use half of it. It should be noted that pulling data from the stack does not clear that memory, it only fetches the data and adjusts SP.

Up next is a register I don’t really have much of a use for: the Zeropage Offset register (Also called the direct page register). It’s used for opcodes that use Direct Page addressing, which we’ll talk about later.

  • Zeropage Register (D) - 16 bit offset that gets added to direct page instructions.

The Data Bank register acts as the bank for opcodes that only specify a 16 bit offset.

  • Data Bank - 8 bit bank for data operations.

When you work with data, many opcodes have different ways to access that data. Say:

lda $NN     ; load from 00:00NN
lda $NNNN   ; load from DB:NNNN
lda $NNNNNN ; load from NN:NNNN

where : separates the bank from the offset.

And, finally, you have the Processor Status register.

  • Processor Status - 8 (9 actually) bit bitfield.

It’s made up of 9 bits:

Bit Name Explanation
0   c    Carry
1   z    Zero
2   i    IRQ Disable
3   d    Decimal Mode
4   x/b  Index register size/break flag
5   m/u  Accumulator size/unused
7   n    Negative
-   e    Emulation flag

When e is set, registers A, X, Y, and S are all 8 bit. The CPU acts almost like it’s a 6502. The SNES boots up in this mode.

When x is set, registers X and Y are 8 bit.

When m is set, register A is 8 bit.

b and u exist when E is set, otherwise the bits are used for x and m.

See either of the documents listed in Requirements for more info.

Register Recap

Registers:

  • A - Accumulator
  • X - Index Register X
  • Y - Index Register Y
  • D - Zeropage Register
  • P - Processor status
  • K - Program Counter Bank
  • DB - Data Bank Register
  • PC - Program Counter
  • SP - Stack Pointer

Accumulator stuff:

  • A - A accumulator (lower byte)
  • B - B accumulator (upper byte)
  • C - C accumulator (both A and B)

Processor status flags:

  • c - Carry flag
  • z - Zero flag
  • i - IRQ disable flag
  • d - decimal mode flag
  • x - Index register size flag
  • m - Accumulator register size flag
  • n - Negative flag
Addressing Modes

Before we talk about opcodes, writing ASM, and all that fun stuff, we need to talk about addressing modes. Opcodes can be written in different ways to access memory in different ways. We call these different ways of writing them addressing modes. For example:

lda #$0000 ; load the value $0000 into A
lda $000000  ; load the word at $000000 into A

The actual official names for some of these modes are seriously confusing and sometimes contradicting, so I’m only going to describe things the way a typical assembler expects them to be. You can go look up the 65816 spec if you want official names.

I’m going to try to be brief with each of these, so check out the 65816 opcodes sheet for more info.

Each of these will use N, S, or D to represent some digit in the opcode.

$NNNN

For jump opcodes jmp and jsr, $NNNN is the offset of where to jump to, and K is the bank.

If the program bank register is $80 and the CPU runs jmp $8000, the CPU will jump to $808000.

For all other opcodes, the data being accessed is DB:NNNN where the Databank register has the bank of the data and NNNN is the offset within that bank.

If the Databank register is $7E and the CPU runs lda $0000, the Accumulator will be loaded with the data at $7E0000.

$NNNN,x and $NNNN,y

These opcodes access data at DB:NNNN plus the contents of X or Y.

If the Databank register is $A0 and X contains $0010, lda $B0B0,x will load the data at $A0B0C0 into the Accumulator.

Some special jump stuff because details tabs hate brackets

The real name of this segment is “($NNNN) and [$NNNN]”

For ($NNNN), which is only used for jmp ($NNNN), jumps to the address given by the word at K:NNNN and the bank given by K, so:

If K is $90 and $NNNN is $0000, and the word at $900000 is $8000, jmp ($0000) will jump to $908000.

For [$NNNN], which is only used for jmp [$NNNN], jumps to the address given by the long at K:NNNN, so:

If K is $C0 and $NNNN is $1234, and the long at $C01234 is $D05678, jmp [$1234] will jump to $D05678.

($NNNN,x)

Only used for jmp and jsr opcodes.

They jump to address given by the word at K:NNNN and the bank given by K, plus the contents of X, so:

If K is $00 and $NNNN is $0000, the contents of X are $0002, and the word at $000002 is $8000, jmp ($0000,x) jumps to $008000.

a or x or y

These opcodes affect the contents of the Accumulator, X, or Y, rather than memory.

$NN

These opcodes access data at 00:00NN plus the contents of D.

For example, if D contains $0000 (which, for FE5 it nearly always should) then lda $05 will load the data at $000005 into the Accumulator.

$NN,x and $NN,y

These opcodes access data at 00:00NN plus the contents of D, plus the contents of X or Y.

For example, if D contains $0000 and X contains $1234 then lda $00,x will load the data at $001234 into the Accumulator.

($NN)

These opcodes access data at the offset given by DB and the word at DB:00NN plus the contents of D, so:

If DB is $94, $NN is $11, and D contains $0000, and the word at $940011 is $1E00, lda ($11) will load data at $941E00 into the Accumulator.

Another case of details not liking brackets

The actual title of this segment is “[$NN]”.

These opcodes access data at the offset given by the long at DB:00NN plus the contents of D, so:

If DB is $81, $NN is $6D, and D contains $0000, and the long at $81006D is $828000, lda [$6D] will load data at $828000 into the Accumulator.

($NN,x)

These opcodes access data at the offset given by DB and the word at DB:00NN plus the contents of D, plus the contents of X, so:

If DB is $11, $NN is $20, and D contains $0000, X contains $0001, and the word at $110020 is $3333, lda ($20,x) will load data at $113334 into the Accumulator.

($NN),y

These opcodes access the data at DB:word+contents of Y, with the word at DB:00NN plus the contents of D, so:

If DB is $50, $NN is $00, D contains $0000, Y contains $0100, and the word at $500000 is $0200, lda ($00),y will load data at $500300.

More brackets

The actual title of this segment is “[$NN],y”.

These opcodes access the data at DB:long+contents of Y, with the long at DB:00NN plus the contents of D, so:

If DB is $50, $NN is $00, D contains $0000, Y contains $0100, and the long at $500000 is $7E0200, lda [$00],y will load data at $7E0200.

#$NN and #$NNNN

Instead of accessing data, these opcodes use the number specified. For example, lda #$0000 loads $0000 into the Accumulator.

Warning about immediate size

Note that whether an 8- or 16-bit value is used is dependent on the size of the register using the data. Not being aware of this can lead to some mishaps. For example, if the size of the Accumulator is 8-bit and your ASM isn’t written to reflect this, lda #$0001 will load $01 into the lower byte of the Accumulator and then hang because the $00 after is treated like ASM (and $00 happens to be the brk opcode).

I’ll cover how to avoid this in the tips and tricks section, and there’s a good chance that 64tass (or another assembler) will automatically avoid this.

Weird title, I know. These opcodes have their parameters implied. They don’t need anything to specify what they do. For example, clc clears the carry flag and nothing else. It only has one mode.

$NNNNNN

These opcodes access data at $NNNNNN. For example, lda $808080 loads the data at $808080 into the Accumulator.

$NNNNNN,x

These opcodes access data at $NNNNNN plus the contents of X.

For example, if X contains $1337 and $NNNNNN is $690000, lda $690000,x loads the data at $691337 into the Accumulator.

Relative stuff

These opcodes actually target addresses relative to them. Rather than explaining the formula (which the 65816 opcodes sheet does), it’s better to explain how they’re used.

In ASM you write, you will typically use a label instead of an address for these. When viewing a disassembly (say, in your debugger or dumped by a disassembler) the opcode will be formatted with a calculated value.

Here are some examples:


Label
    bra Label

    bpl $8000
    bra $800A

Both forms are much more readable than the displacement value.

#$SS,#$DD

There are two opcodes that use this format: mvp and mvn.

They both take quite a few things:

$SS - bank of source data
X - offset of source data
$DD - bank of destination
Y - offset of destination
A - counter for bytes to copy +/- 1

For example, if:
$SS is $70,
X is $0015,
$DD is $7E,
Y is $4E18,
A is $004F

Then mvn #$70, #$7E will copy $4F+1 bytes from $700015 to $7E4E18

#$NN,s

These opcodes access data from SP+$NN. For example:

If SP is $1FE0 and $NN is $10, lda #$10,s will load data from $001FF0 into the Accumulator.

(#$NN,s),y

These opcodes access the data at DB:word+contents of Y, with the word at DB:SP plus the contents of Y, so:

If DB is $50, $NN is $05, SP is $1F50, Y contains $0100, and the word at $001F55 is $0200, lda (#$05,s),y will load data at $500300.

5 Likes

Opcode Listing

I’ll be listing opcodes in the same order and using the same flags notation as the 65816 opcodes sheet I linked above. There’s going to be a lot of copy/pasting.

A refresher on flags
Bit Name Explanation
0   c    Carry
1   z    Zero
2   i    IRQ Disable
3   d    Decimal Mode
4   x/b  Index register size/break flag
5   m/u  Accumulator size/unused
7   n    Negative
-   e    Emulation flag

Flags format:

. - Flag is unaffected
0 - Flag is cleared
1 - Flag is set
* - Flag is affected by result
m - Flag is affected by 16-(8*m) bit result
x - Flag is affected by 16-(8*x) bit result

For m or x this is to say that flags such as the n flag, which is the uppermost result bit is either the 7th or the 15th bit depending on whether m or x is set (i.e. the register is 8- or 16-bits).

ADC

Add with Carry

nvmxdizc e
mm....mm . adc ($10,x)
mm....mm . adc #$32,s
mm....mm . adc $10
mm....mm . adc [$10]
mm....mm . adc #$54
mm....mm . adc $9876
mm....mm . adc $FEDBCA
mm....mm . adc ($10),y
mm....mm . adc ($10)
mm....mm . adc (#$32,s),y
mm....mm . adc $10,x
mm....mm . adc [$10],y
mm....mm . adc $9876,y
mm....mm . adc $9876,x
mm....mm . adc $FEDCBA,x

ADC adds to the accumulator. It follows the formula A = A + Data + c. When m is 0 (The accumulator is 16-bit) ADC is a 16-bit operation, and is an 8-bit operation when m is 1. For example:

When the accumulator contains $10F0, the value at $00 is $10, c is 0, and m is 0:

adc $00 results in A containing $1100.

If the m flag was 1 (meaning that A is 8-bit):

adc $00 results in A containing $1000.

In the latter example, the lower 8 bits of A, $F0, plus $10 = $100 which is larger than a byte. The upper bit is discarded here and the carry flag is set. The upper byte of the accumulator is not affected.

ADC, along with SBC, can use the d flag to perform BCD arithmetic.

Essentially, you can treat the accumulator’s value as a decimal number, rather than a hexadecimal one. For example:

If d = 0, m = 0, A contains $0099:

adc #$0001 results in A = $009A.

but if d = 1:

adc #$0001 results in A = $0100.

Here’s a rundown of how ADC sets flags:

When d = 0:

  • n = uppermost bit of result (bit 15 when m = 0, bit 7 when m = 1).
  • v is set if there is a signed arithmetic overflow, else it is unset.
  • If m = 0, overflow occurs if the result is outside of -32768d to 32767d ($8000 <- $FFFF to $0000 -> $7FFF).
  • If m = 1, overflow occurs if the result is outside of -128 to 127 ($80 <- $FF to $00 -> $7F).
  • z is set if the result is 0, else it is unset.
  • c is set when there is an unsigned carry, else it is unset.
  • If m = 0, carry occurs when the result is greater than $FFFF.
  • If m = 1, carry occurs when the result is greater than $FF.

When d = 1:

  • Carry instead occurs if the result is outside 0 to 9999.
  • Overflow should be ignored because BCD arithmetic is considered to be unsigned.
  • Other flags remain the same.
SBC

Subtract with Carry

nvmxdizc e
mm....mm . sbc ($10,x)
mm....mm . sbc #$32,s
mm....mm . sbc $10
mm....mm . sbc [$10]
mm....mm . sbc #$54
mm....mm . sbc $9876
mm....mm . sbc $FEDBCA
mm....mm . sbc ($10),y
mm....mm . sbc ($10)
mm....mm . sbc (#$32,s),y
mm....mm . sbc $10,x
mm....mm . sbc [$10],y
mm....mm . sbc $9876,y
mm....mm . sbc $9876,x
mm....mm . sbc $FEDCBA,x

SBC subtracts from the accumulator. It follows the formula A = A - Data + 1 - c. When m is 0 (The accumulator is 16-bit) SBC is a 16-bit operation, and is an 8-bit operation when m is 1. For example:

When the accumulator contains $1000, the value at $00 is $10, c is 1, and m is 0:

sbc $00 results in A containing $0FF0.

If the m flag was 1 (meaning that A is 8-bit):

sbc $00 results in A containing $10F0.

In the latter example, the lower 8 bits of A, $00, minus $10 = -$10 or $F0. Instead of ‘borrowing’ from the upper byte, it is unaffected and the carry flag is set.

SBC, along with ADC, can use the d flag to perform BCD arithmetic.

Essentially, you can treat the accumulator’s value as a decimal number, rather than a hexadecimal one. For example:

If d = 0, m = 0, A contains $0100:

sbc #$0001 results in A = $00FF.

but if d = 1:

sbc #$0001 results in A = $0099.

Here’s a rundown of how SBC sets flags:

When d = 0:

  • n = uppermost bit of result (bit 15 when m = 0, bit 7 when m = 1).
  • v is set if there is a signed arithmetic overflow, else it is unset.
  • If m = 0, overflow occurs if the result is outside of -32768d to 32767d ($8000 <- $FFFF to $0000 -> $7FFF).
  • If m = 1, overflow occurs if the result is outside of -128 to 127 ($80 <- $FF to $00 -> $7F).
  • z is set if the result is 0, else it is unset.
  • c is set when the accumulator is greater than or equal to the data, else it is unset.

When d = 1:

  • Carry instead occurs if the result is outside 0 to 9999.
  • Overflow should be ignored because BCD arithmetic is considered to be unsigned.
  • Other flags remain the same.
CMP

Compare to Accumulator

nvmxdizc e
m.....mm . cmp ($10,x)
m.....mm . cmp #$32,s
m.....mm . cmp $10
m.....mm . cmp [$10]
m.....mm . cmp #$54
m.....mm . cmp $9876
m.....mm . cmp $FEDBCA
m.....mm . cmp ($10),y
m.....mm . cmp ($10)
m.....mm . cmp (#$32,s),y
m.....mm . cmp $10,x
m.....mm . cmp [$10],y
m.....mm . cmp $9876,y
m.....mm . cmp $9876,x
m.....mm . cmp $FEDCBA,x

CMP compares the accumulator to the data. It follows the formula A - Data and discards the result. A is not affected. When m is 0 (The accumulator is 16-bit) CMP is a 16-bit operation, and is an 8-bit operation when m is 1. For example:

If m = 0 and A contains $0001,

cmp #$0002 doesn’t change A but sets the n flag because the (voided) result was $FFFF.

Here’s a rundown of how CMP sets flags:

  • n = the uppermost bit of the result (bit 15 when m = 0, bit 7 when m = 1).
  • z is set if the result is 0, else it is unset.
  • c is set when the accumulator is greater than or equal to the data, else it is unset.
CPX

Compare to X register

nvmxdizc e
x.....xx . cpx #$54
x.....xx . cpx $10
x.....xx . cpx $9876

CPX compares X to the data. It follows the formula X - Data and discards the result. X is not affected. When x is 0 (The X register is 16-bit) CPX is a 16-bit operation, and is an 8-bit operation when x is 1. For example:

If x = 0 and X contains $0005,

cpx #$0000 doesn’t change X but sets the c flag because X was greater than the data.

Here’s a rundown of how CPX sets flags:

  • n = the uppermost bit of the result (bit 15 when x = 0, bit 7 when x = 1).
  • z is set if the result is 0, else it is unset.
  • c is set when the X register is greater than or equal to the data, else it is unset.
CPY

Compare to Y register

nvmxdizc e
x.....xx . cpy #$54
x.....xx . cpy $10
x.....xx . cpy $9876

CPY compares Y to the data. It follows the formula Y - Data and discards the result. Y is not affected. When x is 0 (The Y register is 16-bit) CPY is a 16-bit operation, and is an 8-bit operation when x is 1. For example:

If x = 0 and Y contains $0000,

cpy #$0000 doesn’t change Y but sets the c flag because Y was equal to the data. It also sets the z flag because the (voided) result was $0000.

Here’s a rundown of how CPY sets flags:

  • n = the uppermost bit of the result (bit 15 when x = 0, bit 7 when x = 1).
  • z is set if the result is 0, else it is unset.
  • c is set when the Y register is greater than or equal to the data, else it is unset.
DEC

DECrement

nvmxdizc e
m.....m. . dec a
m.....m. . dec $10
m.....m. . dec $9876
m.....m. . dec $10,x
m.....m. . dec $9876,x
x.....x. . dec x
x.....x. . dec y

DEC subtracts one from the register or data. dec x and dec y are 16-bit operations when x is 0, and are 8-bit operations when x is 1. All other dec variants are 16-bit operations when m is 0, and are 8-bit operations when m is 1. For example:

If m = 1 and A contains $0100,

dec a results in A containing $00FF.

Here’s a rundown of how DEC sets flags:

  • n = the uppermost bit of the result.
  • For dec x and dec y (bit 15 when x = 0, bit 7 when x = 1).
  • Otherwise (bit 15 when m = 0, bit 7 when m = 1).
  • z is set when the result is 0, else it is unset.
INC

INCrement

nvmxdizc e
m.....m. . inc a
m.....m. . inc $10
m.....m. . inc $9876
m.....m. . inc $10,x
m.....m. . inc $9876,x
x.....x. . inc x
x.....x. . inc y

INC adds one to the register or data. inc x and inc y are 16-bit operations when x is 0, and are 8-bit operations when x is 1. All other inc variants are 16-bit operations when m is 0, and are 8-bit operations when m is 1. For example:

If m = 1 and A contains $00FF,

inc a results in A containing $0100.

Here’s a rundown of how INC sets flags:

  • n = the uppermost bit of the result.
  • For inc x and inc y (bit 15 when x = 0, bit 7 when x = 1).
  • Otherwise (bit 15 when m = 0, bit 7 when m = 1).
  • z is set when the result is 0, else it is unset.
AND

bitwise AND

nvmxdizc e
m.....m. . and ($10,x)
m.....m. . and #$32,s
m.....m. . and $10
m.....m. . and [$10]
m.....m. . and #$54
m.....m. . and $9876
m.....m. . and $FEDBCA
m.....m. . and ($10),y
m.....m. . and ($10)
m.....m. . and (#$32,s),y
m.....m. . and $10,x
m.....m. . and [$10],y
m.....m. . and $9876,y
m.....m. . and $9876,x
m.....m. . and $FEDCBA,x

AND bitwise ANDs the accumulator with the data, with the result stored in the accumulator.

That is to say that if bit 0 of the accumulator and bit 0 of the data are 1, bit 0 of the result is 1. If they both aren’t 1 then the result bit is 0. Continue this for all bits in the accumulator/data. It’s easier to understand when you write out values in binary:

    00111100b | $3C
AND 00001111b | $0F
--------------+----
    00001100b | $0C

AND is a 16-bit operation when m = 0, and is a 8-bit operation when m = 1.

For example, if m = 1 (the accumulator is 8-bit) and A = $0087,

the result of and #$C0 is found by

    10000111b | $87
AND 11000000b | $C0
--------------+----
    10000000b | $80

and the result is $0080 in A.

Here’s a rundown of how AND sets flags:

  • n = the uppermost bit of the result (bit 15 when m = 0, bit 7 when m = 1).
  • z is set if the result is 0, else it is unset.
EOR

bitwise Exclusive OR

nvmxdizc e
m.....m. . eor ($10,x)
m.....m. . eor #$32,s
m.....m. . eor $10
m.....m. . eor [$10]
m.....m. . eor #$54
m.....m. . eor $9876
m.....m. . eor $FEDBCA
m.....m. . eor ($10),y
m.....m. . eor ($10)
m.....m. . eor (#$32,s),y
m.....m. . eor $10,x
m.....m. . eor [$10],y
m.....m. . eor $9876,y
m.....m. . eor $9876,x
m.....m. . eor $FEDCBA,x

EOR bitwise exclusive ORs the accumulator with the data, with the result stored in the accumulator.

That is to say that if bit 0 of the accumulator matches bit 0 of the data, bit 0 of the result is 0. Otherwise bit 0 of the result is 1. Continue this for all bits in the accumulator/data. It’s easier to understand when you write out values in binary:

    11110000b | $F0
EOR 00011110b | $1E
--------------+----
    11101110b | $EE

EOR is a 16-bit operation when m = 0, and is a 8-bit operation when m = 1.

For example, if m = 1 (the accumulator is 8-bit) and A = $00DE,

the result of eor #$33 is found by

    11011110b | $DE
AND 00110011b | $33
--------------+----
    11101101b | $ED

and the result is $00ED in A.

Here’s a rundown of how EOR sets flags:

  • n = the uppermost bit of the result (bit 15 when m = 0, bit 7 when m = 1).
  • z is set if the result is 0, else it is unset.
ORA

bitwise inclusive OR (Accumulator)

nvmxdizc e
m.....m. . ora ($10,x)
m.....m. . ora #$32,s
m.....m. . ora $10
m.....m. . ora [$10]
m.....m. . ora #$54
m.....m. . ora $9876
m.....m. . ora $FEDBCA
m.....m. . ora ($10),y
m.....m. . ora ($10)
m.....m. . ora (#$32,s),y
m.....m. . ora $10,x
m.....m. . ora [$10],y
m.....m. . ora $9876,y
m.....m. . ora $9876,x
m.....m. . ora $FEDCBA,x

ORA bitwise inclusive ORs the accumulator with the data, with the result stored in the accumulator.

That is to say that if bit 0 of the accumulator and bit 0 of the data are both 0, bit 0 of the result is 0. Otherwise bit 0 of the result is 1. Continue this for all bits in the accumulator/data. It’s easier to understand when you write out values in binary:

    01010001b | $51
ORA 10110110b | $B6
--------------+----
    11110111b | $F7

ORA is a 16-bit operation when m = 0, and is a 8-bit operation when m = 1.

For example, if m = 0 and A = $0125,

the result of ora #$1232 is found by

    0000000100100101b | $0125
ORA 0001001000110010b | $1232
----------------------+------
    0001001100110111b | $1337

and the result is $1337 in A.

Here’s a rundown of how ORA sets flags:

  • n = the uppermost bit of the result (bit 15 when m = 0, bit 7 when m = 1).
  • z is set if the result is 0, else it is unset.
BIT

test bits

nvmxdizc e
mm....m. . bit $10
mm....m. . bit $9876
mm....m. . bit $10,x
mm....m. . bit $9876,x
......m. . bit #$54

BIT is pretty much just AND but it doesn’t keep the result. BIT is a 16-bit operation when m = 0, and is a 8-bit operation when m = 1.

BIT does some weird things with flags.

For bit #$NNNN, only the z flag is affected. It is set if the result of the AND is 0, otherwise it is unset.

For all other forms of BIT:

  • n = highest bit of the data, not the result.
  • v = second-highest bit of the data, not the result.
  • z is set when the result is 0 and is unset otherwise.
TRB

Test and reset bits

nvmxdizc e
......m. . trb $10
......m. . trb $9876

TRB does two things: it ANDs the data with the accumulator (discarding the result like BIT) and then clears the bits specified by the accumulator. The accumulator is unchanged. TRB is a 16-bit operation when m = 0, and is a 8-bit operation when m = 1.

For example, if A contains $0043, m = 1, DB is $12, and the data at $12ABCD is $FF:

trb $ABCD results in $12ABCD containing $BC.

TRB follows the formula Data AND NOT A.

The only flag affected by TRB is z, which is set if the result of the AND is zero, otherwise it is unset.

TSB

Test and set bits

nvmxdizc e
......m. . tsb $10
......m. . tsb $9876

TSB does two things: it ANDs the data with the accumulator (discarding the result like BIT) and then sets the bits specified by the accumulator. The accumulator is unchanged. TSB is a 16-bit operation when m = 0, and is a 8-bit operation when m = 1.

For example, if A contains $0043, m = 1, DB is $12, and the data at $12ABCD is $9C:

tsb $ABCD results in $12ABCD containing $DF.

TSB follows the formula Data OR A.

The only flag affected by TSB is z, which is set if the result of the AND is zero, otherwise it is unset.

ASL

Arithmetic shift left

nvmxdizc e
m.....mm . asl $10
m.....mm . asl a
m.....mm . asl $9876
m.....mm . asl $10,x
m.....mm . asl $9876,x

ASL shifts the data or the accumulator left by one bit. The uppermost bit is transferred into the c flag and a 0 fills the lowest bit. ASL is a 16-bit operation when m = 0, and is a 8-bit operation when m = 1. It’s easiest to explain with binary numbers, so:

If m = 1, A contains $45:

Here, c represents the carry flag, the n segment represents A's contents, and 0 is the 0 we’re going to replace the lowest bit with.

c nnnnnnnn 0

? 01000101 0

When we asl a, we shift all of the bits over by one, so

c nnnnnnnn 0

? 01000101 0

becomes

c nnnnnnnn 0

0 10001010 0

And the value in A is thus $008A. Notice how the lowest bit became that 0.

Here’s a rundown of how ASL sets flags:

  • n = uppermost bit of result (bit 15 when m = 0, bit 7 when m = 1).
  • z is set if the result is 0, otherwise it is unset.
  • c = the bit that was shifted into it.
LSR

Logical shift right

nvmxdizc e
0.....m* . lsr $10
0.....m* . lsr a
0.....m* . lsr $9876
0.....m* . lsr $10,x
0.....m* . lsr $9876,x

LSR shifts the data or the accumulator right by one bit. A 0 fills the uppermost bit and the lowest bit is shifted into c. LSR is a 16-bit operation when m = 0, and is a 8-bit operation when m = 1. It’s easiest to explain with binary numbers, so:

If m = 1, A contains $45:

Here, c represents the carry flag, the n segment represets A's contents, and 0 is the 0 we’re going to replace the highest bit with.

0 nnnnnnnn c

0 01000101 ?

When we lsr a, we shift all of the bits over by one, so

0 nnnnnnnn c

0 01000101 ?

becomes

0 nnnnnnnn c

0 00100010 1

And the value in A is thus $0022. Notice how the highest bit became that 0 and c has the lowest bit of the original $45.

Here’s a rundown of how LSR sets flags:

  • n is always unset because LSR fills the uppermost bit with 0.
  • z is set if the result is 0, otherwise it is unset.
  • c = the bit that was shifted into it.
ROL

Rotate left

nvmxdizc e
m.....mm . rol $10
m.....mm . rol a
m.....mm . rol $9876
m.....mm . rol $10,x
m.....mm . rol $9876,x

ROL shifts the data or the accumulator left by one bit. ROL is a 16-bit operation when m = 0, and is a 8-bit operation when m = 1.

ROL is different from ASL because it uses the c flag to fill the lowest bit instead of always filling it with 0.

It’s easiest to explain with binary numbers, so:

If m = 1, c = 1, A contains $45:

s represents the carry flag before we rol and e represents the carry flag after. The n segment represents A's contents.

e nnnnnnnn s

0 01000101 1

After a rol a we get:

e nnnnnnnn s

0 10001011 1

A contains $008B and c is unset.

Here’s a rundown of how ROL sets flags:

  • n = the uppermost bit of the result (bit 15 when m = 0, bit 7 when m = 1).
  • z is set if the result is 0, otherwise it is unset.
  • c = the bit that was shifted into it.
ROR

Rotate right

nvmxdizc e
m.....mm . ror $10
m.....mm . ror a
m.....mm . ror $9876
m.....mm . ror $10,x
m.....mm . ror $9876,x

ROR shifts the data or the accumulator right by one bit. ROR is a 16-bit operation when m = 0, and is a 8-bit operation when m = 1.

ROR is different from LSR because it uses the c flag to fill the highest bit instead of always filling it with 0.

It’s easiest to explain with binary numbers, so:

If m = 1, c = 1, A contains $45:

s represents the carry flag before we ror and e represents the carry flag after. The n segment represents A's contents.

s nnnnnnnn e

1 01000101 0

After a ror a we get:

s nnnnnnnn e

1 10100010 1

A contains $00A2 and c is set.

Here’s a rundown of how ROR sets flags:

  • n = the uppermost bit of the result (bit 15 when m = 0, bit 7 when m = 1).
  • z is set if the result is 0, otherwise it is unset.
  • c = the bit that was shifted into it.
B{cond}

Going to lump all of these together because the only difference between them is the flags they check.

Branch if carry clear
Branch if carry set
Branch if equal
Branch if minus
Branch if not eequal
Branch if plus
Branch always
Branch if overflow clear
Branch if overflow set

nvmxdizc e
........ . bcc LABEL
........ . bcs LABEL
........ . beq LABEL
........ . bmi LABEL
........ . bne LABEL
........ . bpl LABEL
........ . bra LABEL
........ . bvc LABEL
........ . bvs LABEL

All of these instructions branch to the specified place if their condition is true. Branch instructions have a limited range of -128d to 127d bytes from the instruction after the branch.

BCC branches if the c flag is 0
BCS branches if the c flag is 1
BEQ branches if the z flag is 1
BMI branches if the n flag is 1
BNE branches if the z flag is 0
BPL branches if the n flag is 0
BVC branches if the v flag is 0
BVS branches if the v flag is 1

BRA always branches.

In some cases it might be useful to refer to BCS as BGE (Branch if greater than or equal) and BCC as BLT (Branch if less than). After comparisons using CMP, CPX, or CPY, the c flag represents if the register was greater than or equal to the data.

Branches do not affect any flags.

BRL

Branch long

nvmxdizc e
........ . brl LABEL

BRL is BRA but it can jump to anywhere in the same bank (-32768d to 32767d bytes from the instruction after the BRL).

BRL is used for relocatable code because its jumps are relative, rather than JMP’s jumps to a fixed point.

JMP, JML

Technically, JML is an alias for JMP and isn’t a distinct thing, but using JML for clarity is important.

Jump
Jump long

nvmxdizc e
........ . jmp $1234
........ . jmp ($1234)
........ . jmp ($1234,X)
........ . jmp [$1234]
........ . jml $FEDCBA

JMP and JML jump to an address. The first three addressing modes above

jmp $1234
jmp ($1234)
jmp ($1234,X)

jump within the current bank. The last two

jmp [$1234]
jml $FEDCBA

can jump to anywhere.

JSR, JSL

Jump to subroutine
Jump to subroutine long

nvmxdizc e
........ . jsl $123456
........ . jsr $1234
........ . jsr ($1234,X)

The difference between [JSR and JSL] and [JMP and JML] is that JSR and JSL push a return address onto the stack. JSL pushes K then PC+3, JSR pushes PC+2.

JSR is used in conjunction with RTS to call a subroutine within the same bank as the JSR. A routine called by a JSR must return using RTS.

JSL is used to call a subroutine from anywhere and is used in conjunction with RTL. A routine called by a JSL must return using RTL.

RTL, RTS

Return from subroutine long
Return from subroutine

nvmxdizc e
........ . rtl
........ . rts

RTL and RTS are used with JSL and JSR respectively to return from a subroutine. RTS returns from a subroutine within the same bank as the JSR that called it. RTL can return from anywhere.

RTS pulls two bytes from the stack into PC and then increments PC.

RTL pulls two bytes from the stack into PC, increments PC, and then pulls one byte from the stack into K.

BRK, COP

Breakpoint
Coprocessor

nvmxdizc e
....01.. . brk
....01.. . cop #$12

Oh boy. These two aren’t useful for what we’re doing here. I’m not going to cover their uses here. Check out fullsnes and the 65816 opcodes sheet for more information. Or don’t, I’m not a COP.

The real reason they’re listed here is because you’re likely to see them in disassemblies that have had register size mismatches, which I’ll talk about in the tips and tricks section.

RTI

Return from interrupt

nvmxdizc e
******** . rti

Just like BRK and COP, I’m not going to cover RTI. If you’re editing interrupts then you probably don’t need this guide. Check out both fullsnes and the 65816 opcodes sheet for more info.

CL{flag}, SE{flag}

Clear carry
Clear decimal mode
Clear interrupt disable
Clear overflow
Set carry
Set decimal mode
Set interrupt disable

nvmxdizc e
.......0 . clc
....0... . cld
.....0.. . cli
.0...... . clv
.......1 . sec
....1... . sed
.....1.. . sei

These instructions set or clear their respective flags.

REP, SEP

Reset processor status bits
Set processor status bits

nvmxdizc e
******** . rep #$12
******** . sep #$12

These instructions set or reset flags. You may see them written as rep #$20 or rep #%00100000, whereas the second format is very clear on which bit is being reset.

Both REP and SEP do not affect bits that are 0 in their operand, so sep #$00 would not set or unset any bits.

They’re most often used to set the m and x bits in order to select register sizes. The other flags can be accessed more efficiently using the SE{flag} or CL{flag} instructions, unless multiple processor bits need to be set at once.

LDA

Load accumulator

nvmxdizc e
m.....m. . lda ($10,x)
m.....m. . lda #$32,s
m.....m. . lda $10
m.....m. . lda [$10]
m.....m. . lda #$54
m.....m. . lda $9876
m.....m. . lda $FEDBCA
m.....m. . lda ($10),y
m.....m. . lda ($10)
m.....m. . lda (#$32,s),y
m.....m. . lda $10,x
m.....m. . lda [$10],y
m.....m. . lda $9876,y
m.....m. . lda $9876,x
m.....m. . lda $FEDCBA,x

LDA loads data into the accumulator. LDA is a 16-bit operation when m = 0, and is a 8-bit operation when m = 1.

LDA affects two flags:

  • n = uppermost bit of the data (bit 15 when m = 0, bit 7 when m = 1).
  • z is set if the data is 0, and is unset otherwise.
LDX, LDY

Load X register
Load Y register

nvmxdizc e
x.....x. . ldx #$54
x.....x. . ldx $10
x.....x. . ldx $9876
x.....x. . ldx $10,y
x.....x. . ldx $9876,y
x.....x. . ldy #$54
x.....x. . ldy $10
x.....x. . ldy $9876
x.....x. . ldy $10,x
x.....x. . ldy $9876,x

LDX and LDY load data into X or Y, respectively. They are 16-bit operations when x = 0, and are 8-bit operations when x = 1.

Like LDA, they affect two flags:

  • n = uppermost bit of the data (bit 15 when x = 0, bit 7 when x = 1).
  • z is set if the data is 0, and is unset otherwise.
STA

Store accumulator

nvmxdizc e
........ . sta ($10,x)
........ . sta #$32,s
........ . sta $10
........ . sta [$10]
........ . sta $9876
........ . sta $FEDBCA
........ . sta ($10),y
........ . sta ($10)
........ . sta (#$32,s),y
........ . sta $10,x
........ . sta [$10],y
........ . sta $9876,y
........ . sta $9876,x
........ . sta $FEDCBA,x

STA stores the contents of the accumulator to the specified address. STA is a 16-bit operation when m = 0, and is a 8-bit operation when m = 1.

STA does not affect any flags.

STX, STY

Store X register
Store Y register

nvmxdizc e
........ . stx $10
........ . stx $9876
........ . stx $10,y
........ . sty $10
........ . sty $9876
........ . sty $10,x

STX and STY store the contents of their respective registers to the specified address. They are 16-bit operations when x = 0, and are 8-bit operations when x = 1.

They do not affect any flags.

STZ

Store zero

nvmxdizc e
........ . stz $10
........ . stz $10,x
........ . stz $9876
........ . stz $9876,x

STZ stores zero to the specified address. STZ is a 16-bit operation when m = 0, and is a 8-bit operation when m = 1.

STZ does not affect any registers and does not affect any flags.

MVN, MVP

Move memory negative
Move memory positive

nvmxdizc e
........ . MVN #$12,#$34
........ . MVP #$12,#$34

MVN and MVP move data from one address to another.

MVN starts at the beginning of the data and writes until the end, while MVP starts at the end of the data and writes until the beginning.

For both, the accumulator contains the number of bytes to transfer minus one. The first part of the operand is the bank of the source data, and the second part is the bank of the destination.

For MVN:

  • X contains the offset of the first byte of the source data
  • Y contains the offset of the first byte of the destination

For MVP:

  • X contains the offset of the last byte of the source data
  • Y contains the offset of the last byte of the source data

It’s important to note that DB will be the bank of the destination after using MVN or MVP. There are other quirks, too, but they shouldn’t be important for normal use. See the 65816 opcode sheet for more info.

Neither of these instructions affect any flags.

5 Likes
NOP, WDM

No operation
William D. Mensch, Jr.

nvmxdizc e
........ . nop
........ . wdm

These instructions have no operation attached to them. They both take up space and execution time, which may be useful.

You will either see NOP used where some piece of code was replaced during development or when there must be a specific delay between I/O port actions, such as division or multiplication.

In the first case, the NOP replaced some other meaningful action.

WDM is a reserved instruction that’s actually two bytes long. I don’t think 64tass implements it (not that it’d really be useful). The second byte is read but ignored. BSNES-Plus has an option for breaking on execution of WDM, because you really shouldn’t encounter it in normal code. The 65816 opcode sheet has a sneaky use for it, but I’d consider it to be hacky.

Neither affect any flags.

PEA, PEI, PER

Push effective address
Push effective indirect address
Push effective relative address

nvmxdizc e
........ . pea #$1234
........ . pei $12
........ . per LABEL

All three of these instructions push a 16-bit value onto the stack.

PEA pushes the specified value onto the stack, so pea #$1337 puts $1337 onto the stack.

PEI pushes the data at the specified address onto the stack. If the data at $000069 is $1337, pei $69 pushes $1337 onto the stack.

PER pushes a relative address to the stack, much in the way that BRA jumps to a relative address. It’s useful for relocatable code.

PH{register}, PL{register} (general purpose)

Push accumulator
Push X register
Push Y register
Pull accumulator
Pull X register
Pull Y register

nvmxdizc e
........ . pha
........ . phx
........ . phy
m.....m. . pla
x.....x. . plx
x.....x. . ply

PHA, PHX, and PHY all push the contents of their respective registers onto the stack. PHA is a 16-bit operation when m = 0, and is a 8-bit operation when m = 1. PHX and PHY are 16-bit operations when x = 0, and are 8-bit operations when x = 1.

PLA, PLX, and PLY all pull data from the stack into their respective registers. PLA is a 16-bit operation when m = 0, and is a 8-bit operation when m = 1. PLX and PLY are 16-bit operations when x = 0, and are 8-bit operations when x = 1.

The pushes do not affect flags, but pulls affect flags:

PLA:

  • n = uppermost bit of the data (bit 15 when m = 0, bit 7 when m = 1).
  • z is set if the data is 0, and is unset otherwise.

PLX and PLY:

  • n = uppermost bit of the data (bit 15 when x = 0, bit 7 when x = 1).
  • z is set if the data is 0, and is unset otherwise.
PH{register}, PL{register} (special purpose)

Push data bank register
Push direct page register
Push K register
Push processor status register
Pull data bank register
Pull direct page register
Pull processor status register

nvmxdizc e
........ . phb
........ . phd
........ . phk
........ . php
*.....*. . plb
*.....*. . pld
******** . plp

These instructions push and pull their respective registers to and from the stack.

PHD/PLD always push/pull two bytes. All other instructions push/pull one.

PLB is the main way to change DB. Usually this involves pushing the target bank and then PLB.

PLB and PLD set flags as follows:

  • n = uppermost bit of result (bit 7 for PLB, bit 15 for PLD).
  • z is set if the data is 0, else it is unset.

PLP affects all bits.

STP, WAI

Stop the clock
Wait for interrupt

nvmxdizc e
........ . stp
........ . wai

STP stops the CPU’s clock, shutting it down until a hardware reset occurs. Not really useful for us.

WAI stops until an interrupt occurs. There’s a chance you’ll see this, but at that point you probably won’t need this guide.

T{register}{register} (general purpose)

Transfer accumulator to X register
Transfer accumulator to Y register
Transfer stack pointer to X register
Transfer X register to accumulator
Transfer X register to stack pointer
Transfer X register to Y register
Transfer Y register to accumulator
Transfer Y register to X register

nvmxdizc e
x.....x. . tax
x.....x. . tay
x.....x. . tsx
m.....m. . txa
........ . txs
x.....x. . txy
m.....m. . tya
x.....x. . tyx

These instructions move data between registers. The size of the transfer is determined by the size of the destination register. The stack pointer is always considered to be 16 bits wide.

All instructions but TXS set flags:

  • n = uppermost bit of result.
  • z is set if the data is 0, else it is unset.
T{register}{register} (special purpose)

Transfer C accumulator to Direct page register
Transfer C accumulator to stack pointer
Transfer Direct page register to C accumulator
Transfer stack pointer to C accumulator

nvmxdizc e
*.....*. . tcd
........ . tcs
*.....*. . tdc
*.....*. . tsc

C, or the C accumulator, is a special title given to the full 16-bit accumulator. Normally operations involving the accumulator have different sizes depending on the m flag, but C is always 16 bits.

For all instructions but TCS:

  • n = uppermost bit of result.
  • z is set if the data is 0, else it is unset.
XBA

Exchange B and A accumulator

nvmxdizc e
*.....*. . xba

In the previous set of instructions I mentioned that the full 16 bits that make up the accumulator are called C. The two 8-bit parts of C are called A, the lower byte of the accumulator, and B, the upper byte of the accumulator.

For example, if the accumulator contains $3713, after xba the accumulator contains $1337.

XBA sets flags based on the result in A (the lower byte).

  • n = the uppermost bit of A (the 7th bit of the accumulator)
  • z is set if A contains 0 (regardless of what the upper 8 bits contain), and is unset otherwise.
XCE

Exchange carry and emulation flags

nvmxdizc e
.......* *

XCE swaps the carry and emulation flags. It’s the only instruction that can alter e. I’m not covering emulation mode in this guide, and you’re only likely to see it during startup code, as the SNES starts in emulation mode.

The flags altered by XCE are c and e. For example, during startup you might find

clc
xce

which takes the SNES out of emulation mode.

4 Likes

BSNES-Plus Overview

bsnes’ toolbar has many things that’re common to most emulators. The important thing for us is Tools>Debugger.

Here I’ve highlighted a few important sections:

  • The Toolbar (Red)
  • Step Options (Pink)
  • Run Options (Blue)
  • CPU Tabs (Yellow)
  • CPU Registers (Purple)

There are a few other things on this screen that will be covered separately.

Registers

Each of these will be filled while the emulator is running. You’re free to click the toggles and edit the numbers in the boxes, likely breaking things if you’re not careful.

Step Options

There are four buttons:

  • Break - Pauses the CPU. Allows you to use the other three buttons.
  • Step - Runs one instruction, prints it to the Instruction Window. If the instruction is a subroutine call or branch, it jumps to it.
  • Over - Runs one instruction, prints it to the Instruction Window. If the instruction is a subroutine call, it executes the subroutine silently and ends up at the instruction after the call.
  • Out - Runs until the CPU hits a return instruction.

To get a feel for these, try pressing them yourself while a game is running.

Run Options

These run the SNES until the next occurrence of the action clicked occurs. You should read up about these things in fullsnes.

CPU Tabs

We’re not worried about these, but clicking on them will show the registers and instruction window for these chips. The SA-1 and SuperFX chips are only found in certain games.

Main Window

If we load a cartridge and hit break/step, we get something like

(Note, what is shown here for me is probably not what’ll be shown for you.)

This is the main purpose of the debugger: To show executed code so that you can learn/debug.

On the left side of the window are the addresses of the instructions. To the right, under the Disassemble section, are the instructions themselves. The mnemonic for the instruction is marked in blue, memory addresses are marked in red, and literals are marked in green. On the rightmost edge of this section is a prediction of where the instruction will access based on the processor’s current status. Predicted values might change based on the contents of RAM, different registers, processor flags, etc. Finally, the section marked Comments is for, you guessed it, comments. bsnes-plus automatically lines occupied by return instructions with Return, although you’re free to overwrite that.

I’ll be calling this part of the debugger window the disassembler. Left-clicking on an address on the left will highlight it red and set a breakpoint at that address. More on breakpoints later. Left-clicking in the comment section will allow you to set a comment for that line. Right-clicking on an address or instruction will bring up a pane with 5 options:

  • Toggle breakpoint
  • Set comment
  • Set symbol
  • Jump to PC
  • Jump to address

Toggle breakpoint does the same thing that left-clicking on the address does.
Set comment sets a comment in the section on the right.
Set symbol will give that address a name so that you can refer to it in the breakpoint window. It’ll also cause instructions that refer to that address to show the symbol instead of the address.
Jump to PC will navigate the disassembler back to where the CPU is running code, in case you scrolled away or jumped to an address.
Jump to address will navigate the disassembler to the specified address, allowing you to view other code.

Below the disassembler is a small memory viewer window that will show you data currently being accessed.

Below that is the disassembler’s history console, showing past code that has been traced through.

Toolbar

The debugger toolbar has three dropdowns:

  • Tools
    • Breakpoint Editor …
    • Memory Editor …
    • Properties Viewer …
  • S-PPU
    • Tile Viewer …
    • Tilemap Viewer …
    • Sprite Viewer …
    • Palette Viewer …
  • Misc
    • Clear Console
    • Cache memory usage table to disk
    • Save breakpoints to disk between sessions
    • Save symbols to disk between sessions
    • Show H-position in clocks instead of dots

We’ll get into each of the Tools and S-PPU options in a moment. First, the Misc stuff:

Clear Console clears the disassembler history console.
Cache memory usage table to disk While executing code, reading data, and just generally running the game, bsnes-plus keeps track of all of the things that it has encountered and uses this information to display instructions correctly among other things. The Memory Viewer window section will cover those other things. Really, I don’t keep this option checked while debugging/hacking, because I don’t want bad jumps, changes I’ve made temporarily, or anything else being saved and interpreted as vanilla behavior in another session.
Save breakpoints to disk between sessions Does what it says on the tin.
Save symbols to disk between sessions Same as above.
Show H-position in clocks instead of dots Not useful for us, but this changes the progress toward HBLANK in the disassembler console to count clock cycles instead of pixels.

The Trace checkbox and the Enable trace mask option are for logging executed code to a file. Enable trace mask will ensure that code that is run more than once doesn’t appear again in the trace file.

The Symbols button brings up a menu that shows all defined symbols.

The Step checkbox tells bsnes-plus that we want to watch the current CPU step to the next instruction when Step, Over, or Out are pressed. You can toggle this checkbox for each CPU in the CPU tab, along with the Trace checkbox.

Tools

Breakpoint Editor

To get the CPU to stop at a specific place or during a specific event, such as a read or write to data, you use a breakpoint.

The Add and Del buttons do what you’d expect. When making a new breakpoint, the Start field is for the address of where you want to watch. This could be the address of an instruction, some data, anything. When you begin typing a number or word into this field, bsnes-plus will try to suggest known symbols and their addresses. You can use either interchangeably.

The End field is optional, as it says, and is useful for breaking on access to things like tables. It’ll draw from the list of symbols, too.

The three checkboxes are Read, Write, and Execute. It’s common to refer to breakpoints by these types, i.e. break on read, break on write, or break on execute.

The two Data fields are for breaks on read/write. The first will select an operator and the second selects a value to wait for. If you want to break on all reads/writes, leave these two empty.

The Source field selects what part of the SNES you’re checking. Because things like VRAM and OAM have their own memory, this allows you to set breaks on reads/writes to them.

The last two checkboxes on the bottom do what they say. WDM is an unused opcode and BRK typically means code execution has entered data full of 00s (The first byte of BRK is 00), so it’s useful to be able to break when they’re encountered.

Memory Editor

This is one of the most useful parts of bsnes-plus.

The left column shows the offset of the start of the line. The center shows the data as a series of hexadecimal bytes, with $10 bytes per line. On the right, there’s a readout of whatever the bytes would be in ascii.

At the bottom of the screen there is a label that will tell you the address of the last selected byte. Clicking on a byte will move the cursor to it.

You can copy and paste bytes into this data window. It’s useful for changing RAM values or temporarily changing data or ASM. Changes made to the ROM from this window are not saved.

On the right side there’s a dropdown for selecting what type of memory to view. This is a very useful feature. When looking for the ROM address of something written to RAM, for example, you could copy the bytes in the S-CPU view and search for them in the Cartridge ROM view.

The box below the dropdown is for typing an address to go to. The Auto update toggle will cause the bytes in the view to update while the emulator is running. This is useful for seeing things written to RAM, and generally should be always on. You can hit the Refresh button to manually update the screen.

As you may have noticed, the data in the screenshot is blue. bsnes-plus will automatically color memory in four colors:

  • Black - Unknown data, hasn’t been read, written to, or executed
  • Blue - Data has been read or written to here
  • Red - Data here has been executed as ASM
  • Purple - Data has been read/written and also executed here

This feature helps to show what routines and data are being used. Games often clear RAM on startup, so don’t be surprised if it’s all blue. Mirrored RAM/ROM won’t be highlighted unless it’s used within that bank, so reads to $000000 won’t highlight $7E0000.

The red, blue, and black arrows just below the Refresh button will jump to the next block of that data type.

The binoculars and the arrows next to them are for searching. You can click the binoculars or press Ctrl+F while in the window to search for data. It’s easiest to copy a block of bytes from the window and paste into the box.

The two buttons at the bottom will import or export a set of files that are records of all RAM at that moment. This includes data from the APU, CGRAM, OAM, SRAM, VRAM, and WRAM.

Right-clicking in the memory viewer will bring up a pane with a few options for the selected data, including the ability to set breakpoints at the address of the cursor.

Properties Viewer

No screenshot this time. The properties viewer shows the status of various I/O ports. Check out fullsnes for descriptions of each.

S-PPU

Tile Viewer

The dropdown at the top left controls the magnification applied to the view on the right. Show Grid and Auto update should seem pretty obvious.

Export saves the tiles to an image. Refresh updates the tiles.

Source allows you to look around different parts of memory for tiles. It can be useful for looking for uncompressed images.

Address offsets the view, and is mainly useful for looking through Cartridge ROM.

Bit Depth changes the bit depth displayed. Lower depths allow more tiles but less colors per tile. Best to leave an explanation of bit depth and tiles to something like fullsnes.

Width changes how wide the view is.

Override Background Color and the dropdown below it will alter the background (transparent) color in the view.

The Tile Viewer begins with the view as a grayscale image, but clicking Use CGRAM or in the palette box below it will display tiles using the selected palette. You can toggle Use CGRAM off at any time to return to grayscale.

The Base Tile Addresses boxes show where each BG or OAM sheet is pulling tiles from. Clicking goto will scroll the view to the start of the sheet.

Just like the memory viewer, the tile viewer shows the address of the cursor. Clicking in the tile view will update this.

Tilemap Viewer

No screenshot for this, as many of buttons are similar to the tile viewer. The BG buttons will show you how each BG is built. For an explanation of BG modes, see Fullsnes. You can press the Custom Screen Mode and Override Tilemap toggles to see how the BGs would look with edited settings.

Sometimes the things you see in the viewer won’t match up with the screen. A good example of this is during FE5 battles, where BG2 uses some window shenanigans and the stat windows can be seen if you set the map size to 64x64 and the tile address to 0xE000.

The other windows aren’t complicated and can be figured out by messing around in them.

Sprite Viewer

bsnes-plus’ sprite viewer was recently (at the time of writing) updated, so I figured I’d explain it.

The main window shows a rendering of the sprites. The lower left shows each sprite’s information and a checkbox to toggle them from being rendered in the main window.

Clicking on a sprite in the main window or from the list will show it on the right. The buttons on the right should be self explanatory.

Palette Viewer

No screenshot for this one. It’s an enhanced version of the palette viewer found in the tile viewer. Clicking on a color gives you the precise color values. Take a look at fullsnes for a detailed description of the color format.

4 Likes

Tips and Tricks

(Register) size matters

The 816 in 65816 is meant to highlight the architecture’s ability to switch between 8- and 16-bit registers at will.

The way a snippet of ASM is interpreted is radically different depending on the size of the 65816’s registers.

Let’s take a look at an example. Here’s a snippet that we intend to be used while the accumulator and index registers are 16-bit:

lda #$0000  ; A90000
sta $0B     ; 850B
lda #$1337  ; A93713
ldx #$0001  ; A20100
sta $AAAA,x ; 9DAAAA

On the right are the bytes that make up the opcodes.

Say this snippet was encountered while the accumulator was 8-bit. Let’s have a look at it:

lda #$00       ; A900
brk            ; 00
sta $0B        ; 850B
lda #$37       ; A937
ora (#$A2,s),y ; 13A2
ora ($00,x)    ; 0100
sta $AAAA,x    ; 9DAAAA

Whoah, that’s totally not what we want. The CPU hangs after the second instruction. Any opcode that works with a 16-bit immediate (lda #$0000, for instance) that has the upper byte of the immediate as 00 will cause a crash like this if encountered with the wrong register size.

Executing with the wrong sizes isn’t the only thing you need to worry about, though. An assembler that’s assembling your ASM will do one of two things if sizes aren’t correct:

  • It’ll try to reduce immediates to 8-bit values (like #$0001 to #$01)
  • It’ll error if it can’t figure out how to fix sizes

You might write a routine that the CPU expects to be used with a different size, you might forget to set sizes yourself, the assembler might get ambiguous sizes wrong, etc.

Let’s take a look at some cases where something goes wrong and how to fix them.

First, let’s take a look at a case where the CPU enters your routine with all registers as 8-bit but you need to work with 16-bit values. The two instructions, REP and SEP can change the size of registers:

rep #$30   ; $20 for the m flag | $10 for the x flag
lda #$1337 ; we're good
ldx #$0000

Many routines expect things like register sizes to be preserved across subroutines calls, but we’ll save that for another tip.

Second, let’s take a look at how branching can muck up expected register sizes.

Assuming that all register enter this as 16-bit:

lda $0B ; get widget
beq Foo ; special case, go do Foo

; else we need to Bar

sep #$20 ; small accumulator
lda #$00
sta $0D
bra End ; wrap up

Foo ; here's Foo

lda #$1337 ; frobulate it man
sta $0F
sep #$20 ; small accumulator
lda #$01
sta $0D

; fall through to End

End ; here's End

rep #$20 ; restore large accumulator
rtl

There’s a huge problem here. The assembler gets to the label Foo thinking that the accumulator is still 8-bit (because of the previous sep #$20) despite the CPU only being able to arrive there from the earlier beq Foo, when the accumulator was still 16-bit.

Luckily, most assemblers come with commands that set the assumed size of the registers. For 64tass, these are:

  • .as - sets the assumed accumulator size to 8-bit

  • .xs - sets the assumed index register size to 8-bit

  • .al - sets the assumed accumulator size to 16-bit

  • .xl - sets the assumed index register size to 16-bit

  • .autsiz - lets the assembler continue to assume sizes automatically

  • .mansiz - stops the assembler from assuming register sizes

Now that we know these, let’s fix that routine.

lda $0B ; get widget
beq Foo ; special case, go do Foo

; else we need to Bar

sep #$20 ; small accumulator
lda #$00
sta $0D
bra End ; wrap up

.al
.xl
.autsiz

Foo ; here's Foo

lda #$1337 ; frobulate it man
sta $0F
sep #$20 ; small accumulator
lda #$01
sta $0D

; fall through to End

End ; here's End

rep #$20 ; restore large accumulator
rtl

I don’t know if the .autsiz is really necessary, but it certainly doesn’t hurt.

An additional tip I’ll give here: Begin your routines by setting the assumed register sizes that you’re using! Just as the assembler can’t predict sizes across branches perfectly, it can’t predict across subroutines.

Conventions(?)

There’s an official set of conventions offered by the designers of the 65816 chip, but it’s unlikely you’ll see them followed. FE5 in particular does things very differently than what is suggested.

There are a few things that are always consistent, however.

First, it is up to routines to usually finish with the same register sizes that they started with. This means utilizing REP/SEP when the start and end sizes are always known, or using PHP/PLP to restore the old sizes if they were unknown.

Second, it is up to routines to avoid overwriting memory and registers that are not used for the effects of the routine or its return values. This usually involves pushing registers and data at the start of a routine and pulling them at the end.

Third, return values generally go in the accumulator before returning from a routine. If the routine returns true or false, this is typically represented by the carry flag, with c set being true and c unset being false.

Fourth, the usage of a routine dictates if it is called via JSR or JSL. A routine that can be called from anywhere must use JSL and must typically return via RTL. Routines like this must often make use of the conventions above to preserve registers and data between calls. Subroutines, local routines, whatever you’d like to call them, use JSR and must return with RTS. They’re only called from within the same bank and typically know the circumstances under which they’re called. Routines like this typically preserve less because they usually serve one purpose.

The official way to provide a routine with inputs is to push them onto the stack if they can’t fit in the CPU registers. FE5 does things a bit differently.

FE5 uses a chunk of memory near the start of RAM for routine parameters (along with CPU registers), starting at $00000B and ending before $00007B. They mimic CPU registers not unlike the set that the GBA uses. The first few are 16-bit and are used like scratch registers.

Anyway, let’s take a look at what all of this looks like:

	php
	phb
	phk
	plb
	sep #$20
	lda $7B
	sta $2100
	plb
	plp
	rtl

Let’s spruce this up with some more info about this routine. We’ll add a label for it, a comment about where it’s at, some stuff for register sizes, and some comments.

rlCopyINIDISPBuf ; 80/81A8

	; Assembler info

	.autsiz
	.databank ?

	; Routine

	php ; push flags
	phb ; push data bank

	phk ; push program bank
	plb ; new DB = K

	.databank `*

	sep #$20 ; small accumulator

	; Copy a byte from the I/O buffer to an I/O port
	; this one in particular handles screen brightness
	; and forced blanking. Check fullsnes for a description
	; on I/O

	lda BUF_INIDISP
	sta INIDISP,b

	; restore our pushed registers

	plb ; pull data bank
	plp ; pull flags

	rtl ; return

Much nicer, right?

An important thing to note here is that we push and pull an equal amount of data. Once you push a register you can basically do whatever you need to do with it, as long as you pull it afterwards.

Take a look at

	phk ; push program bank
	plb ; new DB = K

See that the data we pulled is not from the same register? Pushes and pulls don’t “remember” where they came from. plb simply pulls a byte from the stack and sticks it in DB. It doesn’t care what that byte on the stack is.

It’s for this reason that it’s also very important to keep track of the order of pushes and pulls.

For example:

pha
phx

pla
plx

This swaps the contents of A and X (as long as both registers are the same size). If you aren’t careful, you’ll mix everything up. Remember, the last register pushed should be the first register pulled.

There’s some advantage to a smart programmer, though: You can use the ability to pull into different registers to save time. Consider:

lda $0B
pha

; contents of A on the stack and
; also still in A

jsl $808888 ; call routine with A as an input

; our next call needs something else in A

lda #$0000

; but it also needs the previous thing in X

plx

jsl $909999 ; call routine with A, X as inputs

It would’ve been perfectly valid to do

pla
tax
lda #$0000

instead of

lda #$0000
plx

but I’m sure you can see which is smaller and faster.

Memory mapping, assembly offsets

The hardest part of working with the SNES is memory.

If you’ve worked with the GBA, you’re essentially free to throw everything at the end of the ROM and never worry about changing the assembly offset after the first time.

In a game like FE5, you’re likely to get multiple small patches of free space strewn throughout the ROM. Even if you’ve got a huge amount of free space at the end of your ROM to throw everything in you’ll still need to be mindful of bank boundaries.

To put it simply, under most circumstances you can’t have code and data blindly crossing a bank boundary. Most assemblers get funky about doing this, too, often wrapping their assembly offsets without incrementing the bank.

Most assemblers have tools for changing the assembly offset and how memory is handled.

Let’s go over the common tools that 64tass has for doing this:

The symbol * represents the current assembly offset. You can treat it as a value for things, i.e. lda * will load the data at the current offset (the start of the line) into A (this’ll be $A9 and whatever the low byte of the address is). You can set it like a variable, too:


; By default we're at $000000

* = $000010

; Now we're at $000010

We don’t always want to change the assembly offset, though. If you look back to the section about SNES memory, you’ll notice that in a LoROM game the start of the ROM is mapped to $008000 rather than $000000 which is the actual start of the ROM. We don’t want to pad our stuff until we get to $008000, though, so we use another assembler directive.

; By default we're at $000000

.logical $018000

; We're still at $000000, but the assembler will think we're at $018000
; So something like

.long *

; will assemble to 00 80 01 rather than 00 00 00

.here ; logical has to be followed by this at some point
      ; to return to the previous location the assembler thinks it's at

.logical and .here don’t actually change where things are assembled, only where the assembler thinks they are.

It might be best to take a look at 64tass’ manual.

We use * and .logical together in order to change where things go.


* = $000000
.logical $808000

; This is the start of the first ROM bank in a FastROM LoROM game.

.here

* = $008000
.logical $818000

; This is the start of the second bank

.here

Applying changes to a ROM

If you’re coming from GBAFE hacking you might be familiar with using the Event Assembler to apply changes to your ROM.

The typical method involves having a batch file to copy a clean ROM, rename it, and then run EA to write changes to this ROM.

We can skip a few steps here using 64tass and a master buildfile from which you will apply all of your changes.


.cpu "65816"

* = $000000

.binary "FE5.sfc"

; Place your inclusions that go in fixed locations here

.include "ThingInstaller.asm"
.include "OtherThingInstaller.asm"


; Place your inclusions that go wherever here

* = $1FB704 ; Freespace example
.logical $BFB704

.include "FreespaceThing.asm"

.here

Continue adding things as you see fit. The benefit of having .binary "FE5.sfc" is that the ROM’s contents are included first with edits written on top of it.

Next, have a batch file or similar thing to run 64tass:

64tass -f -o OutputROMName.sfc BuildfileName.asm

Sub out the names you use, possibly wrapping the text in quotes if there are spaces in your filenames (Stan is screaming off in the distance, I’m sure).

Organization is really up to you. I haven’t found a decent way to keep my stuff organized at all, so me trying to suggest how you do things is a bit weird.

The important thing here is that you include the clean ROM here to avoid having to copy and rename it before using something else to apply your changes. It’s entirely possible to write your ASM separately, assemble them into .dmps, and include them using something like EA, which is how I originally did things.

Finding the ROM offset of a mapped thing or the other way around

Quickly switching between a mapped offset and the actual ROM offset of a thing is easy and doesn’t require any math. Using BSNES-Plus’s memory viewer, switch the mode in the dropdown to the format for the value you have (i.e. if you have some mapped pointer and you’re looking for its ROM offset, have the mode in S-CPU bus mode. Switch it to Cartridge ROM if you have the ROM offset and need the mapped offset.)

Type your offset into the address box below the dropdown and copy a bunch of bytes from there. Make sure the bytes you’re copying seem like a unique sequence (copying a large block of FF FF FF … is probably a bad idea). Switch the mode to the opposite setting (S-CPU bus to Cartridge ROM or vice versa) and search for the bytes you copied. You should arrive at the right place, with a few exceptions.

The works when searching for things within the ROM. If the game you’re searching in is a FastROM game and you’re looking for a mapped offset you’ll probably get two search hits. One will be in the slow mirror, and another will be in the fast mirror. See the reference section on memory mapping if you’re not sure which is which.

Style

Consistency is key. Keeping a consistent style will help people, including yourself, read your ASM and understand what’s going on. Let’s talk about a few ways I like to style my ASM.

Spacing

Picking a spacing style and sticking with it instantly makes your ASM much more readable. I see a lot of ASM that are conglomerates of vanilla code and custom stuff, sometimes written at different times by different people, resulting in frankencode.

Here’s a general style guide that I follow for spacing:

  1. Labels are left-aligned unless they’re in a child scope, where they’re tabulated to match the scope.
  2. Instructions are preceded by a tab unless they’re in a child scope, where they’re tabulated to match the scope + 1.
  3. Instructions with parameters get a single space between the parameters and the instruction.
  4. There’s a blank line between labels, code, assembler directives, and comments.
  5. There’s always whitespace between things and the comment symbol and the comment symbol always has a space between it and the comment.
  6. Local labels are tabulated to match instructions, allowing slick text editors to cleanly fold the entire routine.
  7. There’s a blank line between sections of code accomplishing different tasks and comments describing different things.
made-up example
rlFrobulator

	.autsiz
	.databank ?

	; Given an index in A,
	; return a pointer to the
	; corresponding widget in
	; lWidgetPointer

	; Inputs:
	; A: Index byte

	; Outputs:
	; lWidgetPointer: long pointer to widget

	php
	phb

	; Ensure that the full registers
	; get saved regardless of how we entered

	rep #$30

	phx
	pha

	sep #$20
	lda #`aFooTable
	pha
	rep #$20
	plb

	.databank `aFooTable

	pla
	and #$00FF
	sta lWidgetPointer ; Temp storage so we can do ID * 3
	asl a
	clc
	adc lWidgetPointer

	; Grab a pointer from aFooTable
	; and store it to lWidgetPointer

	tax
	lda aFooTable,x
	sta lWidgetPointer
	lda aFooTable+1,x
	sta lWidgetPointer+1

	plx
	plb
	plp
	rtl

Names

Next on the chopping block: naming conventions.

Names are impossible. Here are a couple of tricks to make them a bit less impossible:

  1. Have names describe what type of thing a thing is. I like to follow a few basic rules:

    • Routine names begin with r and are followed by l, s, or i depending on their return opcode.
    • Variables that are a single thing begin with b, w, or l, for byte, word, and long respectively.
    • Things that have more than one entry are prefixed with a for array (although array isn’t always the best way to describe them)
    • Other items get longer descriptive prefixes like menutext or event or what have you.
  2. Start routine names with a broad name for the group of code they belong to. Writing ASM for an inventory screen? Probably best to name your code rlInventoryWhatever or the like.

  3. Pick a case style and stick to it. You can see how I like to name things by looking at the last point.

  4. Pick a case style and name style for your files and folders, too. I prefer to keep folder names in all caps to delineate them from files. I also group files into groups of like code. Example: rlInventoryWhatever might be a routine in the file /ASM/Inventory.asm

  5. Keep consistent file extensions for easy grouping, syntax highlighting, searching, etc. I mostly go with .asm for most things, .event for anything to do with events, .inc for definitions, and .*bpp for graphics, where * is the bit depth. Remember, file extensions are just hints and don’t actually dictate the contents of a file.

  6. Use local labels where you can to break up long names. If a table is only used once by a single routine, it might be worth giving it a short local name. Example: Say I’ve got a table aInventoryWhateverThingamajigTable used by the previously-mentioned rlInventoryWhatever. I could probably get away with just calling it _aThingamajigTable. If I ever needed it outside of rlInventoryWhatever then I’d just have rlInventoryWhatever._aThingamajigTable.

Doc

Read it. When writing ASM it’s best to write yourself notes and comments in your code, so that you can come back to it at a later time and not be totally lost. It also helps other people read your code.

At the start of a routine it might be helpful to write out the inputs and outputs of the routine, along with things that it affects. The spacing example above shows how I like to do it.

When writing comments I prefer to keep them on their own lines, typing them before the thing that they describe. Generally, writing some notes on what a block of code is doing is much more useful than documenting each line. I think most people will tell you not to label every cat a cat, but while learning that’s perfectly acceptable. After all, there are so many opcodes and addressing modes that things can get confusing.

I can’t really tell you how to write doc, but writing something is better than nothing.

Final style notes

I think the best advice I can give you is to be consistent. The second-best advice is to go steal style ideas from better wizards. Go look at Stan’s stuff or the Unified FE Hacking Dropbox, or even the fe8u decomp.

Symbols

Back in the section about working with bsnes-plus I mentioned how you can define symbols and comments when debugging.

When we write ASM, we’re able to generate a list of symbols that includes anything we’ve defined in our ASM, such as routine locations, variables, and more.

When assembling ASM with 64tass, specify --vice-labels -l YourSymbolFileName.cpu.sym as commandline options, obviously replacing the name of the output file with whatever the name of the ROM you’re building is. Then, when loading that ROM with bsnes-plus, it’ll automatically use that file as your symbol list.

Note: The current (at the time of writing) version of bsnes-plus doesn’t play nice with the VICE format output by 64tass. In the symbol output, you’ll need to pad any symbols whose address is less that 6 digits with zeroes, i.e. 2e needs to become 00002e or the name of the symbol will be cut off in bsnes-plus.

2 Likes

Here are some small projects to give you some tips and tricks on how to think like an ASM hacker.

#Project 1: Night chapter battles

One of FE5’s most notable features is its pitch-black fog. Accompanying this fog is a change to the backgrounds shown during battles. It’s easy to change which chapters have fog, simply playing with a byte in the chapter’s chapter data. Where do we go to edit which chapters have matching night backgrounds, though?

Let’s start by making a list of chapters that use these backgrounds:

Chapter 2x
Chapter 4x
Chapter 8x
Chapter 11x
Chapter 12
Chapter 12x
Chapter 14x
Chapter 24x

We should probably convert these into their chapter IDs. I’ve got a set of NMMs for FE5 floating around that includes chapter data stuff.

$02 - Chapter 2x
$05 - Chapter 4x
$0A - Chapter 8x
$0E - Chapter 11x
$0F - Chapter 12
$10 - Chapter 12x
$13 - Chapter 14x
$21 - Chapter 24x

If we’re lucky, there’ll be a table in the game that contains this string of bytes. Let’s try looking in bsnes-plus’s memory veiwer or a hex editor for 02 05 0A 0E 0F 10 13 21.

No dice. Let’s try again but with each chapter ID taking up a word. Sometimes it’s more convenient to load a whole word even if the data is only a byte. When loading a byte but your register size is 16-bit, you’d need to AND the register with $FF to strip the byte. Alternatively, you can set the register to 8-bit, load the data, and then switch back to 16-bit, but this is slow.

So, we search for 02 00 05 00 0A 00 0E 00 0F 00 10 00 13 00 21 00.

No luck, either. We’re going to need a different way to find how the game checks for these. We were searching before for a list of chapter IDs because it’d be likely that the game checks for specific chapters when determining if the chapter uses a night background. Why don’t we check for reads to the current chapter’s ID when initiating a battle?

Let’s boot up FE5. Conveniently FE5’s first chapter starts with a character in range so it’s super easy to set this up. Move Eyvel over to attack the enemy soldier, go through with attacking until you get to the battle forecast. Open up the debugger, start the battle, and click the Break button as the screen is transitioning to black.

We’re going to have to cheat on the learning experience a bit here. Rather than have you find where in RAM the current chapter ID is stored, I’m going to just tell you: $000E11. Part of being able to understand what is written/read in RAM is being able to understand the routines that read/write there. We’re trying to learn ASM here, which’ll help us learn how to interpret these things that read/write to RAM, which’ll help us write ASM, etc. etc. It’s a big cycle.

Next, in the main emulator window, set a savestate (Tools>Save Quick State and pick a slot). We’ll use this in case we don’t find what we’re looking for the first time, if we accidentally skip past it, or if we want a chance to trace through things again.

Back in the debugger window, hit Run.

Let the game run through the battle without doing anything. We want bsnes-plus to run through all the involved code first so that it can do its highlighting stuff. Afterward, hit Break again and load your savestate (in the main emulator window, Tools>Load Quick State and select the slot you used before.)

Now we’re going to do it for real. Open up the breakpoint editor and type E11 into the first address box. You can omit leading zeros in addresses. Click the R checkbox to break on reads to E11.

Hit Run in the debugger and let’s see what happens.

The first break you get should looks something like:

...
83c4cf lda #$0031
83c4d2 sta $2f
83c4d4 lda $0e11
83c4d7 jsl $848933
83c4db lda $2f
83c4dd beq $c54a
83c4df lda $7e4fc4
83c4e3 bne $c541
83c4e5 lda $0345
83c4e8 and #$00ff
83c4eb bne $c54a
83c4ed php
83c4ee phb
83c4ef sep #$20
83c4f1 lda #$487e

It should have stopped at 83c4d4 lda $0e11. I’ve got no idea of what any of this is. Let’s see what that jsl $848933 does.

Hitting Step twice in the debugger should stop in the routine. You can step through the routine to see what it’s doing. The effect of this routine is that it writes a pointer at $2F. Step to the end of the routine (the rtl) and pull up the memory editor. At $2F there should be the bytes 31 88 84 (or something similar). If we reverse these bytes and put them together we get $848831. Let’s go there in the memory editor by typing 848831 into the address box.

Well, the data here doesn’t look like what we’re looking for. The data doesn’t seem like a set of palettes (The first word here is 00 80 or $8000 and colors on the SNES don’t use the upper bit.), an image (it’s much too small, even for a compressed image, which would end with $FF), or a table for graphics.

I don’t think this is what we’re looking for. Before we spend a bunch of time here, make a note somewhere of the start the routine we broke on ($83c4cf). Let’s click Run again.

Chances are you’ve ended up at the same place again. As it turns out, this routine seems like it’s run every frame, so it really isn’t what we wanted. If you set a break on read to E11 and watch what happens when normally playing a chapter you’ll get hits all the time. I saw this coming and that’s why I had you set your savestate during the fade out before battles. This way, you’ll hit this point as few times as possible.

Anyway, meta knowledge aside, we just need to keep hitting Run every time it breaks until we hit a routine we haven’t seen before.

It might take a while, but you should end up at a routine that looks like this

968189 pha
96818a lda #$0000
96818d lda $0e11
968190 cmp #$0002
968193 beq $81bb
968195 cmp #$0005
968198 beq $81bb
96819a cmp #$000a
96819d beq $81bb
96819f cmp #$000e
9681a2 beq $81bb
9681a4 cmp #$000f
9681a7 beq $81bb
9681a9 cmp #$0010
9681ac beq $81bb

with the cursor on the 96818d lda $0e11. This routine is a big list of comparisons before returning with carry set or clear. Here’s the full routine:

968189 pha
96818a lda #$0000
96818d lda $0e11
968190 cmp #$0002
968193 beq $81bb
968195 cmp #$0005
968198 beq $81bb
96819a cmp #$000a
96819d beq $81bb
96819f cmp #$000e
9681a2 beq $81bb
9681a4 cmp #$000f
9681a7 beq $81bb
9681a9 cmp #$0010
9681ac beq $81bb
9681ae cmp #$0013
9681b1 beq $81bb
9681b3 cmp #$0021
9681b6 beq $81bb
9681b8 clc
9681b9 bra $81bc
9681bb sec
9681bc pla
9681bd rtl

Your disassembler window probably doesn’t have the 9681bb sec because it’s never been encountered yet.

Let’s see if this is what we’re looking for. Click Step until you reach the first beq. While the disassembler has this instruction highlighted green, click the checkbox for the Z flag in the debugger. Click step and you should arrive at 9681bb. Hit Run and watch what happens.

There we go! The night background appears behind our battlers. If you were paying attention to the routine, you might’ve noticed that it has a comparison for each chapter that’s supposed to be a night chapter. This big block of comparisons is ugly and large, but ultimately is a very fast solution if there are only a handful of things to check. I’m going to break down this routine into some easily-digestable pieces.

prlNightChapterCheck ; 96/8189

	; Assembler info

	.al
	.autsiz

	; Routine

	; Save A because we're using it

	pha

	; This isn't needed and is probably
	; a compiler derp

	lda #$0000

	; Get current chapter and compare
	; to our night chapters

	lda $0E11

	cmp #$0002 ; 2x
	beq _Found

	cmp #$0005 ; 4x
	beq _Found

	cmp #$000A ; 8x
	beq _Found

	cmp #$000E ; 11x
	beq _Found

	cmp #$000F ; 12
	beq _Found

	cmp #$0010 ; 12x
	beq _Found

	cmp #$0013 ; 14x
	beq _Found

	cmp #$0021 ; 24x
	beq _Found

	; If the CPU reaches here, then the
	; current chapter isn't a night chapter,
	; so we return with carry clear

	clc
	bra _End

	; If the CPU reaches here, then the
	; current chapter is a night chapter,
	; so we return with carry set

	_Found

	sec

	_End

	pla
	rtl

Hardcoding a bunch of chapter IDs to be night chapters is very inconvenient. Let’s change this to read from a table of chapter IDs so that it’s easy to edit which chapter is a night chapter. We know that the routine that calls this night chapter check expects that it return with carry clear if it’s not a night chapter and carry set if it is a night chapter.

Let’s write some pseudocode:

x = 0
while true:
	get table_entry from table+x
	if table_entry == table_end:
		return false
	elif table_entry == current_chapter:
		return true
	x += 1

There are some important things for us to think about before we get to writing our routine. First, is this routine called via JSL or JSR? If it was called with JSR then we’d have to do one of three things:

  • We could try fitting our replacement routine in the space used by the original routine.
  • We could use the space used by the original routine to jump out into free space.
  • We could replace all calls to this routine with calls to our new routine, which we’d be putting in free space in the same bank.

The first option is hard to pull off with small original routines. The third option is impossible if there isn’t enough free space within the same bank. Luckily for us, the routine we’re replacing is called via JSL, giving us slightly better options:

  • We could try fitting our replacement routine in the space used by the original routine.
  • We could use the space used by the original routine to jump out into free space.
  • We could replace all calls to this routine with calls to our new routine, which we could put in free space anywhere in the ROM.

Replacing all calls to a routine can be incredibly difficult. Not every call is easy to find. Pointers to routines might be stored in pointer tables rather than baked-in to ASM, the pointers might be calculated by adding an offset to some base, etc.

Before we get too distracted, let’s move on to the second thing to consider: What will this table look like?

What will each entry look like? How will it end?

We’ve got an idea of what each entry should be like: Each should probably be the chapter ID of a night chapter. We know that chapter IDs aren’t ever going to be more than a byte, so we can save a considerable amount of space by making each entry in our table only a byte.

The ending of a table is important because we need some way to tell if we’ve hit the end of the table. We could also check if we’ve reached the end of the table by comparing our offset in the table to the length of the table. We’re going to go with having an entry that signifies the end because it’s simpler and faster.

When we load an entry, the lda opcode sets two flags depending on the data: n, the negative flag, and z, the zero flag. These give us two special types of values we can use: We can have the end entry be $00 and use beq to hop to our return false part. We can also use a byte with its upper bit set (remember, n doesn’t mean that a number is a signed negative number, only that the upper bit is set.) like $80 or $FF or similar along with bmi to get to our return false part.

There are many different ways to do this, these are just the best options I can think of for how I want to do this. There are an infinite number of solutions to most ASM problems.

Anyway, let’s think about these two options. For the first, $00 makes a bad ending entry because the first chapter of the game has the ID $00. Our new system would exclude chapter 1 from being a night chapter. If this doesn’t bother you, you’re free to use it.

The other option, a negative number, is much better in my opinion. This leaves us with $00-$7F available as chapters. That’s more than we could ever hope to use. I’m going to go with this solution.

Let’s do this. Create a new file in your text editor. We need to give our table a descriptive name so we can recognize its purpose at a glance. I’ve chosen aNightChapterTable for mine. The a at the beginning is a reminder that this thing is an array. It’s good to give yourself hints as to what the purpose, usage, and layout of something is.

After coming up with a name we should probably include some comments on what the table is, its format, and how it ends.

Here’s my table file so far:


aNightChapterTable

	; This table is for determining if a
	; chapter uses a darker (nighttime)
	; battle background. If the chapter's
	; ID is in the table, battles will use
	; the night backgrounds. The table is ended
	; with a byte with its uppermost bit set.

Next, let’s build the body of the table. I’m going to keep the vanilla chapters in it for now.

	; These are the vanilla chapters that
	; use the night battle backgrounds

	.byte  2 ; 2x
	.byte  5 ; 4x
	.byte 10 ; 8x
	.byte 14 ; 11x
	.byte 15 ; 12
	.byte 16 ; 12x
	.byte 19 ; 14x
	.byte 33 ; 24x

I wrote out the chapter IDs in decimal because I was lazy and happened to be reading them from a text file. You can specify them in hexadecimal, binary, or even not at all if you don’t want the vanilla chapters in the table.

Then, I added an entry for testing. Rather than playing to one of the chapters above, I’d rather be able to immediately see the results of my work, so I added:

	; Let's have an entry for chapter 1

	.byte  0

I ended to table with the ending byte:

	.char -1 ; This is the terminator

Here’s the whole thing:


aNightChapterTable

	; This table is for determining if a
	; chapter uses a darker (nighttime)
	; battle background. If the chapter's
	; ID is in the table, battles will use
	; the night backgrounds. The table is ended
	; with a byte with its uppermost bit set.

	; These are the vanilla chapters that
	; use the night battle backgrounds

	.byte  2 ; 2x
	.byte  5 ; 4x
	.byte 10 ; 8x
	.byte 14 ; 11x
	.byte 15 ; 12
	.byte 16 ; 12x
	.byte 19 ; 14x
	.byte 33 ; 24x

	; Let's have an entry for chapter 1

	.byte  0

	.char -1 ; This is the terminator

Alright, cool. We’ve got a table, now let’s build the thing that reads from it.

Once again we start with a name. I’m going to name this the same name I gave to the vanilla routine, as it serves the same purpose. Next, we need to figure out the register sizes this routine starts with. From breaking earlier we have a printout in our debugger window that shows the flags at the time of the break:

96818d lda $0e11     [800e11] A:0000 X:0012 Y:0000 S:1fdc D:0000 DB:80 nvmxdiZc V: 21 H:125 F: 4

Notice the part that says nvmxdiZc? This is a representation of the flags, with a lowercase letter meaning that it’s unset. A capital letter means that the flag is set. We see that this routine entered with all registers as 16-bit registers, so our routine might start something like:

prlNightChapterCheck

	; Assembler info

	.al
	.xl
	.autsiz

	; Routine

After this comes the pushing of registers we need to preserve.

Although we know that this routine is entered with 16-bit registers that might not always be the case, so we have to set things up to handle that. We start by pushing processor flags because we need to leave with sizes being the same as how we entered.

You can always push 8-bit registers like DB or K without knowing the size of A or X/Y. Typically we push these first, and then set register sizes before pushing A and/or X/Y. Here we do just that, setting all registers to 8-bit and then pushing them. We set our sizes to 8-bit here because our entries are 8-bit and we won’t have more entries than there are chapters, so X/Y could be 8-bit too.

So far, we have:

prlNightChapterCheck

	; Assembler info

	.al
	.xl
	.autsiz

	; Routine

	; We generally push p first
	; because subsequent opcodes will
	; affect flags

	php

	; We're going to change DB to the
	; bank our table is in.

	phb

	; Chapter numbers are bytes
	; There really isn't a chance for
	; there to be more than 255 chapters

	sep #$30
	pha
	phx

Next up, we should set DB to the bank out table is going to be in. We don’t know where that’ll be, specifically, but the assembler will figure it out.

Remember that the best way to set DB is by using plb. We can load the bank into A, push it, and then pull it into DB:

	; Set DB to our table's bank

	lda #(aNightChapterTable>>16)
	pha
	plb

64tass is supposed to have a special thing for extracting the bank from an address, but I can never get it to work as expected, so we shift our (24-bit) pointer by 16 to end up with an 8-bit bank byte.

This is a good time to mention that 64tass is really good about doing math on things like this.

Up next, we prepare a loop counter to use:

	; We use X as a loop counter and we start at 0

	ldx #$00

Now we build the loop. Recall the pseudocode from earlier:

x = 0
while true:
	get table_entry from table+x
	if table_entry == table_end:
		return false
	elif table_entry == current_chapter:
		return true
	x += 1

Since we’re at the start of the loop, we should put a label here so that we can come back here.

Then, we grab an entry from the table. I mentioned that lda sets flags when it loads data, so we can immediately use a conditional branch afterward to jump if we’ve hit the end of the table.

Next, we need to compare the value we loaded to the current chapter’s ID. If they match, we need to jump to the part that returns true.

After that we need to set up what happens if neither of the above things happened. We increment our loop counter by the size of an entry and then we jump back to the start of the loop.

Lucky for us, the size of our entries is a single byte, and the 65816 has a convenient opcode for adding one: inc. Here’s a look at how I’m doing this:

	; An underscore at the start of a label means that the label
	; is local and can reuse label names.

	_Loop

	; We grab an entry from the table and check if it's the end

	lda <>aNightChapterTable,x
	bmi _NotFound

	; Compare to our target chapter number

	cmp $0E11
	beq _Found

	; If the CPU reaches here, the table entry didn't match
	; our target chapter number, so we increment the loop
	; counter and try again

	inc x
	bra _Loop

Here the <> before aNightChapterTable,x means get the lower 16 bits of this and keeps the assembler from making this a longer opcode. We can do this because we set DB earlier.

You can see that the conditional branches refer to two labels, _NotFound and _Found. Let’s define them:

	_Found

	; If the CPU reaches here then the table entry matched
	; the target chapter, and we need to wrap up and return
	; with the carry flag set.

	; Pull our registers in the order they were pushed

	plx
	pla
	plb
	plp

	; We return with carry set

	sec
	rtl

	_NotFound

	; If the CPU reaches here then the target chapter was not found
	; in the table. We need to wrap up and return with the carry flag clear.

	; Pull our registers in the order they were pushed

	plx
	pla
	plb
	plp

	; We return with carry clear

	clc
	rtl

And then, all together:

prlNightChapterCheck

	; Assembler info

	.al
	.xl
	.autsiz

	; Routine

	; We generally push p first
	; because subsequent opcodes will
	; affect flags

	php

	; We're going to change DB to the
	; bank our table is in.

	phb

	; Chapter numbers are bytes
	; There really isn't a chance for
	; there to be more than 255 chapters

	sep #$30
	pha
	phx

	; Set DB to our table's bank

	lda #(aNightChapterTable>>16)
	pha
	plb

	; We use X as a loop counter and we start at 0

	ldx #$00

	; An underscore at the start of a label means that the label
	; is local and can reuse label names.

	_Loop

	; We grab an entry from the table and check if it's the end

	lda <>aNightChapterTable,x
	bmi _NotFound

	; Compare to our target chapter number

	cmp $0E11
	beq _Found

	; If the CPU reaches here, the table entry didn't match
	; our target chapter number, so we increment the loop
	; counter and try again

	inc x
	bra _Loop

	_Found

	; If the CPU reaches here then the table entry matched
	; the target chapter, and we need to wrap up and return
	; with the carry flag set.

	; Pull our registers in the order they were pushed

	plx
	pla
	plb
	plp

	; We return with carry set

	sec
	rtl

	_NotFound

	; If the CPU reaches here then the target chapter was not found
	; in the table. We need to wrap up and return with the carry flag clear.

	; Pull our registers in the order they were pushed

	plx
	pla
	plb
	plp

	; We return with carry clear

	clc
	rtl

You could optimize this routine a bit by switching up how things are pushed and then pulled at the end, but we’re not going to get into that.

Now we need some way to install out routine and table. Back in the tips and tricks section I showed an example buildfile. Let’s grab something to use as a base.


.cpu "65816"

* = $000000

.binary "fe5.sfc"

; Place your inclusions that go in fixed locations here

; Place your inclusions that go wherever here

* = $1FB704
.logical $BFB704

.here

$1FB704 is a really large block of free space that’s useful to remember, similar to FE8’s $B2A610. Let’s put our things into this free space.

; Place your inclusions that go wherever here

* = $1FB704
.logical $BFB704

.include "NightChapterCheck.asm"
.include "NightChapterTable.asm"

.here

Next, we need to have the game call our new routine from the old one.

We could have a separate file that puts our jump in a specific place. We call this a hook.


* = $0B0189
.logical $968189

	; Replacing the vanilla night chapter check

	jsl prlNightChapterCheck
	rtl

.here

This puts our hook over the start of the old routine. Now, when the game jumps to the old night check it’ll then jump to our replacement hack.

Save this as its own file and add it to your buildfile:


.cpu "65816"

* = $000000

.binary "fe5.sfc"

; Place your inclusions that go in fixed locations here

.include "NightChapterCheckHook.asm"

; Place your inclusions that go wherever here

* = $1FB704
.logical $BFB704

.include "NightChapterCheck.asm"
.include "NightChapterTable.asm"

.here

Whether you assemble your stuff with a batch file, a makefile, or via the commandline I don’t really care. What you’d have to assemble this would look something like:

64tass -f -o example1.sfc Example1Buildfile.asm

It should give you a warning about program counter overflows, which you can ignore. It’s because we included the entire base ROM and that crossed literally every bank boundary.

Anyway, assuming you added chapter 1’s ID to the table, you can boot up the assembled ROM and start a fight to see if it worked.

Now, for bonus points we can check to see if our new routine will fit over the original. Going back and looking at the original routine, it takes up $968189-$9681BE which is $35 bytes.

Opening the ROM in bsnes-plus and using the memory editor to go to $BFB704, we see that our routine takes up $BFB704-BFB729 or $25 bytes. We’re quite fortunate that our replacement is smaller than the original, because we don’t need to use a hook and can simply overwrite the original routine.

We ditch the hook entirely, move the check routine out of free space and wrap its contents in the

* = $0B0189
.logical $968189

.here

bit.

My buildfile now looks like


.cpu "65816"

* = $000000

.binary "fe5.sfc"

; Place your inclusions that go in fixed locations here

.include "NightChapterCheck.asm"

; Place your inclusions that go wherever here

* = $1FB704
.logical $BFB704

.include "NightChapterTable.asm"

.here

and my NightChapterCheck.asm file looks like


* = $0B0189
.logical $968189

prlNightChapterCheck

	; Assembler info

	.al
	.xl
	.autsiz

	; Routine

	; We generally push p first
	; because subsequent opcodes will
	; affect flags

	php

	; We're going to change DB to the
	; bank our table is in.

	phb

	; Chapter numbers are bytes
	; There really isn't a chance for
	; there to be more than 255 chapters

	sep #$30
	pha
	phx

	; Set DB to our table's bank

	lda #(aNightChapterTable>>16)
	pha
	plb

	; We use X as a loop counter and we start at 0

	ldx #$00

	; An underscore at the start of a label means that the label
	; is local and can reuse label names.

	_Loop

	; We grab an entry from the table and check if it's the end

	lda <>aNightChapterTable,x
	bmi _NotFound

	; Compare to our target chapter number

	cmp $0E11
	beq _Found

	; If the CPU reaches here, the table entry didn't match
	; our target chapter number, so we increment the loop
	; counter and try again

	inc x
	bra _Loop

	_Found

	; If the CPU reaches here then the table entry matched
	; the target chapter, and we need to wrap up and return
	; with the carry flag set.

	; Pull our registers in the order they were pushed

	plx
	pla
	plb
	plp

	; We return with carry set

	sec
	rtl

	_NotFound

	; If the CPU reaches here then the target chapter was not found
	; in the table. We need to wrap up and return with the carry flag clear.

	; Pull our registers in the order they were pushed

	plx
	pla
	plb
	plp

	; We return with carry clear

	clc
	rtl

.here

2 Likes