It has occurred to me that the forum has a dearth of resources pertaining to music hacking (largely my fault). So I’m copying and pasting this explanation which I gave to another user on Serenes. I’ve come to understand that music hacking is poorly documented amongst FE communities; this should hopefully clear up at least some things with regard to Sappy. I’m hoping for this to become part of a larger overarching tutorial which will map out music insertion & hacking from start to finish.
This is only to briefly explain what you’re looking at when you open Sappy. Much of this information is actually available in the long-ass doc included with the program, but heck, I’ll walk you through it anyway.
Table, Header, Voice, Speed
These numbers will become apparent if you hit the arrow off the right, which you already know how to do. I’m going to explain each of these, one by one.
“Table” is the location within the game’s song table of the pointer to the song. Songs and sounds in FE are arranged into a giant table of pointers, each of which points to a “header”.
“Header” is… well, it’s a header. It’s difficult to explain what this is because it varies between each use, but in the case of track data, it’s a set of data that tells the game how many tracks the song has, and where to find those tracks.
Specifically, a song header consists of 4 bytes, the first of which tells the game how many tracks the song uses, two unused bytes, then another byte with a “reverb” value, which has by all accounts been proven to be actually unused. After that, 4 bytes for the pointer to the voicetable/instrument table. The end result looks like this:
02 00 00 80 40 50 60 08 11 11 11 08 22 22 22 08
Broken down, this would be:
02 00 00 80 // number of tracks (2); nothing; nothing; "reverb value" (unused)
40 50 60 08 // a pointer to the instrument map
11 11 11 08 // pointer to track 1
22 22 22 08 // pointer to track 2
In your .s file the header is represented by text and is later converted to hex after assembling it with Sappy. It might look something like this:
daysoftrainingA:
.byte 16 @ NumTrks
.byte 0 @ NumBlks
.byte daysoftrainingA_pri @ Priority
.byte daysoftrainingA_rev @ Reverb.
.word daysoftrainingA_grp
.word daysoftrainingA_1
.word daysoftrainingA_2
.word daysoftrainingA_3
.word daysoftrainingA_4
.word daysoftrainingA_5
.word daysoftrainingA_6
.word daysoftrainingA_7
.word daysoftrainingA_8
.word daysoftrainingA_9
.word daysoftrainingA_10
.word daysoftrainingA_11
.word daysoftrainingA_12
.word daysoftrainingA_13
.word daysoftrainingA_14
.word daysoftrainingA_15
.word daysoftrainingA_16
Throughout the .s file you’ll see “labels” which are like little bookmarkers like this:
daysoftrainingA_1:
So the header is basically the game’s way of saying “so this is a song with 16 tracks, and here’s where to find all those tracks”. Sappy automatically updates the pointers to the tracks every time you insert the song, so there’s no manual writing to a hex editor needed.
“Voices” is the location of another table which is composed of, yep, you guessed it, voices. In FE7 each song has its own voice table for reasons unknown to me; this is uncommon practice and most games would use a unified voice table. In FE7, each voice table ONLY contains the samples each song uses, and everything else is by default set to a Square wave, which I’ll explain in a little bit.
Manipulation of a voicetable is absolutely crucial to good music hacking, and perhaps one day I’ll teach you how and why, but we’ll focus on the task at hand for now.
Speed is the tempo of the song in beat per minute. This is determined in your .s file by the TEMPO command, which looks a bit like this:
.byte TEMPO , 126*birthholyknight_tbs/2
As you can see, TEMPO has been divided by two. Why is this? Well, here’s the deal: the game only uses a maximum of 1 byte for a TEMPO parameter; therefore, the maximum tempo for any song would be 255, normally. The Sappy engine gets around this by getting the end tempo from the TEMPO parameter * 2. Meaning, writing 0x64, or 100, as your TEMPO, would actually cause the song to play at 200 beats per minute.
This does occasionally manifest in a mild loss of detail as the game cannot play odd-numbered tempos (e.g. 125, 155, etc.) and is particularly noticeable in very slow songs and when a song is meant to change tempos from 88, 87, 86, 83, in short succession. This is simply a limitation of the system, unfortunately.
Track data
Then there’s the actual track data. “Location” is fairly self-explanatory; it’s the location of the track data and where the game is reading the track data from. This isn’t actually very useful information usually.
Note parameters
Then there are these other coloured numbers; the parameters. Any time a note plays (and even when it doesn’t) it’s going to have a set of parameters as determined by the track data in the .s file. The first number, the red number, determines which voice the track is currently using. In your own .s file you should see something resembling this:
.byte VOICE , 5
The second number (orange) is the current note velocity, which you can interpret as the overall volume of the note. This is determined by “vxxx” next to each note in the .s file, such as “v028”.
The next is a green number, which you may or may not recognise as the master volume, and is determined by this command:
.byte VOL , 110*daysoftrainingA_mvlx/mxv
The blue number is “modulation”, which if you don’t know what that is, don’t ask me, because I don’t either. It’s a musical production term so it’s beyond my scope, but you can think of it like “vibrato”.
Then the purple number is “pitch bend”, which is determined by the BEND command in the .s file, and looks like this:
.byte BEND , c_v+0
Finally, off to the side, you’ll see a bit of text saying things like “Noise”, or “Square1”, or “Direct”. These are telling you what type of instrument is being played.
"Direct" sounds are samples being played. If you can imagine recording yourself humming a middle C and then digitally altering the pitch to make a song out of it, that’s what a Direct sound is.
"Drum" is a set of “Direct” sounds in a separate table. Imagine instead you make a recording of yourself clapping, then clicking, then stomping your foot, and then making a rhythm part out of that, that’s what a “Drum” is. You’d tell the game to play the “stomp foot” sound every time a C comes up, and then a “clap” every time it’s a D, and so on and so forth. Then you’d make the game play C, D, C, D and it would output stomp, clap, stomp, clap. “Multi” is something very similar, but we won’t go into that here.
“Square1”, “Square2”, “Wave” and “Noise” are what we call generated waveforms. You remember the NES and its beeping and booping, right? This is the exact same sort of thing. Generated waveforms are mostly outdated now, but they have their uses. For example, the game uses them for sound effects. FE7 on its default setting can only play 8 direct sounds at once (which can be increased to 12) - if it then tries to play another “direct” sound effect on top of that, one of the music tracks would cut out. You can often hear this happening when you switch between stat screens and the “wop” sound plays, because the “wop” sound is a Direct sound, not a generated waveform. By using a generated waveform for the sound effects it doesn’t have to do this as often.
Hopefully this clears things up for some people. Please let me know if there’s something you don’t understand so I can rewrite the tutorial to make things clearer! Feel free to ask questions, as well.