Sound Sample Format Explained

Full credit goes to Bregalad for this information - he’s the one who wrote the Sappy doc. Which you should read, because it basically tells you everything I’m about to tell you.

Introduction


In the last tutorial when I explained the voice table format I often spoke of a pointer to a sample. I’m now going to teach you what that sample is.

When sounds are inserted into the game, they are inserted as samples, regardless of whether or not they are instruments, sound effects, percussion instruments, etc. Musical instruments only happen to be perfectly tuned at C (usually) so that modifying them at certain frequencies creates music.

Samples are typically inserted as uncompressed WAVs, which tend to be very big in size. The lower the frequency of the WAV, the less space it consumes, and the less processing power it takes. GBA samples are, by nature, usually fairly low frequency so as to keep both of these to a minimum.

If you find pointers in a voice table and follow them to an actual address, as shown by Address1 in the picture above, you’ll find data that looks like this (EXAMPLE ONLY) in a hex editor:

00 00 00 00 00 4A 0B 17 01 00 00 00 00 27 95 00

Or this:

00 00 00 40 00 18 A2 01 4C 13 00 00 7A 1A 00 00

Both of these will follow what appears to be junk data. It isn’t.

Sample Format


You’ve just encountered a sample header. This is a 16-byte long bit of data that gives the game some information about the sound it’s about to play. It can be broken down into 4 words. A word is a set of 4-bytes in little endian, so it is read by the game backwards. Uh… Don’t ask me why, I don’t know. Read the Wikipedia article!


The first word is either going to be 0x00000000 or 0x40000000 (shown in little endian as 00 00 00 40). This denotes whether or not the sample is looped. 00 means unlooped, and 40 means looped. Looping is exactly what it sounds like; it’s whether or not a sample will continue to repeat itself after one cycle.


The second word is the frequency adjustment value (FAV). The game essentially adjusts the sample to this frequency and then plays everything relative to that afterwards. It’s… a bit difficult to explain because I don’t know exactly how it works, but you can be sure that manipulating this will change the frequency of your sample.

For example, if your FAV is this:

00 18 A2 01

That’s 0x1A21800, or 27400192 (more or less equivalent to 27536 kHz). If you double this, the pitch will increase by an octave, and if you halve it, it’ll decrease by an octave. This can be useful if you’re trying to manipulate the pitch of a bass guitar for the “Square wave” sample found in FE7 or FE8.


The third word is the loop point, which will be nothing if the sample doesn’t loop. This is not something you can determine with any hex; you’d have to know it from making the sample yourself. Once again, it’s in little endian, so if you had this:

4C 13 00 00

The loop point would be 0x134A, or 4938.


The fourth word is the length of the sample. As you might have guessed, it’s also in little endian, so

7A 1A 00 00

is 0x1A7A, or 6778 bytes.


After this, you get the actual sample data. If you count all these bytes individually, you’d get the exact length as noted by the header! Fascinating stuff.


All of this is handled by Sappy, so most of this you’ll never really have to deal with. But I think it’s important for you to have a full understanding of the information that you’re using.

Heh, I found my old notes on this stuff.

The sample data is actually 1 sample longer than the length indicated in the header. The… what’s being called FAV here is supposed to indicate the “intended” playback frequency for the sample, i.e. the sampling frequency that was used to record it. The value is in Hz, left-shifted by 10. That gives 26758Hz for the example, which is one of the “standard” values {5734, 7884, 10512, 13379, 15768, 18157, 21024, 26758, 31536, 36314, 40137, 42048}. (I have no idea how those values were picked; they don’t line up with any existing audio standards I know of.)

1 Like

I’ll update the tutorials with this information later. I actually got some of the frequencies wrong (that’s what happens when you do this stuff from memory).