A question on LZ77[In Progress?]

So, some important concepts here. Compression schemes are, generally, one (original) -to-many (compressed). The point of the compression algorithm - usually specified in the standard, and it is for LZ77 - is to pick the best encoding, or at least to pick a good encoding consistently without taking a huge amount of time considering other possibilities. Both of those things are operating on a conceptual level, though; there are still details of the compression encoding, like header bytes, how the data is arranged, etc.

The standard implementation of LZ77 on the GBA uses a 4-byte header (1 byte saying it’s LZ77, and 3 indicating the uncompressed length, IIRC). It uses 16-bit “codes” for the parts it can compress, and it uses ‘flag’ bytes to indicate what parts are the codes and what parts are verbatim data. The code is divided into 12 bits for the position of some data to copy, and 4 bits for the length, that are offset by 3 (if it can’t find more than 2 bytes to copy, then it just uses verbatim data), so the length values range from 3 to 18. The distance value fixes the maximum “sliding window” size of 4096. Of course, there are tons of other ways to specify this, that are still “LZ77” with different encodings.

The LZ77 algorithm is essentially the same regardless of the encoding. But you can still produce valid “LZ77 compressed” data without following the algorithm - it’ll just be different data (probably longer). This is why you may see old tutorials telling you to put a 00 byte every 9th byte in “compressed” palettes - it’s a shortcut to getting valid data if you’re doing it by hand in a hex editor, since palettes don’t compress very well anyway. You can “compress” any data this way, it just doesn’t actually become shorter.


Anyway. Cam, that’s… not it. I checked Nintenlord’s code and it uses the maximum window size/length values supported by the encoding. It appears to be picking the value out of the window in the correct way, too, but I’m not sure - NL wrote rather complicated code for it. That said, it’s entirely possible that IntSys’s compressor had a flaw :smile: