A question on LZ77[In Progress?]

Yesterday I decompressed a image from gba rom and then compressed it into rom and noticed a strange question. I compared the compressed file by NL Compressor to the original compressed file in the original rom and they were different! However, when I scanned them by GBAGE, they were the same image.

the compressed image dump in original rom

the data umcompressed

the compressed image dump by NL Compressor

What’s more, I believe the compression of image is different from that of other data. I used the common LZ77 compression of other data to compress the image raw dump, then it messed up.
Another compressed image dump which cannot display normally

Of course the question is not really essential to my work, but I am only curious. Why the same image’s compression can be different and both can be recognized normally? And the compression of image other than other data is special?

Some info in GBATEK

BTW a way to move compressed data only by a hex editor:

The next 2 bytes behind 10 detemines the size of the uncompressed data, so we can directly move it without any decompression or compression, because I think the compressed data cannot be larger than the uncompressed one. I had used the method for a long time and no bug found. Of course, the disadvantage is more space needed.

If you look at the data in GBAGE and have “compressed graphics” ticked, it won’t show any visible difference (which is the only reason I can think of that they would be identical)

Especially LZ77-compressed graphics should look substantially different

Why the outcomes are different though they are the compression of the same image?

Well for one, image dumping from GBAGE generally decompresses automatically unless you tell it to.

LZ77 is also called “sliding window” compression; as the name might imply, the algorithm relies on a “sliding window” (or rather, a length for which the algorithm is allowed to look backwards)

If you compress two things with LZ77 but use different window sizes, the outcomes will be different. If you use a window bigger than GBA’s window, it would screw up GBAGE as well.

So, some important concepts here. Compression schemes are, generally, one (original) -to-many (compressed). The point of the compression algorithm - usually specified in the standard, and it is for LZ77 - is to pick the best encoding, or at least to pick a good encoding consistently without taking a huge amount of time considering other possibilities. Both of those things are operating on a conceptual level, though; there are still details of the compression encoding, like header bytes, how the data is arranged, etc.

The standard implementation of LZ77 on the GBA uses a 4-byte header (1 byte saying it’s LZ77, and 3 indicating the uncompressed length, IIRC). It uses 16-bit “codes” for the parts it can compress, and it uses ‘flag’ bytes to indicate what parts are the codes and what parts are verbatim data. The code is divided into 12 bits for the position of some data to copy, and 4 bits for the length, that are offset by 3 (if it can’t find more than 2 bytes to copy, then it just uses verbatim data), so the length values range from 3 to 18. The distance value fixes the maximum “sliding window” size of 4096. Of course, there are tons of other ways to specify this, that are still “LZ77” with different encodings.

The LZ77 algorithm is essentially the same regardless of the encoding. But you can still produce valid “LZ77 compressed” data without following the algorithm - it’ll just be different data (probably longer). This is why you may see old tutorials telling you to put a 00 byte every 9th byte in “compressed” palettes - it’s a shortcut to getting valid data if you’re doing it by hand in a hex editor, since palettes don’t compress very well anyway. You can “compress” any data this way, it just doesn’t actually become shorter.


Anyway. Cam, that’s… not it. I checked Nintenlord’s code and it uses the maximum window size/length values supported by the encoding. It appears to be picking the value out of the window in the correct way, too, but I’m not sure - NL wrote rather complicated code for it. That said, it’s entirely possible that IntSys’s compressor had a flaw :smile:

by the encoding

is it possible that the “standard LZ77 compressor” used to produce the “garbage” image just happened to encode it in a way that GBAGE can’t handle

The first two clearly work (the one compressed by GBAGE and NL’s compressor); the only reason I can think of for a “common” LZ77 encoding scheme to break would be that GBAGE just doesn’t support it (i may ahve been wrong about the specific part of the encoding that doesn’t work)

It shouldn’t; decompression of lz77 is pretty straightforward. I guess there could be a bug though, related to the fact that lz77 allows the length of a match to exceed the distance - if you have spamspamspamspamspam as data, it can encode the first spam literally and then write a single code saying “start 4 characters back and repeat 16 characters”. But it’s not really as much of an edge case as it looks, so I would think that if there were a problem that people would constantly be reporting bugs with it, idk.

Misaka, what’s the offset for this image data? I could investigate a little…

what “common” lz77 compression scheme did you apply

as mentioned, the dumps from the compressed and uncompressed (and NL compressed) are correct (GBAGE doesn’t dump “compressed” data without decompressing it first without some doing)

0x1205A50 in FE7if ver20140217

One of images in Miria’s animation.

(BTW I am quite amazed that FEA cannot recognize that animation. Of course I had modified the src of FEA and recompile it, it worked well with FE7if except several animations, so I extracted it manually. I guess it is because that animation is not generated by FEA but only a direct modification to the images in rom.)

Obviously the image is compressed by the author of FE7if instead of IS, and I compressed it by NL Compressor. Although the compressed data is different, but both can work well.

I often directly edited the compressed palette directly by ignoring the "00"s to modify colors in the game before. :sweat_smile:

Color Definitions
Each color occupies two bytes (same as for 32768 color BG modes):

Bit Expl.
0-4 Red Intensity (0-31)
5-9 Green Intensity (0-31)
10-14 Blue Intensity (0-31)
15 Not used

( And I know that there are many tools for palette hacking now :stuck_out_tongue: )

I directly copied the compressed image from FE7if by a hex editor to compare it. (Maybe the length I extracted is not long enough?)

The LZ77 compression scheme I think which is common is a tool for sinicization.(In fact, I don’t know whether it is called “sinicization” in English. It is a nonprofit folk organization to translate a game from Japanese to Chinese like you translate it to English. I am more accustomed to the tool they made, because it is really convenient for me x_X)

It can decompress LZ77 data correctly but the outcome of LZ77 compression is different from that compressed by NL Compressor, and it cannot be recognized by GBAGE.

I did a simple test here:

The original test binary file:

12 34 56 78 12 34 56 78 9A BC DE 9A BC DE 12 34 56 78 9A BC DE 00 12 34 56 78 9A BC DE FF FF FF

Compressed by NL Compressor:

10 20 00 00 08 12 34 56 78 10 03 9A BC DE D0 00 02 40 09 00 40 07 FF FF FF 00 00 00

Outcomes generated by the tool I am used to:

10 20 00 00 08 12 34 56 78 10 03 9A BC DE D0 00 02 40 09 00 40 11 FF FF FF

Then I decompressed the above compressed data (shorter) by NL Compressor:

12 34 56 78 12 34 56 78 9A BC DE 9A BC DE 12 34 56 78 9A BC DE 00 12 34 56 78 9A BC DE FF FF FF

It is the same as the original data!

Then I used my tool to decompressed the data compressed by NL Compressor:

12 34 56 78 12 34 56 78 9A BC DE 9A BC DE 12 34 56 78 9A BC DE 00 12 34 56 78 9A BC DE FF FF FF

Also the same.

To conclude, different LZ77 tools compress the data differently but they decompress the data in the same way.

okay that’s a sign that the algorithm works i suppose

I wonder if the difference in compression methods is why some of those animations fail to appear in FEditor Adv? I know there are a few of the FE7if animations that it can “see” but there are quite a few that it fails to display.

I think FEGirls has a similar issue displaying some animations in FEditor Adv as well?

Strongly doubt it. Images are found by pointer-chasing and/or checking the header, and once you’ve found lz77 data, decompressing it is straightforward and p much foolproof.

For real? That makes sense, it’s not like it should be difficult to do, but for some animations FEditor tends to not render them correctly, even though the compressed data is exactly where it should be.

It’s probably an unrelated issue that keeps those animations from displaying then. At least you can still rip animations the old-fashioned way of yesteryear in those cases :stuck_out_tongue_winking_eye: