(Immediately after this was posted, Wan and Cam asked zahl for more explanations. Following, said explanation.)
so I have this super-optimized encoding for the huffman tree, and among other nice things, it supports 'supernodes'
where it grabs several bits at once and uses them to short-circuit multiple levels down the tree
in order to make that work, conceptually I make three types of nodes: "twigs" (that just have 2^n leaves selected directly with n bits), "branches" (that select from 2^n internal nodes in the same way), and "flowers" (that have a leaf as the left child and an internal node as the right child)
I can hand-wavingly prove that I can always build the tree out of these nodes, with the algorithm that the code uses
anyway, so I had to write totally new asm to do the actual decoding, yeah
the other part of the story is that the nodes have this haxxed encoding, where it uses 4 bits to store both the node type and the node 'size': 0 = flower, 1-10 = twig (and its size), 11-15 = branch (size + 10)
(size is number of bits to use)
so what went wrong was, I correctly set it up to subtract 10 from the branch sizes, and use the twig sizes directly but for flowers, it still had the 0 in the "number of bits to grab from the huffman stream) register (which is also used for multiple other things at various points)
which is wrong, because it needs 1 bit from the stream to decide whether to use the leaf or the twig
it was always using the leaf.
previously detected, much less entertaining bugs:
- failure, in the code that unpacks bits from the stream, to keep track correctly of the number of bits actually used
- after getting to a leaf, an error in looking up the corresponding symbol's offset in the symbol table (I had an index into an array of shorts, forgot to multiply it by 2, basically)