It’s that time again! I’m starting the thread over because old thread is old - and while the OP provides useful information, it no longer makes a good OP.
What dsa
is and how it works
dsa
- a d
ata s
tructure a
ssembler - is designed as a fully general-purpose tool/library for presenting data from a binary source in a readable form and then re-creating binary data that can be written back into a binary. For FEGBA hacking, that means an “everything assembler” that potentially replaces pretty much every command-line tool you’ve ever used (but especially EA and Nightmare).
Overview of what it looks like and how it works
The disassembler produces a file that describes data as a series of chunks, and which can follow pointers to determine the start of new chunks. The chunk data’s format can be described in various ways, using so-called interpreters. Most commonly, a chunk will be described as a series of data structures, described by a structgroup (a kind of interpreter) that allows for specifying various data types.
Example of disassembly results
With the built-in data descriptions, you can get results that look like:
!@main 0x0 hex
HEXD 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
HEXD 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
!# 0x20
Here you can see each struct is identified by its name, HEXD
(this is analogous to an event code in EA). Normally, hexadecimal values would need a 0x
prefix; but a custom data type is used by the HEXD
struct in the hex
structgroup to avoid this.
There is also a system that allows the disassembler to label each struct in the output, and once I get it properly integrated/tested/fixed a way to use those labels to create either pointers to structs with a chunk, or to get their index.
Interpreters can also be written as Python plugins, and the behaviour can be further customized using filters that transform the data extracted from the binary before the interpreter translates it into a text description.
Interpreter and filter plugins
The built-in string
interpreter extracts a null-terminated string (it actually handles mixed binary and text data), producing fancy results like:
!@main 0x0 [string, utf-8, basic]
'This is a [Open]test[Close].[NL]'
'日本語、[0xc0][0xc1]かわいい!!'
!# 0x33
As a side note, if we were writing the disassembly by hand (to be assembled into the target), we could equivalently have written:
!@main 0x0 string:utf-8:basic # Everything after a `#` on a line is a comment.
# Notice the alternate syntax for the "multi-part token".
'This is a [Open]test[Close].[NL]'
'日本語、[0xc0][0xc1]かわいい!!'
# The `0x33` in the previous example was a comment inserted by the disassembler
# and is not necessary. The line starting with `!` ends the block.
!
Because those plugins are written in Python, there is no need to shell out for them. They are dynamically loaded as part of the language set up when you start the assembler or disassembler.
When filters are used, the resulting disassembly chunks get tagged like so:
!size 32
!@main 0x0 hex
HEXD 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
HEXD 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
!# 0x32
size
is the only built-in filter; it basically ensures the size of the underlying binary data. When disassembling, the value is determined by how much data was disassembled; when assembling, it will truncate the data if it’s too long, and zero-pad it if it’s too short. However, the filter mechanism allows dsa-extras
to handle compression methods like lz77, rearrange graphics from 8x8 tiles into what they should actually look like, etc.
Types and structgroups are described using plain-text files.
Type and structgroup examples
Structgroup files look like:
align:1 # there is always a single header line,
# followed by one or more sections describing possible structs.
EXAMPLE
my_type x
my_type y
The types are in separate files, and look like:
# 32-bit value that directly gives the location of another chunk in the file.
pointer example_pointer 32
size # this filter will be applied to the pointed-at chunk.
# A simple integer type.
type Byte
8 value
# A type with multiple fields, restricted value ranges and custom value names.
type my_type
8 first values:my_enum
8 second values:my_enum
enum my_enum
0 fee
1 fie
2 foe
3 fum
Lastly, interpreters (TODO: and filters) can be configured using codecs that read data from a similarly-formatted text file and interpret that data with more Python code.
“But can I make hack
with it?”
The built-in stuff is, by design, pretty basic - because I want to keep the GBAFE-specific stuff separate for a variety of reasons. I will be providing all of that in a separate package called dsa-extras
.
Using `dsa` and `dsa-extras`
Four command-line programs are provided by dsa
:
-
dsa
- assembly mode. -
dsd
- disassembly mode. There is an option to test the results by immediately re-assembling them into the original binary (without writing to disk) to see if any corruption results. -
dsa-use
- adds a specified “library” folder path to the places DSA will look for structgroups, types, filters, interpreters and codecs. -
dsa-drop
- removes a path from the above list.
Each of these has its own command-line help, powered by another library I wrote called epmanager
. However, the code is also designed to be imported as a package and used from other Python programs. One day I hope to make a hex editor that leverages dsa
to describe the data as you scroll over it.
The dsa-extras
package includes:
- a ton of structgroups and types to describe event codes, a bunch of NMMs and maybe a few other things, for FE6/7/8
- some helper scripts that I used to produce the above (don’t expect to be able to use them out of box; talk to me if you think they might be useful for you)
- codecs, filters and interpreters to deal with Huffman-compressed text and GBAFE text codes, LZ77-compressed images, and possibly more
- a post-install script (basically it calls
dsa-use
for you so that you don’t have to figure out the path to wheredsa-extras
was installed)
I might also make a master install script so that you don’t have to understand Python packaging stuff although I’ll be explaining about all of that in a separate thread in the near future.
To migrate your EA files and other such content, the recommended approach is:
-
Build your ROM as before, with the old tools.
-
Use DSA to disassemble the built ROM.
-
Use the resulting disassembled files going forward.
If you need help with this, let me know and I’ll see what I can do for you.
Project status etc.
Please see the next post for live updates.
I’ll be doing some kind of promotional video for dsa
and dsa-extras
myself for FEE3 (it’ll probably be pretty basic). I intend to spend a good chunk of the next two weeks polishing things up because it’s definitely not ready yet.
Boring stuff about licensing
DSA now has a proper, open-source license: the Open Software License 3.0. It’s similar to LGPL, except it’s a lot shorter, it’s based more on the principles of contract law rather than just on copyright, and it includes a “network use is distribution” clause (with the GPL system you would need the Affero variant to get this, which is hideously complex and also otherwise more restrictive than LGPL). Unlike many other licenses it also includes one tiny bit of “warranty” (basically, it represents that I am not blatantly plagiarizing the code).
The documentation for DSA as little as there is at the moment is separately licensed as Creative Commons BY-NC-SA 4.0.
dsa-extras
content is made as free as possible, via the Unlicense, because it’s not the part of this project that I’m really interested in promoting myself with.
Also, I completely re-thought the versioning and realized that the project isn’t stable enough to be on a 1.x version number, let alone a 2.x one. In particular, I started taking semantic versioning principles more seriously, along with migrating to a more modern build/packaging setup (with automated tests and everything!).
How to help
If you make or maintain a graphical tool for FE hacking, it would be amazing of you to design it to (or at least add the option) output data for DSA to assemble, rather than editing the ROM directly. Also, you can contribute content for
dsa-extras
(as long as you’re ok with the licensing).
It would also be amazing if someone could provide syntax-colouring support for this thing, or even point me in the right direction for implementing that.
(I’ve already talked to @Lexou a bit about tool integration, and I’m hoping to collaborate with @Pikmin1211 to make sure that Japanese text handling works smoothly.)