LZS format
The LZS format appears to be a compression format used for compressed archives. Here, both the compression and the underlying archive format are described.
It appears in at least Disgaea 2 PC.
Magic
The file starts with a char magic[4] "dat\0", not part of the compressed data.
Compression
LZ77-based compression (that is, has literal data and backreferences, the latter referring back to a portion of data that has already been output but occurs again).
struct header {
u32 compressed_size; // including this 12-byte header
u32 decompressed_size;
u32 marker; // marker used to denote backrefs in the compressed data. 0x00-0xFF.
};
This header is followed by the compressed data, which consists of literal bytes (copied straight to output) or backrefs. Backrefs are encoded as
struct backref { u8 marker, dist, count; };
representing that count bytes should be copied from dist back into the output history. If dist > marker, then 1 should be subtracted from it.
Why this extra complication? Because two marker bytes in a row denote an escaped literal marker byte. Thus, a decompression loop looks something like this:
while (read_ctr < header.compressed_size - sizeof(struct header)) {
u8 v = READ();
if (v == header.marker) {
u8 dist = READ();
if (dist == header.marker) {
WRITE(header.marker); // marker repeated twice = an escaped marker byte
} else {
u8 count = READ();
if (dist > header.marker) dist--;
for (size_t i = 0; i < count; i++) WRITE(HISTORY(dist));
}
} else {
WRITE(v);
}
}
After this step, header.decompressed_size bytes should have been written.
Archive format
The underlying archive consists of a header with the number of files, a table of offsets and filenames, and file data; nothing weird here.
struct header {
u32 count; // #files in the archive
u32 unk1; // 0
u32 unk2; // 0
u32 unk3; // 0
};
The header is followed by `header.count` file entries.
struct file_entry {
u32 end_offset;
char filename[0x1C];
};
After this comes the file data. The first file runs from this point for `entries[0].end_offset` bytes. That is, all of the end_offsets are counted starting from after the file metadata table.