LZS format
The LZS format appears to be a compression format used for compressed archives. Here, both the compression and the underlying archive format are described.
It appears in at least Disgaea 2 PC.
Magic
The file starts with a char magic[4]
"dat\0", not part of the compressed data.
Compression
LZ77-based compression (that is, has literal data and backreferences, the latter referring back to a portion of data that has already been output but occurs again).
struct header { u32 compressed_size; // including this 12-byte header u32 decompressed_size; u32 marker; // marker used to denote backrefs in the compressed data. 0x00-0xFF. };
This header is followed by the compressed data, which consists of literal bytes (copied straight to output) or backrefs. Backrefs are encoded as
struct backref { u8 marker, dist, count; };
representing that count bytes should be copied from dist back into the output history. If dist > marker, then 1 should be subtracted from it.
Why this extra complication? Because two marker bytes in a row denote an escaped literal marker byte. Thus, a decompression loop looks something like this:
while (read_ctr < header.compressed_size - sizeof(struct header)) { u8 v = READ(); if (v == header.marker) { u8 dist = READ(); if (dist == header.marker) { WRITE(header.marker); // marker repeated twice = an escaped marker byte } else { u8 count = READ(); if (dist > header.marker) dist--; for (size_t i = 0; i < count; i++) WRITE(HISTORY(dist)); } } else { WRITE(v); } }
After this step, header.decompressed_size bytes should have been written.
Archive format
The underlying archive consists of a header with the number of files, a table of offsets and filenames, and file data; nothing weird here.
struct header { u32 count; // #files in the archive u32 unk1; // 0 u32 unk2; // 0 u32 unk3; // 0 };
The header is followed by `header.count` file entries.
struct file_entry { u32 end_offset; char filename[0x1C]; };
After this comes the file data. The first file runs from this point for `entries[0].end_offset` bytes. That is, all of the end_offsets are counted starting from after the file metadata table.