LZS format

From Netherworld Research
Revision as of 10:45, 26 February 2017 by FireFly (Talk | contribs) (Document LZS archives as found in D2PC)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The LZS format appears to be a compression format used for compressed archives. Here, both the compression and the underlying archive format are described.

It appears in at least Disgaea 2 PC.

Magic

The file starts with a char magic[4] "dat\0", not part of the compressed data.

Compression

LZ77-based compression (that is, has literal data and backreferences, the latter referring back to a portion of data that has already been output but occurs again).

struct header {
  u32 compressed_size;    // including this 12-byte header
  u32 decompressed_size;
  u32 marker;             // marker used to denote backrefs in the compressed data. 0x00-0xFF.
};

This header is followed by the compressed data, which consists of literal bytes (copied straight to output) or backrefs. Backrefs are encoded as

struct backref { u8 marker, dist, count; };

representing that count bytes should be copied from dist back into the output history. If dist > marker, then 1 should be subtracted from it.

Why this extra complication? Because two marker bytes in a row denote an escaped literal marker byte. Thus, a decompression loop looks something like this:

while (read_ctr < header.compressed_size - sizeof(struct header)) {
  u8 v = READ();
  if (v == header.marker) {
    u8 dist = READ();
    if (dist == header.marker) {
      WRITE(header.marker);  // marker repeated twice = an escaped marker byte
    } else {
      u8 count = READ();
      if (dist > header.marker) dist--;
      for (size_t i = 0; i < count; i++) WRITE(HISTORY(dist));
    }
  } else {
    WRITE(v);
  }
}

After this step, header.decompressed_size bytes should have been written.

Archive format

The underlying archive consists of a header with the number of files, a table of offsets and filenames, and file data; nothing weird here.

struct header {
  u32 count;  // #files in the archive
  u32 unk1;   // 0
  u32 unk2;   // 0
  u32 unk3;   // 0
};

The header is followed by `header.count` file entries.

struct file_entry {
  u32 end_offset;
  char filename[0x1C];
};

After this comes the file data. The first file runs from this point for `entries[0].end_offset` bytes. That is, all of the end_offsets are counted starting from after the file metadata table.