LZS format

From Netherworld Research
Jump to: navigation, search

The LZS format appears to be a compression format used for compressed archives. Here, both the compression and the underlying archive format are described.

Somewhat confusingly, it appears to not use either of LZS or LZSS compression, despite the name.

It appears in at least Disgaea 2 PC.

Magic

The file starts with a char magic[4] "dat\0", not part of the compressed data.

Compression

LZ77-based compression (that is, has literal data and backreferences, the latter referring back to a portion of data that has already been output but occurs again).

struct header {
  u32 compressed_size;    // including this 12-byte header
  u32 decompressed_size;
  u32 marker;             // marker used to denote backrefs in the compressed data. 0x00-0xFF.
};

This header is followed by the compressed data, which consists of literal bytes (copied straight to output) or backrefs. Backrefs are encoded as

struct backref { u8 marker, dist, count; };

representing that count bytes should be copied from dist back into the output history. If dist > marker, then 1 should be subtracted from it.

Why this extra complication? Because two marker bytes in a row denote an escaped literal marker byte. Thus, a decompression loop looks something like this:

while (read_ctr < header.compressed_size - sizeof(struct header)) {
  u8 v = READ();
  if (v == header.marker) {
    u8 dist = READ();
    if (dist == header.marker) {
      WRITE(header.marker);  // marker repeated twice = an escaped marker byte
    } else {
      u8 count = READ();
      if (dist > header.marker) dist--;
      for (size_t i = 0; i < count; i++) WRITE(HISTORY(dist));
    }
  } else {
    WRITE(v);
  }
}

After this step, header.decompressed_size bytes should have been written.

Archive format

The underlying archive consists of a header with the number of files, a table of offsets and filenames, and file data; nothing weird here.

struct header {
  u32 count;  // #files in the archive
  u32 unk1;   // 0
  u32 unk2;   // 0
  u32 unk3;   // 0
};

The header is followed by header.count file entries.

struct file_entry {
  u32 end_offset;
  char filename[0x1C];
};

After this comes the file data. The first file runs from this point for entries[0].end_offset bytes. That is, all of the end_offsets are counted starting from after the file metadata table.