Skip to content

LumpClasses

Jared Ketterer edited this page Apr 24, 2022 · 3 revisions

This page mostly exists to explain branches/base.py
As well as the tools it lays out as the foundation of branch scripts

Basic Definition

Lump Classes are translators between the bytes of raw lumps & python objects
The LumpClasses for a given engine branch are laid out in a branch script, where they are assigned their lumps

python struct module

bsp_tool uses the built-in struct module at it's core
every LumpClass has a _format attribute which directly defines a format string for struct.unpack

The struct module uses the following strings to identify C structs (very abstractly)
A count can be placed before any character in sequence to communicate that it occurs multiple times
e.g. struct.unpack("2c16s", b"\x01\x02Hello World\0\0\0\0\0") ~> (1, 2, b"Hello World")

char C type python type size
c char int 1
b signed char int 1
B unsigned char int 1
? _Bool bool 1
h short int 2
H unsigned short int 2
i int int 4
I unsigned int int 4
l long int 4
L unsigned long int 4
q long long int 8
Q unsigned long long int 8
e float16 / half float float 2
f float float 4
d double float 8
s char[] bytes b"\x01\x02\x03ABC" 1

Basic LumpClasses

The most basic Lump Class maps a single C object to a python object

class UnsignedShorts(int):
    _format = "H"

bsp_tool.branches.shared holds a few helpful BasicLumpClasses
When a Bsp is created, it will look for BasicLumpClasses in branch_script.BASIC_LUMP_CLASSES

class Shorts(int):
    _format = "h"

# {"LUMP_NAME": LumpClass}
# quake based .bsp DO NOT have versioned lumps
BASIC_LUMP_CLASSES = {"SURFEDGES": Shorts}

For ValveBsp / RespawnBsp:

# branches.valve.orange_box
from . import shared

# {"LUMP_NAME": {version: LumpClass}}
# source based .bsp DO have versioned lumps
BASIC_LUMP_CLASSES = {"DISPLACEMENT_TRIS":         {0: shared.UnsignedShorts},
                      "LEAF_FACES":                {0: shared.UnsignedShorts},
                      "SURFEDGES":                 {0: shared.Ints},
                      "TEXTURE_DATA_STRING_TABLE": {0: shared.UnsignedShorts}}

LumpClasses

Most lump classes are more complex than this, and are made up of multiple types
These are commonly mapped with subclasses of either base.MappedArray or base.Struct

However LumpClasses can use any baseclass, but bsp_tool expects a few methods

# taken from branches.id_software.quake
class Edge(list):  # LUMP 12
    _format = "2H"  # List[int]

    def flat(self) -> List[int]:
        """return contents as a iterable for `struct.pack(self._format, *self.flat())`"""
        return self

    @classmethod
    def from_tuple(cls, _tuple) -> List[int]:
        """alternate __init__; takes `_tuple` from `struct.iter_unpack(self._format, lump_bytes)`"""
        return cls(_tuple)

MappedArray

# bsp_tool/branches/developer/branch_script.py
from .. import base


class Vertex(base.MappedArray):
    x: float  # python type hints
    y: float
    z: float
    _format = "3f"  # 3 * "f" == C float[3]
    _mapping = [*"xyz"]  # * unpacks lists (*"xyz" == ["x", "y", "z"])

MappedArray takes the tuple from struct.unpack(MappedArray._format, raw_lump_bytes) and maps each entry to a name in _mapping it also has a method named .flat(), which returns it's contents to tuple form for writing

Struct

Here is the most complex base.Struct LumpClass in bsp_tool:

# bsp_tool/branches/id_software/quake3.py
from .. import base

class Face(base.Struct):  # LUMP 13
    texture: int                          # index into Texture lump
    effect: int                           # index into Effect lump; -1 for no effect
    type: int                             # polygon, patch, mesh, billboard (env_sprite)
    first_vertex: int                     # index into Vertex lump
    num_vertices: int                     # number of Vertices after first_vertex in this face
    first_mesh_vertex: int                # index into MeshVertex lump
    num_mesh_vertices: int                # number of MeshVertices after first_mesh_vertex in this face
    # lightmap.index: int                 # which lightmap texture to use
    # lightmap.top_left: List[int]        # approximate top-left corner of visible lightmap segment
    # lightmap.size: List[int]            # size of visible lightmap segment
    # lightmap.origin: List[float]        # world space lightmap origin
    # lightmap.vector: List[List[float]]  # lightmap texture projection vectors
    normal: List[float]                   # surface normal
    size: List[float]                     # texture patch dimensions

    __slots__ = ["texture", "effect", "type", "first_vertex", "num_vertices",
                 "first_mesh_vertex", "num_mesh_vertices", "lightmap", "normal", "size"]

    _format = "12i12f2i"

    _arrays = {"lightmap": {"index": None, "top_left": [*"xy"], "size": ["width", "height"],
                            "origin": [*"xyz"], "vector": {"s": [*"xyz"], "t": [*"xyz"]}},
               "normal": [*"xyz"], "size": ["width", "height"]}

As you can see, this definition is far more complex, so let's break it down

Type Hints

At the top, we have a block of type hints
These are entirely optional, but make it much easier to decipher the _format string
Light documentation for each attribute is also given here by the comments

Note the type hints for the lightmap are commented out, but still present
Python doesn't allow for writing type hints that point into an object, only top level hints

__slots__

base.Struct uses __slots__ to define top level attributes
A more basic base.Struct subclass could use __slots__ like the _mapping attribute of a base.MappedArray

_format

As with base.MappedArray, the _format attribute holds a format string for the Python struct module

_arrays

The _arrays attribute is what really separates a Struct from a MappedArray

When a new base.Struct is created, a tuple is assembled from _format, and is sorted out into __slots__ if an attribute in __slots__ has a corresponding entry in _arrays, a MappedArray is created from the "sub-mapping"

Sub-mappings come in four varieties:

  1. Integer
    {"attr": 2} gives 2 entries from the tuple to attr as a list
  2. List {"attr": [*"xy"]} or {"attr": ["x", "y"]} creates attr.x & attr.y
  3. Dict {"attr": {"a": 2, "b": 2}} goes a layer deeper, creating another mapping inside itself

To explain the fourth kind of sub-mapping, let's take a closer look at quake3.Face._arrays["lightmap"]:

"lightmap": {"index": None,                             # int                # which lightmap texture to use
             "top_left": [*"xy"],                       # List[int]          # approximate top-left corner of visible lightmap segment
             "size": ["width", "height"],               # List[int]          # size of visible lightmap segment
             "origin": [*"xyz"],                        # List[float]        # world space lightmap origin
             "vector": {"s": [*"xyz"], "t": [*"xyz"]}}  # List[List[float]]  # lightmap texture projection vectors
  1. "index": None
    Since _arrays["lightmap"] is a dict, there is no way with the 3 kinds of mapping to give only one value to lightmap.index
    ("index": 1 would create a list, so the value would end up stored at lightmap.index[0], which isn't intuitive)
    By mapping "index" to None, base.MappedArray.__init__ acts as if lightmap.index has no deeper mapping,
    and hands it a single value from the tuple

This fourth way of mapping allows for creating mappings where an attribute like lightmap.index can exist on the same layer as more complex mappings

Additional Features

  • MappedArrays can be initialised with any _mapping & _format, without defining a subclass
    • e.g. MappedArray(0, [1, 2], 3.456, _mapping={"a": None, "b": 2, "c": None}, _format="i2Hf") note how "a" & "c" must still be named in this form to express the order & names of all attrs in the mapping
  • Both MappedArray & Struct can be initialised in 3 ways
    • standard init SomeStruct(0, [1, 2], 3.456) Vertex(z=1)
    • .from_bytes(_bytes) uses struct.unpack(_bytes, self._format) to initialise from binary
    • .from_tuple(_tuple) takes a tuple like struct.iter_unpack(RAW_LUMP, self._format) might generate
      • NOTE: this is more complicated than Struct(*_tuple), as __init__'s args reflect the top-level of the mapping
      • Vertex(*_tuple) would work, as it's essentially flat (any MappedArray where _mapping is a list is the same)'
  • .from_bytes(...) & .from_tuple(...) have inverses in .as_bytes() & .flat()
  • PLANNED: .as_cpp() class methods for easier converting to C++ struct definitions (easier to read, though long)

Special LumpClasses (TODO)

SpecialLumpClasses take all of the bytes in a lump and converts them into an object that abstracts the lump into something more approachable.
They should only be used in cases where the lump is too complex for BASIC_LUMP_CLASSES or a LumpClass to translate to a python object.

All SpecialLumpClasses should have a to_bytes method to recreate themselves in bytes for saving changes Most current SpecialLumpClasses don't have nice init methods, separate from .from_bytes methods
This means importing is the priority, over creating a new SpecialLump object
However future versions should move towards allowing users to populate their own lumps

NOTE: SpecialLumpClasses do not have _format attrs, due to their complexity
Instead, most use the built-in io module to treat their bytes as a file, and seek around to internal offsets

GameLumps (TODO)

GameLumps originated with Source, but carries on to many of it's "children"
The Titanfall Engine is one of the more widely known children of Source (though the apple fell far from the tree, closer to CoD)

GameLumps contain smaller sub-lumps, making them a sort of .bsp within a .bsp
Some major classes handle the GameLump structure, so lumps within the GameLump can be mapped somewhat directly
All sub-lumps encountered so far have been SpecialLumpClasses, so if you're mapping a sub-lump, make sure you use a SpecialLumpClass

The most common sub-lump is sprp (all game lumps have 4 character names) sprp holds static props, often in a series of sub-lumps

// psuedocode: Source Engine SPRP
struct SPRP {
    int    num_model_names;
    char   model_names[128][num_model_names];
    int    num_leaves;
    short  leaves[num_leaves];
    int    num_props;
    // then one of the following, based on the version in the GameLump header for 'sprp':
    StaticPropv4  props[num_props];
    StaticPropv5  props[num_props];
    StaticPropv6  props[num_props];
};

Closing Notes

While these mappings can be very complex, we hope you can see how useful the systems behind them are
These base classes are designed to keep the amount of code needed to define a LumpClass to a minimum
(a C implementation would easily take up as much space as the type hints!)

However, these LumpClasses also serve as documentation! If you believe you have a new solution to mapping LumpClasses that is more intuitive, don't hestitate to point it out in a GitHub Issue!