Is there a way to make moving rows in a DataTable fast? #5201

fohrloop · 2024-11-02T15:45:15Z

fohrloop
Nov 2, 2024

My problem in a nutshell

I have an application which uses the DataTable. It works with tables with maximum size of few thousand rows. I need to be able to add rows to the middle of the table and to keep the "labels" static (they are integer numbers). Bonus points if moving rows up and down is fast.

My problem is that moving rows in DataTable takes roughly 1 second per 1000 rows. I wonder if this can be made faster?

What I use currently

Subclass the DataTable as KeySequenceTable
Add the row using DataTable.add_row() with label = current_length + 1
Move the row to the middle of the table by swapping the cell contents N/2 times (N=number of rows in the table).

Benchmarks and code for reproducing the problem

MWE App

This is a Minimal Working Example App which can be used to demonstrate the behavior

from rich.text import Text
from textual.app import App, ComposeResult
from textual.coordinate import Coordinate
from textual.widgets import DataTable
from textual.widgets.data_table import CellDoesNotExist

COLUMNS = ("lane", "swimmer", "country", "time")
ROWS = [
    (1, "Joseph Schooling", "Singapore", 50.39),
    (2, "Michael Phelps", "United States", 51.14),
    (3, "Chad le Clos", "South Africa", 51.14),
    (4, "László Cseh", "Hungary", 51.14),
    (5, "Li Zhuhao", "China", 51.26),
    (6, "Mehdy Metella", "France", 51.58),
    (7, "Tom Shields", "United States", 51.73),
    (8, "Aleksandr Sadovnikov", "Russia", 51.84),
    (9, "Darren Burns", "Scotland", 51.84),
] * 30  # or: 100


class MyTable(DataTable):

    def __init__(self, **kwargs) -> None:
        super().__init__(**kwargs)
        self._col_keys = []
        for column in COLUMNS:
            self._col_keys.append(self.add_column(column))

    def add_row_with_autolabel(self, *contents) -> bool:
        # prevent polluting the logs with potentially hundreds or thousands
        # instances of the same RowHighlighted message.
        with self.prevent(self.RowHighlighted):
            return self._add_row_with_autolabel(*contents)

    def _add_row_with_autolabel(self, *contents) -> bool:

        length_before = self.row_count
        # Rows are _always_ added first to the end of the table.
        lane, swimmer, country, time, *other_cols = contents

        lane = Text(str(lane))
        lane.stylize("italic bright_black")
        swimmer, country = Text(swimmer), Text(country)
        swimmer.stylize("sky_blue1 bold")
        country.stylize("light_pink1 bold")
        time = Text(str(time))
        time.stylize("bold blue")

        label = str(length_before + 1)
        self.add_row(
            lane,
            swimmer,
            country,
            time,
            *other_cols,
            label=label,
            key=label,
        )
        length_new = self.row_count
        self.cursor_coordinate = Coordinate(length_new - 1, 0)

        middle = length_new // 2
        self.move_current_row_up_to(middle)

    def move_current_row_up_to(self, new_row_index: int) -> None:
        current_row_index, _ = self.cursor_coordinate
        if current_row_index == new_row_index:
            return

        for _ in range(current_row_index - new_row_index):
            self._swap_current_row_with(self.cursor_coordinate.up())

    def _swap_current_row_with(self, new_coordinate: Coordinate) -> None:
        cellkey_current = self.coordinate_to_cell_key(self.cursor_coordinate)

        try:
            cellkey_new = self.coordinate_to_cell_key(new_coordinate)
        except CellDoesNotExist:
            # cannot move to a cell that does not exist
            return

        cells_new = self.get_row(cellkey_new.row_key)
        cells_current = self.get_row(cellkey_current.row_key)

        for content_new, content_current, col_key in zip(
            cells_new, cells_current, self._col_keys
        ):
            self.update_cell(cellkey_current.row_key, col_key, content_new)
            self.update_cell(cellkey_new.row_key, col_key, content_current)

        self.cursor_coordinate = new_coordinate


class TableApp(App):
    def compose(self) -> ComposeResult:
        yield MyTable(cursor_type="row")

    def on_mount(self) -> None:
        table = self.query_one(DataTable)
        with self.batch_update():
            for row in ROWS:
                table.add_row_with_autolabel(*row)


if __name__ == "__main__":
    app = TableApp()
    app.run()

How the timing estimate was created

I ran the MWE (code above) and measured times with a stopwatch using a 270 row (multiplier 30) and 900 row (multiplier 100) tables.
If table size (N) is 270, moving row from the bottom to the middle of the table (1 row, 2 rows, 3 rows, ...135 rows) takes about 8910 row swap operations and 6.1 seconds, so 0.68 ms per row swap.
If table size is 900, moving row from the bottom to the middle of the table (1 row, 2 rows, ... 450 rows) takes about 100575 row swap operations which I measured to take about 105 seconds, so 1.04 ms per row swap
The equation for calculating number of row swaps for creating a table with N rows (every time the newly added row is first added to the bottom of the table and then moved to the center):

def n_operations(table_size):
    h = table_size / 2
    tri = h*(h+1)/2 # the triangular number
    out = tri-table_size # one move (first) per each addition is not required?
    return out

It's unclear to be if the table size somehow makes every row swap operation also slower, but the bottom line is that moving a row in a table of 4000 rows from bottom (where DataTable.add_row adds it) to the middle (=2000 row swaps) takes about 2 seconds.

profiling output

Now I did some profiling of the MWE using pyinstrument, and the data looks like this.

zoom-in

fohrloop · 2024-11-03T22:02:29Z

fohrloop
Nov 3, 2024
Author

Idea: Using `DataTable._row_locations`

I wrote a PoC which utilizes the _row_locations attribute of the DataTable. This is a private two-way dictionary which holds information about which row (identified with RowKey) is located at which location (an integer from 0 to N-1). The code is roughly 100x faster. The exact speed cannot be determined with the same stopwatch arrangement, but it's more than enough if moving 1000 rows would take in the order of 20 ms (instead of the 2000 ms+). There is one gotcha: Since I'm messing around with the row order, the labels are also messed up. Checking if there's a solution for keeping the labels in order.

PoC MWE utilizing DataTable._row_locations

from rich.text import Text
from textual._two_way_dict import TwoWayDict
from textual.app import App, ComposeResult
from textual.coordinate import Coordinate
from textual.widgets import DataTable
from textual.widgets.data_table import RowKey

COLUMNS = ("lane", "swimmer", "country", "time")
ROWS = [
    (1, "Joseph Schooling", "Singapore", 50.39),
    (2, "Michael Phelps", "United States", 51.14),
    (3, "Chad le Clos", "South Africa", 51.14),
    (4, "László Cseh", "Hungary", 51.14),
    (5, "Li Zhuhao", "China", 51.26),
    (6, "Mehdy Metella", "France", 51.58),
    (7, "Tom Shields", "United States", 51.73),
    (8, "Aleksandr Sadovnikov", "Russia", 51.84),
    (9, "Darren Burns", "Scotland", 51.84),
] * 100  # or: 100


def change_twowaydct_value(
    dct: TwoWayDict[RowKey, int], oldvalue: int, newvalue: int
) -> None:
    key = dct.get_key(oldvalue)
    if key is None:
        raise ValueError(f"Value {oldvalue} not found in the TwoWayDict")
    if dct.get_key(newvalue) is not None:
        raise ValueError(f"Value {newvalue} is already in the TwoWayDict.")
    # The deletion is required as otherwise one of the two-way dicts will have
    # the old value floating around..
    del dct[key]
    dct[key] = newvalue


class MyTable(DataTable):

    def __init__(self, **kwargs) -> None:
        super().__init__(**kwargs)
        self._col_keys = []
        for column in COLUMNS:
            self._col_keys.append(self.add_column(column))

    def add_row_with_autolabel(self, *contents) -> bool:
        # prevent polluting the logs with potentially hundreds or thousands
        # instances of the same RowHighlighted message.
        with self.prevent(self.RowHighlighted):
            return self._add_row_with_autolabel(*contents)

    def _add_row_with_autolabel(self, *contents) -> bool:

        length_before = self.row_count
        # Rows are _always_ added first to the end of the table.
        lane, swimmer, country, time, *other_cols = contents

        lane = Text(str(lane))
        lane.stylize("italic bright_black")
        swimmer, country = Text(swimmer), Text(country)
        swimmer.stylize("sky_blue1 bold")
        country.stylize("light_pink1 bold")
        time = Text(str(time))
        time.stylize("bold blue")

        label = str(length_before + 1)
        self.add_row(
            lane,
            swimmer,
            country,
            time,
            *other_cols,
            label=label,
            key=label,
        )
        length_new = self.row_count
        self.cursor_coordinate = Coordinate(length_new - 1, 0)

        middle = length_new // 2
        self.bubble_move(self.row_count - 1, middle)

    def bubble_move(self, old: int, new: int) -> None:
        """ "Moves a row to a new location using the adjacent row swapping method used
        in the bubble sort algorithm.

        Parameters
        ----------
        old : int
            The index of the row to move.
        new : int
            The index to move the row to.
        """

        if old == new:
            return
        old_row_key = self._row_locations.get_key(old)
        del self._row_locations[old_row_key]
        if new < old:  # upwards
            for dest_idx in range(old, new, -1):
                src_idx = dest_idx - 1
                change_twowaydct_value(self._row_locations, src_idx, dest_idx)
            self._row_locations.__setitem__(key=old_row_key, value=new)


class TableApp(App):
    def compose(self) -> ComposeResult:
        yield MyTable(cursor_type="row")

    def on_mount(self) -> None:
        table = self.query_one(DataTable)
        with self.batch_update():
            for row in ROWS:
                table.add_row_with_autolabel(*row)


if __name__ == "__main__":
    app = TableApp()
    app.run()

Picture of the messed up labels:

Update

Here's a PoC with additional fixes for the labels. This assumes the labels are intergers running from 1 to N. It's still pretty fast. Updating 1000 labels takes about 1.54 ms (measured with time.time(); could be that some cost is added by the UI changes?)

PoC with fixed labels

from rich.text import Text
from textual._two_way_dict import TwoWayDict
from textual.app import App, ComposeResult
from textual.coordinate import Coordinate
from textual.widgets import DataTable
from textual.widgets.data_table import RowKey

COLUMNS = ("lane", "swimmer", "country", "time")
ROWS = [
    (1, "Joseph Schooling", "Singapore", 50.39),
    (2, "Michael Phelps", "United States", 51.14),
    (3, "Chad le Clos", "South Africa", 51.14),
    (4, "László Cseh", "Hungary", 51.14),
    (5, "Li Zhuhao", "China", 51.26),
    (6, "Mehdy Metella", "France", 51.58),
    (7, "Tom Shields", "United States", 51.73),
    (8, "Aleksandr Sadovnikov", "Russia", 51.84),
    (9, "Darren Burns", "Scotland", 51.84),
] * 100  # or: 100


def change_twowaydct_value(
    dct: TwoWayDict[RowKey, int], oldvalue: int, newvalue: int
) -> None:
    key = dct.get_key(oldvalue)
    if key is None:
        raise ValueError(f"Value {oldvalue} not found in the TwoWayDict")
    change_twowaydct_value_for_key(dct, key, newvalue)


def change_twowaydct_value_for_key(
    dct: TwoWayDict[RowKey, int], key: RowKey, newvalue: int
) -> None:
    if dct.get_key(newvalue) is not None:
        raise ValueError(f"Value {newvalue} is already in the TwoWayDict.")
    # The deletion is required as otherwise one of the two-way dicts will have
    # the old value floating around..
    del dct[key]
    dct[key] = newvalue


class MyTable(DataTable):

    def __init__(self, **kwargs) -> None:
        super().__init__(**kwargs)
        self._col_keys = []
        for column in COLUMNS:
            self._col_keys.append(self.add_column(column))

    def add_row_with_autolabel(self, *contents) -> bool:
        # prevent polluting the logs with potentially hundreds or thousands
        # instances of the same RowHighlighted message.
        with self.prevent(self.RowHighlighted):
            return self._add_row_with_autolabel(*contents)

    def _add_row_with_autolabel(self, *contents) -> bool:

        length_before = self.row_count
        # Rows are _always_ added first to the end of the table.
        lane, swimmer, country, time, *other_cols = contents

        lane = Text(str(lane))
        lane.stylize("italic bright_black")
        swimmer, country = Text(swimmer), Text(country)
        swimmer.stylize("sky_blue1 bold")
        country.stylize("light_pink1 bold")
        time = Text(str(time))
        time.stylize("bold blue")

        label = str(length_before + 1)
        self.add_row(
            lane,
            swimmer,
            country,
            time,
            *other_cols,
            label=label,
            key=label,
        )
        length_new = self.row_count
        self.cursor_coordinate = Coordinate(length_new - 1, 0)

        middle = length_new // 2
        self.bubble_move(self.row_count - 1, middle)

    def bubble_move(self, old: int, new: int) -> None:
        """ "Moves a row to a new location using the adjacent row swapping method used
        in the bubble sort algorithm.

        Parameters
        ----------
        old : int
            The index of the row to move.
        new : int
            The index to move the row to.
        """

        if old == new:
            return
        old_row_key = self._row_locations.get_key(old)
        del self._row_locations[old_row_key]
        if new < old:  # upwards
            for dest_idx in range(old, new, -1):
                src_idx = dest_idx - 1
                key = self._row_locations.get_key(src_idx)
                change_twowaydct_value_for_key(self._row_locations, key, dest_idx)
                row = self.rows[key]
                row.label = Text(str(dest_idx + 1))

            self._row_locations.__setitem__(key=old_row_key, value=new)
            row = self.rows[old_row_key]
            row.label = Text(str(new + 1))


class TableApp(App):
    def compose(self) -> ComposeResult:
        yield MyTable(cursor_type="row")

    def on_mount(self) -> None:
        table = self.query_one(DataTable)
        with self.batch_update():
            for row in ROWS:
                table.add_row_with_autolabel(*row)


if __name__ == "__main__":
    app = TableApp()
    app.run()

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to make moving rows in a DataTable fast? #5201

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Is there a way to make moving rows in a DataTable fast? #5201

fohrloop Nov 2, 2024

My problem in a nutshell

What I use currently

Benchmarks and code for reproducing the problem

Replies: 1 comment

fohrloop Nov 3, 2024 Author

Idea: Using DataTable._row_locations

Update

fohrloop
Nov 2, 2024

fohrloop
Nov 3, 2024
Author

Idea: Using `DataTable._row_locations`