You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have an application which uses the DataTable. It works with tables with maximum size of few thousand rows. I need to be able to add rows to the middle of the table and to keep the "labels" static (they are integer numbers). Bonus points if moving rows up and down is fast.
My problem is that moving rows in DataTable takes roughly 1 second per 1000 rows. I wonder if this can be made faster?
Move the row to the middle of the table by swapping the cell contents N/2 times (N=number of rows in the table).
Benchmarks and code for reproducing the problem
MWE App
This is a Minimal Working Example App which can be used to demonstrate the behavior
fromrich.textimportTextfromtextual.appimportApp, ComposeResultfromtextual.coordinateimportCoordinatefromtextual.widgetsimportDataTablefromtextual.widgets.data_tableimportCellDoesNotExistCOLUMNS= ("lane", "swimmer", "country", "time")
ROWS= [
(1, "Joseph Schooling", "Singapore", 50.39),
(2, "Michael Phelps", "United States", 51.14),
(3, "Chad le Clos", "South Africa", 51.14),
(4, "László Cseh", "Hungary", 51.14),
(5, "Li Zhuhao", "China", 51.26),
(6, "Mehdy Metella", "France", 51.58),
(7, "Tom Shields", "United States", 51.73),
(8, "Aleksandr Sadovnikov", "Russia", 51.84),
(9, "Darren Burns", "Scotland", 51.84),
] *30# or: 100classMyTable(DataTable):
def__init__(self, **kwargs) ->None:
super().__init__(**kwargs)
self._col_keys= []
forcolumninCOLUMNS:
self._col_keys.append(self.add_column(column))
defadd_row_with_autolabel(self, *contents) ->bool:
# prevent polluting the logs with potentially hundreds or thousands# instances of the same RowHighlighted message.withself.prevent(self.RowHighlighted):
returnself._add_row_with_autolabel(*contents)
def_add_row_with_autolabel(self, *contents) ->bool:
length_before=self.row_count# Rows are _always_ added first to the end of the table.lane, swimmer, country, time, *other_cols=contentslane=Text(str(lane))
lane.stylize("italic bright_black")
swimmer, country=Text(swimmer), Text(country)
swimmer.stylize("sky_blue1 bold")
country.stylize("light_pink1 bold")
time=Text(str(time))
time.stylize("bold blue")
label=str(length_before+1)
self.add_row(
lane,
swimmer,
country,
time,
*other_cols,
label=label,
key=label,
)
length_new=self.row_countself.cursor_coordinate=Coordinate(length_new-1, 0)
middle=length_new//2self.move_current_row_up_to(middle)
defmove_current_row_up_to(self, new_row_index: int) ->None:
current_row_index, _=self.cursor_coordinateifcurrent_row_index==new_row_index:
returnfor_inrange(current_row_index-new_row_index):
self._swap_current_row_with(self.cursor_coordinate.up())
def_swap_current_row_with(self, new_coordinate: Coordinate) ->None:
cellkey_current=self.coordinate_to_cell_key(self.cursor_coordinate)
try:
cellkey_new=self.coordinate_to_cell_key(new_coordinate)
exceptCellDoesNotExist:
# cannot move to a cell that does not existreturncells_new=self.get_row(cellkey_new.row_key)
cells_current=self.get_row(cellkey_current.row_key)
forcontent_new, content_current, col_keyinzip(
cells_new, cells_current, self._col_keys
):
self.update_cell(cellkey_current.row_key, col_key, content_new)
self.update_cell(cellkey_new.row_key, col_key, content_current)
self.cursor_coordinate=new_coordinateclassTableApp(App):
defcompose(self) ->ComposeResult:
yieldMyTable(cursor_type="row")
defon_mount(self) ->None:
table=self.query_one(DataTable)
withself.batch_update():
forrowinROWS:
table.add_row_with_autolabel(*row)
if__name__=="__main__":
app=TableApp()
app.run()
How the timing estimate was created
I ran the MWE (code above) and measured times with a stopwatch using a 270 row (multiplier 30) and 900 row (multiplier 100) tables.
If table size (N) is 270, moving row from the bottom to the middle of the table (1 row, 2 rows, 3 rows, ...135 rows) takes about 8910 row swap operations and 6.1 seconds, so 0.68 ms per row swap.
If table size is 900, moving row from the bottom to the middle of the table (1 row, 2 rows, ... 450 rows) takes about 100575 row swap operations which I measured to take about 105 seconds, so 1.04 ms per row swap
The equation for calculating number of row swaps for creating a table with N rows (every time the newly added row is first added to the bottom of the table and then moved to the center):
defn_operations(table_size):
h=table_size/2tri=h*(h+1)/2# the triangular numberout=tri-table_size# one move (first) per each addition is not required?returnout
It's unclear to be if the table size somehow makes every row swap operation also slower, but the bottom line is that moving a row in a table of 4000 rows from bottom (where DataTable.add_row adds it) to the middle (=2000 row swaps) takes about 2 seconds.
profiling output
Now I did some profiling of the MWE using pyinstrument, and the data looks like this.
I wrote a PoC which utilizes the _row_locations attribute of the DataTable. This is a private two-way dictionary which holds information about which row (identified with RowKey) is located at which location (an integer from 0 to N-1). The code is roughly 100x faster. The exact speed cannot be determined with the same stopwatch arrangement, but it's more than enough if moving 1000 rows would take in the order of 20 ms (instead of the 2000 ms+). There is one gotcha: Since I'm messing around with the row order, the labels are also messed up. Checking if there's a solution for keeping the labels in order.
PoC MWE utilizing DataTable._row_locations
fromrich.textimportTextfromtextual._two_way_dictimportTwoWayDictfromtextual.appimportApp, ComposeResultfromtextual.coordinateimportCoordinatefromtextual.widgetsimportDataTablefromtextual.widgets.data_tableimportRowKeyCOLUMNS= ("lane", "swimmer", "country", "time")
ROWS= [
(1, "Joseph Schooling", "Singapore", 50.39),
(2, "Michael Phelps", "United States", 51.14),
(3, "Chad le Clos", "South Africa", 51.14),
(4, "László Cseh", "Hungary", 51.14),
(5, "Li Zhuhao", "China", 51.26),
(6, "Mehdy Metella", "France", 51.58),
(7, "Tom Shields", "United States", 51.73),
(8, "Aleksandr Sadovnikov", "Russia", 51.84),
(9, "Darren Burns", "Scotland", 51.84),
] *100# or: 100defchange_twowaydct_value(
dct: TwoWayDict[RowKey, int], oldvalue: int, newvalue: int
) ->None:
key=dct.get_key(oldvalue)
ifkeyisNone:
raiseValueError(f"Value {oldvalue} not found in the TwoWayDict")
ifdct.get_key(newvalue) isnotNone:
raiseValueError(f"Value {newvalue} is already in the TwoWayDict.")
# The deletion is required as otherwise one of the two-way dicts will have# the old value floating around..deldct[key]
dct[key] =newvalueclassMyTable(DataTable):
def__init__(self, **kwargs) ->None:
super().__init__(**kwargs)
self._col_keys= []
forcolumninCOLUMNS:
self._col_keys.append(self.add_column(column))
defadd_row_with_autolabel(self, *contents) ->bool:
# prevent polluting the logs with potentially hundreds or thousands# instances of the same RowHighlighted message.withself.prevent(self.RowHighlighted):
returnself._add_row_with_autolabel(*contents)
def_add_row_with_autolabel(self, *contents) ->bool:
length_before=self.row_count# Rows are _always_ added first to the end of the table.lane, swimmer, country, time, *other_cols=contentslane=Text(str(lane))
lane.stylize("italic bright_black")
swimmer, country=Text(swimmer), Text(country)
swimmer.stylize("sky_blue1 bold")
country.stylize("light_pink1 bold")
time=Text(str(time))
time.stylize("bold blue")
label=str(length_before+1)
self.add_row(
lane,
swimmer,
country,
time,
*other_cols,
label=label,
key=label,
)
length_new=self.row_countself.cursor_coordinate=Coordinate(length_new-1, 0)
middle=length_new//2self.bubble_move(self.row_count-1, middle)
defbubble_move(self, old: int, new: int) ->None:
""" "Moves a row to a new location using the adjacent row swapping method used in the bubble sort algorithm. Parameters ---------- old : int The index of the row to move. new : int The index to move the row to. """ifold==new:
returnold_row_key=self._row_locations.get_key(old)
delself._row_locations[old_row_key]
ifnew<old: # upwardsfordest_idxinrange(old, new, -1):
src_idx=dest_idx-1change_twowaydct_value(self._row_locations, src_idx, dest_idx)
self._row_locations.__setitem__(key=old_row_key, value=new)
classTableApp(App):
defcompose(self) ->ComposeResult:
yieldMyTable(cursor_type="row")
defon_mount(self) ->None:
table=self.query_one(DataTable)
withself.batch_update():
forrowinROWS:
table.add_row_with_autolabel(*row)
if__name__=="__main__":
app=TableApp()
app.run()
Picture of the messed up labels:
Update
Here's a PoC with additional fixes for the labels. This assumes the labels are intergers running from 1 to N. It's still pretty fast. Updating 1000 labels takes about 1.54 ms (measured with time.time(); could be that some cost is added by the UI changes?)
PoC with fixed labels
fromrich.textimportTextfromtextual._two_way_dictimportTwoWayDictfromtextual.appimportApp, ComposeResultfromtextual.coordinateimportCoordinatefromtextual.widgetsimportDataTablefromtextual.widgets.data_tableimportRowKeyCOLUMNS= ("lane", "swimmer", "country", "time")
ROWS= [
(1, "Joseph Schooling", "Singapore", 50.39),
(2, "Michael Phelps", "United States", 51.14),
(3, "Chad le Clos", "South Africa", 51.14),
(4, "László Cseh", "Hungary", 51.14),
(5, "Li Zhuhao", "China", 51.26),
(6, "Mehdy Metella", "France", 51.58),
(7, "Tom Shields", "United States", 51.73),
(8, "Aleksandr Sadovnikov", "Russia", 51.84),
(9, "Darren Burns", "Scotland", 51.84),
] *100# or: 100defchange_twowaydct_value(
dct: TwoWayDict[RowKey, int], oldvalue: int, newvalue: int
) ->None:
key=dct.get_key(oldvalue)
ifkeyisNone:
raiseValueError(f"Value {oldvalue} not found in the TwoWayDict")
change_twowaydct_value_for_key(dct, key, newvalue)
defchange_twowaydct_value_for_key(
dct: TwoWayDict[RowKey, int], key: RowKey, newvalue: int
) ->None:
ifdct.get_key(newvalue) isnotNone:
raiseValueError(f"Value {newvalue} is already in the TwoWayDict.")
# The deletion is required as otherwise one of the two-way dicts will have# the old value floating around..deldct[key]
dct[key] =newvalueclassMyTable(DataTable):
def__init__(self, **kwargs) ->None:
super().__init__(**kwargs)
self._col_keys= []
forcolumninCOLUMNS:
self._col_keys.append(self.add_column(column))
defadd_row_with_autolabel(self, *contents) ->bool:
# prevent polluting the logs with potentially hundreds or thousands# instances of the same RowHighlighted message.withself.prevent(self.RowHighlighted):
returnself._add_row_with_autolabel(*contents)
def_add_row_with_autolabel(self, *contents) ->bool:
length_before=self.row_count# Rows are _always_ added first to the end of the table.lane, swimmer, country, time, *other_cols=contentslane=Text(str(lane))
lane.stylize("italic bright_black")
swimmer, country=Text(swimmer), Text(country)
swimmer.stylize("sky_blue1 bold")
country.stylize("light_pink1 bold")
time=Text(str(time))
time.stylize("bold blue")
label=str(length_before+1)
self.add_row(
lane,
swimmer,
country,
time,
*other_cols,
label=label,
key=label,
)
length_new=self.row_countself.cursor_coordinate=Coordinate(length_new-1, 0)
middle=length_new//2self.bubble_move(self.row_count-1, middle)
defbubble_move(self, old: int, new: int) ->None:
""" "Moves a row to a new location using the adjacent row swapping method used in the bubble sort algorithm. Parameters ---------- old : int The index of the row to move. new : int The index to move the row to. """ifold==new:
returnold_row_key=self._row_locations.get_key(old)
delself._row_locations[old_row_key]
ifnew<old: # upwardsfordest_idxinrange(old, new, -1):
src_idx=dest_idx-1key=self._row_locations.get_key(src_idx)
change_twowaydct_value_for_key(self._row_locations, key, dest_idx)
row=self.rows[key]
row.label=Text(str(dest_idx+1))
self._row_locations.__setitem__(key=old_row_key, value=new)
row=self.rows[old_row_key]
row.label=Text(str(new+1))
classTableApp(App):
defcompose(self) ->ComposeResult:
yieldMyTable(cursor_type="row")
defon_mount(self) ->None:
table=self.query_one(DataTable)
withself.batch_update():
forrowinROWS:
table.add_row_with_autolabel(*row)
if__name__=="__main__":
app=TableApp()
app.run()
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
My problem in a nutshell
I have an application which uses the DataTable. It works with tables with maximum size of few thousand rows. I need to be able to add rows to the middle of the table and to keep the "labels" static (they are integer numbers). Bonus points if moving rows up and down is fast.
My problem is that moving rows in DataTable takes roughly 1 second per 1000 rows. I wonder if this can be made faster?
What I use currently
DataTable
as KeySequenceTablecurrent_length + 1
Benchmarks and code for reproducing the problem
MWE App
This is a Minimal Working Example App which can be used to demonstrate the behavior
How the timing estimate was created
DataTable.add_row
adds it) to the middle (=2000 row swaps) takes about 2 seconds.profiling output
Now I did some profiling of the MWE using pyinstrument, and the data looks like this.
zoom-in
Beta Was this translation helpful? Give feedback.
All reactions