-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Entity adapter re-sorts all ids (O(n*log(n)), instead of just inserting the new id into the "right" place (O(log(n))) #4252
Comments
No specific reason other than it was the simplest thing and/or that's what the NgRx implementation was doing already. My Replay teammate Brian Vaughn has written some utils for inserting items into a proper index here: maybe we could swipe some of that |
It shouldn't matter much, tbh. Re-sorting something that's already sorted except for one item should stay around |
Copying is by orders of magnitude cheaper than whatever sortFn does, which is up to the user anyhow. |
And I'm not sure where you got the impression that resorting an almost sorted array issues less than n*log(n) calls. Try logging how many times sortFn is called when triggered on an already sorted array. |
Sorting a sorted array will have exactly that amount of comparisons. {let i=0; new Array(100).fill(0).map((_,i) => i).sort((a,b) => { console.log(a,b,++i); return a-b })} logs 99 times. Now we add {let i=0; let arr = new Array(100).fill(0).map((_,i) => i); arr.push(0); arr.sort((a,b) => { console.log(a,b,++i); return a-b })} logs 199 times. (Edit: I forgot the return earlier and said "100" here) Of course, it depends on which algorithm is used in the engine (this was in Firefox), but here it's Since both solutions have to copy an array, we look at a comparison between Of course it's not equal, and an insertion at the right place might be a tad faster, but it's the same order of magnitude.
Edit2: {let i=0; let arr = new Array(9999).fill(0).map((_,i) => i); arr.push(0); arr.sort((a,b) => { console.log(a,b,++i); return a-b }); arr} logs 20003 times. So I stay with the original answer ;) PS: seems like most browsers nowadays use TimSort, with has a best case complexity of |
Oh wow, you're right! When dealing with 1000s of items + using immer (RTK) this does get a lot slower. Not sure if because of the equality function reading "through" immer or because of the big state change at the end + immer (which is unrelated to this "issue") |
With about 10k items this sort takes about ~3s anytime an entity is added/removed. sortComparer: (a, b) => {
// Keep the list of IDs sorted by `startedAt` DESC to avoid having to sort the list on each re-render of the Runs screen
if (a === b) {
return 0;
} else if (!a.startedAt) {
return 1;
} else if (!b.startedAt) {
return -1;
}
return b > a ? -1 : 1;
// return new Date(b.startedAt).getTime() - new Date(a.startedAt).getTime();
} (Date line commented out to see if this was the issue, it was not). |
This is a significant amount of time, and I'm running into this problem on my application too. I've been considering refactoring my implementation so it doesn't rely on |
For the record I'm very open to PRs to improve this, and I'd like to see it improved. I just haven't had much time for Redux maintenance work in the last couple months, and I have other priorities for the few spare minutes when I do get time to work on Redux stuff. Note that I did link some potentially-useful insertion code up earlier in the thread, if someone would like to use that as a starting point for an implementation here. (It would also be useful if we had some kind of perf benchmarks / tests added to the codebase as well, to help measure improvements and ensure we don't regress any improvements.) |
Took a few minutes to re-familiarize myself with some of this code. There's a few major issues involved:
Could someone do me a favor and provide a repro that shows this behaving really slowly, so I can do some checks on the perf? I will say that this logic does seem somewhat inefficient - we're always creating an array, sorting it, and then checking to see if it seems different, and the array we create will be initially sorted based on insertion order of the items in the lookup table (and thus probably always be in badly sorted order): function resortEntities(state: R) {
const allEntities = Object.values(state.entities) as T[]
allEntities.sort(sort)
const newSortedIds = allEntities.map(selectId)
const { ids } = state
if (!areArraysEqual(ids, newSortedIds)) {
state.ids = newSortedIds
}
} |
Really appreciate you taking a look @markerikson. Here's an example repo - https://github.com/G2Jose/redux-toolkit-performance-repro. It contains
With this setup I'm seeing:
In the real world I have an app that tracks workouts + workouts of friends. Over a long time period the list of sets can grow quite large. At which point inserting or updating any set becomes noticeably slow. I can of course refactor things so data that is likely to be updated (eg: a 'current workout') is stored in a different slice, but I was hoping for a simpler solution. |
Looking at this again briefly, it's going to be tricky to pull this off. I think that if we could map over the already-sorted IDs to get the list of items to sort, there'd be significantly fewer comparisons. But, right now we use that In order to safely map over the IDs array, we would need to already have it updated with all the current IDs if any of those changed (added/updated/removed). I'll try to think my way through this later. |
Hey @markerikson I ended up creating a new slice that I did the sorting in and took advantage of both the fact that the existing array of IDs was sorted and the incoming array of ids was also sorted (from the API in my case). But after researching it looks like merging of two sorted arrays is usually rather performant. In the scenario I gave above for the entity adapter, it was really slow when I had ~10k records and I was adding 1k more via I imagine if you pre-sorted the IDs of the 1k in this scenario then performed a merge of two sorted arrays it might be most performant in this case. I created my own modified merge algorithm based on the knowledge that most of the time I am essentially just appending the ids I am receiving to the end of the existing list. I have shared this below, it's a bit messy and besides my own real-world analysis I do not have robust or theoretical proof that it's more performant on average than the standard merger of sorted arrays algorithms you can find in papers. Feel free to use this if it's helpful though, also open to feedback if you identify something bad/broken. Note I did borrow some code that you linked to above for finding the insert index. My Merger of two sorted arrays algorithm/**
* The goal of this function is to insert the new IDs into the sorted list of IDs in the most performant way as possible.
* Because the existing list is already sorted and the new IDs are also sorted (from the API),
* we can take advantage of this to avoid having to re-sort the entire list.
* Additionally, we want to minimize the number of insertions
* (calls to `.slice(...)`) into the list to avoid unnecessary overhead.
*/
function mergeTwoSortedArrays(
existingSortedIds: string[],
preSortedNewIds: string[],
comparator: ComparisonFunction<string>
) {
const firstInstanceId = preSortedNewIds[0];
const lastInstanceId = preSortedNewIds[preSortedNewIds.length - 1];
const startIndex = findInsertIndex(
existingSortedIds,
firstInstanceId,
comparator
);
const endIndex = findInsertIndex(
existingSortedIds,
lastInstanceId,
comparator,
startIndex
);
const overlappingExistingIds = existingSortedIds.slice(startIndex, endIndex);
let newIdIndexOfLastInsert = 0;
let lastRelativeInsertIndex = 0;
for (let i = 1; i < preSortedNewIds.length; i++) {
const relativeInsertIndex = findInsertIndex(
overlappingExistingIds,
preSortedNewIds[i],
comparator,
lastRelativeInsertIndex
);
if (lastRelativeInsertIndex !== relativeInsertIndex) {
const insertIndex =
startIndex + newIdIndexOfLastInsert + lastRelativeInsertIndex;
const arrayToInsert = preSortedNewIds.slice(newIdIndexOfLastInsert, i);
existingSortedIds.splice(insertIndex, 0, ...arrayToInsert);
newIdIndexOfLastInsert = i;
lastRelativeInsertIndex = relativeInsertIndex;
}
}
existingSortedIds.splice(
startIndex + newIdIndexOfLastInsert + lastRelativeInsertIndex,
0,
...preSortedNewIds.slice(newIdIndexOfLastInsert)
);
}
const startedAtDescendingComparator =
(state: QuestInstanceViewsState) => (_a: string, _b: string) => {
const a = state.runListViewsById[_a]?.startedAt;
const b = state.runListViewsById[_b]?.startedAt;
if (a === b) {
// Sort by id if `startedAt` is the same. Mimics the behavior of the API.
return state.runListViewsById[_a]?.id > state.runListViewsById[_b]?.id
? -1
: 1;
} else if (!a) {
return 1;
} else if (!b) {
return -1;
}
return a > b ? -1 : 1;
// date comparison should not be needed since `startedAt` is a UTC ISO formatted string
// return new Date(b.startedAt).getTime() - new Date(a.startedAt).getTime();
};
/**
* Fast insert of a new item into a sorted array.
* Taken from: https://github.com/replayio/devtools/blob/ce97fc7abfd5ae03f767216efa44173209c27cdc/packages/replay-next/src/utils/array.ts#L131
*/
type ComparisonFunction<T> = (a: T, b: T) => number;
export function findInsertIndex<T>(
sortedItems: T[],
item: T,
comparisonFunction: ComparisonFunction<T>,
lowIndexOverride?: number
): number {
let lowIndex = lowIndexOverride ?? 0;
let highIndex = sortedItems.length;
while (lowIndex < highIndex) {
const middleIndex = (lowIndex + highIndex) >>> 1;
const currentItem = sortedItems[middleIndex];
if (comparisonFunction(item, currentItem) > 0) {
lowIndex = middleIndex + 1;
} else {
highIndex = middleIndex;
}
}
return lowIndex;
} Edit:
|
@G2Jose @Jackman3005 I've updated the PR with several WIP variations. Currently it's using one that does some binary searches for insertions. Could you try that out and see how it performs? |
I think I also found that some of the existing entity adapter logic relied heavily on reads into Immer-proxied state, and when done in bulk, that was slowing things down (such as iterating That seems to knock another 20%-ish off some of the times I'm seeing. |
Okay, this is out in https://github.com/reduxjs/redux-toolkit/releases/tag/v2.2.4 ! Going to close this, but I'm very interested in hearing feedback on how much this improves behavior in real-world apps. |
Hey 👋
Why is it that when adding (or removing/updating/etc...) items inside an entity-adapter's state, entity-adapter will first push into the array, then run
.sort(sortFn)
on the entire array?Wouldn't it make more sense to find the index where the new item should be at (O(log(n)), and
splice
theids
array?The text was updated successfully, but these errors were encountered: