feat: modify pool tests #153

AmeanAsad · 2023-08-31T11:10:51Z

No description provided.

AmeanAsad · 2023-09-01T04:16:22Z

node_heap.go

-	for i := 0; i < n && i < len(nh.Nodes); i++ {
-		node := nh.Nodes[i]
-		m = append(m, node)
+	for i := 0; i < n; i++ {


@willscott noticed a slight issue with the use of heaps here.

The heap will keep itself sorted upon insertion and removal of new nodes. However, the existing nodes in the heap have dynamically changing values and the heap does not continuously sort itself every time one of those values changes.

The strategy I've implemented here is to select the top N nodes from the heap, to use .Pop() n times then to retrieve the top nodes, then add them back using .Push(). This is a similar flow to the .Fix() method that exists on the heap which is meant to reposition an item in the heap if it changes its value. Not the most efficient, but we are dealing with fairly small heap sizes here.

Also wondering if we still think that a priority queue is the best data structure to store the AllNodes tier? My understanding of the code / implementation is that the ActiveNodes is always a subset of AllNodes and they are not mutually exclusive sets like the previous "Main" and "Uknown" tier design.

@willscott One issue I noticed with my changes is that since the nodes are temporarily removed from the heap, if there are any stats being written to them concurrently, that will fail.

what do you mean by 'fail'? a temporary dropped metric isn't the end of the world, potentially.

If not a priority queue for all nodes, how would you imagine keeping track of the whole set?

what do you mean by 'fail'? a temporary dropped metric isn't the end of the world, potentially.

What i mean by "fail" is that any concurrent operation that accesses the heap while the topN selection is occurring, it might get an incorrect list. I do agree with you though, it doesn't seem that damaging specially that the "AllNodes" set is just used for testing nodes.

If not a priority queue for all nodes, how would you imagine keeping track of the whole set?

Wondering if a regular array structure would work better here. It's simpler and gives us the functionality we need (top n nodes, random nodes, node removal). We're not really using the properties of a priority queue because the values of each node in the queue are changing constantly and we have to basically re-sort the nodes every time we need to make a selection.

We can add add/remove methods that are simple enough api.

i think keeping the nodes in a roughly sorted order for consideration isn't a bad idea, as we'll want to access the better ones.

if we run into performance issues with using the push/pop semantics on updates, we can get lazy about it and then use the heap fix when we need to get the top n.

one note on golang semantics here: you have the nh.lk.RLock() above - that indicates you're taking a "read only lock" over the data structure - but the pop and push you're proposing in your change will do a read-write operation.

Thanks for the tip here Will, always good to learn :)

AmeanAsad · 2023-09-01T15:11:17Z

Hey @willscott noticing that the ConsiderUpdateWeightedNode function in the hashring lib is not working entirely as intended. It's returning negative deltas for new nodes that are just getting added to the pool.

willscott · 2023-09-01T15:37:18Z

Hey @willscott noticing that the ConsiderUpdateWeightedNode function in the hashring lib is not working entirely as intended. It's returning negative deltas for new nodes that are just getting added to the pool.

you believe this is a bug in the considerupdate? do you have a failing test against it that you can point me to so that i can help debug it?

AmeanAsad · 2023-09-01T16:05:20Z

@willscott

you believe this is a bug in the considerupdate? do you have a failing test against it that you can point me to so that i can help debug it?

I think it would be helpful to just get an idea of the expected behavior here to understand what Im testing against. The example im going off is lets say I have a new node that is not currently in the hash ring and I run ConsiderUpdateWeightedNode("nodeUrl", 1), what do I expect to receive?

willscott · 2023-09-03T12:42:04Z

I would expect ConsiderUpdateWeightedNode for a new node to return a negative value - you're taking part of the hash ring that would currently send it's requests to a node with known latency/performance characteristics, and proposing giving that space instead to a node with unknown characteristics. The expected outcome of that is worse.

One way that confidence gets better is if mirroring has already sent some test requests to the new node, so we have some sense that it could be good.

the other question is how to handle initial startup - we probably need to enforce some sort of minimum reasonable size where we're okay with negative updates if we fall below.

Removed unused metrics

AmeanAsad · 2023-09-04T14:26:52Z

@willscott the reason why the tests are passing is because the PoolConsiderationCount is 30 so all the nodes in the test automatically get added in the pool. I changed that value earlier temporarily for testing with smaller groups of nodes to have easier debugging.

AmeanAsad · 2023-09-04T14:33:18Z

@willscott a few things I found:

The MaybeSubstituteOrAdd uses the neighbor's Rate() as part of the delta formula:

		neighbor = nr.Nodes[n]
		neighborVolume := neighbor.Rate()

		// how much worse is candidate?
		diff := candidate.Priority() - neighbor.Priority()
		delta += diff * neighborVolume * float64(v)

The Rate() for nodes with no stats is always zero so the first test always fails because the delta will always be zero comparing a node that has had stats with one that has none.

The ConsiderUpdateWeightedNode for the first test is returning negative values when comparing a node in the hashring that has no stats vs a new node that has recorded a bunch of successes. Based on your description above, seems like this is unexpected behavior.

AmeanAsad · 2023-09-04T14:34:30Z

node_heap.go

-	}
-	for _, n := range m {
-		heap.Push(nh, n)
+	for i := 0; i < n && i < len(nh.Nodes); i++ {


I thought this returns an unsorted list of candidates?

I think if we always use heap.Push when adding new elements, then the nh.Nodes remains in sorted order?

My train of thought is that if the node(s) stats change though after they have been pushed, then nh.Nodes doesn't remain sorted because heap.push only sorts on insertion.

Also if the stats of already ordered nodes change, then I think heap.Push actually would probably never keep a sorted list becauseheap.Push would only work properly if the heap is correctly sorted when the insertion occurs.

Not sure if im missing something here maybe?

not sure how much churn we expect - hopefully in the steady state not too much.

the answer to changing Priority of nodes that lead to an unsorted heap is to use heap.Init at the beginning of the call to TopN - or to use heap.Fix whenever specific node priorities are changed.

Since we only call TopN in the periodic pool refresh, so once every 5 minutes, it's probably no problem to re-establish the heap ordering each time.

…t-Caboose-main

Integrate new orchestrator endpoint

AmeanAsad · 2023-09-19T03:22:55Z

@aarshkshah1992 test flakiness addressed.

cc @willscott was able to reproduce flakiness locally and made changes to the topN selection. I believe just using heap.Init only guarantees sorting of the first element.

AmeanAsad · 2023-09-19T03:35:05Z

@willscott added a test that targets testing cache affinity. Relevant commit is here: 1975f49

The test tries to ensure that if we have a set of good performing nodes that we have previously made cid requests to, we still hit those nodes for those cids even if other "similar" performing nodes join the pool. Currently that test is failing, and we are seeing new nodes that join the pool show up as first candidate nodes for those cids instead.

aarshkshah1992 · 2023-09-19T04:09:57Z

pool_dynamics_test.go

@@ -15,8 +17,9 @@ import (
 )

 const (
-	nodesSize = 200
+	nodesSize = 6
 )


@AmeanAsad @willscott

Why do we need to do this ?

This is used to change the pool size desired for testing. I reduced the pool size for testing cache affinity to make it easier to debug and get the tests passing. Once we get that passing we can set it to a high / realistic number again.

Okay, yeah that makes sense.

This has already been structured so that the lower pool target is only used in the test context. A larger pool size target is used when running normally

Test cache affinity

feat: port compliance cids

Port over caboose main

willscott · 2023-10-10T09:18:04Z

I'm going to merge this into adaptive so we have one PR tracking all of this and we can get to final merge

AmeanAsad added 3 commits August 31, 2023 07:09

feat: modify pool tests

17bb795

fix: surface caboose pool methods for testing

00a4344

fix: top n node selection from heap

0cd98cf

AmeanAsad commented Sep 1, 2023

View reviewed changes

AmeanAsad and others added 14 commits September 3, 2023 21:59

feat: add more comprehensive tests

b6b03f9

enhancement: add refresh no to tests

5537699

go fmt

778cd45

remove unused metrics

35646a8

put back trace

fa59f39

Merge pull request #154 from filecoin-saturn/feat/remove-metrics

a23963d

Removed unused metrics

response size does not include header

d2c669e

reset retry counter only if progress is made

ce46ce5

update go-car

ca5522d

dont drain response body

27b62be

send verification errors to Saturn

296eaec

pool tier promotion

eb1e8b8

otel and send trace id to Saturn

2713f51

stabilize dynamics tests

7375178

AmeanAsad commented Sep 4, 2023

View reviewed changes

AmeanAsad mentioned this pull request Sep 4, 2023

Caboose APA - Get the tests passing filecoin-saturn/roadmap#12

Open

aarshkshah1992 and others added 4 commits September 5, 2023 17:31

mirroring parallel

93135a7

Merge remote-tracking branch 'origin/aa/test-simulator' into feat/por…

bda8d0d

…t-Caboose-main

pool-target-size through config to better test dynamics

c8be27d

down to flakiness

d52ef6e

willscott and others added 7 commits September 10, 2023 15:24

add substitution (rough)

61c82da

Merge remote-tracking branch 'origin/aa/test-simulator' into feat/por…

550cf5b

…t-Caboose-main

use new orchestrator API

c0ea85c

Merge pull request #161 from filecoin-saturn/feat/integrate-new-endpoint

608a668

Integrate new orchestrator endpoint

fix: top N selection

ea1d62b

Merge branch 'aa/test-simulator' into feat/port-Caboose-main

05c2b37

enhancement: increase test size

c1ab0e9

feat: Add tests for affinity

1975f49

aarshkshah1992 reviewed Sep 19, 2023

View reviewed changes

aarshkshah1992 and others added 15 commits September 19, 2023 14:39

test cache affinity

78f3490

test cache affinity

5e02c7f

remove assert

d8ae01e

fix test

b647fab

address review

0cf6c94

Merge pull request #163 from filecoin-saturn/feat/cache-aff-test

552ea1b

Test cache affinity

feat: port compliance cids

9eb9c18

fix: remove unused code

af17595

modify harness

310c079

feat: add core attr to trace span

8804f45

Merge pull request #164 from filecoin-saturn/feat/port-compliance-cids

3f63a01

feat: port compliance cids

Merge branch 'aa/test-simulator' into feat/port-Caboose-main

da9ad17

fix CI

46b5374

Merge pull request #155 from filecoin-saturn/feat/port-Caboose-main

ad399fb

Port over caboose main

improve error classification (#165)

1015a7f

willscott merged commit 33db988 into feat/adaptive Oct 10, 2023
13 of 16 checks passed

willscott deleted the aa/test-simulator branch October 10, 2023 09:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: modify pool tests #153

feat: modify pool tests #153

AmeanAsad commented Aug 31, 2023

AmeanAsad Sep 1, 2023

AmeanAsad Sep 1, 2023

willscott Sep 1, 2023

AmeanAsad Sep 1, 2023

willscott Sep 1, 2023

willscott Sep 4, 2023

AmeanAsad Sep 18, 2023

AmeanAsad commented Sep 1, 2023 •

edited

Loading

willscott commented Sep 1, 2023

AmeanAsad commented Sep 1, 2023

willscott commented Sep 3, 2023

AmeanAsad commented Sep 4, 2023 •

edited

Loading

AmeanAsad commented Sep 4, 2023

AmeanAsad Sep 4, 2023

willscott Sep 4, 2023

AmeanAsad Sep 4, 2023

willscott Sep 5, 2023

AmeanAsad commented Sep 19, 2023

AmeanAsad commented Sep 19, 2023

aarshkshah1992 Sep 19, 2023 •

edited

Loading

AmeanAsad Sep 19, 2023

aarshkshah1992 Sep 19, 2023

willscott Sep 19, 2023

willscott commented Oct 10, 2023

feat: modify pool tests #153

feat: modify pool tests #153

Conversation

AmeanAsad commented Aug 31, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AmeanAsad commented Sep 1, 2023 • edited Loading

willscott commented Sep 1, 2023

AmeanAsad commented Sep 1, 2023

willscott commented Sep 3, 2023

AmeanAsad commented Sep 4, 2023 • edited Loading

AmeanAsad commented Sep 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AmeanAsad commented Sep 19, 2023

AmeanAsad commented Sep 19, 2023

aarshkshah1992 Sep 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willscott commented Oct 10, 2023

AmeanAsad commented Sep 1, 2023 •

edited

Loading

AmeanAsad commented Sep 4, 2023 •

edited

Loading

aarshkshah1992 Sep 19, 2023 •

edited

Loading