should we update has_gpu field to be num_gpu? #756

xmonader · 2023-06-20T23:32:21Z

instead of boolean we can have it as a number, which can save unneeded queries against nodes to find the total number of gpus available?

related to threefoldtech/tfgrid-sdk-go#210

renauter · 2023-06-21T03:44:58Z

I see the point
Since storage size will be the same (bool <=> u8) why not taking advantage to store more significant information than just if there is gpu or not?
From TFChain side it would require some rework and probably a migration (since storage is already on devnet) or a new storage creation (killing the deprecated one), if we chose this way.

robvanmieghem · 2023-06-21T06:05:44Z

GPU farms ( at least the ones for mining) all had multiple gpu's per physical machine. When you mainly use GPU computing power, the CPU is not very much under load normally so it's much more efficient to put multiple GPU's in a machine.

muhamadazmy · 2023-06-21T07:00:19Z

I think this is a good idea, but this issue then need to be moved to tfchain

xmonader · 2023-06-21T07:20:48Z

With such positive feedback on the idea, @renauter how long would it take to implement ? :)

robvanmieghem · 2023-06-21T07:22:54Z

I see the point
Since storage size will be the same (bool <=> u8) why not taking advantage to store more significant information than just if there is gpu or not?
From TFChain side it would require some rework and probably a migration (since storage is already on devnet) or a new storage creation (killing the deprecated one), if we chose this way.

I would not bother writing a migration, it's devnet

DylanVerstraete · 2023-06-21T11:25:13Z

The reasoning behind adding a number of GPU's bothers me. Why should this information be stored on tfchain in the first place? To make development easy? I rather suggest we implement the offchain indexer idea so that clients can query hardware details from any node NOT using tfchain.

xmonader · 2023-06-21T11:35:24Z

The reasoning behind adding a number of GPU's bothers me. Why should this information be stored on tfchain in the first place? To make development easy? I rather suggest we implement the offchain indexer idea so that clients can query hardware details from any node NOT using tfchain.

Because of the following imo
1- It's equal in data size
2- It communicates more information
3- It saves calls needed to to calculate simple information
4- It keeps that info updated

We are also proceeding with the indexer indeed as agreed before, to track the devices for future filtration

muhamadazmy · 2023-06-21T11:37:29Z

I think making development easier in itself is an enough reason! I don't get why you make this sound bad if it's exactly the same data size. this can be a u8 in rust.

Even if it takes more size on the chain but it makes performance and development easier, no more reasons need to be given.

renauter · 2023-06-21T12:52:11Z

With such positive feedback on the idea, @renauter how long would it take to implement ? :)

Today and tomorrow probably
There is also graphql to update

DylanVerstraete · 2023-06-21T19:28:58Z

I understand that it makes development easier for clients, but now 2 days of development (runtime/graphql) has to be spent to change the initially requested features. It sounds to me like we did not correctly spec out the story/feature enough and now development is slowed down. Imo we should have expressed gpu as a resource and not like this. I don't really think it's beneficial for a user "how" many gpu's there are, rather the user would like to know how powerful they are.

Also not having a migration for this on devnet can lead to corrupted state, which is another reason for me to not blindly incorporate this change.

DylanVerstraete · 2023-06-21T19:33:24Z

Actually the combined information of how many gpu's + resources of each gpu would be the most optimal information

brandonpille · 2023-06-22T07:53:03Z

I understand that it makes development easier for clients, but now 2 days of development (runtime/graphql) has to be spent to change the initially requested features. It sounds to me like we did not correctly spec out the story/feature enough and now development is slowed down. Imo we should have expressed gpu as a resource and not like this. I don't really think it's beneficial for a user "how" many gpu's there are, rather the user would like to know how powerful they are.

Also not having a migration for this on devnet can lead to corrupted state, which is another reason for me to not blindly incorporate this change.

I agree here. Plus I don't think there is a use case where a user needs X amount of GPUs but doesn't care about which GPUs. So in that case he will need to ask the zos nodes. So in the end we will always have to ask the ZOS nodes. Why have that on chain if we will need to contact the zos nodes?

LeeSmet · 2023-06-22T08:27:22Z

Alright so lets be very clear: we decided that there will be no tracking of GPU's on chain in the final release, and the current implementation was only done because the off chain indexing would "take a long time". Therefore as a compromise, on devnet we would temporarily maintain a map of nodes which have gpus, and this will be removed.

In that light, it absolutely makes zero sense to me that we are now adding feature dependencies on top of this to be "more efficient/performant".

muhamadazmy · 2023-06-22T09:15:09Z

@LeeSmet we never agreed that we have any temporary map of node gpus. We only agreed that the node can have a flag (on/off) if it has gpu or not.

We now think (which is fine since the entire feature is still in development sage) that this same flag can be instead be the number of gpus available on the node (can be a u8). This will make finding a node that matches the user requirements is easier and more efficient.

This has 0 cost compared to having just a boolean.

The proxy will index the full gpu information for further filtering

LeeSmet · 2023-07-19T10:07:18Z

Closing since we are removing gpu related info from chain as previously discussed: #761

xmonader added the type_question label Jun 20, 2023

xmonader assigned muhamadazmy and renauter Jun 20, 2023

muhamadazmy transferred this issue from threefoldtech/zos Jun 21, 2023

xmonader changed the title ~~should we update has_gpu field to be num_gpus?~~ should we update has_gpu field to be num_gpu? Jun 21, 2023

xmonader mentioned this issue Jun 21, 2023

replace has gpu with num gpu threefoldtech/tfgrid-sdk-go#211

Merged

4 tasks

renauter mentioned this issue Jun 21, 2023

feat: add node gpu number #758

Closed

xmonader added this to the 2.5.0 milestone Jun 21, 2023

LeeSmet closed this as not planned Won't fix, can't repro, duplicate, stale Jul 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

should we update has_gpu field to be num_gpu? #756

should we update has_gpu field to be num_gpu? #756

xmonader commented Jun 20, 2023

renauter commented Jun 21, 2023

robvanmieghem commented Jun 21, 2023

muhamadazmy commented Jun 21, 2023

xmonader commented Jun 21, 2023

robvanmieghem commented Jun 21, 2023

DylanVerstraete commented Jun 21, 2023

xmonader commented Jun 21, 2023

muhamadazmy commented Jun 21, 2023

renauter commented Jun 21, 2023

DylanVerstraete commented Jun 21, 2023

DylanVerstraete commented Jun 21, 2023

brandonpille commented Jun 22, 2023

LeeSmet commented Jun 22, 2023

muhamadazmy commented Jun 22, 2023

LeeSmet commented Jul 19, 2023

should we update has_gpu field to be num_gpu? #756

should we update has_gpu field to be num_gpu? #756

Comments

xmonader commented Jun 20, 2023

renauter commented Jun 21, 2023

robvanmieghem commented Jun 21, 2023

muhamadazmy commented Jun 21, 2023

xmonader commented Jun 21, 2023

robvanmieghem commented Jun 21, 2023

DylanVerstraete commented Jun 21, 2023

xmonader commented Jun 21, 2023

muhamadazmy commented Jun 21, 2023

renauter commented Jun 21, 2023

DylanVerstraete commented Jun 21, 2023

DylanVerstraete commented Jun 21, 2023

brandonpille commented Jun 22, 2023

LeeSmet commented Jun 22, 2023

muhamadazmy commented Jun 22, 2023

LeeSmet commented Jul 19, 2023