Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[batch] fix misnamed resources #13539

Merged
merged 1 commit into from
Sep 1, 2023
Merged

Conversation

danking
Copy link
Contributor

@danking danking commented Sep 1, 2023

I am having trouble determining the effect of this mistake, but it seems like we would be substantially undercharging for the serivce fee if it was really being charged by worker_fraction_in_1024ths instead of core-hours.

I am having trouble determining the effect of this mistake, but it seems like we would be substantially undercharging for the serivce fee if it was really being charged by worker_fraction_in_1024ths instead of core-hours.
@jigold
Copy link
Contributor

jigold commented Sep 1, 2023

I think we're okay in terms of not having errors currently. This was a backwards compatibility code path. I will double check again, but I think we first create the config on the driver GCPSlimInstanceConfig.create(). This config is sent to the worker which deserializes it, but it's on the last part of that if/else and runs resources = [gcp_resource_from_dict(data) for data in resources].

@jigold
Copy link
Contributor

jigold commented Sep 1, 2023

I think as long as these are correct, then we're okay:

class GCPServiceFeeResource(ServiceFeeResourceMixin, GCPResource):
    FORMAT_VERSION = 1
    TYPE = 'gcp_service_fee'

    @staticmethod
    def product_name() -> str:
        return 'service-fee'

class ServiceFeeResourceMixin(Resource, abc.ABC):
    def to_quantified_resource(
        self, cpu_in_mcpu: int, memory_in_bytes: int, worker_fraction_in_1024ths: int, external_storage_in_gib: int
    ) -> Optional[QuantifiedResource]:  # pylint: disable=unused-argument
        del memory_in_bytes, worker_fraction_in_1024ths, external_storage_in_gib
        return {'name': self.name, 'quantity': cpu_in_mcpu}
class GCPIPFeeResource(IPFeeResourceMixin, GCPResource):
    FORMAT_VERSION = 1
    TYPE = 'gcp_ip_fee'

    @staticmethod
    def product_name(base: int) -> str:
        return f'ip-fee/{base}'

class IPFeeResourceMixin(Resource, abc.ABC):
    def to_quantified_resource(
        self, cpu_in_mcpu: int, memory_in_bytes: int, worker_fraction_in_1024ths: int, external_storage_in_gib: int
    ) -> Optional[QuantifiedResource]:  # pylint: disable=unused-argument
        del cpu_in_mcpu, memory_in_bytes, external_storage_in_gib
        return {'name': self.name, 'quantity': worker_fraction_in_1024ths}

@jigold
Copy link
Contributor

jigold commented Sep 1, 2023

Also, I did a GPU experiment with Sophie where she ran a preemptible and nonpreemptible job for an hour:

mysql> SELECT batch_id, job_id, resource, COALESCE(CAST(COALESCE(SUM(`usage`), 0) AS SIGNED) * rate, 0) AS cost
    -> FROM aggregated_job_resources_v2
    -> LEFT JOIN resources ON aggregated_job_resources_v2.resource_id = resources.resource_id
    -> WHERE aggregated_job_resources_v2.batch_id = 5 AND aggregated_job_resources_v2.job_id = 1
    -> GROUP BY batch_id, job_id, resource;
+----------+--------+---------------------------------------------------------+----------------------+
| batch_id | job_id | resource                                                | cost                 |
+----------+--------+---------------------------------------------------------+----------------------+
|        5 |      1 | ip-fee/1024/1                                           | 0.004081841111111111 |
|        5 |      1 | service-fee/1                                           |  0.04081841111111112 |
|        5 |      1 | disk/pd-ssd/us-central1/1693067778861                   | 0.028497453342459508 |
|        5 |      1 | compute/g2-nonpreemptible/us-central1/1693067778861     |  0.10199791103476001 |
|        5 |      1 | memory/g2-nonpreemptible/us-central1/1693067778861      | 0.044810165988899996 |
|        5 |      1 | accelerator/l4-nonpreemptible/us-central1/1693110803273 |   0.5714988178566729 |
+----------+--------+---------------------------------------------------------+----------------------+
6 rows in set (0.00 sec)

mysql> SELECT batch_id, job_id, resource, COALESCE(CAST(COALESCE(SUM(`usage`), 0) AS SIGNED) * rate, 0) AS cost
    -> FROM aggregated_job_resources_v2
    -> LEFT JOIN resources ON aggregated_job_resources_v2.resource_id = resources.resource_id
    -> WHERE aggregated_job_resources_v2.batch_id = 2 AND aggregated_job_resources_v2.job_id = 1
    -> GROUP BY batch_id, job_id, resource;
+----------+--------+------------------------------------------------------+----------------------+
| batch_id | job_id | resource                                             | cost                 |
+----------+--------+------------------------------------------------------+----------------------+
|        2 |      1 | ip-fee/1024/1                                        | 0.004279891111111111 |
|        2 |      1 | service-fee/1                                        | 0.042798911111111115 |
|        2 |      1 | compute/g2-preemptible/us-central1/1693067778861     | 0.042798911111111115 |
|        2 |      1 | disk/pd-ssd/us-central1/1693067778861                | 0.029880143280857863 |
|        2 |      1 | memory/g2-preemptible/us-central1/1693067778861      |  0.01879407184166667 |
|        2 |      1 | accelerator/l4-preemptible/us-central1/1693110803273 |   0.1797683423080672 |
+----------+--------+------------------------------------------------------+----------------------+
6 rows in set (0.00 sec)

@danking danking merged commit f063e96 into hail-is:main Sep 1, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants