Releases: llmariner/job-manager
Releases · llmariner/job-manager
v1.8.0
What's Changed
Features
- feat(proto): add namespaced_name field to GpuPod by @Ladicle in #384
- feat(dispatcher): ingest a namespaced name to GpuPod by @Ladicle in #385
- feat(server): cache cluster state and reserved scheduled resources by @Ladicle in #386
- feat: implement a scheduling scoring algorithm by @kkaneda in #388
- feat(syncer): update local job status when failed to apply a job by @Ladicle in #392
- feat: validate the cluster registration key by @kkaneda in #393
Bug Fixes
- fix(syncer): explicitly specify the deletion propagation policy by @Ladicle in #389
- fix: bump rbac-manager by @kkaneda in #394
- fix: tweak the requirements.txt for fine-tuning docker image by @kkaneda in #395
Full Changelog: v1.7.0...v1.8.0
v1.7.0
What's Changed
Features
- feat(server,dispatcher): handle notebook requeue by @guangrui-cloudnatix in #357
- feat(syncer): be able to enable TLS by @kkaneda in #379
- feat: make the scheduler take into allocted GPUs by @kkaneda in #381
Bug Fixes
Full Changelog: v1.6.0...v1.7.0
v1.6.0
What's Changed
Features
- feat: allow non-gpu workloads to schedule to a no-gpu cluster by @kkaneda in #354
- feat(syncer): add a syncer to reflect job status in local resources by @Ladicle in #360
- feat: expose ListClusters and add more fields by @kkaneda in #364
- feat: populate the cluster name in ListClusters response by @kkaneda in #370
- feat: populate the gpu capacity in ListClusters response by @kkaneda in #371
- feat: track pods that use GPUs in cluster status by @kkaneda in #372
- feat: populate gpu_allocated and gpu_pod_count by @kkaneda in #373
- feat(server/syncer): add ListClusterIDs service by @Ladicle in #375
- feat(server/syncer): set up auth intercepter by @Ladicle in #374
- feat(syncer): support authentication between syncer and control-plane by @Ladicle in #376
- feat: bump rbac-manager dep by @kkaneda in #377
Bug Fixes
Full Changelog: v1.5.0...v1.6.0
v1.5.0
What's Changed
Features
- feat: persist schedulable envs to database by @kkaneda in #311
- feat(api): add an API for sending the cluster status by @kkaneda in #313
- feat: implement UpdateClusterStatus by @kkaneda in #314
- feat(api): add the "gpu_nodes" field in ClusterStatus by @kkaneda in #317
- feat(api): add JobService and ListClusters by @kkaneda in #318
- feat(dispatcher): send cluster info to server by @kkaneda in #319
- feat(server): implement the scheduler by @kkaneda in #320
- feat(server): Ignore stale clusters from scheduling candidates by @kkaneda in #321
- feat(engine): ignore cordoned GPU nodes from cluster status by @kkaneda in #322
- feat(dispatcher): be able to configure cluster status update interval by @kkaneda in #323
- feat(dispatcher): pull queued workloads only from an assigned cluster by @kkaneda in #325
- feat(server): add index to the cluster_id column of clusters by @kkaneda in #327
- feat(api): add SyncerService for put and delete k8s objects by @Ladicle in #332
- feat(server): add syncer service server by @Ladicle in #333
- feat(chart): add logLevel fields by @Ladicle in #335
- feat(server): implement syncer service APIs by @Ladicle in #336
- feat(syncer): add empty syncer component by @Ladicle in #339
- feat(chart): add syncer chart by @Ladicle in #340
- feat(notebooks): set env var for an org ID and a project ID by @kkaneda in #346
- feat(syncer): add job controller by @Ladicle in #347
- feat(proto): add state and action for rescheduling jobs by @guangrui-cloudnatix in #348
- feat(server): add fields in notebook table by @guangrui-cloudnatix in #349
- feat(dispatcher): set env vars for org and project when creating a no… by @kkaneda in #350
- feat: add org/project title and cluster name to proto by @kkaneda in #351
Bug Fixes
- fix(server): add ingress path for /llmariner.jobs.server.v1.JobWorkerService` by @kkaneda in #324
- fix(server): sort the cluster list in ListClusters by @kkaneda in #326
- fix(chart): fix line handling for server ingress resources by @Ladicle in #334
Other Changes
Full Changelog: v1.4.1...v1.5.0
v1.4.1
v1.4.0
What's Changed
Features
- feat(dispatcher): report component status to cluster manager by @guangrui-cloudnatix in #304
Full Changelog: v1.3.0...v1.4.0
v1.3.0
What's Changed
Features
- feat(fine-tuning): bump the transformer to the latest version by @kkaneda in #297
- feat(fine-tuning): install autoawq by @kkaneda in #298
- feat(fine-tuning): make BitsAndBytesQuantization quantization optional by @kkaneda in #300
Bug Fixes
- fix(dispatcher): do not pull models that have the same prefix by @kkaneda in #299
- fix(chart): unset the default value of
enable
by @Ladicle in #301
Full Changelog: v1.2.0...v1.3.0