Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] To change the model zip file name from hugging face org id to a custom prefix when upload_prefix provided. #413

Merged
merged 6 commits into from
Aug 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .ci/run-repository.sh
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ elif [[ "$TASK_TYPE" == "SentenceTransformerTrace" || "$TASK_TYPE" == "SparseTra
echo -e "\033[34;1mINFO:\033[0m TRACING_FORMAT: ${TRACING_FORMAT}\033[0m"
echo -e "\033[34;1mINFO:\033[0m EMBEDDING_DIMENSION: ${EMBEDDING_DIMENSION:-N/A}\033[0m"
echo -e "\033[34;1mINFO:\033[0m POOLING_MODE: ${POOLING_MODE:-N/A}\033[0m"
echo -e "\033[34;1mINFO:\033[0m UPLOAD_PREFIX: ${UPLOAD_PREFIX:-N/A}\033[0m"
echo -e "\033[34;1mINFO:\033[0m MODEL_DESCRIPTION: ${MODEL_DESCRIPTION:-N/A}\033[0m"

if [[ "$TASK_TYPE" == "SentenceTransformerTrace" ]]; then
Expand All @@ -95,7 +96,7 @@ elif [[ "$TASK_TYPE" == "SentenceTransformerTrace" || "$TASK_TYPE" == "SparseTra
--env "TEST_TYPE=server" \
--name opensearch-py-ml-trace-runner \
opensearch-project/opensearch-py-ml \
nox -s "${NOX_TRACE_TYPE}-${PYTHON_VERSION}" -- ${MODEL_ID} ${MODEL_VERSION} ${TRACING_FORMAT} ${EXTRA_ARGS} -md ${MODEL_DESCRIPTION:+"$MODEL_DESCRIPTION"}
nox -s "${NOX_TRACE_TYPE}-${PYTHON_VERSION}" -- ${MODEL_ID} ${MODEL_VERSION} ${TRACING_FORMAT} ${EXTRA_ARGS} -up ${UPLOAD_PREFIX} -md ${MODEL_DESCRIPTION:+"$MODEL_DESCRIPTION"}

# To upload a model, we need the model artifact, description, license files into local path
# trace_output should include description and license file.
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/model_uploader.yml
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,8 @@ jobs:
echo "MODEL_VERSION=${{ github.event.inputs.model_version }}" >> $GITHUB_ENV
echo "TRACING_FORMAT=${{ github.event.inputs.tracing_format }}" >> $GITHUB_ENV
echo "EMBEDDING_DIMENSION=${{ github.event.inputs.embedding_dimension }}" >> $GITHUB_ENV
echo "POOLING_MODE=${{ github.event.inputs.pooling_mode }}" >> $GITHUB_ENV
echo "POOLING_MODE=${{ github.event.inputs.pooling_mode }}" >> $GITHUB_ENV
echo "UPLOAD_PREFIX=${{ github.event.inputs.upload_prefix }}" >> $GITHUB_ENV
echo "MODEL_DESCRIPTION=${{ github.event.inputs.model_description }}" >> $GITHUB_ENV
- name: Autotracing ${{ matrix.cluster }} secured=${{ matrix.secured }} version=${{matrix.entry.opensearch_version}}
run: "./.ci/run-tests ${{ matrix.cluster }} ${{ matrix.secured }} ${{ matrix.entry.opensearch_version }} ${{github.event.inputs.model_type}}Trace"
Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
- updating listing file with three v2 sparse model - by @dhrubo-os ([#412](https://github.com/opensearch-project/opensearch-py-ml/pull/412))

### Fixed
- Fix the wrong final zip file name in model_uploader workflow, now will name it by the upload_prefix alse.([#413](https://github.com/opensearch-project/opensearch-py-ml/pull/413/files))
- Fix the wrong input parameter for model_uploader's base_download_path in jekins trigger.([#402](https://github.com/opensearch-project/opensearch-py-ml/pull/402))
- Enable make_model_config_json to add model description to model config file by @thanawan-atc in ([#203](https://github.com/opensearch-project/opensearch-py-ml/pull/203))
- Correct demo_ml_commons_integration.ipynb by @thanawan-atc in ([#208](https://github.com/opensearch-project/opensearch-py-ml/pull/208))
Expand Down
4 changes: 2 additions & 2 deletions opensearch_py_ml/ml_models/sparse_encoding_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,8 @@ def save_as_pt(
add_apache_license: bool = True,
) -> str:
"""
Download sentence transformer model directly from huggingface, convert model to torch script format,
zip the model file and its tokenizer.json file to prepare to upload to the Open Search cluster
Download sparse encoding model directly from huggingface, convert model to torch script format,
zip the model file and its tokenizer.json file to prepare to upload to the OpenSearch cluster

:param sentences:
Required, for example sentences = ['today is sunny']
Expand Down
7 changes: 6 additions & 1 deletion utils/model_uploader/autotracing_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,7 @@ def prepare_files_for_uploading(
model_format: str,
src_model_path: str,
src_model_config_path: str,
upload_prefix: str = None,
) -> tuple[str, str]:
"""
Prepare files for uploading by storing them in UPLOAD_FOLDER_PATH
Expand All @@ -253,7 +254,11 @@ def prepare_files_for_uploading(
(path to model config json file) in the UPLOAD_FOLDER_PATH
:rtype: Tuple[str, str]
"""
model_type, model_name = model_id.split("/")
model_type, model_name = (
model_id.split("/")
if upload_prefix is None
else (upload_prefix, model_id.split("/")[-1])
)
model_format = model_format.lower()
folder_to_delete = (
TORCHSCRIPT_FOLDER_PATH if model_format == "torch_script" else ONNX_FOLDER_PATH
Expand Down
11 changes: 11 additions & 0 deletions utils/model_uploader/model_autotracing.py
Original file line number Diff line number Diff line change
Expand Up @@ -281,6 +281,7 @@ def main(
embedding_dimension: Optional[int] = None,
pooling_mode: Optional[str] = None,
model_description: Optional[str] = None,
upload_prefix: Optional[str] = None,
) -> None:
"""
Perform model auto-tracing and prepare files for uploading to OpenSearch model hub
Expand Down Expand Up @@ -363,6 +364,7 @@ def main(
TORCH_SCRIPT_FORMAT,
torchscript_model_path,
torchscript_model_config_path,
upload_prefix,
)

config_path_for_checking_description = torchscript_dst_model_config_path
Expand Down Expand Up @@ -425,6 +427,14 @@ def main(
choices=["BOTH", "TORCH_SCRIPT", "ONNX"],
help="Model format for auto-tracing",
)
parser.add_argument(
"-up",
"--upload_prefix",
type=str,
nargs="?",
default=None,
help="Model customize path prefix for upload",
)
parser.add_argument(
"-ed",
"--embedding_dimension",
Expand Down Expand Up @@ -462,4 +472,5 @@ def main(
args.embedding_dimension,
args.pooling_mode,
args.model_description,
args.upload_prefix,
)
25 changes: 22 additions & 3 deletions utils/model_uploader/sparse_model_autotracing.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,7 @@ def main(
model_version: str,
tracing_format: str,
model_description: Optional[str] = None,
upload_prefix: Optional[str] = None,
) -> None:
"""
Perform model auto-tracing and prepare files for uploading to OpenSearch model hub
Expand Down Expand Up @@ -235,7 +236,10 @@ def main(
torchscript_model_path,
torchscript_model_config_path,
) = trace_sparse_encoding_model(
model_id, model_version, TORCH_SCRIPT_FORMAT, model_description=None
model_id,
model_version,
TORCH_SCRIPT_FORMAT,
model_description=model_description,
)

torchscript_encoding_datas = register_and_deploy_sparse_encoding_model(
Expand All @@ -262,6 +266,7 @@ def main(
TORCH_SCRIPT_FORMAT,
torchscript_model_path,
torchscript_model_config_path,
upload_prefix,
)

config_path_for_checking_description = torchscript_dst_model_config_path
Expand All @@ -273,7 +278,7 @@ def main(
onnx_model_path,
onnx_model_config_path,
) = trace_sparse_encoding_model(
model_id, model_version, ONNX_FORMAT, model_description=None
model_id, model_version, ONNX_FORMAT, model_description=model_description
)

onnx_embedding_datas = register_and_deploy_sparse_encoding_model(
Expand Down Expand Up @@ -325,6 +330,14 @@ def main(
choices=["BOTH", "TORCH_SCRIPT", "ONNX"],
help="Model format for auto-tracing",
)
parser.add_argument(
"-up",
"--upload_prefix",
type=str,
nargs="?",
default=None,
help="Model customize path prefix for upload",
)
parser.add_argument(
"-md",
"--model_description",
Expand All @@ -336,4 +349,10 @@ def main(
)
args = parser.parse_args()

main(args.model_id, args.model_version, args.tracing_format, args.model_description)
main(
args.model_id,
args.model_version,
args.tracing_format,
args.model_description,
args.upload_prefix,
)
3 changes: 0 additions & 3 deletions utils/model_uploader/upload_history/MODEL_UPLOAD_HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,3 @@ The following table shows sentence transformer model upload history.
|2023-09-13 18:03:32|@dhrubo-os|`sentence-transformers/distiluse-base-multilingual-cased-v1`|1.0.1|TORCH_SCRIPT|N/A|N/A|6178024517|
|2023-10-18 18:06:15|@dhrubo-os|`sentence-transformers/paraphrase-mpnet-base-v2`|1.0.0|ONNX|N/A|N/A|6568285400|
|2023-10-18 18:06:15|@dhrubo-os|`sentence-transformers/paraphrase-mpnet-base-v2`|1.0.0|TORCH_SCRIPT|N/A|N/A|6568285400|
|2024-08-07 18:01:26|@dhrubo-os|`opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill`|1.0.0|TORCH_SCRIPT|N/A|N/A|10293890748|
|2024-08-07 18:23:41|@dhrubo-os|`opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini`|1.0.0|TORCH_SCRIPT|N/A|N/A|10294048787|
|2024-08-08 09:40:44|@dhrubo-os|`opensearch-project/opensearch-neural-sparse-encoding-v2-distill`|1.0.0|TORCH_SCRIPT|N/A|N/A|10295327692|
30 changes: 0 additions & 30 deletions utils/model_uploader/upload_history/supported_models.json
Original file line number Diff line number Diff line change
Expand Up @@ -48,35 +48,5 @@
"Embedding Dimension": "N/A",
"Pooling Mode": "N/A",
"Workflow Run ID": "6568285400"
},
{
"Model Uploader": "@dhrubo-os",
"Upload Time": "2024-08-07 18:01:26",
"Model ID": "opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill",
"Model Version": "1.0.0",
"Model Format": "TORCH_SCRIPT",
"Embedding Dimension": "N/A",
"Pooling Mode": "N/A",
"Workflow Run ID": "10293890748"
},
{
"Model Uploader": "@dhrubo-os",
"Upload Time": "2024-08-07 18:23:41",
"Model ID": "opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini",
"Model Version": "1.0.0",
"Model Format": "TORCH_SCRIPT",
"Embedding Dimension": "N/A",
"Pooling Mode": "N/A",
"Workflow Run ID": "10294048787"
},
{
"Model Uploader": "@dhrubo-os",
"Upload Time": "2024-08-08 09:40:44",
"Model ID": "opensearch-project/opensearch-neural-sparse-encoding-v2-distill",
"Model Version": "1.0.0",
"Model Format": "TORCH_SCRIPT",
"Embedding Dimension": "N/A",
"Pooling Mode": "N/A",
"Workflow Run ID": "10295327692"
}
]
Loading