Qualcomm AI Engine Direct - support embedding op #2057

haowhsu-quic · 2024-02-23T18:04:45Z

summary:

support embedding op with int32 index input
make mobilebert / llama2 be fully delegated
add requantize passes for mixed precision
bug fixes

pytorch-bot · 2024-02-23T18:04:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2057

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f223b65 with merge base 81b3232 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-02-29T06:35:17Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cccclai · 2024-02-29T06:44:34Z

Two main comments:

Is there a way to repro the error on our side? maybe it's an edge case we'd need to fix
Can we leave a todo (maybe an expected failing unit test)?

haowhsu-quic · 2024-02-29T07:13:41Z

Two main comments:

Is there a way to repro the error on our side? maybe it's an edge case we'd need to fix

Can we leave a todo (maybe an expected failing unit test)?

Thanks for reviewing! I've added TODO items for next action.

It could be reproduced with following patch:

[examples/qualcomm/scripts/mobilebert_fine_tune.py]
@@ -58,10 +58,18 @@ def accuracy_per_class(preds, goldens, labels):
def get_dataset(data_val):
    # prepare input data
    inputs, input_list = [], ""
-   # max_position_embeddings defaults to 512
-   position_ids = torch.arange(512).expand((1, -1)).to(torch.int32)
    for index, data in enumerate(data_val):
        data = [d.to(torch.int32) for d in data]
+       # input_ids, attention_mask, token_type_ids
+       inputs.append((*data[:2], torch.zeros(data[0].size(), dtype=torch.int32)))
-       # input_ids, attention_mask, token_type_ids, position_ids
-       inputs.append(
-           (
-               *data[:2],
-               torch.zeros(data[0].size(), dtype=torch.int32),
-               position_ids[:, : data[0].shape[1]],
-           )
-       )
        input_text = " ".join(
            [f"input_{index}_{i}.raw" for i in range(len(inputs[-1]))]
        )
@@ -204,9 +212,6 @@ def get_fine_tuned_mobilebert(artifacts_dir, pretrained_weight, batch_size):
            map_location=torch.device("cpu"),
        ),
    )
+   # hack for changing dtype of "position_ids" from int64 to int32
+   sub_module = model.mobilebert.embeddings
+   sub_module.position_ids = sub_module.position_ids.to(torch.int32)

    return model.eval(), dataloader_val, labels

cccclai · 2024-02-29T23:01:06Z

Thanks for the update. Seems like some llama related CI jobs start failign. Those changes look legit but Is it okay to remove changes in examples/models/llama2/model.py to get CI green? We can do a seperate pr for this.

haowhsu-quic · 2024-03-01T01:50:17Z

Thank you, I moved the datatype casting from examples/models/llama2/model.py to our own script as mobilebert does.

facebook-github-bot · 2024-03-01T03:13:50Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

summary: - support embedding op with int32 index input - make mobilebert / llama2 be fully delegated - add requantize passes for mixed precision - bug fixes

facebook-github-bot · 2024-03-01T17:30:21Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-03-03T23:42:48Z

@cccclai merged this pull request in 57e192b.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 23, 2024

haowhsu-quic force-pushed the dev_enable_embedding branch from 58183e7 to c2aca3d Compare February 29, 2024 06:08

cccclai approved these changes Feb 29, 2024

View reviewed changes

haowhsu-quic added 4 commits March 1, 2024 13:49

Qualcomm AI Engine Direct - support embedding op

e69a2a2

summary: - support embedding op with int32 index input - make mobilebert / llama2 be fully delegated - add requantize passes for mixed precision - bug fixes

update TODO items for ptq_mobilebert

0dfde33

change llama2's example input inside script

3c016e4

fix uplevel error

f223b65

haowhsu-quic force-pushed the dev_enable_embedding branch from 1af3bdf to f223b65 Compare March 1, 2024 08:19

facebook-github-bot closed this in 57e192b Mar 3, 2024

facebook-github-bot added the Merged label Mar 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qualcomm AI Engine Direct - support embedding op #2057

Qualcomm AI Engine Direct - support embedding op #2057

haowhsu-quic commented Feb 23, 2024 •

edited

Loading

pytorch-bot bot commented Feb 23, 2024 •

edited

Loading

facebook-github-bot commented Feb 29, 2024

cccclai commented Feb 29, 2024

haowhsu-quic commented Feb 29, 2024 •

edited

Loading

cccclai commented Feb 29, 2024

haowhsu-quic commented Mar 1, 2024

facebook-github-bot commented Mar 1, 2024

facebook-github-bot commented Mar 1, 2024

facebook-github-bot commented Mar 3, 2024

Qualcomm AI Engine Direct - support embedding op #2057

Qualcomm AI Engine Direct - support embedding op #2057

Conversation

haowhsu-quic commented Feb 23, 2024 • edited Loading

pytorch-bot bot commented Feb 23, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2057

✅ No Failures

facebook-github-bot commented Feb 29, 2024

cccclai commented Feb 29, 2024

haowhsu-quic commented Feb 29, 2024 • edited Loading

cccclai commented Feb 29, 2024

haowhsu-quic commented Mar 1, 2024

facebook-github-bot commented Mar 1, 2024

facebook-github-bot commented Mar 1, 2024

facebook-github-bot commented Mar 3, 2024

haowhsu-quic commented Feb 23, 2024 •

edited

Loading

pytorch-bot bot commented Feb 23, 2024 •

edited

Loading

haowhsu-quic commented Feb 29, 2024 •

edited

Loading