Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up Keras profiling #863

Merged
merged 4 commits into from
Oct 7, 2023

Conversation

AdrianAlan
Copy link
Contributor

@AdrianAlan AdrianAlan commented Sep 8, 2023

Description

Changes to get_ymodel_keras speeding up (Q)Keras profiling. Instead of compiling a new model on every step the output can be an array as suggested in Keras FAQ.

Type of change

  • Other: (non-breaking enhancement)

Tests

  • I added new tests in test_trace.
  • I tried it with ResNet and I didn't find any issues.
  • I have tried it on a simple example of LeNet on T4 on lxplus705:
def get_model():
  model = keras.Sequential()
  model.add(layers.Conv2D(filters=6, kernel_size=(3, 3), activation='relu', input_shape=(32,32,1)))
  model.add(layers.AveragePooling2D())
  model.add(layers.Conv2D(filters=16, kernel_size=(3, 3), activation='relu'))
  model.add(layers.AveragePooling2D())
  model.add(layers.Flatten())
  model.add(layers.Dense(units=120))
  model.add(layers.Dense(units=84, activation='linear'))
  model.add(layers.Dense(units=10, activation='softmax'))
  model.compile()
  return model

warmup = np.random.random((1000, 32, 32, 1))
model = get_model()
for _ in range(10):
  _ = hls4ml.model.profiling.get_ymodel_keras_old(model, warmup)

new, old = [], []
for _ in range(10):
  X = np.random.random((1000, 32, 32, 1))

  start = time.time()
  _trace_new = hls4ml.model.profiling.get_ymodel_keras(model, X)
  end = time.time()
  new.append(end-start)

  start = time.time()
  _trace_old = hls4ml.model.profiling.get_ymodel_keras_old(model, X)
  end = time.time()
  old.append(end-start)

  assert _trace_old.keys() == _trace_new.keys()
  for l in _trace_old.keys():
    assert np.all(_trace_old[l] == _trace_new[l])

print("New implementation: {}".format(np.mean(new)))
print("Old implementation: {}".format(np.mean(old)))

and I got

New implementation: 0.7009074449539184
Old implementation: 1.334557008743286

Checklist

  • I have read the guidelines for contributing.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have installed and run pre-commit on the files I edited or added.
  • I have added tests that prove my fix is effective or that my feature works.

@jmduarte jmduarte added the please test Trigger testing by creating local PR branch label Oct 6, 2023
@jmduarte jmduarte self-requested a review October 6, 2023 16:45
@jmduarte jmduarte added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Oct 7, 2023
Copy link
Member

@jmduarte jmduarte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good thanks @AdrianAlan!

@jmduarte jmduarte merged commit d36e226 into fastmachinelearning:main Oct 7, 2023
5 of 7 checks passed
calad0i added a commit to calad0i/hls4ml that referenced this pull request Nov 7, 2023
🎉 Add proxy_model support

format

Fix broken import

Add parallelization_factor propagation

Add overriding warning

Purge linear layers for proxy model configured layers

Fix repack_stream optimizer inheirt original precision

format not-my-code

add handler, fix type enforcement

Speed up Keras profiling (fastmachinelearning#863)

* Speed up Keras profiling

* update function name

---------

Co-authored-by: Javier Duarte <jduarte@ucsd.edu>

[pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/pre-commit/pre-commit-hooks: v4.4.0 → v4.5.0](pre-commit/pre-commit-hooks@v4.4.0...v4.5.0)
- [github.com/asottile/pyupgrade: v3.14.0 → v3.15.0](asottile/pyupgrade@v3.14.0...v3.15.0)

Fix profiling SeparableConv1D and SeparableConv2D (fastmachinelearning#891)

* Profiling: Fix suffixes for SeparableConv1D&2D

* Profiling: transform list to dict where dict is expected

---------

Co-authored-by: Quentin Berthet <quentin.berthet@cern.ch>

Add support for filt_height==1 for streaming quartus conv2d (fastmachinelearning#886)

Fix config structure name in pragma for SeparableConv1D (fastmachinelearning#884)

* Raise exception if Vivado command fail

* Duplicate sepconv2d test for sepconv1d

* Test that csynth is working for sepconv1d

* Define multiplier_limit in nnet::conv1d_config (for sepconv1d)

* Revert build test

---------

Co-authored-by: Quentin Berthet <quentin.berthet@cern.ch>
Co-authored-by: Vladimir Loncar <vloncar@users.noreply.github.com>

[pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/psf/black: 23.9.1 → 23.10.0](psf/black@23.9.1...23.10.0)

rename, minor bug fix

fix multi clones w/ diff outs in stream io

fix test

Fix quartus writer with io_stream and multi-output

fix weight fp write length

add test

Fix: update weight writer digits

Fix clone precision inheriting

Add relu6 stream support

Fix semi-heterogeneous mask generation

Add test and fixed_point_quantizer

format

Revert "Add relu6 stream support"

This reverts commit d05cbaa.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
please test Trigger testing by creating local PR branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants