Speed up Keras profiling #863

AdrianAlan · 2023-09-08T15:16:56Z

Description

Changes to get_ymodel_keras speeding up (Q)Keras profiling. Instead of compiling a new model on every step the output can be an array as suggested in Keras FAQ.

Type of change

Other: (non-breaking enhancement)

Tests

I added new tests in test_trace.
I tried it with ResNet and I didn't find any issues.
I have tried it on a simple example of LeNet on T4 on lxplus705:

def get_model():
  model = keras.Sequential()
  model.add(layers.Conv2D(filters=6, kernel_size=(3, 3), activation='relu', input_shape=(32,32,1)))
  model.add(layers.AveragePooling2D())
  model.add(layers.Conv2D(filters=16, kernel_size=(3, 3), activation='relu'))
  model.add(layers.AveragePooling2D())
  model.add(layers.Flatten())
  model.add(layers.Dense(units=120))
  model.add(layers.Dense(units=84, activation='linear'))
  model.add(layers.Dense(units=10, activation='softmax'))
  model.compile()
  return model

warmup = np.random.random((1000, 32, 32, 1))
model = get_model()
for _ in range(10):
  _ = hls4ml.model.profiling.get_ymodel_keras_old(model, warmup)

new, old = [], []
for _ in range(10):
  X = np.random.random((1000, 32, 32, 1))

  start = time.time()
  _trace_new = hls4ml.model.profiling.get_ymodel_keras(model, X)
  end = time.time()
  new.append(end-start)

  start = time.time()
  _trace_old = hls4ml.model.profiling.get_ymodel_keras_old(model, X)
  end = time.time()
  old.append(end-start)

  assert _trace_old.keys() == _trace_new.keys()
  for l in _trace_old.keys():
    assert np.all(_trace_old[l] == _trace_new[l])

print("New implementation: {}".format(np.mean(new)))
print("Old implementation: {}".format(np.mean(old)))

and I got

New implementation: 0.7009074449539184
Old implementation: 1.334557008743286

Checklist

I have read the guidelines for contributing.
I have commented my code, particularly in hard-to-understand areas.
I have made corresponding changes to the documentation.
My changes generate no new warnings.
I have installed and run pre-commit on the files I edited or added.
I have added tests that prove my fix is effective or that my feature works.

jmduarte

Looks good thanks @AdrianAlan!

🎉 Add proxy_model support format Fix broken import Add parallelization_factor propagation Add overriding warning Purge linear layers for proxy model configured layers Fix repack_stream optimizer inheirt original precision format not-my-code add handler, fix type enforcement Speed up Keras profiling (fastmachinelearning#863) * Speed up Keras profiling * update function name --------- Co-authored-by: Javier Duarte <jduarte@ucsd.edu> [pre-commit.ci] pre-commit autoupdate updates: - [github.com/pre-commit/pre-commit-hooks: v4.4.0 → v4.5.0](pre-commit/pre-commit-hooks@v4.4.0...v4.5.0) - [github.com/asottile/pyupgrade: v3.14.0 → v3.15.0](asottile/pyupgrade@v3.14.0...v3.15.0) Fix profiling SeparableConv1D and SeparableConv2D (fastmachinelearning#891) * Profiling: Fix suffixes for SeparableConv1D&2D * Profiling: transform list to dict where dict is expected --------- Co-authored-by: Quentin Berthet <quentin.berthet@cern.ch> Add support for filt_height==1 for streaming quartus conv2d (fastmachinelearning#886) Fix config structure name in pragma for SeparableConv1D (fastmachinelearning#884) * Raise exception if Vivado command fail * Duplicate sepconv2d test for sepconv1d * Test that csynth is working for sepconv1d * Define multiplier_limit in nnet::conv1d_config (for sepconv1d) * Revert build test --------- Co-authored-by: Quentin Berthet <quentin.berthet@cern.ch> Co-authored-by: Vladimir Loncar <vloncar@users.noreply.github.com> [pre-commit.ci] pre-commit autoupdate updates: - [github.com/psf/black: 23.9.1 → 23.10.0](psf/black@23.9.1...23.10.0) rename, minor bug fix fix multi clones w/ diff outs in stream io fix test Fix quartus writer with io_stream and multi-output fix weight fp write length add test Fix: update weight writer digits Fix clone precision inheriting Add relu6 stream support Fix semi-heterogeneous mask generation Add test and fixed_point_quantizer format Revert "Add relu6 stream support" This reverts commit d05cbaa.

Speed up Keras profiling

3d733e4

AdrianAlan force-pushed the fix-optmize-profiling branch from babfc8a to 3d733e4 Compare September 8, 2023 15:35

Merge branch 'main' into fix-optmize-profiling

cbfacbf

jmduarte added the please test Trigger testing by creating local PR branch label Oct 6, 2023

jmduarte self-requested a review October 6, 2023 16:45

jmduarte added 2 commits October 6, 2023 18:34

update function name

05129d0

Merge branch 'main' into fix-optmize-profiling

f3c5bb9

jmduarte added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Oct 7, 2023

jmduarte approved these changes Oct 7, 2023

View reviewed changes

jmduarte merged commit d36e226 into fastmachinelearning:main Oct 7, 2023
5 of 7 checks passed

vloncar mentioned this pull request Oct 20, 2023

Fix profiling SeparableConv1D and SeparableConv2D #891

Merged

7 tasks

jmitrevs mentioned this pull request Feb 22, 2024

Problem tracing binary CNN model after recent tracing optimization #974

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up Keras profiling #863

Speed up Keras profiling #863

AdrianAlan commented Sep 8, 2023 •

edited

Loading

jmduarte left a comment

Speed up Keras profiling #863

Speed up Keras profiling #863

Conversation

AdrianAlan commented Sep 8, 2023 • edited Loading

Description

Type of change

Tests

Checklist

jmduarte left a comment

Choose a reason for hiding this comment

AdrianAlan commented Sep 8, 2023 •

edited

Loading