Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FPGA support to SonicTriton #8

Open
kpedro88 opened this issue Sep 14, 2020 · 11 comments
Open

Add FPGA support to SonicTriton #8

kpedro88 opened this issue Sep 14, 2020 · 11 comments

Comments

@kpedro88
Copy link
Collaborator

The FaaST FPGA server uses Triton calls in order to be interoperable with the existing SonicTriton client. An explicit conversion from floating point to fixed point may be needed (as opposed to the direct reinterpret_cast currently used to handle data for the Triton GPU server).

Assigned to: @drankincms

@kpedro88
Copy link
Collaborator Author

Summary of latest discussion: try to revamp the FaaST server to do the floating point -> fixed point conversion on the server side. FaaST should also be updated to send dimension and data type information*, where the data type is what the client should provide (e.g. FP32) rather than the fixed point type needed by the FPGA.

If/when this works, a Docker or Singularity image for the FaaST server can be provided as another example in CMSSW. (A CPU emulation of the FPGA could allow people without FPGA access to run the FaaST example, but this may be slow and clunky.)

* Triton provides these types: https://github.com/triton-inference-server/server/blob/010334ac4b1aa35e7ca4f19680b3436d203284f1/src/core/model_config.cc#L39-L71

@kpedro88
Copy link
Collaborator Author

Updated plan: ability to do conversion on server is nice as an extra, but can reduce throughput (especially w/ multiple clients pinging one server).

We may consider adding a "conversion factory" to allow runtime decisions about how to convert floating point to fixed point, and easy addition of new kinds of conversions. An example of the kinds of conversions required is here: https://github.com/hls-fpga-machine-learning/SonicCMS/blob/97ae4c4fa5a791d501f76e1bce7072cca8d8a791/TensorRT/plugins/HcalProducerFPGA.cc#L37-L42

An example of CMSSW factories:

(The best location and usage of the proposed conversion factory is to be determined.)

@kpedro88
Copy link
Collaborator Author

More notes:

@drankincms
Copy link

I have been trying to construct a conversion factory but Im not quite sure how to handle the conversion generically enough. My understanding is that the factory base should have a virtual function we call inside toServer around here: https://github.com/cms-sw/cmssw/blob/8ca7de64fccbe0da0572424fac930e179a32f3b1/HeterogeneousCore/SonicTriton/src/TritonData.cc#L125 (and also in fromServer potentially). Then there would be a dummy converter class that inherits from the base which basically doesn't do anything, to match things currently, and then a converter class specifically for certain types like ap_fixed. However, I can't figure out how to make the convert function work generically for the current setup without using templates like we use already, which don't work with virtual functions as I understand it. Is there some std::any type magic that I am missing? Or perhaps I don't understand the flexibility of the factory.

@kpedro88
Copy link
Collaborator Author

With the factory approach, for this particular problem, the goal is to have a conversion class where the specific ap_fixed type only exists inside the class. The signatures of all public-facing class methods must be identical (corresponding to the base class interface). Therefore, the conversion function should ultimately return uint8_t*. In this setup, the default conversion operation would just be reinterpret_cast, as we use for GPU.

@drankincms
Copy link

Ok, to document things a bit: I have a first pass at this here: https://github.com/drankincms/cmssw/tree/triton_converter_v1
(its built off Jeff's Facile work so a useful diff is here: https://github.com/jeffkrupa/cmssw/compare/hcalreco-facile...drankincms:triton_converter_v1?expand=1)
Right now it contains only the basic converters for float and int64_t which just call reinterpret_cast, but assuming a couple tests work then I can start adding converters for ap_fixed types in the same way

@drankincms
Copy link

Ok, things are now fully working on the same branch I used above (https://github.com/drankincms/cmssw/tree/triton_converter_v1). Right now this is still just the basic converters. Should I submit a PR here and then we can proceed to CMSSW after that? Or just go straight to CMSSW?

@kpedro88
Copy link
Collaborator Author

@drankincms fully working including ap_fixed conversions?

I would propose the following:

  1. rebase your branch to CMSSW_11_2_0_pre8 (removing Jeff's commits)
  2. run scram b code-checks and scram b code-format
  3. make a PR to https://github.com/hls-fpga-machine-learning/cmssw so I can review it for code rules etc.

(I may not be able to review it right away, and will also be considering the fully templated TritonData redesign we had discussed.)

@drankincms
Copy link

@kpedro88 Now it includes an example ap_fixed conversion, which took a bit longer than expected.

I have run scram b code-checks and scram b code-format, and rebased to 11_2_0_pre9, but an attempt to make a PR to https://github.com/hls-fpga-machine-learning/cmssw on master or CMSSW_11_2_X gives a lot of differences, so I assume there's a better branch with which to make a PR?

@kpedro88
Copy link
Collaborator Author

@drankincms I have to update the master branch in the fastml CMSSW fork manually before you make the PR. I've done that now, so you should be able to make the PR.

@kpedro88
Copy link
Collaborator Author

The internal review PR is: fastmachinelearning/cmssw#2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants