Propagate to plane / Kalman on plane / Matriplex with support for scalar operations and VDT #148

osschar · 2024-05-15T16:26:37Z

PR description:

Preliminary PR to discuss physics and computational performance of

propagateToPlane / kalmanUpdateOnPlane
Modifications to Matriplex: support scalar operations, usage of VDT.
Use Matriplex modifications to re-express vectorized code for propToPlane (mostly in PropagationMPlex.icc).

Computational performance notes

Before Matriplex / VDT changes the track finding ran about 4-times slower. Matriplex and VDT changes brought this down by about 20% -- so we still have an enormous slowdown which is not understood at all.

Further, the Matriplex vectorization of components by use of Matriplex scalars needs to be re-checked for the new operators and for VDT. Matevz examined the assembler code for the MiniPropagators which used the prototype implementation of this concept (also including some VDT functions ... but not embedded in Matriplex).

Things to check / look at

Compare vtune/igProf profiles for propToPlane vs. the standard implementation. Figure out where time is actually spent. Giuseppe is still working on the Kalman update on plane / in local coordinates.
Matriplex vectorization should be reviewed / consistently tested, along with VDT (there is special pre-processor define that causes fallback from vdt::fast_xyzz() to std::xyzz() functions.
- Not quite sure how to go about this on an "industrial scale" -- for all functions / operators and for all usages in specific CMSSW functions.
- I have not added simd pragmas ... they were not necessary in the MiniPropagators test. But maybe they can help by triggering some vectorization reports in case they fail.
Known problems / issues
- Mario reported a CMSSW problem with static_assert assuring float/double Mplex type for VDT - I only ran standalone builds with gcc-13 on Fedora 39.
- unary minus operator is giving me some trouble (member function without an argument didn't work, had to define it outside of the class) -- maybe I did something silly.

….cc), move structs BeamSpot and DeadRegion out of Hit.h.

…ify the new propagateToPlane code. * Add VDT support to Matriplex, mostly to work on Matriplex scalar types. - Functions are prefixed as fast_xyzz(), same as in VDT. - Controlled by define MPLEX_VDT. Additionally, if MPLEX_VDT_USE_STD is also defined, the fast_xyzz() functions fall back to using std:: variants. This is useful for performance comparisons. - Add reduction operator and an assigner class / method to extract scalars (one i,j element) or to assign to it. * Massage propagateToPlane low level implementation - Use the new Matriplex functionality to simplify code. - Remove the nmin, nmax indices initially introdcued to support GPU code (this was actually a wrong way ... Matriplex does support MatriplexVectors that should have been used instead -- and extended as needed).

slava77 · 2024-05-17T15:17:41Z

RecoTracker/MkFitCore/src/KalmanUtilsMPlex.cc

+      // fixme? //(pf.use_param_b_field ? 0.01f * Const::sol * Config::bFieldFromZR(psPar(n, 2, 0), hipo(psPar(n, 0, 0), psPar(n, 1, 0))) : 0.01f * Const::sol * Config::Bfield);
+      const float bF = 0.01f * Const::sol * Config::Bfield;
+      const float qh2 = bF * lp(n,0,0);
+      const float t1r = std::sqrt(1. + lp(n,0,1)*lp(n,0,1) + lp(n,0,2)*lp(n,0,2))*pzSign(n,0,0);


Suggested change

const float t1r = std::sqrt(1. + lp(n,0,1)*lp(n,0,1) + lp(n,0,2)*lp(n,0,2))*pzSign(n,0,0);

const float t1r = std::sqrt(1.f + lp(n,0,1)*lp(n,0,1) + lp(n,0,2)*lp(n,0,2))*pzSign(n,0,0);

?

slava77 · 2024-05-17T15:20:11Z

RecoTracker/MkFitCore/src/KalmanUtilsMPlex.cc

@@ -1009,15 +1137,17 @@ namespace mkfit {

  void kalmanUpdatePlane(const MPlexLS& psErr,
                         const MPlexLV& psPar,
+			                   const MPlexQI& Chg,


Suggested change

const MPlexQI& Chg,

const MPlexQI& chg,

naming style

slava77 · 2024-05-17T15:22:51Z

RecoTracker/MkFitCore/src/KalmanUtilsMPlex.cc

+	// fixme? //(pf.use_param_b_field ? 0.01f * Const::sol * Config::bFieldFromZR(psPar(n, 2, 0), hipo(psPar(n, 0, 0), psPar(n, 1, 0))) : 0.01f * Const::sol * Config::Bfield);
+	const float bF = 0.01f * Const::sol * Config::Bfield;//fixme: cache?
+	const float qh2 = bF * lp_upd(n,0,0);
+	const float cosl1 = 1./vn(n,0,2);


Suggested change

const float cosl1 = 1./vn(n,0,2);

const float cosl1 = 1.f/vn(n,0,2);

slava77 · 2024-05-17T15:23:24Z

RecoTracker/MkFitCore/src/KalmanUtilsMPlex.cc

+	const float vj = vn(n,0,0)*rot(n,0,0) + vn(n,0,1)*rot(n,0,1) + vn(n,0,2)*rot(n,0,2);
+	const float vk = vn(n,0,0)*rot(n,1,0) + vn(n,0,1)*rot(n,1,1) + vn(n,0,2)*rot(n,1,2);
+	const float cosz = vn(n,0,2)*qh2;
+	jacLoc2Curv(n,0,0) = 1.;


Suggested change

jacLoc2Curv(n,0,0) = 1.;

jacLoc2Curv(n,0,0) = 1.f;

although assignments should be OK

cmssw/RecoTracker/MkFitCore/src/Matriplex/Matriplex.h

Lines 78 to 85 in 6f9a138

template<typename TT>

Matriplex& negate_if_ltz(const Matriplex<TT, D1, D2, N> &sign) {

for (idx_t i = 0; i < kTotSize; ++i) {

if (sign.fArray[i] < 0)

fArray[i] = -fArray[i];

}

return *this;

}

Brute forcing the matrix to positive...

…om Matriplex.

- Collect unique ModuleShapes in MkFitGeometryESProducer and store them in a vector for each LayerInfo. - Add shape_id member to ModuleInfo (and remove half-legtn that is now available from the ModuleShape). - Add LayerInfo::m_has_charge to the prinout. - List ModuleShapes when detailed TrackerInfo is requested. - TrackerInfo::print_tracker() now takes additional argument 'precision' that determines number of decimal places to use for printing of module/shape parameters.

…-plane-rb1

- Collect unique ModuleShapes in MkFitGeometryESProducer and store them in a vector for each LayerInfo. - Add shape_id member to ModuleInfo (and remove half-legtn that is now available from the ModuleShape). - Add LayerInfo::m_has_charge to the prinout. - List ModuleShapes when detailed TrackerInfo is requested. - TrackerInfo::print_tracker() now takes additional argument 'precision' that determines number of decimal places to use for printing of module/shape parameters.

…al WITH_REVE to enable REve through Shell.

…-plane-rb1

slava77 · 2024-09-05T21:35:02Z

RecoTracker/MkFitCore/src/PropagationMPlex.icc

-  }
+  MPlex56 jacCCS2Curv(0.0f);
+  jacCCS2Curv.aij(0, 3) = mpt::negate_if_ltz(sinT, inChg);
+  jacCCS2Curv.aij(0, 5) = mpt::negate_if_ltz(cosT, inChg);


source in TrackState::jacobianCCSToCurvilinear has jac(0, 5) = charge * cosT * invpt;

source in TrackState::jacobianCCSToCurvilinear has jac(0, 5) = charge * cosT * invpt;

@cerati which one do you think is a bug?

I am not familiar with mpt::negate_if_ltz... maybe @osschar ?

I think my version before this change was the same as TrackState::jacobianCCSToCurvilinear

my point is that the 1/pT is missing

it was not missing in my version (you can see it if you look at the code that was removed)

oh rats, it's my fault again :)
or maybe it's better because of it ;)

osschar · 2024-09-25T18:17:03Z

Changes added to #151, this is now obsolete, closing.

cerati and others added 7 commits April 25, 2024 04:51

implement plan local kalman update

69c7600

fix vectorization bug, and start cleaning up

d9676b9

more cleanup

888bc92

fix sign of dP

6a3360a

Enable prop2plane by default in Config.h.

505d48d

Move global dead-vector declaration to ConfigStandalone (out of mkFit…

9670ea2

….cc), move structs BeamSpot and DeadRegion out of Hit.h.

slava77 reviewed May 17, 2024

View reviewed changes

osschar and others added 8 commits May 23, 2024 15:09

From Dan: proper way to call float/double versons of VDT functions fr…

4685974

…om Matriplex.

first round of vectorization of kalman operations on plane

5b884a0

fixes

d719468

step2 of optimizations

2da0b4f

step3 of optimizations

fd065ea

add auto generated include files

17addd4

Merge branch 'prop-plane-rb1' of github.com:trackreco/cmssw into prop…

bd372cb

…-plane-rb1

osschar force-pushed the prop-plane-rb1 branch from bd372cb to 17addd4 Compare May 31, 2024 15:52

osschar and others added 8 commits May 31, 2024 09:13

Prototype version of REve from Shell.

6fb5161

Fix with/without root handling of Shell.

8b12c6b

Replace cpp define NO_ROOT with inverse WITH_ROOT. Introduce addition…

0ce7920

…al WITH_REVE to enable REve through Shell.

Fix Shell/TRint construction/destructions; extend dictionary for shell.

a9d57f5

Use first-order prop-to-plane in selectHitIndices() for tilted modules.

c609972

attempt at removing non-optimized functions from prop to plane

aab435b

Merge branch 'prop-plane-rb1' of github.com:trackreco/cmssw into prop…

6f9a138

…-plane-rb1

mmasciov mentioned this pull request Aug 27, 2024

Phase-2 mkFit: propagation to plane / Kalman operations on plane / Matriplex with support for scalar operations and VDT (superseding #148) #151

Merged

slava77 reviewed Sep 5, 2024

View reviewed changes

osschar closed this Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagate to plane / Kalman on plane / Matriplex with support for scalar operations and VDT #148

Propagate to plane / Kalman on plane / Matriplex with support for scalar operations and VDT #148

osschar commented May 15, 2024 •

edited

Loading

slava77 May 17, 2024

slava77 May 17, 2024

slava77 May 17, 2024

slava77 May 17, 2024

dan131riley Sep 6, 2024

slava77 Sep 5, 2024

slava77 Sep 5, 2024

cerati Sep 6, 2024

cerati Sep 6, 2024

slava77 Sep 6, 2024

cerati Sep 6, 2024

osschar Sep 6, 2024

osschar commented Sep 25, 2024

	const float t1r = std::sqrt(1. + lp(n,0,1)lp(n,0,1) + lp(n,0,2)lp(n,0,2))*pzSign(n,0,0);
	const float t1r = std::sqrt(1.f + lp(n,0,1)lp(n,0,1) + lp(n,0,2)lp(n,0,2))*pzSign(n,0,0);

	const float cosl1 = 1./vn(n,0,2);
	const float cosl1 = 1.f/vn(n,0,2);

	template<typename TT>
	Matriplex& negate_if_ltz(const Matriplex<TT, D1, D2, N> &sign) {
	for (idx_t i = 0; i < kTotSize; ++i) {
	if (sign.fArray[i] < 0)
	fArray[i] = -fArray[i];
	}
	return *this;
	}

Propagate to plane / Kalman on plane / Matriplex with support for scalar operations and VDT #148

Propagate to plane / Kalman on plane / Matriplex with support for scalar operations and VDT #148

Conversation

osschar commented May 15, 2024 • edited Loading

PR description:

Computational performance notes

Things to check / look at

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

osschar commented Sep 25, 2024

osschar commented May 15, 2024 •

edited

Loading