Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FW synch 12_0_0_pre4 #220

Merged
merged 77 commits into from
Mar 4, 2022
Merged

FW synch 12_0_0_pre4 #220

merged 77 commits into from
Mar 4, 2022

Conversation

aehart
Copy link
Contributor

@aehart aehart commented Jan 7, 2022

This PR updates the tarballs downloaded in emData/download.sh to those generated with the fw_synch_220106 fw_synch_220119 tag.

@meisonlikesicecream
Copy link
Contributor

Hahah I guess you also found this, but seems like Vivado is also experiencing an existential crisis over it already being 2022: https://support.xilinx.com/s/question/0D52E00006uxy49SAA/vivado-fails-to-export-ips-with-the-error-message-bad-lexical-cast-source-type-value-could-not-be-interpreted-as-target?language=en_US

@aehart
Copy link
Contributor Author

aehart commented Jan 7, 2022

Hahah I guess you also found this, but seems like Vivado is also experiencing an existential crisis over it already being 2022: https://support.xilinx.com/s/question/0D52E00006uxy49SAA/vivado-fails-to-export-ips-with-the-error-message-bad-lexical-cast-source-type-value-could-not-be-interpreted-as-target?language=en_US

This feels very on-brand for these times. I noticed the failure locally, but thought it was just some problem with our lab computer.

Using faketime seems to be the only halfway decent solution for now. I just added it to the CI tests, and lo and behold, it's already installed on the Gitlab runner, and it seems to be working.

@aperloff
Copy link

All versions of the Xilinx software at CU should be patched now. Please remove the faketime commands and re-test.

@aehart aehart mentioned this pull request Jan 18, 2022
@meisonlikesicecream
Copy link
Contributor

My VMR maxstep emulation PR (cms-L1TK/cmssw#119) has been merged now, so if we could include that in this PR that'd be nice!

@aehart
Copy link
Contributor Author

aehart commented Jan 19, 2022

My VMR maxstep emulation PR (cms-L1TK/cmssw#119) has been merged now, so if we could include that in this PR that'd be nice!

Thanks! I will remake the test vectors and rebase the branch now.

@meisonlikesicecream
Copy link
Contributor

Seems like we yet again have the issue with Vivado returning exit code 0 when the script actually crashed... https://gitlab.cern.ch/cms-l1tk/firmware-hls/-/jobs/18907686

Last time we solved it by running the scripts in batch mode, but that doesn't seem to work anymore!? (Maybe it's time for the awk solution again... hehe)

@meisonlikesicecream
Copy link
Contributor

meisonlikesicecream commented Jan 20, 2022

Seems like we yet again have the issue with Vivado returning exit code 0 when the script actually crashed... https://gitlab.cern.ch/cms-l1tk/firmware-hls/-/jobs/18907686

Last time we solved it by running the scripts in batch mode, but that doesn't seem to work anymore!? (Maybe it's time for the awk solution again... hehe)

And seems like the reason we get the ERROR in the first place is because the DTC Link memory addresses now starts with a "0x", which makes the VHDL FileReaderFIFO unhappy.

@meisonlikesicecream
Copy link
Contributor

Seems like we yet again have the issue with Vivado returning exit code 0 when the script actually crashed... https://gitlab.cern.ch/cms-l1tk/firmware-hls/-/jobs/18907686
Last time we solved it by running the scripts in batch mode, but that doesn't seem to work anymore!? (Maybe it's time for the awk solution again... hehe)

And seems like the reason we get the ERROR in the first place is because the DTC Link memory addresses now starts with a "0x", which makes the VHDL FileReaderFIFO unhappy.

I fixed the "0x" issues, and added the awk command as I don't trust Vivado exit codes anymore... feel free to remove/change it if you want.

@aehart
Copy link
Contributor Author

aehart commented Jan 21, 2022

Seems like we yet again have the issue with Vivado returning exit code 0 when the script actually crashed... https://gitlab.cern.ch/cms-l1tk/firmware-hls/-/jobs/18907686
Last time we solved it by running the scripts in batch mode, but that doesn't seem to work anymore!? (Maybe it's time for the awk solution again... hehe)

And seems like the reason we get the ERROR in the first place is because the DTC Link memory addresses now starts with a "0x", which makes the VHDL FileReaderFIFO unhappy.

I fixed the "0x" issues, and added the awk command as I don't trust Vivado exit codes anymore... feel free to remove/change it if you want.

Thanks, @meisonlikesicecream. I tried to think of a cleverer way to get Vivado to return an error exit code, but I think your awk solution might be best.

I just made one tweak because it was catching messages like "… 0 errors…" during synthesis. So now "error" is required to be at the beginning of the line, since that's how Vivado usually reports them.

@tomalin
Copy link
Contributor

tomalin commented Jan 25, 2022

@bryates , the CI is failing because the MC does not pass CSIM . See https://gitlab.cern.ch/cms-l1tk/firmware-hls/-/pipelines/3470487 . Could you please suggest a fix?

@bryates
Copy link
Contributor

bryates commented Jan 25, 2022

@bryates , the CI is failing because the MC does not pass CSIM . See https://gitlab.cern.ch/cms-l1tk/firmware-hls/-/pipelines/3470487 . Could you please suggest a fix?

This is strange, since the MC works fine on the master branch. @aryd, or anyone else, was something changed in the MC emulation before generating these test vectors?

There seems to be just one missing event:

Event: 43
FullMatch 1:
index	reference	computed
0	0x620922b00301a	0x620922b00301a
1	0x622129c5f91c6	0x622129c5f91c6
2	0x64090f15f31b1	0x64090f15f31b1
3	0x640d0f15f3186	0x640d0f15f3186
4	0x641d2d39fb031	0x641d2d39fb031
5	0x64212d39fc1cc	0x0	<=== missing

@bryates
Copy link
Contributor

bryates commented Jan 25, 2022

@bryates , the CI is failing because the MC does not pass CSIM . See https://gitlab.cern.ch/cms-l1tk/firmware-hls/-/pipelines/3470487 . Could you please suggest a fix?

This is strange, since the MC works fine on the master branch. @aryd, or anyone else, was something changed in the MC emulation before generating these test vectors?

There seems to be just one missing event:

Event: 43
FullMatch 1:
index	reference	computed
0	0x620922b00301a	0x620922b00301a
1	0x622129c5f91c6	0x622129c5f91c6
2	0x64090f15f31b1	0x64090f15f31b1
3	0x640d0f15f3186	0x640d0f15f3186
4	0x641d2d39fb031	0x641d2d39fb031
5	0x64212d39fc1cc	0x0	<=== missing

I just found that adding 2 more clock cycles in the HLS fixes this. This of course isn't a viable solution, but we already do have the MC in the emulation set to 105 instead of 108. I'll try dropping it to 103. I only just got the emulation installed and running the other day (usually @aryd takes care of it).

@aryd
Copy link
Contributor

aryd commented Jan 25, 2022

@bryates , the CI is failing because the MC does not pass CSIM . See https://gitlab.cern.ch/cms-l1tk/firmware-hls/-/pipelines/3470487 . Could you please suggest a fix?

This is strange, since the MC works fine on the master branch. @aryd, or anyone else, was something changed in the MC emulation before generating these test vectors?
There seems to be just one missing event:

Event: 43
FullMatch 1:
index	reference	computed
0	0x620922b00301a	0x620922b00301a
1	0x622129c5f91c6	0x622129c5f91c6
2	0x64090f15f31b1	0x64090f15f31b1
3	0x640d0f15f3186	0x640d0f15f3186
4	0x641d2d39fb031	0x641d2d39fb031
5	0x64212d39fc1cc	0x0	<=== missing

I just found that adding 2 more clock cycles in the HLS fixes this. This of course isn't a viable solution, but we already do have the MC in the emulation set to 105 instead of 108. I'll try dropping it to 103. I only just got the emulation installed and running the other day (usually @aryd takes care of it).

I'm not aware of any code changes. Could be that we did not process the same events and that this triggered some discrepency we had not noted earlier.

@bryates
Copy link
Contributor

bryates commented Jan 25, 2022

@bryates , the CI is failing because the MC does not pass CSIM . See https://gitlab.cern.ch/cms-l1tk/firmware-hls/-/pipelines/3470487 . Could you please suggest a fix?

This is strange, since the MC works fine on the master branch. @aryd, or anyone else, was something changed in the MC emulation before generating these test vectors?
There seems to be just one missing event:

Event: 43
FullMatch 1:
index	reference	computed
0	0x620922b00301a	0x620922b00301a
1	0x622129c5f91c6	0x622129c5f91c6
2	0x64090f15f31b1	0x64090f15f31b1
3	0x640d0f15f3186	0x640d0f15f3186
4	0x641d2d39fb031	0x641d2d39fb031
5	0x64212d39fc1cc	0x0	<=== missing

I just found that adding 2 more clock cycles in the HLS fixes this. This of course isn't a viable solution, but we already do have the MC in the emulation set to 105 instead of 108. I'll try dropping it to 103. I only just got the emulation installed and running the other day (usually @aryd takes care of it).

I'm not aware of any code changes. Could be that we did not process the same events and that this triggered some discrepency we had not noted earlier.

After changing the MC to 105 iterations in the emulation, I get vastly different test vectors. Maybe I have something incorrectly configured in my copy of the emulation.

@bryates
Copy link
Contributor

bryates commented Jan 25, 2022

@bryates , the CI is failing because the MC does not pass CSIM . See https://gitlab.cern.ch/cms-l1tk/firmware-hls/-/pipelines/3470487 . Could you please suggest a fix?

This is strange, since the MC works fine on the master branch. @aryd, or anyone else, was something changed in the MC emulation before generating these test vectors?
There seems to be just one missing event:

Event: 43
FullMatch 1:
index	reference	computed
0	0x620922b00301a	0x620922b00301a
1	0x622129c5f91c6	0x622129c5f91c6
2	0x64090f15f31b1	0x64090f15f31b1
3	0x640d0f15f3186	0x640d0f15f3186
4	0x641d2d39fb031	0x641d2d39fb031
5	0x64212d39fc1cc	0x0	<=== missing

I just found that adding 2 more clock cycles in the HLS fixes this. This of course isn't a viable solution, but we already do have the MC in the emulation set to 105 instead of 108. I'll try dropping it to 103. I only just got the emulation installed and running the other day (usually @aryd takes care of it).

I'm not aware of any code changes. Could be that we did not process the same events and that this triggered some discrepency we had not noted earlier.

After changing the MC to 105 iterations in the emulation, I get vastly different test vectors. Maybe I have something incorrectly configured in my copy of the emulation.

Just realized I still had the combined modules flag set to true. I'll try the emulation again on my end.

@bryates
Copy link
Contributor

bryates commented Jan 26, 2022

I've opened a PR on the CMSSW cms-L1TK:L1TK-dev-12_0_0_pre4 branch: cms-L1TK/cmssw#123
I found I only had to reduce the emulation MC to 104 iterations (instead of 103 as I reported previously), and I see full agreement between the HLS and emulation MC implementations.

@bryates
Copy link
Contributor

bryates commented Jan 26, 2022

I've opened a PR on the CMSSW cms-L1TK:L1TK-dev-12_0_0_pre4 branch: cms-L1TK/cmssw#123 I found I only had to reduce the emulation MC to 104 iterations (instead of 103 as I reported previously), and I see full agreement between the HLS and emulation MC implementations.

@aehart this emulation PR was just merged, so we'll need to generate new test vectors.

@tomalin
Copy link
Contributor

tomalin commented Feb 10, 2022

I've opened a PR on the CMSSW cms-L1TK:L1TK-dev-12_0_0_pre4 branch: cms-L1TK/cmssw#123 I found I only had to reduce the emulation MC to 104 iterations (instead of 103 as I reported previously), and I see full agreement between the HLS and emulation MC implementations.

@aehart this emulation PR was just merged, so we'll need to generate new test vectors.

Although @aehart updated the test vectors 3 days ago, the CI tests are still failing because the MC fails CSIM. @bryates any idea why? Perhaps the test vectors were not updated correctly?

@bryates
Copy link
Contributor

bryates commented Feb 10, 2022

I've opened a PR on the CMSSW cms-L1TK:L1TK-dev-12_0_0_pre4 branch: cms-L1TK/cmssw#123 I found I only had to reduce the emulation MC to 104 iterations (instead of 103 as I reported previously), and I see full agreement between the HLS and emulation MC implementations.

@aehart this emulation PR was just merged, so we'll need to generate new test vectors.

Although @aehart updated the test vectors 3 days ago, the CI tests are still failing because the MC fails CSIM. @bryates any idea why? Perhaps the test vectors were not updated correctly?

I'm running the emulation on my end now to see if the test vectors are out of date.

@aehart aehart mentioned this pull request Feb 21, 2022
Copy link
Contributor

@pwittich pwittich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve based on @aehart's feedback

@aehart aehart merged commit c5e8a74 into master Mar 4, 2022
@aehart aehart deleted the fw_sync_12_0_0_pre4 branch March 4, 2022 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants