Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid JSON records generated by CMSSW #35142

Closed
vkuznet opened this issue Sep 3, 2021 · 3 comments · Fixed by #35362
Closed

Invalid JSON records generated by CMSSW #35142

vkuznet opened this issue Sep 3, 2021 · 3 comments · Fixed by #35362

Comments

@vkuznet
Copy link
Contributor

vkuznet commented Sep 3, 2021

Hi,
CMSSW releases generate invalid CMSSW popularity JSON records which we send to our UDP popularity service which by itself send them over to CMS popularity in CERN MONIT infrastructure.

Finally, I got time to debug our UDP proxy server used by many CMS GRID jobs who send to it its popularity info (via JSON). Here is a problematic JSON:

{"site_name":"T1_UK_RAL", "user_dn":"/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=amaltaro/CN=718748/CN=Alan Malta
drigues/CN=749219743/CN=1706326272/CN=1940159068/CN=3843846833", "client_host":"ttcms032-1133805",
lient_domain":"0-lcg2562.gridpp.rl.ac.uk", "server_host":"xrootd", "server_domain":"echo.stfc.ac.uk",
nique_id":"f854a17c-a7fc-4b23-943a-2126f0650cc4-0", "file_lfn":"../cmsRun1/RAWSIMoutput.root", "file_size":5784900358,
ead_single_sigma":-nan, "read_single_average":765326, "read_bytes":151428981315, "read_bytes_at_close":151428981315,
ead_single_operations":197862, "read_single_bytes":151428981315, "read_vector_operations":0, "read_vector_bytes":0,
tart_time":1630666360, "end_time":1630672564}

To make your life easier to spot the problem, the issue is this part: "read_single_sigma":-nan which is INVALID from JSON data-format point of view, i.e. the -nan is invalid value.

The code in question is here, and, in particular, here is how it assign this value:

os << "\"read_single_sigma\":"
        << sqrt((static_cast<double>(read_single_square - m_read_single_square) -
                 single_average * single_average * single_op_count) /
                static_cast<double>(single_op_count))
        << ", ";
     os << "\"read_single_average\":" << single_average << ", ";

As far as I can tell there is no check if single_op_count can be zero which may cause -nan value. I truly hope that someone can take care of this bug and fix it in ALL CMSSW releases.

Because of it we loose around 7K records per day in CMS popularity metrics and I can't judge if it is too much or negligible but it seems to me it is sufficient to make a fix. So far our server ignores these invalid JSON records but it would be nice to finally patch CMSSW and get rid off this problem, and get better data for monitoring purposes.

Best,
Valentin.

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 3, 2021

A new Issue was created by @vkuznet Valentin Kuznetsov.

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@vkuznet
Copy link
Contributor Author

vkuznet commented Sep 3, 2021

I also suggest to inspect the entire codebase to verify that what is written to os stream is valid numerical or string values. I propose to add proper JSON validation to the code.

@makortel
Copy link
Contributor

makortel commented Sep 3, 2021

This is a duplicate of #29412, I suggest we move the discussion into the earlier issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants