-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new data catalogs from Rucio storage description (RucioCatalog) and use it by default instead of trivial data catalogs (TrivialCatalog) #37278
Conversation
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/28905
Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/28906
|
A new Pull Request was created by @nhduongvn for master. It involves the following packages:
@cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
What is the motivation for this change? |
Hi Chris, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Digging through old e-mails I reminded myself with these references
https://twiki.cern.ch/twiki/bin/view/CMS/StorageDescription
https://indico.cern.ch/event/919965/#11-a-new-description-for-stora
@nhduongvn Could you add these to the PR description (for future reference)?
Here are my comments from first review pass. My feeling is that some details may need further iteration. Could you also extend the tests of FileLocator
and SiteLocalConfigService
(here
and here
) to cover the new functionality?
Is there any need to backport this to earlier release cycles?
If I understood correctly this PR only adds the code, but it does not get used yet (because the default in InputFileCatalog
constructor preserves current behavior). How do you foresee the new code to be enabled? Would this be done e.g. by switching the default in InputFileCatalog
constructor? Or would every user of InputFileCatalog
need to decide case-by-case which to use? Or something else?
How long does the reading of storage.xml
need to be supported? E.g. do all the sites already provide storage.json
?
#include <boost/property_tree/ptree.hpp> | ||
#include <boost/foreach.hpp> | ||
|
||
namespace pt = boost::property_tree; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this namespace alias to FileLocator.cc
? There is only one use of that in this header (so the long name is not that big of a deal), but the shorthand feels too generic to be spread into other source files including this header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
#include <boost/property_tree/json_parser.hpp> | ||
#include <boost/property_tree/ptree.hpp> | ||
#include <boost/foreach.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only boost::property_tree
type appears to be used in this header. Would it be sufficient to leave the #include
of its header here and move the rest to the FileLocator.cc
?
#include <boost/property_tree/json_parser.hpp> | |
#include <boost/property_tree/ptree.hpp> | |
#include <boost/foreach.hpp> | |
#include <boost/property_tree/ptree.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok the #include are rearranged
|
||
namespace edm { | ||
|
||
class FileLocator { | ||
public: | ||
explicit FileLocator(std::string const& catUrl, unsigned iCatalog = 0); | ||
//catType: 0 (1) to construct using data catagory from event-data (data-access) | ||
explicit FileLocator(std::string const& catUrl, unsigned iCatalog = 0, unsigned catType = 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change the catType
to an enum class
with descriptive names? That would make the code more self-documenting.
With enum class
you could also add a new constructor that avoids spelling the default value of iCatalog
in calling code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @makortel ,
I do not understand your comment "With enum class you could also add a new constructor that avoids spelling the default value of iCatalog in calling code."
I updated the constructor:
explicit FileLocator(std::string const& catUrl, unsigned iCatalog = 0, unsigned catType = 0);
--> enum CatalogType {TrivialCatalog,RucioCatalog};
explicit FileLocator(std::string const& catUrl, unsigned iCatalog = 0, CatalogType catType = TrivialCatalog);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant you could add the following overload for the constructor
explicit FileLocator(std::string const& catUrl, CatalogType catType = TrivialCatalog);
Then the default value of iCatalog
does not have to be repeated in places that need to set non-default catType
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
~FileLocator(); | ||
|
||
std::string pfn(std::string const& ilfn) const; | ||
//ruleType: 0 (1) to use rule from storage.xml (storage.json) | ||
std::string pfn(std::string const& ilfn, unsigned ruleType = 0) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change the ruleType
to an enum class
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
FWCore/Catalog/src/FileLocator.cc
Outdated
rule.result = result; | ||
rule.chain = ""; | ||
rules[protocol].emplace_back(std::move(rule)); | ||
} catch (cms::Exception const& e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What code in the try
block would throw cms::Exception
? Maybe it should catch something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the try block. Please see further comment below
pfns.push_back(pfn); | ||
} | ||
} | ||
if (catType == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (catType == 1) { | |
else if (catType == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return m_dataCatalogs; | ||
if (catType == 0) | ||
return m_dataCatalogs; | ||
if (catType == 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (catType == 1) | |
else if (catType == 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -30,7 +30,7 @@ namespace edm { | |||
SiteLocalConfig() {} | |||
virtual ~SiteLocalConfig() {} | |||
|
|||
virtual std::vector<std::string> const& dataCatalogs(void) const = 0; | |||
virtual std::vector<std::string> const& dataCatalogs(unsigned catType = 0) const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that all callers of this function know at compile time whether catType
is 0
or 1
(e.g. because of if
statements outside). The returned data is also different for the two cases
0
contains theurl
attributes of thecatalog
elements insideevent-data
elements1
contains comma-separated list of- site
- sub-site if exists, otherwise site
site
attribute ofcatalog
element if exists, otherwise site- volume
- protocol
So the calling code must handle these two cases differently. How about creating
- a struct to store the aforementioned fields (to avoid creating and parsing comma-separated strings)
- a new accessor function that returns
std::vector
of those structs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this option without success. The code crashed with complain about "bad allocation". I do not think that 0 and 1 case the returned data is different. They can be both strings. It is just the matter of how trivial catalogs and new (Rucio) catalogs are defined in site_local_config.xml. In trivial catalog (case 0) everything is combined in an url and the codes still need to split it to know the path to storage definition (storage.xml) and the protocol:
"trivialcatalog_file:/cvmfs/cms.cern.ch/SITECONF/T1_US_FNAL/PhEDEx/storage.xml?protocol=xrd"
In the new (Rucio) catalog (case 1), things are more structured in "site", "volume" etc..
Therefore, we can make a string to include all information and the code will split it later as in case 0.
I do not see the benefit of having more complexity of a struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The struct would be easier to understand and maintain as the meaning different fields would be self-documented than serializing the data into strings. Could you share the code you tried?
I'd actually argue that also in the "case 0" the dataCatalogs()
returning the full URL (directly from site_local_config.xml
) is not very good, and it would have been better to do the URL parsing in SiteLocalConfigService
instead of FileLocator
. But that is not really worth of improving at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
FWCore/Catalog/src/FileLocator.cc
Outdated
if (m_catAttr.empty()) { | ||
if (!localconfservice.isAvailable()) | ||
throw cms::Exception("FileCatalog", "edm::SiteLocalConfigService is not available"); | ||
if (iCatalog >= localconfservice->dataCatalogs().size()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be calling
if (iCatalog >= localconfservice->dataCatalogs().size()) | |
if (iCatalog >= localconfservice->dataCatalogs(1).size()) |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -50,11 +62,14 @@ namespace edm { | |||
ProtocolRules m_directRules; | |||
/** Inverse rules are used to do the mapping from PFN to LFN*/ | |||
ProtocolRules m_inverseRules; | |||
/** Direct rules are used to do the mapping from LFN to PFN taken from storage.json*/ | |||
ProtocolRules m_directRules_da; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the inverse rules, for PFN->LFN mapping?
On the other hand, a quick git grep
indicates tha tthe FileLocator::lfn()
would not be called except in the unit tests. Should we consider removing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it, not sure why it is here in the first place
Hallo Matti,
yes, it would be desirable to port this to the latest versions of all CMSSW_X releases so
All sites have a storage.json file. The site support team is adding the two new site-local-config.xml sections to the remaining sites this week. Thanks,
|
Thanks Stephan. That sounds to me like there would be no need to keep the |
Hallo Matti,
Yes, this is an intentional change: A site's storage.json and site-local-config.xml might be at different locations for different worker nodes/subsites. This way only the SITECONF directory structure is "fixed" with one entry point, SITECONFIG_PATH. Thanks,
|
I have received your comments and will work them out one by one. |
I have gone slowly to modify codes from Mati suggestions and done test along, so far no "segmentation fault" (I might did something wrong in the past when modifying all suggestions at once and did the test at the end). I will continue tomorrow. |
Hi @makortel, ====================================== |
Thanks @nhduongvn. The compilation error comes from bool empty() const {return site==""&&subSite==""&&storageSite==""&&volume==""&&protocol=="";} (plus formatting). I'd actually prefer to use the bool empty() const {return site.empty() and subSite.empty() and storageSite.empty() and volume.empty() and protocol.empty();} The I would also make the |
Thank you. The errors were fixed. |
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37278/29278
Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
I kind of suspect this PR caused issues in the DQM unit tests in the CMSSW_12_6_X_2022-08-30-2300 IB https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_amd64_gcc10/CMSSW_12_6_X_2022-08-30-2300/unitTestLogs/DQM/Integration#/ |
I think since new data access is the default choice now, this |
another fallout is failure from CRAB jobs https://cms-talk.web.cern.ch/t/crab-test-cmssw-12-6-x-invalid-site-local-config/15423 |
The data taking has ended, and no further problems have surfaced. @nhduongvn @stlammel Maybe now it would be good time to prepare the backports? I updated my comment #37278 (comment) that lists the fixes that need to backported too. |
Hallo Matti,
|
Hi @stlammel @nhduongvn just reminding the catalog change should still be backported. |
I am sorry I misunderstood that @makortel will do the backported. I will try to do this but actually I never done this before. Are there instructions somewhere? |
Ah, so we didn't really agree on who does the backports (as I assumed the opposite). There are instructions in https://cms-sw.github.io/tutorial-resolve-conflicts.html#backporting-a-pr . Alternatively, a short recipe would be to create a developer area of the latest release of a given release cycle, and cherry-pick the commits from this PR and the further fixes listed in #37278 (comment) (and fix possible conflicts on the way), test quickly, and make a PR. |
Hi @makortel , I also tried git cms-rebase-topic --old-base CMSSW_12_6_X_2023-01-25-2300 nhduongvn:new-data-access |
By quick look your new-data-access_126X_125X_1 branch looks correct (three commits on top of CMSSW_12_5_X_2023-01-24-2300). Alternative would be to just cherry-pick the commits of all three PRs, i.e. something along
|
Backport codes for adding new data catalogs from Rucio storage description (RucioCatalog) and use it by default instead of trivial data catalogs (TrivialCatalog) #37278, and related bug fixes
PR description:
New data access using catalogs (defined in ) and storage.json
PR validation:
if this PR is a backport please specify the original PR and why you need to backport that PR:
Before submitting your pull requests, make sure you followed this checklist: