-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thin DS: Command Line Interface #11549
Conversation
Skipping CI for Draft Pull Request. |
cd4904d
to
f2d31c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great thing!
I tried to run the test scenarios for rule audit_rules_privileged_commands and using the full data stream it took 5:12 but using the thin data stream built from this PR it took only 3:18. That's a big saving.
In general, I suggest focusing now on getting this merge-able instead of focusing on further reduction of the built data stream. Further reduction of the built data stream can be done in following separate PRs.
cmake/SSGCommon.cmake
Outdated
@@ -118,7 +118,7 @@ macro(ssg_build_compiled_artifacts PRODUCT) | |||
add_custom_command( | |||
OUTPUT "${CMAKE_CURRENT_BINARY_DIR}/ssg_build_compile_all-${PRODUCT}" | |||
COMMAND ${CMAKE_COMMAND} -E make_directory "${CMAKE_CURRENT_BINARY_DIR}/profiles" | |||
COMMAND env "PYTHONPATH=$ENV{PYTHONPATH}" "${PYTHON_EXECUTABLE}" "${SSG_BUILD_SCRIPTS}/compile_all.py" --resolved-base "${CMAKE_CURRENT_BINARY_DIR}" --project-root "${CMAKE_SOURCE_DIR}" --build-config-yaml "${CMAKE_BINARY_DIR}/build_config.yml" --product-yaml "${CMAKE_CURRENT_BINARY_DIR}/product.yml" --sce-metadata "${CMAKE_CURRENT_BINARY_DIR}/checks/sce/metadata.json" | |||
COMMAND env "PYTHONPATH=$ENV{PYTHONPATH}" "${PYTHON_EXECUTABLE}" "${SSG_BUILD_SCRIPTS}/compile_all.py" --resolved-base "${CMAKE_CURRENT_BINARY_DIR}" --project-root "${CMAKE_SOURCE_DIR}" --build-config-yaml "${CMAKE_BINARY_DIR}/build_config.yml" --product-yaml "${CMAKE_CURRENT_BINARY_DIR}/product.yml" --sce-metadata "${CMAKE_CURRENT_BINARY_DIR}/checks/sce/metadata.json" "${SSG_THIN_DS_ARG}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that it's sometimes used here but I don't like this idea of including the option into the variable. Instead I prefer to write the option explicitly, because that would be consistent with the other options here.
COMMAND env "PYTHONPATH=$ENV{PYTHONPATH}" "${PYTHON_EXECUTABLE}" "${SSG_BUILD_SCRIPTS}/compile_all.py" --resolved-base "${CMAKE_CURRENT_BINARY_DIR}" --project-root "${CMAKE_SOURCE_DIR}" --build-config-yaml "${CMAKE_BINARY_DIR}/build_config.yml" --product-yaml "${CMAKE_CURRENT_BINARY_DIR}/product.yml" --sce-metadata "${CMAKE_CURRENT_BINARY_DIR}/checks/sce/metadata.json" "${SSG_THIN_DS_ARG}" | |
COMMAND env "PYTHONPATH=$ENV{PYTHONPATH}" "${PYTHON_EXECUTABLE}" "${SSG_BUILD_SCRIPTS}/compile_all.py" --resolved-base "${CMAKE_CURRENT_BINARY_DIR}" --project-root "${CMAKE_SOURCE_DIR}" --build-config-yaml "${CMAKE_BINARY_DIR}/build_config.yml" --product-yaml "${CMAKE_CURRENT_BINARY_DIR}/product.yml" --sce-metadata "${CMAKE_CURRENT_BINARY_DIR}/checks/sce/metadata.json" --rule-id "${SSG_THIN_DS_RULE_ID}" |
efd1aa4
to
e5fdf78
Compare
Seems to work for me - at least The speed up in calling
(Some random rule I saw in the list.) |
I tried build the
|
build-scripts/compile_all.py
Outdated
if rule_id not in rules and "ALL_RULES" not in rule_id: | ||
raise Exception("Rule ID: {} not found!".format(rule_id)) | ||
if "ALL_RULES" not in rule_id: | ||
rules = set(rule_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong. It will create a set of letters instead of a set containing a single item.
jcerny@fedora:~/work/git/scap-security-guide (pr/11549)$ python3
Python 3.12.1 (main, Dec 18 2023, 00:00:00) [GCC 13.2.1 20231205 (Red Hat 13.2.1-6)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> set("ahoj")
{'o', 'j', 'a', 'h'}
>>>
def get_profiles_per_rule_by_id(rule_id, project_root_abspath, env_yaml, product_yaml): | ||
relevant_benchmarks = get_relevant_benchmarks(env_yaml, product_yaml) | ||
rules = find_existing_rules(project_root_abspath, relevant_benchmarks) | ||
if rule_id not in rules and "ALL_RULES" not in rule_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This special value needs to be mentioned in the help text of the --rule-id
option.
@@ -158,8 +205,14 @@ def main(): | |||
controls_manager.remove_selections_not_known(loader.all_rules) | |||
controls_manager.add_references(loader.all_rules) | |||
|
|||
profiles_by_id = get_all_resolved_profiles_by_id( | |||
env_yaml, product_yaml, loader, product_cpes, controls_manager, controls_dir) | |||
if args.rule_id is None or args.rule_id == "off": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The special value needs to be described in the help text of the --rule-id
option.
for rule in rules: | ||
data = { | ||
'documentation_complete': True, | ||
'variables': variables, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'variables': variables, | |
'variables': variables, | |
'hidden': True, |
I suggest creating a hidden profile so that this fake profile won't appear in the built SCAP data stream.
build-scripts/build_xccdf.py
Outdated
) | ||
parser.add_argument( | ||
"--per-profile-dir", | ||
help="Directory to store XCCDF, OVAL, OCIL, per profile. (off: to disable)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per profile? I'm confused. It seems like they're per rule. Can you clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole process of creating a thin datastream is that a profile is generated for each rule. (In case only one rule is specified is created one profile) This is done in the compile_all.py
file. When the --thin
flag is set, XCCDF, OVAL and OCIL components need to be generated for each rule. Since each rule has its own profile the generation is done by profiles. (In case the one rule is specified --per-profile-dir
flag is disabled). Also, this can be used sometime in the future when we want to build a datastream with only one profile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so in fact the build system creates fake profiles for each rules and then build_xccdf.py
simply creates content driven by each of the fake profiles.
You're right that naming this a "--per-profile-dir" is technically correct because that's the way how things work. However, I'm afraid that it might create more confusion given the way how it's used. If someone takes a look at what is generated into this directory they might not realize that it's internally implemented using the fake profiles.
I suggest renaming this to --thin-ds-components-dir
or similar. This name would better describe the actual usage of this option. Moreover, it will still be correct if you consider that a data stream with just a single profile is also a thin data stream.
I suggest renaming also the SSG_PER_PROFILE_COMPONENT_DIR CMake variable in a similar way.
@@ -93,6 +94,8 @@ print_help() | |||
printf '\t%s\n' "--derivatives, --no-derivatives: Also build derivatives of products if applicable (off by default)" | |||
printf '\t%s\n' "--ansible-playbooks, --no-ansible-playbooks: Build Ansible Playbooks for every profile (on by default)" | |||
printf '\t%s\n' "--bash-scripts, --no-bash-scripts: Build Bash remediation scripts for every profile (on by default)" | |||
printf '\t%s\n' "-t, --thin, --no-thin: Build thin data streams for each rule. Do not build any of the guides, tables, etc (off by default)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This takes a horrible amount of time. Do you have any idea why it could take so long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it generates about 1800 thin ds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try some products with smaller benchmark. (For example firefox)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it generates about 1800 thin ds.
Obviously.
But I was more interested if you know in which part it spends the most time and what slows this down the most. Is it composing the data streams from individual components? Processing compiled rules? Something else?
In theory it should be possible to make the difference between building a single large data stream or many small data streams negligible because these outputs differ only structuring of the output data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generating the XCCDF benchmark on a profile and linking it with other components causes the biggest slowdown. I'm working on speeding it up right now. Should I incorporate the changes into this PR or create another one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose doing that in a separate PR. This PR is already large enough.
cmake/SSGCommon.cmake
Outdated
@@ -751,6 +751,10 @@ macro(ssg_build_product PRODUCT) | |||
set(PRODUCT_${_LANGUAGE}_REMEDIATION_ENABLED TRUE) | |||
endforeach() | |||
|
|||
set(SSG_PER_PROFILE_COMPONET_DIR "off") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
COMPONET
build-scripts/build_xccdf.py
Outdated
p = Paths_( | ||
xccdf=get_path(args.per_profile_dir, "xccdf_{}.xml".format(id_)), | ||
oval=get_path(args.per_profile_dir, "oval_{}.xml".format(id_)), | ||
ocil=get_path(args.per_profile_dir, "ocil_{}.xml".format(id_)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not include OCIL at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the rule does not contain OCIL. OCIL does not appear in the final ds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'd keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if the rule contains OCIL, it will appear in the final DS 😄
My point is that the OCIL is useless for the use cases of the thin data stream, so I wouldn't include it at all.
(Side note: our OCIL is useless for all purposes but we need to keep it in the normal DS because of compatibility)
Would it be easy or difficult to not add OCILs to thin DSs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's simple it's just a few lines of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Mab879 The problem you found should be fixed. |
Confirmed, both the Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The outputs look good. We can solve further content size reduction in separate PRs.
Please see my comments.
CMakeLists.txt
Outdated
@@ -275,6 +279,7 @@ message(STATUS "Separate SCAP files: ${SSG_SEPARATE_SCAP_FILES_ENABLED}") | |||
message(STATUS "Ansible Playbooks: ${SSG_ANSIBLE_PLAYBOOKS_ENABLED}") | |||
message(STATUS "Ansible Playbooks Per Rule: ${SSG_ANSIBLE_PLAYBOOKS_PER_RULE_ENABLED}") | |||
message(STATUS "Bash scripts: ${SSG_BASH_SCRIPTS_ENABLED}") | |||
message(STATUS "Thin Datastreams: ${SSG_THIN_DS}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
message(STATUS "Thin Datastreams: ${SSG_THIN_DS}") | |
message(STATUS "Thin data streams: ${SSG_THIN_DS}") |
build-scripts/compile_all.py
Outdated
"--rule-id", | ||
type=str, | ||
help="Creates a profile with the specified rule and does not use other profiles." | ||
"It is named in the RULE_ID format. If you want to process " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest something like: The profile ID is identical to the rule ID of the selected rule.
ssg/build_yaml.py
Outdated
if self.benchmark is None: | ||
raise Exception( | ||
"Before generating benchmarks for each profile, you need to load " | ||
"the initial benchmark. Using the load_benchmark method." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the initial benchmark. Using the load_benchmark method." | |
"the initial benchmark using the load_benchmark method." |
🤖 A k8s content image for this PR is available at: Click here to see how to deploy itIf you alread have Compliance Operator deployed: Otherwise deploy the content and operator together by checking out ComplianceAsCode/compliance-operator and: |
Code Climate has analyzed commit 6b4b49f and detected 0 issues on this pull request. The test coverage on the diff in this pull request is 13.7% (50% is the threshold). This pull request will bring the total coverage in the repository to 58.1% (-0.1% change). View more on Code Climate. |
/packit retest-failed |
1 similar comment
/packit retest-failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have generated thin DSs for some of my favourite rules. Then I tried to use them for a scan. It worked as expected.
Also, I have tried to generate thin DSs for all rules, this also worked.
At this moment the PR is good enough so I will merge it. We will continue in separate PRs to avoid bloating this PR and complicating reviews.
I suggest focusing on the following aspects in future PRs:
- Improve awful performance of the
-t
option. If the performance will be improved, the feature will suddenly start to be useful for testing in downstream and CI. - Removing excess XCCDF Groups and Profiles that aren't essential
- Integration test
Great job!
/packit retest-failed |
1 similar comment
/packit retest-failed |
Description:
This PR introduces a new flag for generating thin data streams. The
--rule-id
flag generates a thin data stream for a single rule. The thin data stream for a single rule generated by the--rule-id
flag is reduced from 19 MB to approximately 890 KB. Also, the--thin
flag is created to generate thin data streams for each rule in the rules in benchmark relevant to theproduct
. The rules use default variables.Further reduction of CPE, XCCDF, etc. will be done in future PRs. Also, the documentation of the changes will be created in another PR.
A side effect of this PR is automatic XML formatting using Python3+. So in another PR, the XML formatting step can be removed from cmake.
--rule-id
RULE_IDThis flag builds a single Datastream as in the case of thick Datastreams but as a thin Datastream with only one rule.
--thin
This flag builds thin Datastreams for each rule relevant to the benchmark. The components of each rule, such as XCCDF, OVAL, and OCIL, are stored in the build directory in
./build/<PRODUCT>/thin_ds_components
as files with prefixes for specific component types and the rule id. The final Thin Datastream is stored in the build directory under./build/thin_ds/
. In this case, you disabled the Datastream build version 1. 2.Review Hints:
To test the
--rule-id
flag, you can run this command:To test the
--thin
flag, you can run this script:The script generates a thin Datastream for each rule and then performs a scan using
oscap
.This test takes more than an hour to run because there are approximately 1830 rules to process and some rules are very memory intensive.