Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buildsystem optimization #8588

Merged
merged 5 commits into from
Jun 16, 2022

Conversation

j-ode
Copy link
Collaborator

@j-ode j-ode commented Apr 21, 2022

Description:

  • Speed up the build system by using C implementations of yaml Loader and Dumper in ssg/yaml.py used by build-scripts/collect_remediations.py
  • Fix build profiler bug where targets with multiple outputs all added duplicate build time
  • What is the difference in build time after these changes (RHEL8 build):
    • before: ~3 minutes, collect_remediations taking up ~17% of build time and ~30 seconds itself
    • after: ~2:45 minutes, collect_remediations taking up ~12% of build time and ~20 seconds itself
    • these times were measured using the profiling tool from Implemented buildsystem profiling #7832, they vary a lot and are averages from multiple builds

Issues:

@j-ode j-ode requested a review from jan-cerny April 21, 2022 16:34
@pep8speaks
Copy link

pep8speaks commented Apr 21, 2022

Hello @j-ode! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-06-15 15:06:48 UTC

@github-actions
Copy link

Start a new ephemeral environment with changes proposed in this pull request:

Open in Gitpod

@jan-cerny
Copy link
Collaborator

  • it seems to be a great idea!
  • in theory the 2 implementation should be equivalent
  • the changes in CMake should definitely be a separate commit, preferably they would be a separate pull request
  • we have some tests failing in the gating runs so please investigate them

@jan-cerny jan-cerny self-assigned this Apr 28, 2022
@j-ode j-ode force-pushed the buildsystem_optimization branch 3 times, most recently from f42c54d to 9dc2100 Compare May 6, 2022 13:04
@jan-cerny
Copy link
Collaborator

Unfortunately, I have found that there are situations in which this causes incorrect form of data in the built data stream. For example, in ssg-rhel8-ds.xml the Ansible remediation for rule accounts_password_pam_minlen contains create: 'yes'. But it should contain create: yes, because in Ansible lineinfile module the create key must be a boolean, the yes without quotes is a boolean but 'yes' in quotes is a string. Fortunately, Ansible understands it because it attempts to perform an implicit typecast and if you pass 'yes' Ansible converts it to yes internally. So it probably doesn't break the content. But I don't know if there are rules in which this subtle change will matter and I also better wouldn't rely on this undocumented typecast behavior of Ansible. We will need to find out why the quotes around yes got added.

ssg/yaml.py Outdated
@@ -121,7 +125,7 @@ def open_raw(yaml_file):
return yaml_contents


def ordered_load(stream, Loader=yaml.Loader, object_pairs_hook=OrderedDict):
def ordered_load(stream, Loader=yaml_SafeLoader, object_pairs_hook=OrderedDict):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there is the "suspicious" line 33 which modifies yaml_SafeLoader:

yaml_SafeLoader.add_constructor(u'tag:yaml.org,2002:bool', _bool_constructor)

But before your patch the yaml_SafeLoader was used only in the _open_yaml function. So this special modfication from line 33 affected only the _open_yaml function. But, you start to use the yaml_SafeLoader also here in ordered_load, so now this special modification affects also the ordered_load function.

If we would like to use yaml_SafeLoader in both functions but keep the previous behavior of ordered_load, we would have to rework it so that the special modification from line 33 would again affect only _open_yaml. We can do that using a trick like this:

diff --git a/ssg/yaml.py b/ssg/yaml.py
index 71f7e3adee..601584a21d 100644
--- a/ssg/yaml.py
+++ b/ssg/yaml.py
@@ -29,8 +29,6 @@ def _unicode_constructor(self, node):
     return str(string_like)
 
 
-# Don't follow python bool case
-yaml_SafeLoader.add_constructor(u'tag:yaml.org,2002:bool', _bool_constructor)
 # Python2-relevant - become able to resolve "unicode strings"
 yaml_SafeLoader.add_constructor(u'tag:yaml.org,2002:python/unicode', _unicode_constructor)
 
@@ -52,8 +50,13 @@ def _open_yaml(stream, original_file=None, substitutions_dict={}):
 
     Return None if it contains "documentation_complete" key set to "false".
     """
+    class OurCustomLoader(yaml_SafeLoader):
+        pass
+
+    # Don't follow python bool case
+    OurCustomLoader.add_constructor(u'tag:yaml.org,2002:bool', _bool_constructor)
     try:
-        yaml_contents = yaml.load(stream, Loader=yaml_SafeLoader)
+        yaml_contents = yaml.load(stream, Loader=OurCustomLoader)
 
         if yaml_contents.pop("documentation_complete", "true") == "false" and \
                 substitutions_dict.get("cmake_build_type") != "Debug":

I admit that I don't like that we have some special things like this yaml_SafeLoader.add_constructor(u'tag:yaml.org,2002:bool', _bool_constructor), it would be ideal to remove it, but that would be very very difficult because a lot of code depend on this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for noticing this. From my testing, it seems this conversion was caused by using CSafeLoader, when using CLoader the issue is gone.

@jan-cerny
Copy link
Collaborator

should contain create: yes

or it should contain create: true which is the case that was there before this PR

…erted

CDumper inserts three dots at the end of some ansible YAML remediations,
which causes the utils/ansible_playbook_to_role.py script to fail when
collecting the remediations from a playbook

Apparently one YAML file should contain only one document, and when
multiple documents denoted by '---' start of document or '...' end of
document strings are inside one YAML file, yaml.load_all should be used
instead of yaml.load function.
@codeclimate
Copy link

codeclimate bot commented Jun 15, 2022

Code Climate has analyzed commit 74d2270 and detected 2 issues on this pull request.

Here's the issue category breakdown:

Category Count
Style 1
Complexity 1

The test coverage on the diff in this pull request is 63.6% (50% is the threshold).

This pull request will bring the total coverage in the repository to 42.8% (0.0% change).

View more on Code Climate.

Copy link
Collaborator

@jan-cerny jan-cerny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I experience a smaller speedup, on my machine ~ 6 seconds, but that's a great achievement! Kudos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants