Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment report for 2021-08-23-cloning stuck for days on 15 min #1237

Closed
pietroborrello opened this issue Aug 31, 2021 · 9 comments
Closed

Comments

@pietroborrello
Copy link
Contributor

It seems that the report for the experiment 2021-08-23-cloning is stuck on 15m for something like 3 days.
We have two hypotheses on what could have gone wrong:

  1. Upgrade Ubuntu version oss-fuzz#6180 seems to have upgraded the Ubuntu version for gcr.io/oss-fuzz-base/base-builder images, and we were merged just before Pin images to specific base-builder and base-clang build. #1225 which was preventing the upgrade. This race may have caused some inconsistencies or broke our builds.
    We merged the fix in our fork now, should we resubmit the PR?

  2. The high memory requirements for our experiment somehow broke or frozen the containers that should run our fuzzers. In that case, we would resubmit the PR with lower memory requirements.

Let us know how should we proceed, and thank you for all your work

@inferno-chromium
Copy link
Collaborator

@jonathanmetzman thoughts ?

@inferno-chromium
Copy link
Collaborator

i see weird error

I 2021-08-31T02:23:53.840493021Z startup-script: Exception: Yaml file /work/config/experiment.yaml/experiment.yaml does not exist. 
I 2021-08-31T02:23:53.840474077Z startup-script:     raise Exception('Yaml file %s does not exist.' % yaml_filename) 
I 2021-08-31T02:23:53.840456130Z startup-script:   File "/work/src/common/yaml_utils.py", line 22, in read 
I 2021-08-31T02:23:53.840422724Z startup-script:     experiment_config = yaml_utils.read(experiment_config_filename) 
I 2021-08-31T02:23:53.840405365Z startup-script:   File "/work/src/experiment/stop_experiment.py", line 30, in stop_experiment 
I 2021-08-31T02:23:53.840387351Z startup-script:     if stop_experiment.stop_experiment(experiment_utils.get_experiment_name(), 
I 2021-08-31T02:23:53.840368071Z startup-script:   File "/work/src/experiment/dispatcher.py", line 214, in main 
I 2021-08-31T02:23:53.840345893Z startup-script:     sys.exit(main()) 
I 2021-08-31T02:23:53.840305144Z startup-script:   File "/work/src/experiment/dispatcher.py", line 222, in <module> 
I 2021-08-31T02:23:53.840187735Z startup-script: Traceback (most recent call last): 

@inferno-chromium
Copy link
Collaborator

requesting new experiment in 2021-08-31-cloning

@pietroborrello
Copy link
Contributor Author

thank you

@inferno-chromium
Copy link
Collaborator

thank you

fyi, that experiment is still building and has some fuzzer build failures, so it is retrying, i expect the experiment to start sometime later in day. fuzzbench is not broken as another experiment from @vanhauser-thc is successfully running - https://www.fuzzbench.com/reports/experimental/2021-09-01-aflpp/index.html . Lets keep an eye and file issue if needed in next 2 days. only thing that might break is the big size of the experiment.

Building using (<function build_fuzzer_benchmark at 0x7f5f9d7e9dc0>): [('aflplusplus_classic_ctx', 'wireshark_fuzzshark_ip'), ('cfctx_bottom', 'poppler_pdf_fuzzer'), ('cfctx_dataflow_seadsa', 'poppler_pdf_fuzzer'), ('cfctx_dataflow_seadsa', 'proj4_standard_fuzzer'), ('cfctx_dataflow_seadsa', 'wireshark_fuzzshark_ip'), ('cfctx_dataflow_svf', 'poppler_pdf_fuzzer'), ('cfctx_dataflow_svf', 'proj4_standard_fuzzer'), ('cfctx_dataflow_svf', 'systemd_fuzz-varlink'), ('cfctx_dataflow_svf', 'wireshark_fuzzshark_ip'), ('cfctx_randomic', 'poppler_pdf_fuzzer'), ('cfctx_params', 'grok_grk_decompress_fuzzer'), ('cfctx_params', 'poppler_pdf_fuzzer'), ('cfctx_plain', 'poppler_pdf_fuzzer'), ('cfctx_dataflow_svf_llc', 'libgit2_objects_fuzzer'), ('cfctx_dataflow_svf_llc', 'libxml2_libxml2_xml_reader_for_file_fuzzer'), ('cfctx_dataflow_svf_llc', 'matio_matio_fuzzer'), ('cfctx_dataflow_svf_llc', 'njs_njs_process_script_fuzzer'), ('cfctx_dataflow_svf_llc', 'poppler_pdf_fuzzer'), ('cfctx_dataflow_svf_llc', 'proj4_standard_fuzzer'), ('cfctx_dataflow_svf_llc', 'usrsctp_fuzzer_connect'), ('cfctx_dataflow_svf_llc', 'wireshark_fuzzshark_ip'), ('cfctx_dataflow_seadsa_llc', 'libxml2_libxml2_xml_reader_for_file_fuzzer'), ('cfctx_dataflow_seadsa_llc', 'matio_matio_fuzzer'), ('cfctx_dataflow_seadsa_llc', 'njs_njs_process_script_fuzzer'), ('cfctx_dataflow_seadsa_llc', 'poppler_pdf_fuzzer'), ('cfctx_dataflow_seadsa_llc', 'proj4_standard_fuzzer'), ('cfctx_dataflow_seadsa_llc', 'usrsctp_fuzzer_connect'), ('cfctx_dataflow_seadsa_llc', 'wireshark_fuzzshark_ip'), ('cfctx_params_llc', 'libxml2_libxml2_xml_reader_for_file_fuzzer'), ('cfctx_params_llc', 'matio_matio_fuzzer'), ('cfctx_params_llc', 'njs_njs_process_script_fuzzer'), ('cfctx_params_llc', 'poppler_pdf_fuzzer'), ('cfctx_params_llc', 'proj4_standard_fuzzer'), ('cfctx_params_llc', 'usrsctp_fuzzer_connect'), ('cfctx_params_llc', 'wireshark_fuzzshark_ip'), ('cfctx_params_512kb', 'poppler_pdf_fuzzer'), ('cfctx_params_1mb', 'poppler_pdf_fuzzer'), ('cfctx_params_2mb', 'poppler_pdf_fuzzer'), ('cfctx_params_4mb', 'poppler_pdf_fuzzer')]

@pietroborrello
Copy link
Contributor Author

Depending on the number of concurrent builds and the scheduling of builds it may take some hours to build, especially the _llc fuzzers that require a lot of memory to link, so it should be normal.
Regarding the build failures, they may have gone OOM. We could avoid them by knowing a hard limit on our memory consumption to set in the fuzzer.

Thank you for your availability

@pietroborrello
Copy link
Contributor Author

So it seems that the experiment 2021-08-31-cloning is still stuck at 15min for something like 2 days. Since #1225 should be merged, my guess is unless something in the infrastructure may have broken, our high memory consumption may be the problem.

For example _llc fuzzers create ~300/400Mb binaries that take about 30Gb of RAM to link. They are optimized for CPUs with LLC caches assumed to be at least 8Mb. We can request a new experiment reducing memory consumption if this is a problem, as this benchmark is intended to test the limits and trade-offs of compile-time context sensitivity.

@jonathanmetzman
Copy link
Contributor

I think this experiment was run in the end.

@pietroborrello
Copy link
Contributor Author

Yes, thank you for closing the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants