Experiment report for 2021-08-23-cloning stuck for days on 15 min #1237

pietroborrello · 2021-08-31T10:36:53Z

It seems that the report for the experiment 2021-08-23-cloning is stuck on 15m for something like 3 days.
We have two hypotheses on what could have gone wrong:

Upgrade Ubuntu version oss-fuzz#6180 seems to have upgraded the Ubuntu version for gcr.io/oss-fuzz-base/base-builder images, and we were merged just before Pin images to specific base-builder and base-clang build. #1225 which was preventing the upgrade. This race may have caused some inconsistencies or broke our builds.
We merged the fix in our fork now, should we resubmit the PR?
The high memory requirements for our experiment somehow broke or frozen the containers that should run our fuzzers. In that case, we would resubmit the PR with lower memory requirements.

Let us know how should we proceed, and thank you for all your work

inferno-chromium · 2021-08-31T15:59:04Z

@jonathanmetzman thoughts ?

inferno-chromium · 2021-08-31T19:20:04Z

i see weird error

I 2021-08-31T02:23:53.840493021Z startup-script: Exception: Yaml file /work/config/experiment.yaml/experiment.yaml does not exist. 
I 2021-08-31T02:23:53.840474077Z startup-script:     raise Exception('Yaml file %s does not exist.' % yaml_filename) 
I 2021-08-31T02:23:53.840456130Z startup-script:   File "/work/src/common/yaml_utils.py", line 22, in read 
I 2021-08-31T02:23:53.840422724Z startup-script:     experiment_config = yaml_utils.read(experiment_config_filename) 
I 2021-08-31T02:23:53.840405365Z startup-script:   File "/work/src/experiment/stop_experiment.py", line 30, in stop_experiment 
I 2021-08-31T02:23:53.840387351Z startup-script:     if stop_experiment.stop_experiment(experiment_utils.get_experiment_name(), 
I 2021-08-31T02:23:53.840368071Z startup-script:   File "/work/src/experiment/dispatcher.py", line 214, in main 
I 2021-08-31T02:23:53.840345893Z startup-script:     sys.exit(main()) 
I 2021-08-31T02:23:53.840305144Z startup-script:   File "/work/src/experiment/dispatcher.py", line 222, in <module> 
I 2021-08-31T02:23:53.840187735Z startup-script: Traceback (most recent call last):

inferno-chromium · 2021-08-31T19:20:44Z

requesting new experiment in 2021-08-31-cloning

pietroborrello · 2021-08-31T20:10:05Z

thank you

inferno-chromium · 2021-09-01T16:01:34Z

thank you

fyi, that experiment is still building and has some fuzzer build failures, so it is retrying, i expect the experiment to start sometime later in day. fuzzbench is not broken as another experiment from @vanhauser-thc is successfully running - https://www.fuzzbench.com/reports/experimental/2021-09-01-aflpp/index.html . Lets keep an eye and file issue if needed in next 2 days. only thing that might break is the big size of the experiment.

Building using (<function build_fuzzer_benchmark at 0x7f5f9d7e9dc0>): [('aflplusplus_classic_ctx', 'wireshark_fuzzshark_ip'), ('cfctx_bottom', 'poppler_pdf_fuzzer'), ('cfctx_dataflow_seadsa', 'poppler_pdf_fuzzer'), ('cfctx_dataflow_seadsa', 'proj4_standard_fuzzer'), ('cfctx_dataflow_seadsa', 'wireshark_fuzzshark_ip'), ('cfctx_dataflow_svf', 'poppler_pdf_fuzzer'), ('cfctx_dataflow_svf', 'proj4_standard_fuzzer'), ('cfctx_dataflow_svf', 'systemd_fuzz-varlink'), ('cfctx_dataflow_svf', 'wireshark_fuzzshark_ip'), ('cfctx_randomic', 'poppler_pdf_fuzzer'), ('cfctx_params', 'grok_grk_decompress_fuzzer'), ('cfctx_params', 'poppler_pdf_fuzzer'), ('cfctx_plain', 'poppler_pdf_fuzzer'), ('cfctx_dataflow_svf_llc', 'libgit2_objects_fuzzer'), ('cfctx_dataflow_svf_llc', 'libxml2_libxml2_xml_reader_for_file_fuzzer'), ('cfctx_dataflow_svf_llc', 'matio_matio_fuzzer'), ('cfctx_dataflow_svf_llc', 'njs_njs_process_script_fuzzer'), ('cfctx_dataflow_svf_llc', 'poppler_pdf_fuzzer'), ('cfctx_dataflow_svf_llc', 'proj4_standard_fuzzer'), ('cfctx_dataflow_svf_llc', 'usrsctp_fuzzer_connect'), ('cfctx_dataflow_svf_llc', 'wireshark_fuzzshark_ip'), ('cfctx_dataflow_seadsa_llc', 'libxml2_libxml2_xml_reader_for_file_fuzzer'), ('cfctx_dataflow_seadsa_llc', 'matio_matio_fuzzer'), ('cfctx_dataflow_seadsa_llc', 'njs_njs_process_script_fuzzer'), ('cfctx_dataflow_seadsa_llc', 'poppler_pdf_fuzzer'), ('cfctx_dataflow_seadsa_llc', 'proj4_standard_fuzzer'), ('cfctx_dataflow_seadsa_llc', 'usrsctp_fuzzer_connect'), ('cfctx_dataflow_seadsa_llc', 'wireshark_fuzzshark_ip'), ('cfctx_params_llc', 'libxml2_libxml2_xml_reader_for_file_fuzzer'), ('cfctx_params_llc', 'matio_matio_fuzzer'), ('cfctx_params_llc', 'njs_njs_process_script_fuzzer'), ('cfctx_params_llc', 'poppler_pdf_fuzzer'), ('cfctx_params_llc', 'proj4_standard_fuzzer'), ('cfctx_params_llc', 'usrsctp_fuzzer_connect'), ('cfctx_params_llc', 'wireshark_fuzzshark_ip'), ('cfctx_params_512kb', 'poppler_pdf_fuzzer'), ('cfctx_params_1mb', 'poppler_pdf_fuzzer'), ('cfctx_params_2mb', 'poppler_pdf_fuzzer'), ('cfctx_params_4mb', 'poppler_pdf_fuzzer')]

pietroborrello · 2021-09-01T17:58:37Z

Depending on the number of concurrent builds and the scheduling of builds it may take some hours to build, especially the _llc fuzzers that require a lot of memory to link, so it should be normal.
Regarding the build failures, they may have gone OOM. We could avoid them by knowing a hard limit on our memory consumption to set in the fuzzer.

Thank you for your availability

pietroborrello · 2021-09-03T13:05:00Z

So it seems that the experiment 2021-08-31-cloning is still stuck at 15min for something like 2 days. Since #1225 should be merged, my guess is unless something in the infrastructure may have broken, our high memory consumption may be the problem.

For example _llc fuzzers create ~300/400Mb binaries that take about 30Gb of RAM to link. They are optimized for CPUs with LLC caches assumed to be at least 8Mb. We can request a new experiment reducing memory consumption if this is a problem, as this benchmark is intended to test the limits and trade-offs of compile-time context sensitivity.

jonathanmetzman · 2022-03-02T16:55:46Z

I think this experiment was run in the end.

pietroborrello · 2022-03-02T17:14:03Z

Yes, thank you for closing the issue

pietroborrello closed this as completed Aug 31, 2021

pietroborrello reopened this Sep 3, 2021

pietroborrello mentioned this issue Sep 8, 2021

Experiment for AFL++ cloning strategies using different (smaller) map sizes #1243

Merged

jonathanmetzman closed this as completed Mar 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment report for 2021-08-23-cloning stuck for days on 15 min #1237

Experiment report for 2021-08-23-cloning stuck for days on 15 min #1237

pietroborrello commented Aug 31, 2021

inferno-chromium commented Aug 31, 2021

inferno-chromium commented Aug 31, 2021

inferno-chromium commented Aug 31, 2021

pietroborrello commented Aug 31, 2021

inferno-chromium commented Sep 1, 2021

pietroborrello commented Sep 1, 2021

pietroborrello commented Sep 3, 2021

jonathanmetzman commented Mar 2, 2022

pietroborrello commented Mar 2, 2022

Experiment report for 2021-08-23-cloning stuck for days on 15 min #1237

Experiment report for 2021-08-23-cloning stuck for days on 15 min #1237

Comments

pietroborrello commented Aug 31, 2021

inferno-chromium commented Aug 31, 2021

inferno-chromium commented Aug 31, 2021

inferno-chromium commented Aug 31, 2021

pietroborrello commented Aug 31, 2021

inferno-chromium commented Sep 1, 2021

pietroborrello commented Sep 1, 2021

pietroborrello commented Sep 3, 2021

jonathanmetzman commented Mar 2, 2022

pietroborrello commented Mar 2, 2022