[Oss-Fuzz] Adding fuzzer-stl_operations #2160

tanuj208 · 2020-06-03T19:35:05Z

I went through the codebase and added a new fuzz target - fuzzer-stl_operations to improve the coverage of the existing fuzz targets.

This fuzz target tests the following functionality -

conversion from various STL containers (vector, deque, list, forward_list, sets and maps) to json
Various looping methods in json vectors and json maps
get() and at() methods for value access
push_back() and emplace_back() methods for json vectors
key() and value() methods for accessing elements inside json maps
parsing directly from vectors

By doing so, the coverage goes up by
Line coverage: 284 lines
Function coverage: 31 functions
Region coverage: 89 regions

Coverage report

To get this coverage report, I did the following:

Forked this repo, added the new fuzz target & updated Makefiles
Cloned google'e repo - oss-fuzz, changed the destination of json's repo to my forked repo in dockerfile and used their script helper.py to build fuzzer.
Downloaded the test corpus from oss-fuzz for existing fuzz targets.
Ran the new fuzz target for approximately 30-40 minutes and ran the same helper.py script to get coverage stats.

Let me know if there is any incorrect usage of the API

coveralls · 2020-06-03T22:22:08Z

Coverage remained the same at 100.0% when pulling 0afebec on tanuj208:develop into 7444c7f on nlohmann:develop.

nlohmann · 2020-06-04T08:21:32Z

I may not understand the issue completely, but that change does not really increase the coverage of the parsers, but only adds some more constructors. In the end, any JSON input will be processed by the same parser - regardless whether it was read from a file, a string, a vector, etc.

How did you come up with the coverage numbers? Do you have insights into fuzzing? I never digged too deep into this.

FrancoisChabot · 2020-06-04T15:21:04Z

I do think that these targets bring a tiny bit of value because the parser is technically a different block of code in each of them. @nlohmann is correct that they will never catch programmer errors within the library's codebase that will not be caught by the existing fuzz tests. However, it's theoretically possible to run into a compiler bug that would exhibit itself through the interaction of a complex-ish iterator implementation and the parser.

Whether this is something worth throwing compute at is debatable considering how unlikely such a scenario is.

Regardless of all that, these coverage numbers are "slightly" misleading since the tests causes additional template instantiations that are never-ever going to be seen in the wild. The ordered containers: vector, list, deque, forward_list I can buy into. But the set ones are just nonsensical.

Edit: wait, I'm suddenly confused. From the title of this PR, I thought this was fuzzing the parser, but it makes no sense since parsing non-contiguous containers is not merged into the develop branch yet (#2145). Attempting the fuzz the parser on any of these except for std::vector<> should cause a compile error.

This is fuzzing the adl_serializer<>, which is a different thing from the parser. I don't think you are fuzzing what you think you are fuzzing. Either that, or you need to rename things.

nlohmann · 2020-06-05T06:19:12Z

So far, all value from the fuzzers were to detect issues during parsing. I do not see value in extending the fuzzing to the input adapters, because they can be unit tested 100%. Though, technically, we would just wasting Google's resources here, I'd rather improve the coverage of the existing fuzz targets if it is not 100% yet.

…e_back() methods

…thods

… when it has large data

tanuj208 · 2020-06-07T11:28:30Z

@nlohmann and @FrancoisChabot thank you very much for your comments. I apologize for the wrong and confusing name of the fuzz target and lack of explanation. I have updated this pull request with more explanation about the functions that this fuzz target tests. I have also added how did I come up with the coverage stats.

I am an intern at Google and writing/improving fuzz target for this repo is a part of my internship project. I learned about fuzzing only a few days back.

I got the coverage stats for each of the functions in this fuzz target, and I found out that the improvement in coverage because of parsing or direct conversion from STL containers was very little, like you mentioned. The main improvement came because of loops in JSON map & JSON vectors, and because of methods like push_back and get.

I request you to please go through the pull request again and decide if it is of value.

nlohmann · 2020-06-08T07:39:02Z

@tanuj208 That is cool to hear! I really appreciate the effort Google already put in checking this project with OSS-Fuzz.

I have a question: How exactly is the project integrated into OSS Fuzz? Do you just execute make fuzzers and then execute the generated binaries? I'm always afraid to break anything, hence I never touched these lines. It would be great to know more about this so I can document this properly for the future.

FrancoisChabot · 2020-06-08T18:57:42Z

I'm going to reiterate (and hopefully better reword) my concern that I seriously suspect this new fuzzer does not fuzz what you think it fuzzes:

If I compile and run the following:

#include "nlohmann/json.hpp"
#include <iostream>

int main() {
    std::string data = "[1, 3, 4]";
    std::vector<uint8_t> vec(data.begin(), data.end());

    // parsing from STL containers
    nlohmann::json j_vector(vec);

    std::cout << j_vector << "\n";
}

The result is:

[91,49,44,32,51,44,32,52,93]

That's because initializing a json object with a vector of characters does not parse these characters, it creates a json list with one number entry per character. The values in the data are not interpreted, just stored as-is inside the json object.

Changing the contents of the vector (and the same goes for any stl container passed this way) cannot change which code paths are executed, and that makes it a bad candidate for fuzzing.

Now. if you did: auto j_vector = nlohmann::json::parse(vec); Then that would be a whole different matter, and fuzzing would make a lot of sense, since the data in the vector is actually interpreted.

nlohmann · 2020-06-08T20:13:06Z

I agree with @FrancoisChabot - however, exactly this check (parsing from a vector of bytes from OSS Fuzz) is already performed by parse_afl_fuzzer.

nlohmann

The fuzzing makes no sense like this.

nlohmann · 2020-06-09T11:22:05Z

test/src/fuzzer-stl_operations.cpp

+    std::unordered_multiset<uint8_t> umultiset(data, data + size);
+
+    // parsing from STL containers
+    json j_vector(vec);


This is a constructor call, not a parse call. Instead of interpreting the bytes in the containers to create a JSON value, the containers are interpreted as JSON array. As such, no real library code is tested, but just copy/move constructors from STL containers.

tanuj208 · 2020-06-09T16:01:40Z

@nlohmann and @FrancoisChabot, I understand what you were trying to explain and why the fuzz target I wrote cannot change code paths with different inputs. I will close this pull request for now. I am very thankful for all the quick replies.

@nlohmann regarding your question about OSS Fuzz, @oliverchang has more context, I will let him respond.

nlohmann · 2020-06-09T16:05:36Z

If there is any way to improve the current fuzzing, e.g. with a grammar or so, please let us know!

FrancoisChabot · 2020-06-09T16:54:49Z

A huge +1 to adding a grammar to the fuzzer as per https://github.com/google/AFL#9-fuzzer-dictionaries, especially considering afl appears to come pre-packaged with a json dictionary.

That being said, this is a property of how the fuzzer is invoked rather than how it is built, which seems to be external to this repo.

oliverchang · 2020-06-10T01:00:24Z

@tanuj208 That is cool to hear! I really appreciate the effort Google already put in checking this project with OSS-Fuzz.

I have a question: How exactly is the project integrated into OSS Fuzz? Do you just execute make fuzzers and then execute the generated binaries? I'm always afraid to break anything, hence I never touched these lines. It would be great to know more about this so I can document this properly for the future.

Json is integrated into OSS-Fuzz here. We do essentially just execute make fuzzers.

tanuj208 · 2020-06-10T15:05:32Z

After adding the JSON dictionary, the coverage did not increase since fuzzer was running for a long time and after trying millions of random inputs, it picked up the JSON format by itself. But since it is good to have a dictionary, I added that in oss-fuzz repository. It should get merged soon.

tanuj208 requested a review from nlohmann as a code owner June 3, 2020 19:35

tanuj208 marked this pull request as draft June 5, 2020 15:38

Tanuj Garg and others added 24 commits June 7, 2020 14:46

[Experiment] added a test fuzzer

f829e72

[Experiment] edited makefiles to include test fuzzer

cd36c9b

[Experiment] corrected a typo in makefile

a90ff95

[Experiment] changed name of test fuzzer

b95ca37

[Experiment] renamed again for confirmation

651fb03

[Experiment] temp fuzzer removed

def05d0

fuzzer, testing conversion from vector & deque to json, added

35ce214

added conversion from other stl sequence containers to json added

5d501bf

Added conversion from various maps to json

74a0470

removed namespace bug

c66806b

added parsing from vector and deque

57c8851

added looping in vector & map, and tested get(), push_back() & emplac…

5868e7f

…e_back() methods

changed datatype of key in maps from uint8_t to std::string

9388bb2

initialized json with empty array to test push_back & emplace_back me…

5c51ebb

…thods

resolved all warnings

35ac931

removed deque parsing because deque is not a contiguous byte sequence…

2efdbfa

… when it has large data

[Checking stats] Keeping only conversion from stl containers

6e81c78

[Checking stats] Keeping only iterating a json vector container

c06ee97

[Checking stats] Keeping only stl like operations on json vector

02c901e

[Checking stats] resolved compile error

2cd4352

[Checking stats] Keeping only map coversions to json

f3c7022

[Checking stats] keeping only stl like operations

f1f38cf

[Checking stats] Keeping json map iteration

a94e1af

[Checking stats] Keeping only parsing vector

88194fd

tanuj208 added 2 commits June 7, 2020 14:51

[Checking stats] resolved compiler error

d402b6a

Added complete data and renamed files and fuzzer

0afebec

tanuj208 changed the title ~~[Oss-Fuzz] Adding parse_stl_fuzzer~~ [Oss-Fuzz] Adding fuzzer-stl_operations Jun 7, 2020

tanuj208 marked this pull request as ready for review June 7, 2020 11:28

nlohmann requested changes Jun 9, 2020

View reviewed changes

tanuj208 closed this Jun 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Oss-Fuzz] Adding fuzzer-stl_operations #2160

[Oss-Fuzz] Adding fuzzer-stl_operations #2160

tanuj208 commented Jun 3, 2020 •

edited

Loading

coveralls commented Jun 3, 2020 •

edited

Loading

nlohmann commented Jun 4, 2020

FrancoisChabot commented Jun 4, 2020 •

edited

Loading

nlohmann commented Jun 5, 2020

tanuj208 commented Jun 7, 2020

nlohmann commented Jun 8, 2020

FrancoisChabot commented Jun 8, 2020 •

edited

Loading

nlohmann commented Jun 8, 2020

nlohmann left a comment

nlohmann Jun 9, 2020

tanuj208 commented Jun 9, 2020

nlohmann commented Jun 9, 2020

FrancoisChabot commented Jun 9, 2020

oliverchang commented Jun 10, 2020

tanuj208 commented Jun 10, 2020

[Oss-Fuzz] Adding fuzzer-stl_operations #2160

[Oss-Fuzz] Adding fuzzer-stl_operations #2160

Conversation

tanuj208 commented Jun 3, 2020 • edited Loading

coveralls commented Jun 3, 2020 • edited Loading

nlohmann commented Jun 4, 2020

FrancoisChabot commented Jun 4, 2020 • edited Loading

nlohmann commented Jun 5, 2020

tanuj208 commented Jun 7, 2020

nlohmann commented Jun 8, 2020

FrancoisChabot commented Jun 8, 2020 • edited Loading

nlohmann commented Jun 8, 2020

nlohmann left a comment

Choose a reason for hiding this comment

nlohmann Jun 9, 2020

Choose a reason for hiding this comment

tanuj208 commented Jun 9, 2020

nlohmann commented Jun 9, 2020

FrancoisChabot commented Jun 9, 2020

oliverchang commented Jun 10, 2020

tanuj208 commented Jun 10, 2020

tanuj208 commented Jun 3, 2020 •

edited

Loading

coveralls commented Jun 3, 2020 •

edited

Loading

FrancoisChabot commented Jun 4, 2020 •

edited

Loading

FrancoisChabot commented Jun 8, 2020 •

edited

Loading