Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config: replace dom parser with sax parser #972

Merged
merged 18 commits into from
May 17, 2017
Merged

Conversation

danielhochman
Copy link
Contributor

@danielhochman danielhochman commented May 16, 2017

The main benefit of using the SAX parser is that we get line numbers in error messages by using a custom stream during parsing.

New error:

JSON at lines 4-6 does not conform to schema.
 Schema violation: pattern
 Offending document key: #/name

The diff is not fun to look through given the changes to some widely used variable names. Pretty much all meaningful code changes are here: https://github.com/lyft/envoy/pull/972/files#diff-33ef08aee21cc1f90b7a1ff26051aa6b

TODO:

  • fix remaining tests (failing tests are asserting on the old error message format, want to make sure everyone is happy with it first)
  • added more tests on exception message format

#include <string>

#include "envoy/json/json_object.h"

// Do not let RapidJson leak outside json_loader.
Copy link
Contributor

@ccaraman ccaraman May 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this can't be moved into a header/cc file. Sorry I didn't see the rapidjson::Document in your member variables earlier.

Please move this back into json_loader.cc.

#include "spdlog/spdlog.h"

// Do not let RapidJson leak outside of this file.
// Do not let RapidJson leak outside of json_loader.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old comment is correct, but to make it more clear, you can change it to json_loader.cc

bool boolean_value_;
double double_value_;
int64_t integer_value_;
std::map<const std::string, FieldPtr> object_value_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not std::unordered_map?

* Internal representation of Object.
*/
class Field;
typedef std::shared_ptr<Field> FieldPtr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FieldSharedPtr

class Field;
typedef std::shared_ptr<Field> FieldPtr;

class Field : public Object, public std::enable_shared_from_this<Field> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General preference for not having large nested classes (and series of nested classes). It becomes hard to track the outer class. Consider using a namespace (anonymous if inside .cc).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah cool was not aware of this feature. new to the language. will make that change.

bool isType(Type type) const { return type == type_; }
void checkType(Type type) const {
if (!isType(type)) {
throw Exception("Field access in JSON with correct type.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: incorrect type

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is there a test for this? Can we add the type name to the error string?

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improved error messages!

}
return ret;
}
uint64_t getLineNumber() { return line_number_; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: const

bool isType(Type type) const { return type == type_; }
void checkType(Type type) const {
if (!isType(type)) {
throw Exception("Field access in JSON with correct type.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is there a test for this? Can we add the type name to the error string?

} else {
return default_value;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These getters are all kind of boiler platish, consider a template function implementation.

std::vector<std::string> string_array;
string_array.reserve(array.size());
for (const auto& element : array) {
if (!element > isType(Type::String)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this expression is hard to read, there's negation, inequality and on the RHS a predicate. Can this be made clearer?

array->setLineNumberStart(stream_.getLineNumber());

switch (state_) {
case expectValueOrStartObjectArray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can some of this be refactored with StartObject?

@@ -12,7 +12,7 @@
namespace Json {
class Object;

typedef std::unique_ptr<Object> ObjectPtr;
typedef std::shared_ptr<Object> ObjectPtr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ObjectSharedPtr

throw Exception("Field access in JSON with correct type.");
}
}
// value rtype funcs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rtype = return type?

// Ch is typdef in parent class to handle character encoding.
public:
LineCountingStringStream(const Ch* src) : rapidjson::StringStream(src), line_number_(1) {}
Ch Take() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that an override of method from rapidjson::StringStream? (if so Ch Take() override)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only virtual member functions can be marked 'override'

Copy link
Member

@RomanDzhabarov RomanDzhabarov May 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's why i asked if Take is defined on the rapidjson::StringStream as virtual.
if no, nvm

}

rapidjson::Document asRapidJsonDocument() const;
static void build(const Field& field, rapidjson::Value& value,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment here on what this does?

@danielhochman
Copy link
Contributor Author

@htutch

Can some of [StartArray] be refactored with StartObject?

Not seeing a way to do this without making the code less clear than it is now. The differences are there enough to make it worth keeping the impl separate. Open to suggestions though.

These getters are all kind of boiler platish, consider a template function implementation.

I based the getter functions on bson_impl which @mattklein123 wrote which favors repetition over templates.

@@ -102,7 +102,7 @@ bool RouterCheckTool::compareEntriesInJson(const std::string& expected_route_jso
}

if (validate->hasObject("header_fields")) {
for (const Json::ObjectPtr& header_field : validate->getObjectArray("header_fields")) {
for (const Json::ObjectSharedPtr& header_field : validate->getObjectArray("header_fields")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per offline convo, I would global find/replace Json::ObjectSharedPtr& with Json::ObjectSharedPtr for safety.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
default:
throw Exception("Json has a non-object or non-array at the root.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnvoyException instead of Exception

Copy link
Contributor Author

@danielhochman danielhochman May 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a Json::Exception

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nvm I forgot about the Json::Exception.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Per offline convo, please make sure there is a test for a partial but valid object. Also, did we fix the case where the JSON is just an array? I think that was broken in old code.

throw Exception(fmt::format("'{}' is not an array", name_));
}
// Container factories for handler.
static FieldSharedPtr createObject() { return FieldSharedPtr{new Field(Type::Object)}; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use std::make_shared here and everywhere else we are making new shared_ptr inline.

Value value_;
};

/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: /**

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same elsewhere


uint64_t line_number_start_;
uint64_t line_number_end_;
Type type_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be const?

return std::hash<std::string>{}(buffer.GetString());
}

bool Field::getBoolean(const std::string& name) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add TODO here to add line number info to all these errors (or just do now)

state_ = expectKeyOrEndObject;
return true;
default:
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per offline convo, lets do NOT_REACHED here since this should never happen and is a programming error (same elsewhere). I would add comment also on why.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same elsewhere if this applies.

@mattklein123
Copy link
Member

Per offline convo there are more error messages that can have line numbers so those will get added also.

@@ -521,7 +523,7 @@ bool ObjectHandler::StartObject() {
state_ = expectKeyOrEndObject;
return true;
default:
Copy link
Member

@RomanDzhabarov RomanDzhabarov May 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should delete default completely from the switch() as it must never happen.

the compiler will enforce that all types of state_ are present in the case statements. (could place NOT_REACHED at the very end after switch)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some other entries like that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not all states are in the case statement. the NOT_REACHED will only be hit if something unexpected happens in the state machine (e.g. due to unexpected sequence of calls from rapidjson).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea of NOT_REACHED is to make the compiler happy in cases we know something will not happen rather than detect cases like that.

In this case, it feels more like InvalidStateException or something like that (i do not think we really have a variety of exception types in our code base), so throwing Json::Exception might make sense, and specify state_ as part of the exception message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case must crash the process, so throwing a Json::Exception, or any exception, is not appropriate. Especially if it cannot be tested without modifying the rapidjson code. Using NOT_REACHED is fine in this case.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We can merge tomorrow in case anyone has any final comments. This is great!

@RomanDzhabarov
Copy link
Member

+1

@mattklein123 mattklein123 merged commit 3511758 into master May 17, 2017
@mattklein123 mattklein123 deleted the rapidjson-sax branch May 17, 2017 17:23
rshriram pushed a commit to rshriram/envoy that referenced this pull request Oct 30, 2018
Automatic merge from submit-queue.

[DO NOT MERGE] Auto PR to update dependencies of proxy

This PR will be merged automatically once checks are successful.
```release-note
none
```
jpsim pushed a commit that referenced this pull request Nov 28, 2022
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: JP Simard <jp@jpsim.com>
jpsim pushed a commit that referenced this pull request Nov 29, 2022
Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: JP Simard <jp@jpsim.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants