Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: docs Autodetect for schemas #563

Merged
merged 3 commits into from
Nov 20, 2023
Merged

fix: docs Autodetect for schemas #563

merged 3 commits into from
Nov 20, 2023

Conversation

IbrahimBagalwa
Copy link
Contributor

(Fixes #560)

Description

I noticed a discrepancy in the documentation, specifically in the example for the Autodetect class. Here are the details of the issue I fixed:

In the documentation example, the variable value_type was not defined before being referenced in the following lines of code:

value_type = registry[value_type]   # <-- value_type is not defined
return loads(value_type, message.key, serializer=serializer)  # <-- maybe message.value?

It appears that there might have been a typo or misunderstanding, and it should have been value_type_name instead of value_type.

Additionally, I noticed a comment in the code that seemed to be a victim of a copy-paste error. It read: try to get key_type and serializer from Kafka headers, which I believe should have been: try to get value_type and serializer from Kafka headers.

I fixed the issue by making the necessary changes, and I also raised questions about whether it is intended to pass message.key in the last line, and if message.value should be considered instead.

@wbarnha wbarnha added the documentation Improvements or additions to documentation label Nov 7, 2023
Copy link
Member

@wbarnha wbarnha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching this! I'm surprised such an error has gone unnoticed for so long.

Copy link

codecov bot commented Nov 7, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (ccc062f) 93.71% compared to head (5dab715) 93.71%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #563   +/-   ##
=======================================
  Coverage   93.71%   93.71%           
=======================================
  Files         102      102           
  Lines       11156    11156           
  Branches     1534     1534           
=======================================
  Hits        10455    10455           
  Misses        613      613           
  Partials       88       88           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wbarnha
Copy link
Member

wbarnha commented Nov 7, 2023

(Fixes #560)

Description

I noticed a discrepancy in the documentation, specifically in the example for the Autodetect class. Here are the details of the issue I fixed:

In the documentation example, the variable value_type was not defined before being referenced in the following lines of code:

value_type = registry[value_type]   # <-- value_type is not defined
return loads(value_type, message.key, serializer=serializer)  # <-- maybe message.value?

It appears that there might have been a typo or misunderstanding, and it should have been value_type_name instead of value_type.

Additionally, I noticed a comment in the code that seemed to be a victim of a copy-paste error. It read: try to get key_type and serializer from Kafka headers, which I believe should have been: try to get value_type and serializer from Kafka headers.

I fixed the issue by making the necessary changes, and I also raised questions about whether it is intended to pass message.key in the last line, and if message.value should be considered instead.

It seems the loads function is left undefined. Faust also defines its own loads method which behaves differently if orjson is installed, which is different from the regular loads defined in https://docs.python.org/3/library/json.html.

faust/faust/utils/json.py

Lines 166 to 203 in 3fb3180

if orjson is not None: # pragma: no cover
def dumps(
obj: Any,
json_dumps: Callable = orjson.dumps,
cls: Type[JSONEncoder] = JSONEncoder,
**kwargs: Any,
) -> str:
"""Serialize to json."""
return json_dumps(
obj,
default=on_default,
option=orjson.OPT_NON_STR_KEYS | orjson.OPT_UTC_Z,
)
def loads(s: str, json_loads: Callable = orjson.loads, **kwargs: Any) -> Any:
"""Deserialize json string."""
return json_loads(s)
else:
def dumps(
obj: Any,
json_dumps: Callable = json.dumps,
cls: Type[JSONEncoder] = JSONEncoder,
**kwargs: Any,
) -> str:
"""Serialize to json. See :func:`json.dumps`."""
return json_dumps(
obj,
cls=cls,
**dict(_JSON_DEFAULT_KWARGS, **kwargs),
separators=(",", ":"),
)
def loads(s: str, json_loads: Callable = json.loads, **kwargs: Any) -> Any:
"""Deserialize json string. See :func:`json.loads`."""
return json_loads(s, **kwargs)

Looking at both of these, it was not immediately apparent to me as to which implementation Ask Solem intended to use in the example above, since there's no import declared for it. 😦 After reviewing the original Python documentation, it's definitely not json.loads.

To be honest I should be testing this myself to make sure it works. I've never used autodetection before because I've always believed that it to be in best practice to use one schema per Kafka topic, because non-Faust applications may also need to consume from a particular topic and my experiences with https://marcosschroh.github.io/dataclasses-avroschema/faust_records/ show me that, despite how powerful these Records are, they can also make your life quite difficult.

I'm still approving it because the implementation in the documentation is not properly documented anyway and the changes here make it more clear.

@wbarnha wbarnha merged commit e31974f into faust-streaming:master Nov 20, 2023
19 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(docs) Autodetect for schemas
2 participants