Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for field processors. #85

Merged
merged 8 commits into from
Oct 15, 2022
Merged

Add support for field processors. #85

merged 8 commits into from
Oct 15, 2022

Conversation

wRAR
Copy link
Member

@wRAR wRAR commented Oct 14, 2022

This adds support for an out= argument to @field, as proposed in zytedata/zyte-common-items#18 (option 3). It only supports a list of callables, though it should be easy to also add support for a single callable. It only supports synchronous processors as async processors don't seem too useful and would change the signature of the field from sync to async which I find problematic.

I'm not sure how idiomatic (wrt both Python and web-poet) is the implementation but it's simple enough to be able to adjust it.

@wRAR wRAR requested review from kmike and BurnzZ October 14, 2022 11:11
@codecov
Copy link

codecov bot commented Oct 14, 2022

Codecov Report

Merging #85 (800fc13) into master (34eb0ec) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 800fc13 differs from pull request most recent head 5ed2faf. Consider uploading reports for the commit 5ed2faf to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master      #85   +/-   ##
=======================================
  Coverage   99.81%   99.81%           
=======================================
  Files          17       17           
  Lines         536      553   +17     
=======================================
+ Hits          535      552   +17     
  Misses          1        1           
Impacted Files Coverage Δ
web_poet/fields.py 98.70% <100.00%> (+0.36%) ⬆️

web_poet/fields.py Outdated Show resolved Hide resolved
web_poet/fields.py Outdated Show resolved Hide resolved
@kmike
Copy link
Member

kmike commented Oct 14, 2022

The implementation and tests makes sense to me, thanks @wRAR!

Could you please add docs for this feature? it should be a section in https://github.com/scrapinghub/web-poet/blob/master/docs/advanced/fields.rst

@kmike
Copy link
Member

kmike commented Oct 14, 2022

There is also a note in the current docs, which says

Item classes can also be used to hold common attribute pre-processing and validation logic.

I think we should remove it, and actually explain that having preprocessing logic in items may be problematic. This new docs section would be a good place for it.

@wRAR
Copy link
Member Author

wRAR commented Oct 14, 2022

I've added a docs section.

Item classes can also be used to hold common attribute pre-processing and validation logic.

I think we should remove it, and actually explain that having preprocessing logic in items may be problematic. This new docs section would be a good place for it.

Why would it be problematic?

docs/advanced/fields.rst Outdated Show resolved Hide resolved
@kmike
Copy link
Member

kmike commented Oct 14, 2022

Why would it be problematic?

It breaks the following (from the docs):

The resulting page object becomes more flexible and reusable: if not all data extracted in the to_item() method is needed, user can use properties for individual attributes. It’s more efficient than running to_item() and only using some of the result.

If field processing happens in the item, it means you can't use page_object.field instead of item.field, the result would be wrong (i.e. without cleaning).

web_poet/fields.py Outdated Show resolved Hide resolved
Copy link
Member

@kmike kmike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @wRAR!

Copy link
Member

@Gallaecio Gallaecio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

docs/advanced/fields.rst Outdated Show resolved Hide resolved
Co-authored-by: Adrián Chaves <adrian@chaves.io>
@kmike kmike merged commit 7341c36 into master Oct 15, 2022
@kmike kmike deleted the field-processors-support branch October 15, 2022 18:34
@kmike
Copy link
Member

kmike commented Oct 15, 2022

After re-reading zytedata/zyte-common-items#18 I realized that typing could be an issue. But it turns out proper typing is not implemented for @field at all. So, we should treat is as a separate issue (#87, #88).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants