Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more documentation on Diffy, Sampler usage #65

Merged
merged 5 commits into from
Apr 2, 2018
Merged

Conversation

idreeskhan
Copy link
Contributor

@idreeskhan idreeskhan commented Mar 30, 2018

@codecov-io
Copy link

codecov-io commented Mar 30, 2018

Codecov Report

Merging #65 into master will decrease coverage by 0.01%.
The diff coverage is 20%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #65      +/-   ##
==========================================
- Coverage   70.88%   70.86%   -0.02%     
==========================================
  Files          22       22              
  Lines         941      968      +27     
  Branches      123      122       -1     
==========================================
+ Hits          667      686      +19     
- Misses        274      282       +8
Impacted Files Coverage Δ
...in/scala/com/spotify/ratatool/diffy/BigDiffy.scala 46.21% <ø> (ø) ⬆️
...ala/com/spotify/ratatool/samplers/BigSampler.scala 64.07% <20%> (-1.79%) ⬇️
...om/spotify/ratatool/scalacheck/AvroGenerator.scala 86.02% <0%> (-1.48%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 09697a9...2d5c221. Read the comment docs.


## Usage

```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to discuss the output and how to read it as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's discussed a bit in the earlier section. Do you think it needs to be more detailed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah maybe give an example of how to read the output since I know we mentioned it's hard to parse potentially


# BigSampler

BigSampler will run a [Scio](https://github.com/spotify/scio) pipeline sampling either Avro or BigQuery data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we note that it supports gs:// paths as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

(pct, args("input"), args("output"), args.list("fields"),
args.optional("seed"))
} catch {
case e: Throwable =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to catch only NonFatal here instead of everything

http://www.scala-lang.org/api/current/scala/util/control/NonFatal$.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly I think it will be NoSuchElementException, if we wanted to be more granular we can just catch on that I think. Not sure what args.list with throw

Copy link
Contributor

@jbigred1 jbigred1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a few high level comments looks good otherwise

@idreeskhan idreeskhan merged commit 4b1fcd7 into master Apr 2, 2018
@idreeskhan idreeskhan deleted the idrees/big-docs branch April 2, 2018 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants