Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Avro sampler to FileSystems API #140

Merged
merged 2 commits into from
Nov 8, 2018
Merged

Conversation

idreeskhan
Copy link
Contributor

#139 for Avro only

@idreeskhan
Copy link
Contributor Author

@martinbomio

Copy link
Contributor

@martinbomio martinbomio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, left some comments

extends Sampler[GenericRecord] {

private val logger: Logger = LoggerFactory.getLogger(classOf[AvroSampler])

private def getFileContext: FileContext = FileContext.getFileContext(GcsConfiguration.get())
// private def getFileContext: FileContext = FileContext.getFileContext(GcsConfiguration.get())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comment?

val fs = FileSystem.get(path.toUri, GcsConfiguration.get())
if (fs.isFile(path)) {
new AvroFileSampler(getFileContext, path, seed).sample(n, head)
// val fs = FileSystem.get(path.toUri, GcsConfiguration.get())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comment

@@ -92,51 +98,44 @@ class AvroSampler(path: Path, protected val seed: Option[Long] = None)

}

private class AvroFileSampler(fc: FileContext, path: Path, protected val seed: Option[Long] = None)
private class AvroFileSampler(r: ResourceId, protected val seed: Option[Long] = None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it be more generic to expect the file path instead of the resource id?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it doesn't matter that much since the class is private

@@ -18,7 +18,7 @@
package com.spotify.ratatool.samplers

import java.io.File
import java.nio.file.Files
import java.nio.file.{Files, Paths}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary import?

@@ -82,15 +82,15 @@ public static Configuration get() {

conf.setIfUnset(
HadoopCredentialConfiguration.BASE_KEY_PREFIX +
HadoopCredentialConfiguration.ENABLE_SERVICE_ACCOUNTS_SUFFIX,
HadoopCredentialConfiguration.ENABLE_SERVICE_ACCOUNTS_SUFFIX,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if we can get rid of this class.
Seems like it is only being used in ParquetIO (besides AvroSampler and an example)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, Parquet cleanup is not a priority for me at the moment but I think we can get rid of it once that's done.

@codecov
Copy link

codecov bot commented Nov 7, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@1937910). Click here to learn what that means.
The diff coverage is 93.75%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #140   +/-   ##
=========================================
  Coverage          ?   67.58%           
=========================================
  Files             ?       36           
  Lines             ?     1447           
  Branches          ?      169           
=========================================
  Hits              ?      978           
  Misses            ?      469           
  Partials          ?        0
Flag Coverage Δ
#ratatoolCli 4.37% <0%> (?)
#ratatoolDiffy 26.86% <0%> (?)
#ratatoolExamples 18.58% <0%> (?)
#ratatoolSampling 59.05% <100%> (?)
#ratatoolScalacheck 83.48% <ø> (?)
#ratatoolShapeless 5.28% <0%> (?)
Impacted Files Coverage Δ
...in/scala/com/spotify/ratatool/diffy/BigDiffy.scala 45.94% <0%> (ø)
...ain/scala/com/spotify/ratatool/tool/Ratatool.scala 0% <0%> (ø)
...la/com/spotify/ratatool/samplers/AvroSampler.scala 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1937910...facbcf1. Read the comment docs.

Copy link
Contributor

@martinbomio martinbomio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested it locally and it is working

@idreeskhan
Copy link
Contributor Author

Cool, will merge this and solve parquet separately after

@idreeskhan idreeskhan merged commit fe92be7 into master Nov 8, 2018
@idreeskhan idreeskhan deleted the idrees/filesystemsapi branch November 8, 2018 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants