-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move Avro sampler to FileSystems API #140
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, left some comments
extends Sampler[GenericRecord] { | ||
|
||
private val logger: Logger = LoggerFactory.getLogger(classOf[AvroSampler]) | ||
|
||
private def getFileContext: FileContext = FileContext.getFileContext(GcsConfiguration.get()) | ||
// private def getFileContext: FileContext = FileContext.getFileContext(GcsConfiguration.get()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comment?
val fs = FileSystem.get(path.toUri, GcsConfiguration.get()) | ||
if (fs.isFile(path)) { | ||
new AvroFileSampler(getFileContext, path, seed).sample(n, head) | ||
// val fs = FileSystem.get(path.toUri, GcsConfiguration.get()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comment
@@ -92,51 +98,44 @@ class AvroSampler(path: Path, protected val seed: Option[Long] = None) | |||
|
|||
} | |||
|
|||
private class AvroFileSampler(fc: FileContext, path: Path, protected val seed: Option[Long] = None) | |||
private class AvroFileSampler(r: ResourceId, protected val seed: Option[Long] = None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it be more generic to expect the file path instead of the resource id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it doesn't matter that much since the class is private
@@ -18,7 +18,7 @@ | |||
package com.spotify.ratatool.samplers | |||
|
|||
import java.io.File | |||
import java.nio.file.Files | |||
import java.nio.file.{Files, Paths} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary import?
@@ -82,15 +82,15 @@ public static Configuration get() { | |||
|
|||
conf.setIfUnset( | |||
HadoopCredentialConfiguration.BASE_KEY_PREFIX + | |||
HadoopCredentialConfiguration.ENABLE_SERVICE_ACCOUNTS_SUFFIX, | |||
HadoopCredentialConfiguration.ENABLE_SERVICE_ACCOUNTS_SUFFIX, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering if we can get rid of this class.
Seems like it is only being used in ParquetIO
(besides AvroSampler and an example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, Parquet cleanup is not a priority for me at the moment but I think we can get rid of it once that's done.
Codecov Report
@@ Coverage Diff @@
## master #140 +/- ##
=========================================
Coverage ? 67.58%
=========================================
Files ? 36
Lines ? 1447
Branches ? 169
=========================================
Hits ? 978
Misses ? 469
Partials ? 0
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tested it locally and it is working
Cool, will merge this and solve parquet separately after |
#139 for Avro only