-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ingest to parquet rewrite #231
Conversation
dd78329
to
f3e720a
Compare
data-processing-lib/python/src/data_processing/data_access/data_access_s3.py
Show resolved
Hide resolved
@@ -21,7 +21,7 @@ | |||
|
|||
class AbstractTransformLauncherTest(AbstractTest): | |||
""" | |||
The launcher test class for all/most AbstractTransformLauncher implementations. | |||
The Ray-based test class for all/most AbstractTransform implementations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to be dropping things from dev. This comment was more correct and the new ray-specific one is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
): | ||
""" | ||
Test the given transform and its runtime using the given CLI arguments, input directory of data files and expected output directory. | ||
Data is processed into a temporary output directory which is then compared with the directory of expected output. | ||
:param launcher: launcher configured to run the transform being tested | ||
:param cli_params: a map of the simulated CLI arguments (w/o --). This includes both the transform-specific CLI parameters and the launching args. | ||
:param cli_params: a map of the simulated CLI arguments (w/o --). This includes both the transform-specific CLI parameters and the Ray launching args. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more comments lost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -293,7 +293,7 @@ conventions: .transforms.check_required_macros | |||
.transforms.minio-load:: | |||
@# Help: Install the test-data/input files into minio for $(TRANSFORM_NAME) | |||
dir=$$(find $(REPOROOT)/transforms -type d -name $(TRANSFORM_NAME)); \ | |||
$(MAKE) MINIO_SRC=$$dir/test-data/input MINIO_DEST=$(TRANSFORM_NAME)/ .defaults.minio.load-test-data | |||
$(MAKE) MINIO_SRC=$$dir/ray/test-data/input MINIO_DEST=$(TRANSFORM_NAME)/ .defaults.minio.load-test-data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ray
should probably be changed to $(TRANSFORM_RUNTIME)
to be runtime-independent. The value of this macro is the name of the directory. one of python, ray or spark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just made it work. Moving forward, yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unclear why you won't make this change like you did for the comments? The time is now :)
# the name of the job script | ||
EXEC_SCRIPT_NAME: str = "ingest_2_parquet_transform.py" | ||
|
||
task_image = "quay.io/dataprep1/data-prep-kit/ingest_2_parquet:0.3.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think this version will eventually also need .dev6 suffix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, for the first version its good enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but now is the time, do you think someone is going to fix this later?
return {ingest_supported_languages: lang_refs} | self.params | ||
|
||
|
||
class IngestToParquetTransformConfiguration(TransformConfiguration): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think we should be consistent on the naming of classes and files like the other transforms. Pick ingesttoparquet
or ingest2parquet
or ingest_2_parquet
or ingest_to_parquet
, but lets be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File names and class names do not have to be the same. File names are dependent on directory name in the make files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consistency in conventions will help everyone in the end.
Also, currently this is for code, but i think the goal is to have this be more generic. As such, it belongs in the universal tree not the code tree. Yes, in anticipation of future extensions and not having to move it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too late, but do we really need a 27mb file?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THats what she had
Just copied it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shivdeep-singh-ibm Please verify that the functionality is exactly the same as the module that was written by Saptha. |
Why are these changes needed?
Related issue number (if any).