Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ingest to parquet rewrite #231

Merged
merged 6 commits into from
Jun 6, 2024
Merged

ingest to parquet rewrite #231

merged 6 commits into from
Jun 6, 2024

Conversation

blublinsky
Copy link
Collaborator

Why are these changes needed?

Related issue number (if any).

@@ -21,7 +21,7 @@

class AbstractTransformLauncherTest(AbstractTest):
"""
The launcher test class for all/most AbstractTransformLauncher implementations.
The Ray-based test class for all/most AbstractTransform implementations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to be dropping things from dev. This comment was more correct and the new ray-specific one is not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

):
"""
Test the given transform and its runtime using the given CLI arguments, input directory of data files and expected output directory.
Data is processed into a temporary output directory which is then compared with the directory of expected output.
:param launcher: launcher configured to run the transform being tested
:param cli_params: a map of the simulated CLI arguments (w/o --). This includes both the transform-specific CLI parameters and the launching args.
:param cli_params: a map of the simulated CLI arguments (w/o --). This includes both the transform-specific CLI parameters and the Ray launching args.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more comments lost

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -293,7 +293,7 @@ conventions: .transforms.check_required_macros
.transforms.minio-load::
@# Help: Install the test-data/input files into minio for $(TRANSFORM_NAME)
dir=$$(find $(REPOROOT)/transforms -type d -name $(TRANSFORM_NAME)); \
$(MAKE) MINIO_SRC=$$dir/test-data/input MINIO_DEST=$(TRANSFORM_NAME)/ .defaults.minio.load-test-data
$(MAKE) MINIO_SRC=$$dir/ray/test-data/input MINIO_DEST=$(TRANSFORM_NAME)/ .defaults.minio.load-test-data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ray should probably be changed to $(TRANSFORM_RUNTIME) to be runtime-independent. The value of this macro is the name of the directory. one of python, ray or spark.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just made it work. Moving forward, yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unclear why you won't make this change like you did for the comments? The time is now :)

# the name of the job script
EXEC_SCRIPT_NAME: str = "ingest_2_parquet_transform.py"

task_image = "quay.io/dataprep1/data-prep-kit/ingest_2_parquet:0.3.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think this version will eventually also need .dev6 suffix

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, for the first version its good enough

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but now is the time, do you think someone is going to fix this later?

return {ingest_supported_languages: lang_refs} | self.params


class IngestToParquetTransformConfiguration(TransformConfiguration):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we should be consistent on the naming of classes and files like the other transforms. Pick ingesttoparquet or ingest2parquet or ingest_2_parquet or ingest_to_parquet, but lets be consistent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File names and class names do not have to be the same. File names are dependent on directory name in the make files

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consistency in conventions will help everyone in the end.

Also, currently this is for code, but i think the goal is to have this be more generic. As such, it belongs in the universal tree not the code tree. Yes, in anticipation of future extensions and not having to move it later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too late, but do we really need a 27mb file?!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THats what she had
Just copied it

Copy link
Member

@roytman roytman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but need an approvement from @daw3rd or @cmadam

@Bytes-Explorer
Copy link
Collaborator

@shivdeep-singh-ibm Please verify that the functionality is exactly the same as the module that was written by Saptha.

@cmadam cmadam self-requested a review June 6, 2024 13:08
@blublinsky blublinsky merged commit dc44844 into dev Jun 6, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants