-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: reducing coverage loss #619
Conversation
rbatch | ||
), f"Left and Right batch does not have equal elements: left: {len(lbatch)} right: {len(rbatch)}" | ||
|
||
assert self.etype in [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't keeping it in the else clause easier to maintain? Similar to the switch clause in cpp. Future changes won't need to worry about modifying assert. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. But, it will be harder to hit the else
without a synthetic test case (from a coverage standpoint).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, if the assert is not updated and a new expression type is added, then they will have to update the assert before passing the test case. So, they will be forced to update the assert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we move the assert condition to the test case? Any update to the logic will fail the test case and will require updating it.
It is a one-time effort, either we do it in the code using assert or in the test. I prefer test from readability perspective. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not entirely sure if I understand. Can you take care of this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like eva/catalog/catalog_manager.py
, shall we also replace logger.execption and raise BinderError
with assert for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done
eva/executor/delete_executor.py
Outdated
except Exception as e: | ||
logger.error(e) | ||
raise ExecutorError(e) | ||
if table_catalog.table_type == TableType.STRUCTURED_DATA: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need this if check, the previous assert
ensures that table_catalog.table_type
is TableType.STRUCTURED_DATA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, resolved.
raise ExecutorError(e) | ||
|
||
def __call__(self, *args, **kwargs) -> Generator[Batch, None, None]: | ||
yield from self.exec(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is ray supporting lateral join now? I think the __call__
is needed for ray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this \ cc @kaushikravichandran @jiashenC @gaurav274
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if it were the case, this code was not being used in any of the test cases and repeated in a few operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, I think it could be triggered by Ray if there is a LateralJoin under Projection. Maybe we can move this to the AbstractExecutor to improve the coverage? I think it will anyway have the same function implementation for any executor.
raise ExecutorError( | ||
f"StorageEngine {storage_engine} create call failed" | ||
) | ||
storage_engine.create(table_obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here do we still need to check the success from create
? We probably don't want the next write if create failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing this logic breaks these two test cases -- test_should_fail_to_load_videos_with_same_path
and test_should_fail_to_load_images_with_same_path
storage_engine = StorageEngine.factory(table_obj)
if do_create:
create_status = storage_engine.create(table_obj)
if create_status:
storage_engine.write(
table_obj,
Batch(pd.DataFrame({"file_path": valid_files})),
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\cc @gaurav274
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fine. create
will raise error
setattr(result, k, deepcopy(v, memo)) | ||
return result | ||
|
||
def copy(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is copy
not needed anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was not being used
eva/models/storage/batch.py
Outdated
for column in by: | ||
if column not in self._frames.columns: | ||
raise KeyError( | ||
"Can not orderby non-projected column: {}".format(column) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also remove this raise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
raise BinderError("Index input needs to be float32.") | ||
if not len(output.array_dimensions) == 2: | ||
raise BinderError("Index input needs to be 2 dimensional.") | ||
assert IndexType.is_faiss_index_type( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, if we support index type other than Faiss later, we still need to do it with if else right (assuming we have some index specific checks)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
This PR affects |
Moving to a different pattern for handling exceptions (at the query executor instead of doing it in each operator) -- https://www.youtube.com/watch?v=8kTlzR4HhWo&list=PLqapmkczup-VyUEZW8adgVWX088cToWlX&index=4
Ref: https://www.reddit.com/r/learnpython/comments/yxcqib/exception_handling_best_practices/
Abstract tests:
https://github.com/georgia-tech-db/eva/blob/1d826440368682ce589735dba3cb87365d98d2b2/test/udfs/test_abstract_udf.py#L26-L40