-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat[next]: Optimisations for icon4py #1536
base: main
Are you sure you want to change the base?
feat[next]: Optimisations for icon4py #1536
Conversation
…' into optimisations-for-icon4py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forwarding the review to @egparedes for improving Python patterns in performance critical code parts.
@@ -1000,8 +999,6 @@ def _shift_field_indices( | |||
def np_as_located_field( | |||
*axes: common.Dimension, origin: Optional[dict[common.Dimension, int]] = None | |||
) -> Callable[[np.ndarray], common.Field]: | |||
warnings.warn("`np_as_located_field()` is deprecated, use `gtx.as_field()`", DeprecationWarning) # noqa: B028 [no-explicit-stacklevel] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change? Can you undo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see it was intentional. Still should be undone and fixed on the icon4py side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So basically emitting the warning was slowing things down on the icon4py side when creating the gt4py fields when calling the granule. In what way could this be fixed on the icon4py side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we currently still use np_as_located_field
is because it gives you back a numpy array view of the pointer we pass from fortran. If we switch to as_field
it does not use the same memory location any longer (I guess it makes a copy?) which should be fixed on the gt4py side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to just wrap the warning in an if __debug__
for now, so that it can be disabled in -O
mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see, then it probably makes sense to use (temporarily) the common._field()
function to wrap an array into a field.
# If we don't pass them as in the case of a CachedProgram extract connectivities here. | ||
if conn_args is None: | ||
conn_args = extract_connectivity_args(offset_provider, device) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like code should be refactored such that we always pass conn_args here. Probably makes sense to do this change in the context of a coarser refactoring. @DropD might have ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once I get size args extraction moved to this stage, I will look into adding that.
return arr, origin | ||
|
||
|
||
type_handlers_convert_args = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would functools.singledispatch work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could work, however whilst nicer I have the feeling that it would be slower than a simple dictionary lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be preferable to sacrifice readability only on the basis of hard evidence.
But in general, could make sense to discuss this in the context of going towards frozen programs. |
def handle_connectivity( | ||
conn: NeighborTableOffsetProvider, zero_tuple: tuple[int, ...], device: core_defs.DeviceType | ||
) -> ConnectivityArg: | ||
return (_ensure_is_on_device(conn.table, device), zero_tuple) | ||
|
||
|
||
def handle_other_type(*args: Any, **kwargs: Any) -> None: | ||
return None | ||
|
||
|
||
type_handlers_connectivity_args = { | ||
NeighborTableOffsetProvider: handle_connectivity, | ||
common.Dimension: handle_other_type, | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this pattern proves to be a significant optimization over singledispatch, it still needs to be made more readable.
I propose to encode the pattern in a class. That class then needs a docstring explaining when to use it over standard approaches and why, for example, subclasses of the relevant types won't work with it unless they are added explicitly to the dict.
Sketch:
class FastDispatch:
"""
Optimized version of functools.singledispatch, does not take into account inheritance or protocol membership.
This leads to a speed-up of XXX, as documented in ADR YYY.
Usage:
>>> @Fastdispatch.fastdispatch(Any)
... def extract_connectivity_args(connectivity, *args, **kwargs):
... return None
...
... @extract_connectivity_args(NeighborTableOffsetProvider):
... def extract_connectivity_args_from_nbtable(connectivity, device, *args, **kwargs):
... return (_ensure_is_on_device(connectivity.table, device), zero_tuple)
"""
_registry: dict[type: str]
def __call__(self, dispatch_arg, *args, **kwargs):
return getattr(self, self._registry[type(dispatch_arg)])(dispatch_arg, *args, **kwargs)
def register(self, type):
def decorator(function):
self._registry[type] = function
return function
return decorator
@classmethod
def fastdispatch(cls, default_type):
return decorator(function):
dispatcher = cls()
dispatcher.register(default_type)(function)
return dispatcher
return decorator
Description
Changes to speedup diffusion granule execution from ICON.
isinstance
checks fromextract_connectivity_args
andconvert_args
.warning
from field_maker
.Further (future) optimisations:
ensure_is_on_device
is only necessary now in order to be able to run certain gt4py tests. This should be removed in the future (and done in the gt4py tests itself) as it adds an overhead here that is not necessary.convert_args
still takes up the bulk of time, can we get rid of it or improve it?