Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat[next]: Optimisations for icon4py #1536

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

samkellerhals
Copy link
Contributor

@samkellerhals samkellerhals commented Apr 22, 2024

Description

Changes to speedup diffusion granule execution from ICON.

  • Removing isinstance checks from extract_connectivity_args and convert_args.
  • Removing warning from field _maker.
  • Allows connectivities to be passed (effectively allowing caching of connectivities on the caller side)

Further (future) optimisations:

  • All connectivities are cached on the icon4py side, so the use of ensure_is_on_device is only necessary now in order to be able to run certain gt4py tests. This should be removed in the future (and done in the gt4py tests itself) as it adds an overhead here that is not necessary.
  • convert_args still takes up the bulk of time, can we get rid of it or improve it?
  • We have to pass array sizes for each dimension to gt4py stencils if no explicit domain bounds are defined, can this be standardised?

@samkellerhals samkellerhals changed the title Optimisations for icon4py feat[next]: Optimisations for icon4py May 6, 2024
Copy link
Contributor

@havogt havogt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forwarding the review to @egparedes for improving Python patterns in performance critical code parts.

@@ -1000,8 +999,6 @@ def _shift_field_indices(
def np_as_located_field(
*axes: common.Dimension, origin: Optional[dict[common.Dimension, int]] = None
) -> Callable[[np.ndarray], common.Field]:
warnings.warn("`np_as_located_field()` is deprecated, use `gtx.as_field()`", DeprecationWarning) # noqa: B028 [no-explicit-stacklevel]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? Can you undo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it was intentional. Still should be undone and fixed on the icon4py side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically emitting the warning was slowing things down on the icon4py side when creating the gt4py fields when calling the granule. In what way could this be fixed on the icon4py side?

Copy link
Contributor Author

@samkellerhals samkellerhals May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we currently still use np_as_located_field is because it gives you back a numpy array view of the pointer we pass from fortran. If we switch to as_field it does not use the same memory location any longer (I guess it makes a copy?) which should be fixed on the gt4py side.

Copy link
Contributor Author

@samkellerhals samkellerhals May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to just wrap the warning in an if __debug__ for now, so that it can be disabled in -O mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see, then it probably makes sense to use (temporarily) the common._field() function to wrap an array into a field.

Comment on lines +82 to +85
# If we don't pass them as in the case of a CachedProgram extract connectivities here.
if conn_args is None:
conn_args = extract_connectivity_args(offset_provider, device)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like code should be refactored such that we always pass conn_args here. Probably makes sense to do this change in the context of a coarser refactoring. @DropD might have ideas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once I get size args extraction moved to this stage, I will look into adding that.

return arr, origin


type_handlers_convert_args = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@samkellerhals samkellerhals May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could work, however whilst nicer I have the feeling that it would be slower than a simple dictionary lookup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be preferable to sacrifice readability only on the basis of hard evidence.

@havogt
Copy link
Contributor

havogt commented May 6, 2024

But in general, could make sense to discuss this in the context of going towards frozen programs.

@samkellerhals samkellerhals requested a review from DropD May 15, 2024 13:12
Comment on lines +95 to +110
def handle_connectivity(
conn: NeighborTableOffsetProvider, zero_tuple: tuple[int, ...], device: core_defs.DeviceType
) -> ConnectivityArg:
return (_ensure_is_on_device(conn.table, device), zero_tuple)


def handle_other_type(*args: Any, **kwargs: Any) -> None:
return None


type_handlers_connectivity_args = {
NeighborTableOffsetProvider: handle_connectivity,
common.Dimension: handle_other_type,
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this pattern proves to be a significant optimization over singledispatch, it still needs to be made more readable.

I propose to encode the pattern in a class. That class then needs a docstring explaining when to use it over standard approaches and why, for example, subclasses of the relevant types won't work with it unless they are added explicitly to the dict.

Sketch:

class FastDispatch:
    """
    Optimized version of functools.singledispatch, does not take into account inheritance or protocol membership.
    
    This leads to a speed-up of XXX, as documented in ADR YYY.
    
    Usage:
    >>> @Fastdispatch.fastdispatch(Any)
    ... def extract_connectivity_args(connectivity, *args, **kwargs):
    ...     return None
    ...
    ... @extract_connectivity_args(NeighborTableOffsetProvider):
    ... def extract_connectivity_args_from_nbtable(connectivity, device, *args, **kwargs):
    ...     return (_ensure_is_on_device(connectivity.table, device), zero_tuple)
    """
    _registry: dict[type: str]
    
    def __call__(self, dispatch_arg, *args, **kwargs):
        return getattr(self, self._registry[type(dispatch_arg)])(dispatch_arg, *args, **kwargs)
        
    def register(self, type):
        def decorator(function):
            self._registry[type] = function
            return function
        return decorator
        
    @classmethod
    def fastdispatch(cls, default_type):
        return decorator(function):
            dispatcher = cls()
            dispatcher.register(default_type)(function)
            return dispatcher
        return decorator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants