Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Fix iterator ergonomics #3700

Open
wants to merge 44 commits into
base: main
Choose a base branch
from

Conversation

martinvuyk
Copy link
Contributor

@martinvuyk martinvuyk commented Oct 22, 2024

Fix iterator ergonomics. Add iter() -> Iterator and next[T](Iterator[T]) -> T for most common iterators, add missing reversed() -> Iterator[forward=False] for Stringlike types

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
@martinvuyk martinvuyk requested a review from a team as a code owner October 22, 2024 02:19
@soraros
Copy link
Contributor

soraros commented Oct 22, 2024

I think the name is dictated by the compiler, thus can't be changed.

@martinvuyk
Copy link
Contributor Author

Hi @soraros yep, was just about to @JoeLoser . IMO it needs to be changed as it deviates from other such dunders e.g. is_not and overall snake case conventions. I can revert the change if deemed unnecessary 🤷‍♂️

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
@JoeLoser
Copy link
Collaborator

Hi @soraros yep, was just about to @JoeLoser . IMO it needs to be changed as it deviates from other such dunders e.g. is_not and overall snake case conventions. I can revert the change if deemed unnecessary 🤷‍♂️

Yeah, its name is coupled in the compiler, so we'd need to change it both in the library and compiler. I'm not wedded to the name that @Mogball chose here, but I'll let him chime in as well. CC: @ConnorGray in case you have opinions too.

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
@Mogball
Copy link

Mogball commented Oct 22, 2024

This function is temporary anyways, so the name doesn't really matter

martinvuyk and others added 4 commits October 22, 2024 08:10
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Copy link
Collaborator

@JoeLoser JoeLoser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of making everything implement dunder bool, did you consider changing iter function to just directly call self.__hasmore__() in the debug_assert usage?

@martinvuyk
Copy link
Contributor Author

martinvuyk commented Oct 22, 2024

If __hasmore__ is to be temporary then I think it's all the more a reason to have every Iterator implement a __bool__ that returns whether it has more items. It seems a bit more idiomatic to me (weird to even write that for such a young language but you get my meaning).

I'm not so sure about implicit conversion and such, but to me this looks prettier:

iterator = iter("hello world")

fn example_optional_wrapper() -> Optional[StaticString]:
    return next(iterator) if iterator else None

@gryznar
Copy link
Contributor

gryznar commented Oct 22, 2024

IMO iterator.__has_more__() much better shows the intent than bool(iterator)

@martinvuyk
Copy link
Contributor Author

@gryznar does this seem readable to you?

    fn read(
        self, start: UInt = 0
    ) -> (UInt, JsonType, Self._Sp, UInt) as output:
        """Read the next `JsonType` **skipping over whitespace**.

        Args:
            start: The absolute offset in bytes to start reading a valid
                `JsonType`.

        Returns:
            The absolute offset, its `JsonType`, the `Span` containing it, and
            the start of the next item (does not check bounds).
        """

        alias `{` = UInt8(ord("{"))
        alias `}` = UInt8(ord("}"))
        alias `[` = UInt8(ord("["))
        alias `]` = UInt8(ord("]"))
        alias `,` = UInt8(ord(","))
        alias `"` = UInt8(ord('"'))
        alias ` ` = UInt8(ord(" "))
        alias `\n` = UInt8(ord("\n"))
        alias `\t` = UInt8(ord("\t"))
        alias `\r` = UInt8(ord("\r"))

        invalid = start, JsonType(JsonType.invalid), self._buffer, start

        ptr = self._buffer.unsafe_ptr()
        debug_assert(
            start <= len(self._buffer), "start is bigger than buffer length"
        )
        iterator = StringSlice[origin](
            unsafe_from_utf8_ptr=ptr + start, len=len(self._buffer) - start
        ).__iter__()
        iter_len = len(iterator)
        if iter_len == 0:
            output = invalid
            return
        char = iterator.__next__()
        while (
            char.byte_length() > 0
            and Self.isspace(char.unsafe_ptr()[0])
            and iterator.__hasmore__()
        ):
            char = iterator.__next__()
        if not char:
            output = invalid
            return
        char_p = char.unsafe_ptr()
        b0_char = char_p[0]
        if b0_char == `{`:
            if not iterator.__hasmore__():
                output = invalid
                return
            char = iterator.__next__()
            char_p = char.unsafe_ptr()
            b0_char = char_p[0]
            if Self.isspace(b0_char):
                if not iterator.__hasmore__():
                    output = invalid
                    return
                char = iterator.__next__()
                char_p = char.unsafe_ptr()
                b0_char = char_p[0]

            if b0_char == `}`:
                ...  # TODO: object end
            elif b0_char == `,`:
                ...  # TODO: next item
        elif b0_char == `[`:
            if not iterator.__hasmore__():
                output = invalid
                return
            char = iterator.__next__()
            char_p = char.unsafe_ptr()
            b0_char = char_p[0]
            if Self.isspace(b0_char):
                if not iterator.__hasmore__():
                    output = invalid
                    return
                char = iterator.__next__()
                char_p = char.unsafe_ptr()
                b0_char = char_p[0]

            if b0_char == `]`:
                ...  # TODO: array end
            elif b0_char == `,`:
                ...  # TODO: next item
        elif b0_char == `"`:
            ...  # TODO: string
        # TODO: value
        # TODO: escaped_value
        # TODO: parametrized support for NaN
        # TODO: parametrized support for Infinity
        # TODO: parametrized support for -Infinity
        output = invalid

yes this very much something I threw together without much thought and can do capturing functions to prettify some of the stuff, but it is still very ugly even adding iter and next builtins

@gryznar
Copy link
Contributor

gryznar commented Oct 22, 2024

@martinvuyk if in case above most code operates on dunders, it is more consistent to do the same with __has_more__(). Prettifying stuff is just matter of design choices which IMO is out of scope of this pr

@martinvuyk
Copy link
Contributor Author

This PR is exactly about the ergonomics of iterator related operations... dunders aren't meant to be used directly.

@gryznar
Copy link
Contributor

gryznar commented Oct 22, 2024

Yeah, sure, but this is for Modular team to make decision here :) Your proposition bool(iterator) deviates from Python and that's why this is not the best approach for me:
image

Following Python approach, something like has_more(iter) should be introduced

@martinvuyk
Copy link
Contributor Author

martinvuyk commented Oct 22, 2024

@gryznar Python raises when iterators have no more values, which is problematic for many reasons

>>> next(iter([]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

The Iterator proposal I did (which was rejected) was like Rust. Rust returns Option<T> which has a performance hit (building and destroying the optional instance and then unwrapping after runtime if check) and thus the Modular team (I guess) decided to go for an approach where you know the length of the iterator (C and C++ approach) or you have a method that determines whether to keep iterating (functional approach). I'm not privy to compiler internals so I can't say for sure.

What I am proposing here is still indirectly following the Optional style logic and IMO seems coherent with the way Mojo is turning out:

fn print_next_value(inout iterator: _StringSliceIter) -> Bool:    
    if not iterator:
        return False
    print(next(iterator))
    return True

fn print_newline_optional(a: Optional[StringLiteral]):
    if not a:
        print("a is empty")
        return
    iterator = iter(a.value())
    while print_next_value(iterator):
        continue
    print("finished printing")

fn main():
    a = "hello world"
    print_newline_optional(a)

And most Python code needing to work with iterators directly (which very few people do) is very clunky (if you follow best practices)

from typing import Iterator

def print_next_value(iterator: Iterator[str]) -> bool:
    try:
        print(next(iterator))
        return True
    except StopIteration:
        return False
    except:
        raise Exception("python can surprise you in unpleasant ways :')")

def print_newline_optional(a: str | None):
    if a is None:
        print("a is empty")
        return
    iterator = iter(a)
    while print_next_value(iterator):
        continue
    print("finished printing")

if __name__ == "__main__":
    a = "hello world"
    print_newline_optional(a)

We can actually "improve" Python here, there is no need to do everything the exact same. We can keep compatibility by adding an iterator wrapper that behaves like Python's if we wish to (probably easier said than done).

@JoeLoser JoeLoser requested a review from ConnorGray October 23, 2024 22:16
martinvuyk and others added 2 commits November 6, 2024 12:23
@JoeLoser
Copy link
Collaborator

!sync

…slice tests

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
@martinvuyk
Copy link
Contributor Author

@JoeLoser gentle ping, and also to ask if you want me to split this PR. IMHO it makes no difference since this barely changes any functionality, it's just wrappers and docstrings. But if it will be faster for you guys just let me know and I'll do it.

@skongum02 skongum02 deleted the branch modular:main January 29, 2025 18:58
@skongum02 skongum02 closed this Jan 29, 2025
@skongum02 skongum02 reopened this Jan 29, 2025
@skongum02 skongum02 changed the base branch from nightly to main January 29, 2025 20:49
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imported-internally Signals that a given pull request has been imported internally.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants