Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop treating lists as typed arrays #2

Open
LTLA opened this issue Nov 7, 2023 · 0 comments
Open

Stop treating lists as typed arrays #2

LTLA opened this issue Nov 7, 2023 · 0 comments

Comments

@LTLA
Copy link
Member

LTLA commented Nov 7, 2023

We should stop considering lists to be typed arrays, because they're not.

Currently, a list of strings is treated as a typed array of strings. This is difficult as:

  • Every function needs to scan the list to check that, indeed, the list only contains strings.
  • Every function also needs to scan the list to check whether the list contains None values to represent missing strings.
  • It also introduces ambiguity, e.g., is a list of strings to be interpreted as a typed array that can only ever contain strings or as an unstructured list of arbitrary objects? What should we guess [] to be? (This has consequences for singledispatch.)

So I propose that all arrays of strings should now use numpy.array with the string type, which is closer to R's character vectors than Python list. This includes, e.g., the row and column names of the BiocFrame, the levels of the Factor, and so on.

The case is even easier to make for numeric types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant