Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ArrowStringArray] API: StringArray -> ObjectStringArray #40962

Closed

Conversation

simonjayhawkins
Copy link
Member

xref #35169 (comment)

this PR is draft as this change has not yet been discussed. (maybe could keep the StringArray the same and create a StringArrayBase instead)

also probably want to do this after #39908 (which is not yet ready) and then move the common base class into a separate module along with the dtype. (this could be string_base.py or create subdir with maybe base.py, string_arrow.py and string_python.py)

we should leave moving things around for a follow-up to keep the diff simpler to review.

This PR could be a precusor to a follow-up to #40708 to remove duplication #40708 (comment) (or could be done with common helper function or mixin #40708 (comment))

No deduplication here. just creating base class and changes to make the tests pass

This is API breaking, but for experimental API, so we would probably need some documentation or add to #40747 if we do this after #39908

@simonjayhawkins simonjayhawkins added API Design Strings String extension data type and string data labels Apr 15, 2021
@simonjayhawkins simonjayhawkins added this to the 1.3 milestone Apr 15, 2021
@simonjayhawkins
Copy link
Member Author

No deduplication here. just creating base class and changes to make the tests pass

more doctstings also need updating. will update shortly.

@jorisvandenbossche
Copy link
Member

(maybe could keep the StringArray the same and create a StringArrayBase instead)

Yes, I would do that, let's keep the existing StringArray name (for the class with object type) the same

@simonjayhawkins
Copy link
Member Author

Yes, I would do that, let's keep the existing StringArray name (for the class with object type) the same

I would tend to agree. Most of the changes here are the name change.

I'll close this for now. we can maybe consider renaming after the base class is in play. i'll do that in another PR just in case we want to come back to this.

@jreback
Copy link
Contributor

jreback commented May 12, 2021

looking at this again. I think we should do this, not averse to leaving StringArray = PythonStringArray. I just think the code would be less confusing.

@jorisvandenbossche
Copy link
Member

Why is it that confusing to have a BaseStringArray for shared code, and then two concrete subclasses StringArray and ArrowStringArray?

@jreback
Copy link
Contributor

jreback commented May 13, 2021

simple, since we are planning dtypes named string[python] and string[pyarrow] these match the array names. (not objecting to shared code in BaseStringArray, nor having a back-compa alias of StringArray

@simonjayhawkins
Copy link
Member Author

from the comments here and elsewhere, I think we want a BaseStringArray (which cannot be called StringArray for back compat)

we want to keep StringArray for now but also want PythonStringArray with StringArray being an alias for PythonStringArray.

we could maybe also deprecate StringArray too?

will reopen this and make those changes.

@jorisvandenbossche
Copy link
Member

As mentioned above, I don't see the upside of changing the "StringArray" name. Let's just make a BaseStringArray class if needed, and leave the other names as is?

@simonjayhawkins
Copy link
Member Author

Let's just make a BaseStringArray class if needed, and leave the other names as is?

That's quite straightforward, so will open another PR with just that (maybe could even add to #39908 to address #39908 (comment) directly)

and keep this PR for the name change discussion.


class StringArray(ExtensionArray):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will rename this StringArrayBase in next commit

@@ -186,7 +186,7 @@ def rep(x, r):

repeats = np.asarray(repeats, dtype=object)
result = libops.vec_binop(np.asarray(self), repeats, rep)
if isinstance(self, (StringArray, ArrowStringArray)):
if isinstance(self, (PythonStringArray, ArrowStringArray)):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could now use the base class. will change in next commit

@@ -119,7 +119,7 @@ def array(
:class:`datetime.timedelta` :class:`pandas.arrays.TimedeltaArray`
:class:`int` :class:`pandas.arrays.IntegerArray`
:class:`float` :class:`pandas.arrays.FloatingArray`
:class:`str` :class:`pandas.arrays.StringArray`
:class:`str` :class:`pandas.arrays.PythonStringArray`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is overlap here with the dtype change. could add the docstring to the baseclass and no need to reference the two array types.

@jreback
Copy link
Contributor

jreback commented Jun 2, 2021

maybe let's call this ObjectStringArray? (as more accurate / easier to digest name)

@simonjayhawkins simonjayhawkins removed this from the 1.3 milestone Jun 3, 2021
@simonjayhawkins simonjayhawkins changed the title [ArrowStringArray] API: StringArray -> PythonStringArray [ArrowStringArray] API: StringArray -> ObjectStringArray Jun 5, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Jul 7, 2021

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jul 7, 2021
@simonjayhawkins
Copy link
Member Author

This PR needs are merge with master to be able to see clearly the changes. will hopefully circle back to this this week. This basically was creating a StringArray class which dispatches to either ArrowStringArray or ObjectStringArray which are both EAs in their own right but hidden from the user and also addressed the astype issue #41856

@jreback
Copy link
Contributor

jreback commented Nov 28, 2021

this is quite old, happen to reopen if actively worked on.

@jreback jreback closed this Nov 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Stale Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants