Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Names type to allow selecting elements by excluding others #1

Merged
merged 1 commit into from
Nov 17, 2013
Merged

Conversation

nalimilan
Copy link
Contributor

Here's a first stab at the kind of convenience syntax I spoke about on the mailing list. Comments welcome.

This allows things like:

julia> reload("src/NamedArray.jl")
        n = NamedArray(rand(2,4))
       setnames!(n, ["one", "two"], 1)       
["one"=>1,"two"=>2]

julia> n[!Names(["one"]), Names(["1"])]
NamedArray{Float64,2}
names: two
1
dimnames: A B

.14859425443510443

To save more typing, the type could be renamed to e.g. N.

Add a simple immutable Names type storing a vector of names and
a boolean indicating whether these names should be retained or
excluded when indexing. Add support for this type in getindex().
@nalimilan
Copy link
Contributor Author

BTW, is the fact that names can be set to any type, not only strings, intended?

julia> setnames!(n, [1:4], 2)
[4=>4,2=>2,3=>3,1=>1]

julia> xdump(n)
NamedArray{Float64,2} 
  array: Array(Float64,(2,4)) 2x4 Array{Float64,2}:
 0.911343  0.60882  0.453109  0.272036
 0.148594  0.12549  0.336926  0.518748
  names: Array(Array{T,1},(2,)) [["one","two"],[1,2,3,4]]
  dimnames: Array(ASCIIString,(2,)) ["A","B"]
  dicts: Array(Dict{K,V},(2,)) [["one"=>1,"two"=>2],[4=>4,2=>2,3=>3,1=>1]]

I think this could be useful in some situations, but this requires some type checking (as I do in the commit), and at the moment I see no way to index using names that are not strings. My proposal would also allow that.

davidavdav added a commit that referenced this pull request Nov 17, 2013
Add Names type to allow selecting elements by excluding others
@davidavdav davidavdav merged commit 62c2a1c into davidavdav:master Nov 17, 2013
@nalimilan nalimilan deleted the Names branch November 17, 2013 19:37
@nalimilan
Copy link
Contributor Author

Thanks!

@davidavdav
Copy link
Owner

Hi Milan,

I simplified the code a bit, realising there is not really a need for
Names(names) without the exclusion.

So there now only is a NoNames type, whose only function is to be
recognised in indices(). The functions !(String) and !(Vector{String})
will construct NoNames objects.

What do you think?

I also have some support for negative indexes, you you can also say

n[:,-2]

now. It doesn't always work, because n[2,-2] is dispatched through another
function.

Then the next question is: should we change the syntax !(String) to
-(String) for consistency?

---david

Will this email be delivered? And will it be stored on Github? No idea...
Let's see.

On Sun, Nov 17, 2013 at 8:37 PM, Milan Bouchet-Valat <
notifications@github.com> wrote:

Thanks!


Reply to this email directly or view it on GitHubhttps://github.com//pull/1#issuecomment-28661626
.

David van Leeuwen

@nalimilan
Copy link
Contributor Author

I don't really like using NoNames rather than Names, for a few reasons:

  • If not only strings, but any type, can be used as names, then Names could be used to index an array using its names. In my experience this could be useful in a few cases, if that's not too much a burden to support.
  • If base Julia does not accept the -"a string" or !"a string' syntax, !Names looks nicer to my eyes than NoNames (which sounds like "those elements that have no names").
  • Overall, I think it looks more natural to have a Names object which can be negated, with indexing by string being a shorthand for it, rather than a custom NoNames type.

Regarding whether - or ! should be used, I'm not sure. Indeed for consistency with numbers - could be preferred. Or we could allow both.

As for indexing with negative integers, I'm all for it, but I think it should go into the standard Array object first. Else the inconsistency will be confusing.

@davidavdav
Copy link
Owner

Hi,

On Mon, Nov 18, 2013 at 10:14 AM, Milan Bouchet-Valat
notifications@github.com wrote:

I don't really like using NoNames rather than Names, for a few reasons:

If not only strings, but any type, can be used as names, then Names could be used to index an array using its names. In my experience this could be useful in a few cases, if that's not too much a burden to support.

I don't really understand what Names() does, except making its
argument of type Names. The NamedArray should be able to support
anything that Names supports.

If base Julia does not accept the -"a string" or !"a string' syntax, !Names looks nicer to my eyes than NoNames (which sounds like "those elements that have no names").

My idea was to not export NoNames or Names once NamedArray is in a
module. So I don't really mind what it is called, but since I didn't
see what exactly is the function of Names() other than to support
!Names(), I figured we might skip this step.

Also, I think that claiming Names as a general type might be a bit
bold, I can imagine that there is completely different use for a type
called Names. Would "Named" be an option, or "NamedIndex", or
"nIndex"?

Overall, I think it looks more natural to have a Names object which can be negated, with indexing by string being a shorthand for it, rather than a custom NoNames type.

I agree that !String is not for general types that may be used as an
index. I think currently in many places there still is the
assumption that the type of the index is <: String.

There is the potential ambiguity if I would want to use, say, a Range
as a single index, i.e.,

1:5 => 1
5:7 => 2

etc. Pretty hairy, but OK. so what then does

n[1:5]

refer to?

Is this what your Names type tries to be of help? i.e.,

n[1:5] # traditional meaning
n[Names(1:5)] # expands to n[1] in the above example

I'll revert back if this is the case.

Still, I wouldn't mind if there would be a shorter version for Names,
perhaps a Greek letter? Talking about hairy...

Regarding whether - or ! should be used, I'm not sure. Indeed for consistency with numbers - could be preferred. Or we could allow both.

As for indexing with negative integers, I'm all for it, but I think it should go into the standard Array object first. Else the inconsistency will be confusing.

See it as an extension. I don't think it is bad if NamedArray
supports negative indexing, and Array doesn't. NamedArray is pretty
much a concept as it is---I'm not sure if people would actually start
using it.

But if the wise people from Base come up with an implementation for
Array I am all for it.

Cheers,

---david


Reply to this email directly or view it on GitHub.

I don't seem to be able to find these mails on github...

David van Leeuwen

@nalimilan
Copy link
Contributor Author

Non-string indexes would only be supported by using Names, or whatever we call it, for any ambiguous case (Integer and Range, mainly).

But I think the need to export or not Names/NoNames really depends on what Julia developers think is reasonable, since the ultimate goal is to get NamedArrays in base Julia or in a blessed package. If they accept the -"a string" syntax, then I agree exporting it does not add any value. (If indexing by non-strings is supported, exporting might still be useful, but we could find a less generic name indeed.)

Anyway, it will be interesting to follow the design of NegatedIndex from JuliaLang/julia#1032 : NoNames is really NegatedNames. And the implementation looks very similar. The choice of ! or - or not() (see JuliaData/DataFrames.jl#182) should also be consistent.

@davidavdav
Copy link
Owner

OK,

We'll leave the implementation as a hybrid ! or -, we can later cut out the
unwanted version.

I put Names() back. I tried using it with a Range1 type, but that was very
awkward to define (as [Range] interpolated the data), and in the end it
still didn't work, even though the keys in the dictionay were correct.

---david

On Mon, Nov 18, 2013 at 1:34 PM, Milan Bouchet-Valat <
notifications@github.com> wrote:

Non-string indexes would only be supported by using Names, or whatever we
call it, for any ambiguous case (Integer and Range, mainly).

But I think the need to export or not Names/NoNames really depends on what
Julia developers think is reasonable, since the ultimate goal is to get
NamedArrays in base Julia or in a blessed package. If they accept the -"a
string" syntax, then I agree exporting it does not add any value. (If
indexing by non-strings is supported, exporting might still be useful, but
we could find a less generic name indeed.)

Anyway, it will be interesting to follow the design of NegatedIndex from
JuliaLang/julia#1032 JuliaLang/julia#1032 :
NoNames is really NegatedNames. And the implementation looks very similar.
The choice of ! or - or not() (see JuliaData/DataFrames.jl#182JuliaData/DataFrames.jl#182)
should also be consistent.


Reply to this email directly or view it on GitHubhttps://github.com//pull/1#issuecomment-28694549
.

David van Leeuwen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants