-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ak.from_iter equivalent to convert from various Julia types into AwkwardArray. #10
Comments
at least for the fallback implementation, you can rely on julia> a = (x = 1, y=2)
(x = 1, y = 2)
julia> keys(a)
(:x, :y)
julia> b = Dict("x" => 1, "y"=>2)
Dict{String, Int64} with 2 entries:
"x" => 1
"y" => 2
julia> keys(b)
KeySet for a Dict{String, Int64} with 2 entries. Keys:
"x"
"y" most likely you will get an iterator of dict-like or named-tuple like objects |
That's good to know. I've also been wondering about structs: they're somewhat like Python's dataclass and Scala's case classes in their regularly, and since reflection is available, maybe I could get the field names and types, and thus ingest arrays of these classes. What I'm thinking of probably has to be a macro operating on types, whereas I'm planning on writing symmetric |
nah, no macro needed: julia> a = 1 + 3im;
julia> dump(a)
Complex{Int64}
re: Int64 1
im: Int64 3
julia> fieldnames(typeof(a))
(:re, :im)
julia> fieldtypes(typeof(a))
(Int64, Int64) however, you might want to use propertynames instead, because say you have a dataframe, In general, for simple structs (this is default if you don't specialize anything), properties and fields are the same thing; and for complex structs, properties are probably the user-facing one as far as data is concerned anyway. julia> propertynames(a)
(:re, :im)
julia> df = DataFrame(x=[1,2,3], y=[4,5,6]);
julia> fieldnames(typeof(df))
(:columns, :colindex, :metadata, :colmetadata, :allnotemetadata)
julia> propertynames(df)
2-element Vector{Symbol}:
:x
:y For any table-like back-and-forth translation, we simply need to implement a few interface from Tables.jl and we will be able to interpolate with almost all Julia table ecosystems. |
Tables.jl looks like the perfect conversion target, as long as it's possible to put Tables inside of Tables, which I can experiment with. That would nicely narrow the scope from "all Julia objects" to "dataset-like Julia objects." |
I guess? I think because "a column of the table" can be anything |
I think that would be possible, but it would be complicated to set up and should maybe be a stretch goal. I'll go back to the idea of converting to and from generic data, which doesn't have "row ###, column ###" concepts built-in. (It's possible to slice columns through nested lists, so I think we would always be able to give an answer for a requested row and column number—but later.) |
I think RecordArray is unambiguously ColumnTable right? And you simply return each column when asked. (It's non copy) Other Table-compatible implementation won't complain at all what column you give them, as long as it satisfy those very basic properties |
but yeah, don't worry about it for now, I can give it a try later if you want! |
The biggest mismatch to the Table data model is that an Awkward Array without any RecordArrays (at any level of nesting) has no columns. When interacting with Arrow, I solved that problem by making the column named julia> Symbol("")
Symbol("")
julia> :("")
"" and wasn't convinced I was getting empty-string symbols. (If But the reason that I think Table's model of accessing everything by (row, column) coordinates (e.g. random access column on an For some records inside of lists of lists: >>> array = ak.Array([[[{"x": 1.1, "y": [1]}, {"x": 2.2, "y": [1, 2]}], []], [], [[{"x": 3.3, "y": [1, 2, 3]}]]])
>>> array.show(type=True)
type: 3 * var * var * {
x: float64,
y: var * int64
}
[[[{x: 1.1, y: [1]}, {x: 2.2, y: [1, 2]}], []],
[],
[[{x: 3.3, y: [1, 2, 3]}]]] You don't have to explicitly dig through the lists of lists to pick out one column: >>> array["x"].show(type=True)
type: 3 * var * var * float64
[[[1.1, 2.2], []],
[],
[[3.3]]]
>>> array["y"].show(type=True)
type: 3 * var * var * var * int64
[[[[1], [1, 2]], []],
[],
[[[1, 2, 3]]]] So if nested lists are row-oriented Tables within Tables, all with the same set of columns, asking for column |
that's indeed an empty symbol: julia> Symbol() === Symbol("")
true yea, I see what you mean, indeed that's too "flexible" (i.e. let's do something instead of error philosophy) for a regular Table.jl table. |
No description provided.
The text was updated successfully, but these errors were encountered: