Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request - flags to allow additional arguments to be ignored and values supplied as "missing" to be overridden. #153

Open
Lincoln-Hannah opened this issue May 29, 2023 · 7 comments

Comments

@Lincoln-Hannah
Copy link

The objective of this request is to make it very easy to use AsTable(:) to convert a DataFrame to an array of structures.
( AsTable(:) passes a DataFrame row as a named tuple. )
Example

@with_kw mystruct   (allow_additional_args  = true,   overwrite_missing = true)
    X
    Y  = 2X
end

@chain begin
         DataFrame( X=[1,2], Y=[1,missing],  Z=[1,2] )

          @rtransform   :mystruct    = mystruct(   AsTable(:)...   ) 
end

#   produces
#   mystruct(1, 1)
#   mystruct(2, 4)

allow_additional_args = true, means column Z is ignored (rather than causing an error).

overwrite_missing means when column Y = missing, the default of 2X is used, as it would be if field Y were not supplied e..g mystruct(X=1)

More generally
When the struct has 10 or 20 fields and the DataFrame has 50. (or when the struct is created from a larger struct with a super-set of fields), its nice to not have to re-state the fieldnames.

@mauro3
Copy link
Owner

mauro3 commented Jun 20, 2023

Thanks for the input! First off, DataFrames are a mystery to me, so you'll have to give examples without them.

In principle, I can see that allow_additional_args could be useful. I wonder whether it should be a specific constructor as opposed to a property of the type itself. Something like:

julia> @with_kw struct Mystruct
       a
       b
       end

julia> Mystruct(Parameters.ignore_additional_kwargs, a=1, b=2, c=3)

? A bit uglier but maybe clearer?

Not sure about the other one. Seems too specific?

@Lincoln-Hannah
Copy link
Author

That's a nice syntax. Adding an ignore_missing parameter to your example:

@with_kw struct MyStruct
    a
    b=2a
end

X = (a=1,b=missing, c=1)

MyStruct(   Parameters.ignore_additional_kwargs,   Parameters.ignore_missing ;  X...  )

gives X(a=1,b=2)

I encourage you to look at DataFrames with Chain and DataFramesMeta.
It's much more elegant than SQL or Pandas - you can select, filter, aggregate and pivot within a single Chain block, and Bogumił and the others are always helpful.

Maybe there's a better way to work ? but my usual process is:

  Read data from SQL Databases, CSV or XML files into DataFrames
  Manipulate with Chain and DataFramesMeta macros. 
  Convert to array of structs
  Do calculations with broadcasted functions
  Use DataFrames to aggregate and summarize output.

While structs and functions are the best for calculations, Database tables are a standard for storage.
It would be nice to transition between the two in one short line of code.
This is the motivation for this request and the DataFramesMeta request linked to above.

The ignore_missing parameter is for when a DataBase column is missing or NULL for some rows and you want to supply a default (e.g. b=2a) in the struct. For me this is common. Maybe its just me:)

@mauro3
Copy link
Owner

mauro3 commented Jun 21, 2023

lol, I don't know SQL or Pandas either ;-) But, yes, I probably should learn them...

@Lincoln-Hannah
Copy link
Author

SQL yes cos its easy and so much data is stored in SQL databases. Don't bother with Pandas unless you have to. IMO Julia's DataFrames ecosystem is the best in any language.

@mauro3
Copy link
Owner

mauro3 commented Jun 21, 2023

The first one can just be done with a utility function:

using Parameters
@with_kw struct AA
    a = 0
    b = 0
    c = 0
end

function construct_it(T, tup::D) where D
    tf = fieldnames(D)
    fn = fieldnames(T)
    # remove all fields from tuple tup which are not in T
    out = (;)
    for f in fn
        if f in tf && tup[f]!==missing
            out = (out..., f=>tup[f])
        end
    end
    T(;out...)
end

construct_it(AA, (a=3, b=7, u=8))

(ok, the function needs a better name). The missing handling could also be incorporated into that function.

I think that be the best approach.

Edit: added missing-check/feature

@Lincoln-Hannah
Copy link
Author

I like that :)

@Lincoln-Hannah
Copy link
Author

will you put this in?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants