-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set the encoding of datasets written by saspy. #317
Comments
Wow, what a great idea. How do I not have support for doing that already!? (I don't). Well, watching hockey right now, so I'll implement support for both of those tomorrow! |
Great! Thanks, enjoy the game! |
Ok, so I've looked at this and have the following info and implementation to throw out and see what you think of it. First, inencoding= and outencoding= are libname options, and can currently be used to make this work today, with saspy having no knowledge of this. You can assign a libref and specify these encodings, and when you use that libref (in your SASdata object, or in libref= on sd2df and df2sd), SAS will use that encoding to transcode to/from session encoding when reading and writing that data set. That already works today for all of these cases. Second, the data set option encoding= is used when reading or writing a data set when specified. And, like most options that are scoped hierarchically (sessiom, libname, data set), the encoding= DS option overrides the Libname, and those each override the Session. So, I already have a dsopts dictionary associated to a SASdata object, and dsopts= parm for sd2df methods which is applied to the data set when reading it to return as a data frame. Adding encoding= to the dsopts will make reading and writing this data set use the specified encoding. For df2sd, I would need to add an option for this output encoding, which I would use to write the data and then set in the SASdata dsopts returned by the method, so it would be correct already. I think on df2sd() I would call it outencoding=, even though I'll be setting the encoding= data set option with it, so it's not confusing as to what encoding it's referring to. I think this is clean in all cases, and allows you to apply this to specific data sets. Of course, I'll have to code it all up and test it all out to be sure all cases work as I expect. That will take longer than just today :) But, this all makes sense in my head, so I don't expect to find any problems with this. Does this make sense and is it what you're looking for? Is simply assigning a libref with the in/out encodings, which already works an acceptable solution instead of implementing this? Just in case you like that instead, then I don't need to add the rest of this; just checking. But I think adding the encoding to the data set is a reasonable addition, so I'm ok adding it. Thoughts? |
Hey Tom, The implementation you describe above sounds good to me. |
Cool, then I'll work on implementing this! To add the outencoding to df2sd will require adding (in)encoding else you'd not be able to access the table you just wrote. I'll post when I have this for you to try out. |
ok, actually, no, SAS will see that the encoding is different on input and just transcode without having to be told. But, I still want the encoding in the dsopts so everything works right for other cases. For instance, add_vars() will recreate the data set, so it needs to be explicit about this. So, I've actually already implemented this and pushed it to a new branch 'outencoding'. Obviously, I haven't fully tested, but I did run through a number of paths and all seemed to work as I expected. Looked at the saslog to verify also. I love it when my architecture allows me to implement something this pervasive so quickly! Having said that, I expect you'll find a problem first try, LOL. Anyway, I'm gonna go get some lunch, so feel free to grab this code and try it out, FWIW, here's some of the code I tried out. Feel free to explore more and let me know what you see, I'll obviously test this more before merging back into main.
Let me know what you think! |
Amazing, I will clone the branch and test it it out. |
Hey Tom, Sorry for the off topic question, but I am having issues installing the package into a virtual env I am running:
On my old version
Any thoughts on what i am doing wrong? Packaging is always something I mess up. |
Hey Jeremy, Are you sure the environment you're running in is the one you installed in? Those environments get people all the time. I don't use them, myself. I did just uninstall and reinstalled, using cut-n-paste of your command above, and I'm seeing it run like I expect. Also, I always uninstall first, then install. That's always the cleanest.
|
oh, and
|
got working now, no idea what I was doing wrong. Sometimes just starting over helps. i'll test out now |
cool :) |
good news and bad news. Using the
So that is great I a getting the encoding I expect. However if I write out the data to a sas7bdat file and
I am told that the encoding is UTF-8 not latin9, in the summary. Any suggestions?:
for good measure here are my session details
|
well, what is this libref JLABARGE? The dataset you wrote out in latin9 was work.cars9, but the proc contents was on jlabarge.cars_latin9. when I did a contens() it correctly showed the encoding (sorry this contents output is garbage; it's pandas output); see the last line
|
look at your saslog too; I think you're just not referencing the right data set you created |
this is easier to read:
|
yeah that is likely, that i screwed something up I don't fully 'get' libref/table/saslib. Obviously I don't interact with SAS much directly since they appear to be pretty core concepts 😆 Here is the full set of commands I ran:
|
ok, then run this (I changed results to text so you could read the contents output easier):
|
thanks Here is my output
still is showing UTF-8. |
I'm not sure you're running this code. can you submit this after that last contents?
|
run that whole set of code and then |
when I said I don't think you're running this code, I mean the source code with this enhancement in it; saspy outencoding branch of code. |
|
Alright! You found a bug. Firrst, you are running the new code, so that's good. I missed one little thing in the stdio access method; I was running iom. For got to delete the ';\n' in the line before the new code that added the option. That's why you have the error above: 94 data 'cars9'n;
|
excellent I will give it a try. |
works like a charm!Thanks Tom as always your help, patience and responsiveness is greatly appreciated
|
I am happy and am ok with closing the issue, do you know when you might merge this into main? |
Sweet! Yes, I've run regressions and, now, also the new code with all 3 access methods. I'll merge it in tomorrow. I can build a new release too, as I expect you really want it in a pypi release, not just in the repo. I have a few other things at main that can go into a new release. After I do that tomorrow, I'll post back here and we can close this then. |
Ok, Jeremy, this is merged, pushed, built and out on Pypi as V3.5.1, the current production version. Thanks, |
Hi Tom, sas |
Hey @AnandReddy23, this is an old closed issue. Can you open a new issue for this problem you're having. |
Is your feature request related to a problem? Please describe.
I have a customer that wants a sas7bdat dataset written with 'latin9' encoding Is there a way to specify the output encoding within saspy?
Describe the solution you'd like
I would like to specify the OUTENCODING when calling the df2sd method.
If you are also feeling generous, it might also be nice to be able to specify the INENCODING when reading a sas dataset
Describe alternatives you've considered
I have tried modifying the encoding within the pandas df, but that did not help when writing out the sas dataset.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: