Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scan SAS files #195

Merged
merged 3 commits into from
Dec 24, 2019
Merged

Scan SAS files #195

merged 3 commits into from
Dec 24, 2019

Conversation

MaximMoinat
Copy link
Collaborator

Uses the Parso library (https://lifescience.opensource.epam.com/parso.html) to read sas7bdat files and write a scan report. Implementation is similar to how CSV files are scanned.

Note that this PR does not support writing fake data in the sas7bdat format. The Parso library does not seem to support writing to sas files and as it is a proprietary format, I think this might not be possible.

Also, as this implementation processes a sas file similar to csv; it scans values to derive data types. However, a sas file has associated properties like a database which we can use to read data type and length without reading the values.

Copy link
Collaborator

@blootsvoets blootsvoets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If inline processSasFile TODO comment you added is fixed, LGTM. Added some readability comments.

@@ -26,7 +26,8 @@
public class DbSettings {
public static int DATABASE = 1;
public static int CSVFILES = 2;

public static int SASFILES = 3;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to make this an enum? Alternatively, you could distinguish on “database” vs “file” as source type, and for each source type be able to select a format (db: postgres; file: csv)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not implementing this atm, would affect a lot of parts of the code

@MaximMoinat MaximMoinat merged commit 9656b4f into OHDSI:develop Dec 24, 2019
@MaximMoinat MaximMoinat mentioned this pull request Dec 24, 2019
@MaximMoinat MaximMoinat deleted the wr-scan-sas branch February 28, 2020 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants