Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare dataframe classes #111

Merged
merged 6 commits into from
Jul 25, 2024
Merged

Compare dataframe classes #111

merged 6 commits into from
Jul 25, 2024

Conversation

gowerc
Copy link
Owner

@gowerc gowerc commented Jul 18, 2024

Closes #42

This one was a little more involved so a couple of points to bear in mind below:

  • I currently create a "summary" dataframe which contains the number of rows and columns as well as the class of each dataset. I wanted this summary to appear at the top of the printed output regardless of wether users wanted to check the class as its a nice high level summary. This thus requires some special case handling in the main diffdf function to adjust the corresponding issue. e.g. if the check is disabled and there aren't any other issues then remove the summary issue as it should be regarded as compared the same. Likewise the warning message needs to be different to the actual display string. (happy to talk about this more if the special case handling doesn't make sense

  • In general this highlights to me that our current issues setup is too rigid and needs to be overhauled to handle more dynamic content (e.g. display content that isn't strictly to be regarded as an issue). I didn't want to do this overhaul here as that would be too much for a single PR so have just added special case handling logic instead.

  • The printed output for this looked clunky so I played around with the newlines and spacings to try and make it look as good as I think I can get it (thus some of the print snapshots have changed slightly)

  • A slightly bigger change to the printed output is I took this moment to remove the "all rows are show below" as this is an utter waste of space

  • I also removed the "a summary is shown below" as it no longer makes sense when combined with the summary table

  • The unit tests for row_print where too fragile and broke despite only white space changes so I reworked them to be a little bit more robust by making them relative to each other rather than testing for specific row lengths (e.g. show that changing row_limit from 5 to 10 increases the number of lines by 5)

Copy link
Contributor

github-actions bot commented Jul 18, 2024

badge

Code Coverage Summary

Filename                Stmts    Miss  Cover    Missing
--------------------  -------  ------  -------  --------------------------
R/ascii_tables.R          105       8  92.38%   10, 148, 158, 163-166, 211
R/cast_variables.R         49       0  100.00%
R/diffdf.R                159      18  88.68%   323-340, 367
R/generate_keyname.R       10       1  90.00%   16
R/identify.R              134       9  93.28%   246, 283-290
R/is_different.R           52       0  100.00%
R/issuerows.R              40       0  100.00%
R/issues.R                 17       1  94.12%   51
R/misc_functions.R         34       2  94.12%   9, 13
R/print.R                  20       0  100.00%
TOTAL                     620      39  93.71%

Results for commit: c30e97f

Minimum allowed coverage is 80%

♻️ This comment has been updated with latest results

Copy link
Contributor

github-actions bot commented Jul 18, 2024

Unit Tests Summary

  1 files   11 suites   5s ⏱️
 43 tests  42 ✅ 1 💤 0 ❌
560 runs  556 ✅ 4 💤 0 ❌

Results for commit c30e97f.

♻️ This comment has been updated with latest results.

R/diffdf.R Outdated Show resolved Hide resolved
R/diffdf.R Outdated Show resolved Hide resolved
R/misc_functions.R Show resolved Hide resolved
tests/testthat/_snaps/class_compare.md Outdated Show resolved Hide resolved
@gowerc gowerc merged commit c665886 into master Jul 25, 2024
23 checks passed
@gowerc gowerc deleted the 42-data-class branch July 25, 2024 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should diffdf care about data frame class
2 participants