-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial xlsx support, as discussed in issue #466 #470
Conversation
Fixed problem with mapInto for missing value, shouldn't call function.
First of all, all tests pass locally, both when run as junit tests and with mvn test. |
} | ||
while (columns.remove(null)) | ||
; | ||
table.addColumns(columns.toArray(new Column<?>[columns.size()])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nicer to add an addColumns(Collection<Column> columns)
method rather than having to convert to an array here
according to comments.
changes according to comments.
This looks good! Not sure if @lwhite1 has any thoughts, but I think this looks about ready to merge. Left a few last comments. Also, Thanks for contributing this! And thanks for making all the changes |
As mentioned above, the tests pass when run locally, both using Run As -> JUnit tests and mvn test. The (improved) assert message tells that a double isn't read correctly. Could it be a platform issue? I'm on a mac, and the test Excel file was created on a mac. Can one of you try it locally, preferably on Linux? |
Ah, right. Forgot about that. I use Linux. I can try later |
Hmm, I checked out the PR locally on my machine and the tests passed there as well |
@hallvard I managed to get the tests passing on this PR, by tweaking our Travis config. It seems that the issue was an old version of Ubuntu and using a more recent version made it work |
of SHORT column type, replaced check prefix with assert for test helper methods.
@@ -108,13 +107,23 @@ private ColumnType getColumnType(Cell cell) { | |||
return null; | |||
} | |||
|
|||
private int[] findTableArea(Sheet sheet) { | |||
private static class TableRange { | |||
int startRow, endRow, startColumn, endColumn; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these could probably be made final
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one question for @hallvard. Otherwise this looks good.
} | ||
} | ||
Column<?> altColumn = appendValue(column, cell); | ||
if (altColumn != null && altColumn != column) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be if (altColumn != null && ! altColumn.equals(column))?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. If appendValue return a new Column instance, that equals the old, it should still replace it.
Thanks @hallvard! Also, @benmccann for the very thorough review. |
Yes, thanks again @hallvard ! |
import tech.tablesaw.columns.Column; | ||
|
||
@Immutable | ||
public class XlsxReader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this class should be made final
so that users can't subclass it and make instances mutable by adding mutable fields. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... I've only just realised that this PR was already merged in. 🤦♂️
Thanks for contributing.
Description
Added XlsxReader and XlsxReadOptions for the xlsx file format, using Apache's POI libraries.
Added xlsx methods in DataFrameReader to support xlsx file format.
Testing
Test for XlsxReader that reads a sample file with several column types.