Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial xlsx support, as discussed in issue #466 #470

Merged
merged 12 commits into from
Mar 3, 2019
Merged

Initial xlsx support, as discussed in issue #466 #470

merged 12 commits into from
Mar 3, 2019

Conversation

hallvard
Copy link
Contributor

Thanks for contributing.

Description

Added XlsxReader and XlsxReadOptions for the xlsx file format, using Apache's POI libraries.
Added xlsx methods in DataFrameReader to support xlsx file format.

Testing

Test for XlsxReader that reads a sample file with several column types.

@hallvard
Copy link
Contributor Author

hallvard commented Feb 23, 2019

First of all, all tests pass locally, both when run as junit tests and with mvn test.
I've been improving the assert messages, and this message seems to indicate there is a missing value, but it doesn't make much sense that it doesn't happen locally. The only thing I can think of is a locale issue.

}
while (columns.remove(null))
;
table.addColumns(columns.toArray(new Column<?>[columns.size()]));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nicer to add an addColumns(Collection<Column> columns) method rather than having to convert to an array here

@benmccann
Copy link
Collaborator

This looks good! Not sure if @lwhite1 has any thoughts, but I think this looks about ready to merge. Left a few last comments. Also, XlsxReaderTest is currently failing on Travis, so that'll need to be fixed first

Thanks for contributing this! And thanks for making all the changes

@hallvard
Copy link
Contributor Author

hallvard commented Mar 1, 2019

As mentioned above, the tests pass when run locally, both using Run As -> JUnit tests and mvn test. The (improved) assert message tells that a double isn't read correctly. Could it be a platform issue? I'm on a mac, and the test Excel file was created on a mac. Can one of you try it locally, preferably on Linux?

@benmccann
Copy link
Collaborator

Ah, right. Forgot about that. I use Linux. I can try later

@benmccann
Copy link
Collaborator

Hmm, I checked out the PR locally on my machine and the tests passed there as well

@benmccann benmccann closed this Mar 2, 2019
@benmccann benmccann reopened this Mar 2, 2019
@benmccann
Copy link
Collaborator

@hallvard I managed to get the tests passing on this PR, by tweaking our Travis config. It seems that the issue was an old version of Ubuntu and using a more recent version made it work

of SHORT column type, replaced check prefix with assert for test helper
methods.
@@ -108,13 +107,23 @@ private ColumnType getColumnType(Cell cell) {
return null;
}

private int[] findTableArea(Sheet sheet) {
private static class TableRange {
int startRow, endRow, startColumn, endColumn;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these could probably be made final

Copy link
Collaborator

@lwhite1 lwhite1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one question for @hallvard. Otherwise this looks good.

}
}
Column<?> altColumn = appendValue(column, cell);
if (altColumn != null && altColumn != column) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be if (altColumn != null && ! altColumn.equals(column))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. If appendValue return a new Column instance, that equals the old, it should still replace it.

@lwhite1 lwhite1 merged commit 00eddbd into jtablesaw:master Mar 3, 2019
@lwhite1
Copy link
Collaborator

lwhite1 commented Mar 3, 2019

Thanks @hallvard! Also, @benmccann for the very thorough review.

@benmccann
Copy link
Collaborator

Yes, thanks again @hallvard !

import tech.tablesaw.columns.Column;

@Immutable
public class XlsxReader {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this class should be made final so that users can't subclass it and make instances mutable by adding mutable fields. :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... I've only just realised that this PR was already merged in. 🤦‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants