Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support spreadsheet data in corpus form #1748

Open
lukavdplas opened this issue Feb 4, 2025 · 1 comment
Open

Support spreadsheet data in corpus form #1748

lukavdplas opened this issue Feb 4, 2025 · 1 comment
Labels
enhancement improvements to user functionality

Comments

@lukavdplas
Copy link
Contributor

lukavdplas commented Feb 4, 2025

Low priority, but this may be worthwhile in the future.

Is your feature request related to a problem? Please describe.

The corpus form currently only supports CSV source data, but users may receive or process the source data in a different format.

As far as I'm concerned, we will never add support for alternative formats like XML, HTML, RDF, JSON, etc.. Due to the complexity of those formats, users are basically always served better by separate pre-processing software.

That said, maybe supporting spreadsheet data is still worthwhile. I think it's a different case for two reasons:

  • The data format for CSV and spreadsheets is basically the same (a table), so the interface of the form barely needs to change.
  • As an input format, spreadsheets are particularly suitable for users with no programming experience, who collect small datasets for qualitative research. So these are also the users who would benefit the most from built-in support.

Describe the solution you'd like

Allow users to upload XLSX files instead of CSV in the corpus form. Apart from some minor details, the layout of the form will remain the same.

Describe alternatives you've considered

Exporting an Excel file to CSV is quite straightforward; we could also just add instructions for this and encourage the user to export their spreadsheet themself. However:

  • It does make the process a bit more complicated for the user.
  • When you export a spreadsheet to CSV, you lose some data that I-analyzer then has to infer, namely the data type of each cell.

Suggested implementation

  • Expand the corpus JSON schema: add "xlsx" option to the data format.
  • When uploading the sample file, allow the user to pick whether they will upload their data as CSV or XLSX. (Making it either/or is easier to program than allowing users to mix the formats.)
  • If the user selected XLSX files, they don't need to select a delimiter character in the form.
  • The backend has a function to extract a list of columns with their respective data types from a CSV file. Add a similar function for XLSX files.
  • Adjust the make_reader function to pick CSVReader or XLSXReader depending on the selected input type.
@lukavdplas lukavdplas added the enhancement improvements to user functionality label Feb 4, 2025
@jgonggrijp
Copy link
Contributor

Against. Can of worms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement improvements to user functionality
Projects
None yet
Development

No branches or pull requests

2 participants