You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Low priority, but this may be worthwhile in the future.
Is your feature request related to a problem? Please describe.
The corpus form currently only supports CSV source data, but users may receive or process the source data in a different format.
As far as I'm concerned, we will never add support for alternative formats like XML, HTML, RDF, JSON, etc.. Due to the complexity of those formats, users are basically always served better by separate pre-processing software.
That said, maybe supporting spreadsheet data is still worthwhile. I think it's a different case for two reasons:
The data format for CSV and spreadsheets is basically the same (a table), so the interface of the form barely needs to change.
As an input format, spreadsheets are particularly suitable for users with no programming experience, who collect small datasets for qualitative research. So these are also the users who would benefit the most from built-in support.
Describe the solution you'd like
Allow users to upload XLSX files instead of CSV in the corpus form. Apart from some minor details, the layout of the form will remain the same.
Describe alternatives you've considered
Exporting an Excel file to CSV is quite straightforward; we could also just add instructions for this and encourage the user to export their spreadsheet themself. However:
It does make the process a bit more complicated for the user.
When you export a spreadsheet to CSV, you lose some data that I-analyzer then has to infer, namely the data type of each cell.
Suggested implementation
Expand the corpus JSON schema: add "xlsx" option to the data format.
When uploading the sample file, allow the user to pick whether they will upload their data as CSV or XLSX. (Making it either/or is easier to program than allowing users to mix the formats.)
If the user selected XLSX files, they don't need to select a delimiter character in the form.
The backend has a function to extract a list of columns with their respective data types from a CSV file. Add a similar function for XLSX files.
Adjust the make_reader function to pick CSVReader or XLSXReader depending on the selected input type.
The text was updated successfully, but these errors were encountered:
Low priority, but this may be worthwhile in the future.
Is your feature request related to a problem? Please describe.
The corpus form currently only supports CSV source data, but users may receive or process the source data in a different format.
As far as I'm concerned, we will never add support for alternative formats like XML, HTML, RDF, JSON, etc.. Due to the complexity of those formats, users are basically always served better by separate pre-processing software.
That said, maybe supporting spreadsheet data is still worthwhile. I think it's a different case for two reasons:
Describe the solution you'd like
Allow users to upload XLSX files instead of CSV in the corpus form. Apart from some minor details, the layout of the form will remain the same.
Describe alternatives you've considered
Exporting an Excel file to CSV is quite straightforward; we could also just add instructions for this and encourage the user to export their spreadsheet themself. However:
Suggested implementation
"xlsx"
option to the data format.make_reader
function to pickCSVReader
orXLSXReader
depending on the selected input type.The text was updated successfully, but these errors were encountered: