Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some forms with Unicode characters fail to deploy #2591

Closed
jnm opened this issue Mar 11, 2020 · 0 comments
Closed

Some forms with Unicode characters fail to deploy #2591

jnm opened this issue Mar 11, 2020 · 0 comments
Assignees
Labels
bug Things broken and not working as expected

Comments

@jnm
Copy link
Member

jnm commented Mar 11, 2020

Symptoms:

  • Error messages when trying to deploy:
    • No JSON object could be decoded
    • Expecting value: line 1 column 1 (char 0)
  • Exporting XLSForm from KPI and attempting to upload that directly into KoBoCAT results in a 500 error

Detail:

The XLSForm exported by KPI in Excel 2003 format (using https://pypi.org/project/xlwt/) seems to be sometimes invalid, or at least impossible to read with https://pypi.org/project/xlrd/. The full exception from KoBoCAT when trying to read these files is:

UnicodeDecodeError: 'utf16' codec can't decode bytes in position 0-1: unexpected end of data
  File "django/core/handlers/base.py", line 132, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "django/utils/decorators.py", line 145, in inner
    return func(*args, **kwargs)
  File "django/views/decorators/csrf.py", line 58, in wrapped_view
    return view_func(*args, **kwargs)
  File "rest_framework/viewsets.py", line 87, in view
    return self.dispatch(request, *args, **kwargs)
  File "rest_framework/views.py", line 466, in dispatch
    response = self.handle_exception(exc)
  File "rest_framework/views.py", line 463, in dispatch
    response = handler(request, *args, **kwargs)
  File "rest_framework/mixins.py", line 78, in partial_update
    return self.update(request, *args, **kwargs)
  File "onadata/apps/api/viewsets/xform_viewset.py", line 776, in update
    survey = utils.publish_xlsform(request, owner, existing_xform)
  File "onadata/apps/api/tools.py", line 266, in publish_xlsform
    return publish_form(set_form)
  File "onadata/libs/utils/logger_tools.py", line 455, in publish_form
    return callback()
  File "onadata/apps/api/tools.py", line 262, in set_form
    return form.publish(user, existing_xform.id_string)
  File "onadata/apps/main/forms.py", line 326, in publish
    return publish_xls_form(cleaned_xls_file, user, id_string)
  File "onadata/libs/utils/logger_tools.py", line 510, in publish_xls_form
    dd.save()
  File "onadata/apps/viewer/models/data_dictionary.py", line 150, in save
    survey = create_survey_from_xls(self.xls)
  File "pyxform/builder.py", line 295, in create_survey_from_xls
    excel_reader = SurveyReader(path_or_file)
  File "pyxform/xls2json.py", line 1176, in __init__
    path, warnings=self._warnings, file_object=self._file_object)
  File "pyxform/xls2json.py", line 1086, in parse_file_to_json
    workbook_dict = parse_file_to_workbook_dict(path, file_object)
  File "pyxform/xls2json.py", line 1065, in parse_file_to_workbook_dict
    return xls_to_dict(file_object if file_object is not None else path)
  File "pyxform/xls2json_backends.py", line 43, in xls_to_dict
    workbook = xlrd.open_workbook(file_contents=path_or_file.read())
  File "xlrd/__init__.py", line 162, in open_workbook
    ragged_rows=ragged_rows,
  File "xlrd/book.py", line 116, in open_workbook_xls
    bk.parse_globals()
  File "xlrd/book.py", line 1199, in parse_globals
    self.handle_sst(data)
  File "xlrd/book.py", line 1170, in handle_sst
    self._sharedstrings, rt_runlist = unpack_SST_table(strlist, uniquestrings)
  File "xlrd/book.py", line 1413, in unpack_SST_table
    accstrg += unicode(rawstrg, "utf_16_le")
  File "encodings/utf_16_le.py", line 16, in decode
    return codecs.utf_16_le_decode(input, errors, True)

Discussion: https://www.flowdock.com/app/kobotoolbox/kobo/threads/IyTCl2SFPKdRWuvxnT4Ed7FYc1E

@jnm jnm added the bug Things broken and not working as expected label Mar 11, 2020
@jnm jnm self-assigned this Mar 11, 2020
jnm added a commit that referenced this issue Mar 11, 2020
XLSX is now used instead of XLS when deploying forms to KoBoCAT
and downloading forms as XLSForm. Fixes #2591.
@jnm jnm closed this as completed Mar 12, 2020
jnm added a commit that referenced this issue May 14, 2020
XLSX is now used instead of XLS when deploying forms to KoBoCAT
and downloading forms as XLSForm. Fixes #2591.
WinnyTroy pushed a commit to onaio/kpi that referenced this issue Sep 17, 2020
XLSX is now used instead of XLS when deploying forms to KoBoCAT
and downloading forms as XLSForm. Fixes kobotoolbox#2591.
WinnyTroy added a commit to onaio/kpi that referenced this issue Jan 27, 2021
* Exclude KoBoCat internal permission assignments from ViewSet list

* Use `object.remove_perm()` to bulk delete permission assignments prior to assign new ones.

* new branch same issue, fixed all the comments in pull request 2500 as well

* Add KPI identifier to service_health response

and use the KoBoCAT response `text` (a `str`) instead of the `content`,
which is `bytes`

* fix AccessDenied import usage - consistent with other ui.es6 ones

* Renamed AssetEditorNestedObjectPermission to a less confusing name

* start rebuilding styles for service-row

* align columns

* fix z-indexes

* Remove USE_SAME_DATABASE code

since it's no longer possible for KPI and KoBoCAT to share a database

* Add checks for two-database upgrade problems

Towards kobotoolbox#2543

* Add management command to check for empty database

Will be used by kobo-install; see kobotoolbox/kobo-install#65

* Re-raise unrecognized OperationalErrors

* Add management command that waits for database…

to begin accepting connections

* Resolve `SyntaxWarning: "is" with a literal`

* Don't complain if KoBoCAT database has no KPI data

…when running two-database migration checks

* WIP: getting download URL for media files

* geopoint query limit removed if a specific question is queried

* removed console logs

* WIP: asset values causing cross-domin error when fetching

* WIP: got links to display in table, now need to get real links

* Displays correct url in table view for media links

* replaced .some with .forEach

* only remove 5000 limit if viewby exists

* made links open in new tab on click

* Update actions import

* Update actions import

* Make sure user matches when validating permissions

* fix tests and fix vendor splitting for npm run watch

* moved changes to map.es6 from dataInteface.es6

* Requested PR changes

* Last PR change

* undid indentation

* changed getMediaDownloadLink

* now prints submissions as expected

* removed annoying space ;)

* avoid spread crash

* added print button to submission view, refactored accordingly

* WIP: slider updates map but not slider itself

* WIP: Slider works as expected, working on visuals

* WIP: data pull bugs out when limit changes too quickly, added delay

* WIP: Added change to how slider should be displayed

* WIP: Moving slider to settings modal

* WIP: Moved slider to settings modal, need testing on large datasets

* points htmlFor of output to input

* Forced DB name for Mongo (needed with auth)

* Support special characters in MongoDB and redis passwords

* Force update yarn debian apt keys

* Use "quote_plus" instead of "quote" - Username and password must be escaped according to RFC 3986

* WIP: resets query to default limit on refresh in case limit crashes browser

* WIP: Set all values to thousands

* Made more verbose warning, PR ready :)

* changed where name is taken from file or url import, removed unused const

* removed getFilenameFromURI function

* Ignore map_style if selectedQuestion is not 'geopoint'

* implemented requested changes see PR

* removed leftover spaces and console logs

* made requested changes, see PR

* WIP: making some requested changes from pr, removing magic numbers

* made all requested changes except changing slider to number input

* Delete test3_-_latest_version_-_labels_-_2020-02-13-05-47-52.xlsx

* removed console logs

* removed weird random semi-colon

* Delete test3_-_latest_version_-_labels_-_2020-02-13-05-47-52.xlsx

* Update database split info in readme

* made all requested changes, see pr, added writeable number input alongside slider

* WIP: fixing bugs related to RESET and SAVE buttons, removed redundant css

* typo in docs

* reimport query_parser and grammar

* Use canopy parser for asset queries.

* reindentation & comment revisions

* toward kobotoolbox#2514 - remove woosh/haystack fallback

* return empty queryset for parse/field errors

toward kobotoolbox#2514

* Allow users to see AnonymousUser permissions whatever their permission on Asset

* fixed SAVE button input bug

* cleaned up a bit

* Finish integrating Canopy/PEG query parser

Towards kobotoolbox#2514

* Purge Haystack and Whoosh; clean up settings a bit

Closes kobotoolbox#2514

* Fix README typo

* Force update yarn apt repo key on build

* Revert asset.summary field back to jsonfield

* Changed JSONFields to JSONBFields

* Removed red warning when group title field is empty

* removed local media folder added by accident

* Get rid of "import_survey_drafts_from_dkobo" management command

* refactored protected "_filter_by_source_kludge" to public "filter_by_source"

* Requested changes for PR#2578

* Move `_set_auto_field_update()` into model_utils

Solves `django.db.utils.ProgrammingError: relation "django_content_type" does not exist`
when running tests or running migrations on an empty database

* Eliminate `filter_by_source()` in favor of…

`filter(data__source=source)` now that we have support for querying
inside JSONB columns

* Add Unicode character to asset search test query

* Add unit test for anonymous permissions in API

* Bumped Python version to 3.8 for TravisCI

* fix tests

* Added requested PR changes

* removed console logs

* made changes to reset button exclusive to the querylimit tab

* added the s ;)

* cleanup, fix onupdating actions, fix immediate fetch action

* espace name in delete modal to be safe

* escape one more place

* put back removed css, removed group label placeholder

* Read the CSRF token from the DOM instead of cookie

Closes kobotoolbox/tasks#343
Closes kobotoolbox/tasks#344

* Set the HttpOnly flag on the CSRF cookie

REQUIRES kobotoolbox#2588!
Closes kobotoolbox/tasks#116.

* Add XlsxWriter dependency

* Use XlsxWriter instead of xlwt in to_xls_io()

XLSX is now used instead of XLS when deploying forms to KoBoCAT
and downloading forms as XLSForm. Fixes kobotoolbox#2591.

* Flag CSRF and Session cookie as secure

* Set secure cookie based on PUBLIC_REQUEST_SCHEME

…or SECURE_PROXY_SSL_HEADER

* Fix my boneheaded typo

* Updated pip dependencies: added ssrf-protect, changed django-markdown and django-request-cache to their pypi version

* Use ssrf-protect to validate hook endpoint

* allow longer xml value names

* change slug limit to 40 to mimick BE code

* fix few small linter things

* fix group labels not saving

* Remove testing kludges for S3 storage

Closes kobotoolbox#2280

* Fix `TypeError: 'bool' object is not callable`

…when attempting to download an export

* Added better error message for deleted REST submission

* Removed Submission 0 or 1 if there is no sequence to display

* Add TODO marking front-end kludge

…referencing kobotoolbox#2562

* Fixed bug where adding translations to matrix would add another null language

* Moved matrix survey JSON to if statement

* Updated pip dependencies: added ssrf-protect, changed django-markdown and django-request-cache to their pypi version

* Use ssrf-protect to validate hook endpoint

* avoid spread crash

* Bring back locale as submodule

* Applied good practices for bash syntax

* Install "raven" in dev mode when Sentry DSN is present

* Added quotes around src= in copy paste for 'Embeddable web for code'

* Removed wrapping double quotes around UWSGI_COMMAND

* Revert "Protect hook endpoints against SSRF attacks"

* Capitalize KoBoCAT in docs and messages

* Add explanatory comment and adjust formatting

* `try` only the call that could raise the exception

and `assert` without parentheses

* Updated SSRF Protect version, fixed tests using it

* Add failing test case for denying anonymous perms

See kobotoolbox#2528

* Do not expose denied permissions

* Allow to assign a denied permission to AnonymousUser even if it does not belong to anonymous user's allowed permissions

* Prefer `not` to `is False` and clarify message

* Temporarily return to text-based JSON fields

Partially reverts kobotoolbox#2578

* Use JSON `\u` escaping when searching `summary`

This is necessary while JSON fields are stored as text in Postgres

* Add management command to migrate text to jsonb…

manually, without locking entire tables at once. This helps avoid
downtime on large databases

* Replace mutable `{}` defaults with `dict`

* Edit asset page : Unwanted hints shown in other items when user is pointing to an item icon.

* Added EXIF orentation fix

* Added network check before username check

* Added gradual pagination to version history

* Dropped makeEditable from Choices, used input instead

* Need to implement proper placeholders before switching to input tag for default options

* Fixing null character that prevents 0024
from completing if invalid records exist

* fixed typos in migration

* Remove unnecessary jsonb-to-jsonb conversion; use

`import…as JSONBField` consistently

* Mark converted jsonb fields as non-nullable

Toward kobotoolbox#2635

* When converting text to jsonb, make sure no NULLs…

exist in the text column

* Add management command suggestion to 0024 migration

* Remove reference to 2.019.52-final-shared-database

since we are now using the `shared-database-obsolete` branch instead of
a tag

* Add note about shared-database branch to README

* Made PR changes

* Allow edit form permissions to view media and sharing

* Changed all makeEditable to input fields

* Only owner can see settings/media

* Added check for owner

* Stop sharing cookie for CSRF, but continue…

sharing the session cookie with KoBoCAT. Requires accompanying changes in
KoBoCAT and kobo-docker.  See kobotoolbox#2658

* preload permissions config before loading app

* Removed Media from sidebar if not owner

* Added navigation for choice questions

* Removed duplicate failed notify

* Created TravisCI pip dependencies instead of listing them in ".travis.yml"

* Fixed permissions listing on collections with api V1

* Bypass unicode search test

* Force AssetSnapshot source to be null on init

* Fix AssetSnapshot.save() for non-nullable source

* Made query parser friendlier to import

* Fix query parser import in unit test

* Added navigation to making a new question

* less webpack output

* Shift+Up/Down for label navigation

* Ctrl+Alt+N for opening new question

* Removed makeEditable viewUtils

* Moved check for active connection

* Added tabindex navigation

* Fixed new input fields not saving

* Highlight newest option and small clean up

* Added boxes to all tabbable elements

* Optimize queries; fixes kobotoolbox#2671

Fixes the bug portion of kobotoolbox#2679, where the `/reports/` list view shows
public-but-unsubscribed assets. Closes kobotoolbox#2653, and includes
`defer('content')` on the report detail view in the hopes that this
improves performance for very large surveys

* WIP getting around 100 version limit

* Moved constant back to formLanding

* Removed old check for non-geopoint question

* Update message for consistency

* Use chunks to update XForms

* Fixed forgotten field for ordering

* Improved memory footprint

* Added summary counts at the end

* Replace recursive function to get chunks with "queryset.iterator()" instead

* Applied PR requested changed. Filtered queryset by "survey" assets' type

* Filtered queryset with new 'deployed()' Asset manager method

* Add failing test for kobotoolbox#2698

* Fix check for anonymous permissions when…

an authenticated user has no explicitly-assigned permissions.
Fixes kobotoolbox#2698.

* Use permission constants instead of strings…

in some tests

* Improve documentation for kobotoolbox#2698 test case

* Applied title fix to dropped files

* Added title fix to url imports

* get a better error message from Raven

Towards kobotoolbox#2450

* Added PR changes

* Added PR changes

* Added PR css changes

* Added taxIndex PR changes

* Added styling PR changes

* Fix KoBoCAT sequences after syncing users

Toward kobotoolbox#2704

* Simplify sync of django-digest `PartialDigest`s…

to avoid manually setting primary keys. Fixes kobotoolbox#2704

* Empty group labels showing for translated forms

* Increase request body & file upload max sizes

* move ChangePassword to separate file

* use same layout as account settings, introduce password strength

* Added PR changes

* Added SSRF options to constance configuration

* Pass SSRF options from constance config to SSRFProtect validation

* hackfix icons caching by applying a timestamp

* Reduce session timeout to 1 week

The documented Django default was 2 weeks. Closes kobotoolbox/tasks#336

* Have CSRF token stored in cookies

* Fix typo

* pull the 64 character csrftoken from document.cookie

if it exists. otherwise use existing call to `cookies.get(...)`

* cleanup and fix loading versions from further calls

and fix versions update during redeploy

* display add row button when focusing inside a row

* cleanup styles

* mixinize focus styles

* Removed white border when focused

* Removed leftover script from other issue

* Include HXL tag only once for select_multiples

(by upgrading formpack; see kobotoolbox/formpack#208)

* Handle XLSForm `disabled` column appropriately

…when building reports and exports, by upgrading formpack to fix
kobotoolbox/formpack#219

* Add basic test for exporting form with `disabled`

* Moved reading CSRF cookie to ajax

* fixed indentation

* moved var instantiation

* Include modifications to get kpi able to create and edit forms

* Create deployment for asset when importing xls data into empty asset

* Remove Celery files from kobo

* Remove Files referencing celery imports

* Install lower version of celery and include previous files with celery imports

* Include aability to grant default model level perms to authenticated user

* Remove 'Return to list' and 'close' icons from landing page when creating a form

* Remove ONA_TITLE constant from the utils file to the config file.
Rename vars from ONA_TITLE to WEB_PAGE_TITLE

* Prevent deployment for non-survey assets

Show successful update message when non-survey assets have been successfully updated

Signed-off-by: Mark Ekisa <mark.ekisa@gmail.com>

* Add can_publicize_collection key to extra_details json object

Signed-off-by: Mark Ekisa <mark.ekisa@gmail.com>

* Update raven pip package

* Rebase with kobo/two_databases

* Update KPI authentication Module (#28)

* Get authentication model from kobocat db
Use Token object to retrieve Onadata user associated with Token. Fetch username for Onadata User
Get or Create KPI User from Ondata user username and email var

* Grant permissions when creating user object.

* Return tuple with new KPI user and token

* Handle Token.DoesNotExist error

* Return namespace in urls, instead have this present in the permission endpoint

* Rebase with kobo/two_databases

* Retrieve form json payload from onadata then update asset

Signed-off-by: Mark Ekisa <mark.ekisa@gmail.com>

* Remove updates to Kobo linking forms on Kobo to assets on KPI (#32)

* Updates to the imports Endpoint. (#35)

* Authenticate user from Onadata db when making requests to import the form url into KPI and also when fetching form information from Onadata.

* code cleanup

Co-authored-by: Olivier Leger <olivierleger@gmail.com>
Co-authored-by: duvld <ollejna@gmail.com>
Co-authored-by: John N. Milner <john@tmoj.net>
Co-authored-by: Leszek Pietrzak <leszek@magicznyleszek.xyz>
Co-authored-by: Philip Edwards <phil@edwards.io>
Co-authored-by: Agus Hilman <gushil@gmail.com>
Co-authored-by: Alex Dorey <dorey415@gmail.com>
Co-authored-by: duvld <duvld@github.com>
Co-authored-by: Mark Ekisa <mark.ekisa@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Things broken and not working as expected
Projects
None yet
Development

No branches or pull requests

1 participant