Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First step in updating dependencies (to Django 1.8), more-complete test coverage, some refactoring #48

Merged
merged 170 commits into from
Jun 11, 2019

Conversation

jthomale
Copy link
Member

This gigantic set of changes should help set us down the road to having long-outdated dependencies updated. The goal is to have fully updated dependencies (to Django 1.11) and move to Python 3 by 2020. Then we'll work on supporting Django 2.

Merging this into master will create release 1.4.

Django 1.8 and More-recent Dependencies
The supported version of Django is now 1.8, from 1.7. Other dependencies have been updated as far as they can go while still supporting Django 1.8.

Tests, Tests, and More Tests
To ensure the dependency-update process didn't break anything, we've added near-comprehensive test coverage; not necessarily strict unit tests, but tests that at least make sure to touch all of the major end-user-facing functions. The full test suite currently runs ~2300 individual tests. This is the main reason this update took nearly a year—with tests now in place, future dependency updates should be much faster and easier.

Tests were particularly challenging to write due to this system's dependence on a legacy vendor database (Sierra) plus Solr, which doesn't have any existing test-data-generation modules equivalent to model-mommy. Test data and fixtures are included to simulate data flow from Sierra (via a test postgreSQL Sierra database, including fixtures exported from our live catalog) into Solr. I've built modules for generating Solr test data a la model-mommy (utils.test_helpers.solr_factories and utils.test_helpers.solr_test_profiles) and added some helper factories for generating objects that work well as pytest fixtures to clean up Solr test data after each test or after each pytest module, depending on what's needed. This has led to a somewhat more complex test infrastructure than I really wanted, but it works.

Also, the Sierra DB tests that were originally meant to be run via the normal Django test runner have been moved to pytest. This addresses issue #40.

Changes to Sierra Models
Django 1.8 made some changes to models that broke some of the workarounds we'd employed, e.g., to accommodate the lack of single primary keys in some Sierra tables and Django's lack of support for composite keys. This improves models in the following ways.

  • Adds base.fields.VirtualCompField — a virtual composite field that works as a proper PK for Sierra models. Notably, it does NOT work in relationships as a foreign key, but the Sierra models don't need to employ it that way.

  • Adds base.models.ModelWithAttachedName base class for Sierra Property or Type models that have an associated PropertyName or TypeName table containing names or labels for a particular property by language, linked to the IIILanguage table. Previously it was assumed these labels always used the default language, but this base class implements support for multiple languages. A new setting has been added (III_LANGUAGE_CODE) to define the primary language, but the class allows you to request a label using a different language, if it's available in your system.

  • ForeignKey relations that had a unique=True property have been converted to OneToOne relationship types.

New manage.py Command for Creating APIUsers

Previously, in order to create new APIUsers, you had to use the Python shell and run Python commands manually. This adds the importapiusers management command via the api app to allow you to import a CSV file to generate new APIUsers from the command line.

Refactored Code
This set of changes contains a LOT of refactored code to help break things into more granular units to make them more testable. Although I wasn't really aiming to do consistently granular unit tests, in some cases the methods or functions I wrote originally were awful and needed to be refactored.

Particularly, all exporters have been completely refactored. I've added several new types of base classes for helping compose complex export workflows.

Bug Fixes
There are some fixes for various bugs or issues that came to light while writing tests, particularly surrounding API behavior.

  • Using API filters that used the utils.solr.Queryset exclude method now work correctly. Previously, in certain circumstances, they would lead to default View querysets losing their default filters.

  • API filter operators such as gt, gte, lt, etc. now handle queries against string fields that contain spaces. Previously these filters were not escaping the spaces correctly in the underlying Solr query, which led to errors.

  • Incorrect syntax for API filters often led to 500 server errors if/when Solr raised an error because it couldn't parse the underlying query (which was not useful or informative for end users). Now incorrect API filter syntax raises a proper 400 error with a message explaining the issue.

  • API filters using the in and range operators can now escape or quote commas to include them as part of a data value. Previously commas were only used as delimiters in the query argument.

  • Shelflistitem manifests are now created/updated/refreshed for the correct item locations whenever any exporter that utilizes the shelflistitem Haystack index runs. The list of item locations needing a shelflistitem manifest refresh is stored and tracked as items are indexed, whether it's an ItemsToSolr, BibsAndAttachedToSolr, or other type of exporter. (Which addresses BibsAndAttachedToSolr export not generating shelflist item manifests correctly #16.)

BibsDownloadMarc Exporter Deprecation

  • The exporter.basic_exporters.BibsDownloadMarc exporter is now deprecated. All of the Solrmarc-related conversion logic has been moved from exporters down into a lower-level custom Solr backend for Haystack. A separate exporter that converts bibs to MARC and saves them to the filesystem is no longer needed.

jason-ellis and others added 30 commits September 10, 2018 13:21
…sn't appear that model-mommy > 1.3.2 works with Django 1.7.1. Deferring update.
… of sqlparse installed by django-debug-toolbar is broken with that version, so explicitly add highest working version.
…=True to OneToOneField, remove null=True from ManyToManyField.
This is a big commit, but it adds a new custom field type,
`base.fields.VirtualCompField`, a virtual composite field, that we can
use with Sierra models that are lacking a PK or even a truly unique
individual field. The field is virtual, so it isn't reflected in the
database itself.

This will allow us to fix some issues with a few of the models, where
we had defined a non-unique field as a PK just to satisfy Django's
requirements for models. Doing this led to inappropriately restricted
data in the Sierra test database, which means our test data wasn't
reflective of the live data.
This is another large-ish commit that fixes some issues with the initial
VirtualCompField implementation. The biggest improvement is that field
values are no longer represented as composite strings, but tuples, which
makes more sense. Composite string values are still needed for some
database operations and for serialization, but the default separator
value has been changed to `|_|` to reduce the chances that you'd
legitimately encounter that value in your data. Support for
serialization has been added. (Normally we'd just tell it not to
serialize the field, but that doesn't work when the field is used as a
PK.)
This is a big update to models, which includes the following.

* Correct ForeignKey and OneToOne fields are sorted out, as much as
possible. Fields that had been changed to OneToOne earlier to prevent
having a ForeignKey with a `unique` constraint have been vetted and
updated as needed. Composite keys have been added (using
VirtualCompField) where needed to have proper keys that enforce
uniqueness accurately.
* A very large number of the tables with fields that had been changed to
OneToOne that were changed back are *Name tables. These tables implement
multi-language support for naming system codes and properties, where a
*Name table could have multiple entries for the same code in multiple
languages. The true PK for these tables is (code, iii_language_id),
but--previously, without VirtualCompField--I had used `code` as the PK.
This happened to work because we only use one language. These tables
have updated so that they will now actually support multi-lingual sites.
It's still assumed that you'll have one primary language, which you can
now set via the `settings.III_LANGUAGE_CODE` option. I've added a new
abstract model type along with methods to let you get a property name in
the correct language more easily.
* I made tweaks to several of the fields to relax data integrity checks
or update things based on changes to the Sierra database that I had yet
accounted for. I'm working on updating the Sierra test fixtures, adding
several additional models to the fixtures so that I can write tests
against them. But, because I had previously set up things like
ForeignKey fields on `code` columns that aren't actually FKs in the
Sierra DB, trying to force data from the live DB into a more constrained
set of models doesn't work. E.g., for all FKs that use a `code` column,
I added a `db_contraint=False` so I can load fixture data.
* Updated some of the comments and docstrings in the models to reflect
changes.
This changeset adds tests (including new fixtures) for testing that
OneToOneField relations work with the makefixtures and tracebranches
commands. Only one small change to `relationtrees` was needed, to
ensure that trying to access non-existent indirect 1-1 relations on a
model instance doesn't raise errors.
I forgot to add a `get_deletions` implementation on BatchExporter. This
adds it. I also added a few minor improvements to tests for
`get_records` and `get_deletions` methods.
While revisiting the tests for the `api` app (in order to add
`shelflist` app tests for the `shelflistitems` API resources), I
realized that the `solr_assemble_specific_record_data` fixture
almost exactly duplicates the
SolrTestDataAssembler.load_static_test_data method. I removed this
fixture (and the parallel `shelflist` app fixture) from conftest.py and
updated the applicable tests.
Preparing to write some api tests for the `shelflist` app, but I need to
access the test helper functions originally defined in
`api.tests.test_api`. I changed them to pytest fixtures and moved them
into the top-level conftest.py file.
This makes a small change to how SolrTestDataAssembler profiles are
generated (adding a helper method `make_profile`) to expose a way to add
new profiles to an existing assembler on the fly.
In order to use the `assemble_test_records` fixture in non-api-app
tests, I had to remove the `api_solr_env` and `basic_solr_assembler`
fixtures, and then create an api-specific `assemble_api_test_records`
fixture in the api-app tests that did use those fixtures. This fixes a
bug in the shelflist-api tests (not yet committed) where too much test
data was being created because I was inadvertently calling the
`api_solr_env` fixture without wanting to or realizing it.
This adds a set of basic tests for the api resources that the
`shelflist` app adds or modifies: primarily `shelflistitems`, but also
items and locations. Necessary fixtures are added to the shelflist
conftest file. A tiny change to the shelflist views to facilitate
testing. More specific tests that test shelflist-specific features (like
updating/saving data to the API) are coming next.
... and implement as shared pytest fixtures.
Between writing test_api.py and test_model.py tests I created a generic
factory fixture for creating APIUser model test classes, rather than
instantiating APIUser directly. I neglected to switch to using the
fixture in the test_api tests. This takes care of that.
This finishes up the API tests for the `shelflist` app `shelflistitem`
resource: using PUT/PATCH for updating writable resource data, requiring
authentication for PUT/PATCH methods, and ensuring resources are always
listed in shelflist-manifest order.
As we're working on additional indexes fed by Solrmarc (for Blacklight),
it started making more sense to move the Solrmarc conversion process
into a lower-level class so that we can have more uniformity at the
index and exporter levels. This makes that change. It also means we can
remove some cruft from the solr/solrmarc project directory.
Following from the last commit, this removes unnecessary properties
files from the solr/solrmarc project directory. It also updates the
remaining `config.properties` files to remove the illusion that the
`index.properties` file is still defined there. Now it is defined more
granularly in the haystack index class or the Solr backend.
The `CustomQuerySetIndex` `get_qualified_id` method now takes the full
record object to help formulate the ID, rather than assuming it's the
record PK. This adds support for using alternative fields as IDs (in
Solr) on subclasses; they can override that method to provide a
different ID. Fixtures in the main conftest.py file have been updated as
well to utilize `get_qualified_id` rather than putting together the ID
manually. (I think this was always my intention, but I forgot to
implement it.)
Previously I had moved some of the methods for formulating
combined `select_` and `prefetch_related` lists for multiple children
from `CompoundMixin` down into the `AttachedRecordExporter`, because I
thought the latter class would be the only place where that was really
useful. In working on blacklight exporters, I discovered this is not the
case, so I've moved that functionality back up into `CompoundMixin`, but
a little more generalized. The functionality that IS specific to
`AttachedRecordExporter` has been moved into the associated `Child`
class, for determining relationship strings for attached children.
Previously if an index update operation was called for an empty record
set, the SolrmarcIndexBackend would still generate a MARC file (empty)
and attempt to run the Solrmarc Java process. This would lead to a weird
"divide by zero" error simply due to the empty file. This just adds
checks to make sure the file isn't empty before the Solrmarc process is
called.
I had originally put the `importapiusers` management command in the
`sierra` app in order to keep it with the other commands, but this one
actually needs to be run in production, and the `sierra` app is not
installed there. It makes more sense to keep it with the `api` app, so
I've moved it there.
@jthomale
Copy link
Member Author

jthomale commented Jun 11, 2019

Argh. While looking over the updates in this branch already deployed to production, I realized I had an oversight in the shelflist.views.FirstItemPerLocationList API view. In refactoring the api.simpleviews.SimpleGetMixin, I broke what was a monolithic method paginate into several sub-methods. I updated all of the views in the api app (including api.views.FirstItemPerLocationList), but by the time I was adding tests to the shelflist app I had forgotten I still needed to make this change to the overriding version in the shelflist app. Since that class is a subclass of the api version, it doesn't technically generate any errors--it's just missing the extensions that are added in the shelflist app, which means it doesn't work when the Inventory App tries to pull data (specifically a shelflist rowNumber) from that API resource.

Before merging this PR I'm going to go back now and add a test for the shelflist version of this resource that tests to make sure the added fields are there, and then fix the problem. It will take a little while, as I will also need to merge the changes into bl-alpha-solrmarc and redeploy to staging to make sure the problem is fixed.

This corrects a small oversight in which I neglected to realize that the
`shelflist` app implemented a specialized/extended version of
`api.views.FirstItemPerLocationList`. When refactoring
`api.simpleviews`, I broke the `SimpleGetMixin` `paginate` method into
several smaller methods. I updated the
`api.views.FirstItemPerLocationList` view to use the new method, but I
didn't update the `shelflist.views.FirstItemPerLocationList`, and I
didn't add tests to test it. This corrects that mistake--it addes the
appropriate tests and fixes the issue by simply renaming that method on
the `shelflist` version of the view.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants