Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use "text" type mysql field when exporting any id-type field #12598

Closed
wants to merge 2 commits into from

Conversation

adixon
Copy link
Contributor

@adixon adixon commented Jul 30, 2018

Use text type field when exporting any id-type field that might get converted into text of arbitrary length

Overview

Exports of fields that contain "id" type values may contain text of arbitrary length (e.g. values from option groups).

  1. Fix the code so that the previous collection of ad-hoc fixes are replaced by a more comprehensive solution.
  2. Replace the use of varchar(xyz) fields with text type fields to reduce the chance of the row length maximum being reached.

Before

Add a gender type "I am not sure and neither are you" (i.e. anything more than 16 characters). You will be unable to export contacts if you include gender in your export fields: you'll get the unhelpful "DB Error Unknown error" fatal error.

After

Export now works.

Technical Details

  1. This PR as per the discussion here: https://lab.civicrm.org/dev/core/issues/181
  2. Useful documentation about row length contributions of different field types (i.e. why we're using text instead of varchar) is here: https://dev.mysql.com/doc/refman/8.0/en/column-count-limit.html#row-size-limits.
  3. Replaces the code fixes of CRM-12100, CRM-16939, CRM-14398 (at least).

Comments

Testing the effects of the change in a large export would be worthwhile, i.e the performance and disk-usage implications.


@civibot
Copy link

civibot bot commented Jul 30, 2018

(Standard links)

@adixon adixon changed the title Use text type field when exporting any id-type field Use "text" type mysql field when exporting any id-type field Jul 30, 2018
@eileenmcnaughton
Copy link
Contributor

@adixon for now the test fails relate - https://test.civicrm.org/job/CiviCRM-Core-PR/21897/testReport/junit/(root)/CRM_Export_BAO_ExportTest__testGetSQLColumns/testGetSQLColumns_with_data_set__0/ but I agree that performance testing is the key here - we should make sure we get some exports done of ~60000+ rows & comparitive times on it

@adixon
Copy link
Contributor Author

adixon commented Jul 31, 2018

So there's one special case - the primary civicrm id really is exported as an integer number, and it's used as an index on the table, so we need to give it a varchar type. I just updated my PR.

@eileenmcnaughton
Copy link
Contributor

Once I've got the rest of the cleanup on export merged to an rc I'll test this + export in general on some larger exports & get some stats

@eileenmcnaughton
Copy link
Contributor

As as aside @adixon the more I dig into the export the less sure I am that the temp table is used in a way that it improves performance. We actually iterate through every row & write it to the temp table & then pull it out & write to csv. I think the way the merge household & merge address was written is reliant on that process but at least in the case of merge household I'm sure that it could be done at least as efficiently without the temp table

@davidjosephhayes
Copy link
Contributor

Is there anything I can do to help get this pull requested merged?

$lookUp = ['prefix_id', 'suffix_id'];
// special case: civicirm_primary_id exports the contact id and is an index field
if ($fieldName == 'civicrm_primary_id') {
return "$fieldName varchar(16)";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we use varchar to store an integer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this change will need to be tested on servers with significant data sets for performance - we have a good theory as to what will work well but it needs some concerted testing. It's also possible we can & should eliminate the temp table per my comment above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colemanw : in an export, in general, a civi field stored as an integer might actually end up getting a long string value (e.g. the gender field, which is what triggerred this PR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok but in the case of civicrm_primary_id we know the temp table is storing an integer, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so you're asking why varchar(16) for the civicrm_primary_id instead of an integer. The short answer is, I'm not sure, but we're generating the export table and perhaps there's a reason they should all be text fields so we can process them into a csv file more easily? The existing code is converting everything to text fields as well, so that issue is kind of outside the scope of the PR (though it's a reasonable question).

@eileenmcnaughton
Copy link
Contributor

I'm closing this out because it's stale - I actually think the answer, after a lot of digging in that code, is not to write the translated fields to a temp table. We only actually need the temp table when merging same addresses

@adixon
Copy link
Contributor Author

adixon commented Feb 5, 2019

Are you proposing an alternative fix? This is a bug that a client experienced and the patch does resolve it (and also cleans up a few other past hacks), so if you're suggesting an alternative way of resolving it then a few more details might be reasonable to request?

@eileenmcnaughton
Copy link
Contributor

The code has changed quite a bit so the patch doesn't apply anymore - but basically I think we need to review what we are writing to the temp table and why - there really isn't much reason for the temp table in most cases as we iterate through the entire php array anyway it doesn't help with performance and we could just write straight to the csv and skip the temp table.

The one blocker to skipping the temp table is checking how we'd handle merge to address

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants