Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to create case insensitive email index #6465

Closed
mtrezza opened this issue Mar 5, 2020 · 4 comments
Closed

Unable to create case insensitive email index #6465

mtrezza opened this issue Mar 5, 2020 · 4 comments

Comments

@mtrezza
Copy link
Member

mtrezza commented Mar 5, 2020

Issue Description

Upgrading to parse server 3.10.0 to 4.1.0 fails with this error message:

warn: Unable to create case insensitive email index: WiredTigerIndex::insert: key too large to index, failing 1157 { ... }

The indices on username and email both fail to create with the same error message.

  • Issue seems related to fix Case insensitive signup #5634
  • Issue occurs with parse sever 4.1.0 on mongoDB 3.6.17
  • Issue does not occur with parse server 3.10.0 on mongoDB 4.2

Steps to reproduce

Expected Results

Indices should create.

Actual Outcome

Indices fail to create.

Environment Setup

  • Server

    • parse-server version (Be specific! Don't say 'latest'.) : 4.1.0
    • Localhost or remote server? (AWS, Heroku, Azure, Digital Ocean, etc): AWS
  • Database

    • MongoDB version: 3.6.17
    • Storage engine: WT
    • Localhost or remote server? (AWS, mLab, ObjectRocket, Digital Ocean, etc): AWS

Logs/Trace

mongoDB log:

2020-03-05T01:03:07.605+0000 I COMMAND [conn356038] command my-database.$cmd command: createIndexes { createIndexes: "_User", indexes: [ { name: "case_insensitive_username", key: { username: 1 }, background: true, sparse: true, collation: { locale: "en_US", strength: 2 } } ], writeConcern: { w: "majority" }, lsid: { id: UUID("xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx") }, $clusterTime: { clusterTime: Timestamp(1583370186, 29), signature: { hash: BinData(0, 123456789), keyId: 123456789 } }, $db: "my-database" } exception: WiredTigerIndex::insert: key too large to index, failing 1157 { : "..." } code:KeyTooLong numYields:574 reslen:419 locks:{ Global: { acquireCount: { r: 576, w: 576 } }, Database: { acquireCount: { w: 577, W: 2 }, acquireWaitCount: { W: 2 }, timeAcquiringMicros: { W: 16745 } }, Collection: { acquireCount: { w: 576 } } } protocol:op_msg 1475ms

@acinader
Copy link
Contributor

acinader commented Mar 5, 2020

I'm also using WT 3.6. According to the manual, for versions < 4.2 there is a limit of 1024 bytes for an index entry. Presumably, index entries for case insensitive indexes are larger than for case sensitive.

For MongoDB versions greater than 4.2, there is no limit.

My assumption is that you have at least one extraordinarily large entry. The error message indicates that the entry is 1157 char.

The max size for an email address is supposed to be between 250 - 360 char depending on who you ask, so I tried this:

db.getCollection('_User').insertOne({email: 'thisisaverylongemailaddressthatiswaybeoyondanythingiwouldeexpec@asdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasd'})
{
"acknowledged" : true,
"insertedId" : ObjectId("5e60980290772d0ed5cf094e")
}


 and then verified the index with:

db.getCollection('_User').find({email: 'Thisisaverylongemailaddressthatiswaybeoyondanythingiwouldeexpec@asdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasd'}).collation({ locale: 'en_US', strength: 2 }).explain()
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "active._User",
"indexFilterSet" : false,
"parsedQuery" : {
"email" : {
"$eq" : "Thisisaverylongemailaddressthatiswaybeoyondanythingiwouldeexpec@asdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasdfasd"
}
},
"collation" : {
"locale" : "en_US",
"caseLevel" : false,
"caseFirst" : "off",
"strength" : 2,
"numericOrdering" : false,
"alternate" : "non-ignorable",
"maxVariable" : "punct",
"normalization" : false,
"backwards" : false,
"version" : "57.1"
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"email" : 1
},
"indexName" : "case_insensitive_email",
"collation" : {
"locale" : "en_US",
"caseLevel" : false,
"caseFirst" : "off",
"strength" : 2,
"numericOrdering" : false,
"alternate" : "non-ignorable",
"maxVariable" : "punct",
"normalization" : false,
"backwards" : false,
"version" : "57.1"
},
"isMultiKey" : false,
"multiKeyPaths" : {
"email" : [ ]
},
"isUnique" : false,
"isSparse" : true,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"email" : [
"["O79M9M)S1KY?EC51A)9?)//K1MMO7)O9MU)Y+1EYEC/)CYO79C59UEQ?/11WG1-\nz)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)...", "O79M9M)S1KY?EC51A)9?)//K1MMO7)O9MU)Y+1EYEC/)CYO79C59UEQ?/11WG1-\nz)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)M/3)..."]"
]
}
}
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
"host" : "xxx",
"port" : 27017,
"version" : "3.6.8",
"gitVersion" : "6bc9ed599c3fa164703346a22bad17e33fa913e4"
},
"ok" : 1
}


If I make the email address 4x the size and try to insert it, I get this error:

```"errmsg" : "WiredTigerIndex::insert: key too large to index, failing  1926 ...```

Can you check if you have any docs that have very large value using https://docs.mongodb.com/manual/reference/operator/aggregation/strLenBytes/#exp._S_strLenBytes

@mtrezza
Copy link
Member Author

mtrezza commented Mar 5, 2020

@acinader You were right!

I removed some extra-long strings and was able to create the indices manually, without upgrading Parse Server to 4.x:

db.getCollection('_User').createIndex({
    "email": 1
}, {
    name: "case_insensitive_email",
    background: true,
    sparse: true,
    collation: { locale: 'en_US', strength: 2 }
})

db.getCollection('_User').createIndex({
    "username": 1
}, {
    name: "case_insensitive_username",
    background: true,
    sparse: true,
    collation: { locale: 'en_US', strength: 2 }
})

Did I read the code correctly, that these indices are not unique and the existing unique indices for user and email remain? Couldn't the new indices completely replace the existing ones if they were unique?

return Promise.all([
usernameUniqueness,
usernameCaseInsensitiveIndex,
emailUniqueness,
emailCaseInsensitiveIndex,
roleUniqueness,
adapterInit,
indexPromise,
]);

@acinader
Copy link
Contributor

acinader commented Mar 5, 2020

Whew! glad to hear it. We just made your data a little better ;).

Yes, you are correct. The case sensitive unique index remains and is important. The case
insensitive
index is not unique. It is ok if there are case insensitive clashes. Here are two use cases for why the case insensitive index is not unique:

  1. Anonymous users and potentially some auth adapters use randomly generated strings for usernames. In this context, it's perfectly ok for there to be unique strings that could clash when compared case-insensitively.

  2. Existing installations may have clashes already and if we made these indexes unique, those clashes would have to be resolved before the index could be applied.

Q. So why add the case insensitive index at all?

A. In order to prevent duplicate case insensitive username clashes (for example Manuel and manuel), we need to validate each new username (or email). We don't need an index to do this, but without an index, the check would require a full table scan and that would be expensive on large user tables.

@mtrezza
Copy link
Member Author

mtrezza commented Mar 5, 2020

Makes sense! And yes, thanks, this did indeed lead to a data clean-up on our side 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants