Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release/r2018.04.23 #152

Merged
merged 29 commits into from
Apr 24, 2018
Merged

Release/r2018.04.23 #152

merged 29 commits into from
Apr 24, 2018

Conversation

domodwyer
Copy link

@domodwyer domodwyer commented Apr 23, 2018

Includes:

  • Implement maxIdleTimeout for pooled connections (details)
  • Connection pool waiting improvements (details)
  • Fixes BSON encoding for $in and friends (details)
  • Add BSON stream encoders (details)
  • Add integer map key support in the BSON encoder (details)
  • Support aggregation collations (details)

Thanks to:

jameinel and others added 28 commits February 12, 2018 12:42
* Fallback to JSON tags when BSON tag isn't present


Cleanup.

* Add test to demonstrate tagging fallback.

- Test coverage for tagging test.
Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.
* Add a test that mongo Server gets their abended reset as necessary.

See https://github.com/go-mgo/mgo/issues/254 and
https://github.com/go-mgo/mgo/pull/255/files

* Include the patch from Issue 255.

This brings in a test which fails without the patch, and passes with the
patch. Still to be tested, manual tcpkill of a socket.
Add $changeStream support
* enable shrink the socket pool size

we found the mgo will allocate the pool size during burst traffic but won't
close the sockets any more until restart the client or server.

And the mongo document defines two related query options

- [minPoolSize](https://docs.mongodb.com/manual/reference/connection-string/#urioption.minPoolSize)
- [maxIdleTimeMS](https://docs.mongodb.com/manual/reference/connection-string/#urioption.maxIdleTimeMS)

By implementing these two options, it could shrink the pool to minPoolSize after
the sockets introduced by burst traffic timeout.

The idea comes from https://github.com/JodeZer/mgo , he investigated
this issue and provide the initial commits.

I found there are still some issue in sockets maintenance, and had a PR against
his repo JodeZer#1 .

This commit include JodeZer's commits and my fix, and I simplified the data structure.
What's in this commit could be described as this figure:

+------------------------+
| Session | <-------+ Add options here
+------------------------+

+------------------------+
| Cluster | <-------+ Add options here
+------------------------+

+------------------------+
| Server | <-------+*Add options here
| | *add timestamp when recycle a socket +---+
| +-----------+ | +---+ *periodically check the unused sockets |
| | shrinker <------+ and reclaim the timeout sockets. +---+
| +-----------+ | |
| | |
+------------------------+ |
|
+------------------------+ |
| Socket | <-------+ Add a field for last used times+---------+
+------------------------+

Signed-off-by: Wang Xu <gnawux@gmail.com>

* tests for shrink the socks pool

Signed-off-by: Wang Xu <gnawux@gmail.com>
* Ignore dial error when server is stopped. related to #117

* Print dbtest server starting error before panic.

As seen in #117, dbtest start() throws a panic when it can't start mongo.
This panic is picked up by a panichandler obscuring the actual problem.
This PR simply prints the error start() encounters to stderr, before throwing the panic.
We've seen a deadlock happen occasionally where syncServers needs to
acquire a socket to call isMaster, but the socket acquisition needs to
know the server topology which isn't known yet. See #120
issue for a detailed breakdown.

This replicates the issue by setting up a mongo "server" which closes
sockets as soon as they're opened; about 20% of the time, this will
trigger the deadlock because the acquired socket for ismaster() dies and
needs to be reacquired.
As discussed in the issue #120, isMaster() can cause a
deadlock with the topology scanner if the connection it makes dies
before running the command; mgo automagically attempts to make another
socket in acquireSocket, but this can't work without topology.

This commit forces isMaster() to actually run on the intended socket.
Make run on socket helper methods private
* Add signaling support for connection pool waiting

The current behaviour when the poolLimit is reached and a new connection
is required is to poll every 100ms to see if there is now headroom to
make a new connection. This adds tremendous latency to the
limit-hit-path.

This commit changes the checkout behaviour to watch on a condition
variable for connections to become available, and the checkin behaviour
to signal this variable. This should allow waiters to use connections
immediately after they become available.

A new parameter is also added to DialInfo, PoolTimeout, which is the
maximum time that clients will wait for connection headroom to become
available. By default this is unlimited.

* Add stats for connection pool timings

This exposes four new counters

* The number of times a socket was successfully obtained from the
  connection pool
* The number of times the connection pool needed to be waited on
* The total time that has been spent waiting for a conneciton to become
  available
* The number of times socket acquisition failed due to a pool timeout

* Gitignore .vscode directory

I'm using vscode and accidently committed the .vscode directroy;
.gitignore this footgun.
Why:

  * mongodb aquires array for $in/$nin/$all ops

How:

  * encode with array for spec
…127)

* bson: Added Encoder and Decoder types for stream encoding/decoding.

Those types are analog to those found in json and yaml. They allow us to operate on io.Readers/io.Writers instead of raw byte slices. Streams are expected to be sequences of concatenated BSON documents: *.bson files from MongoDB dumps, for example.

* Stream: NewEncoder and NewDecoder now return pointers.

JSON and YAML do that too, so let's be consistent.

* Stream decoder: added checks on document size limits, and panic handler.

Strangely, the BSON spec defines the document size header as a signed int32, but:
- No document can be smaller than 5 bytes (size header + null terminator).
- MongoDB constrains BSON documents to 16 MiB at most.

Therefore, documents whose header doesn't obey those limits are discarded and Decode returns ErrInvalidDocumentSize.

In addition, we're reusing the handleErr panic handler in Decode to protect from unwanted panics in Unmarshal.

* Exported MinDocumentSize and MaxDocumentSize consts.
* docs: updated readme and docs

* docs: added minimal readme for bson package
* socket: only send client metadata once per socket (#105)

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Merge Development (#111)

* Brings in a patch on having flusher not suppress errors. (#81)

go-mgo#360

* Fallback to JSON tags when BSON tag isn't present (#91)

* Fallback to JSON tags when BSON tag isn't present


Cleanup.

* Add test to demonstrate tagging fallback.

- Test coverage for tagging test.

* socket: only send client metadata once per socket

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Cluster abended test 254 (#100)

* Add a test that mongo Server gets their abended reset as necessary.

See https://github.com/go-mgo/mgo/issues/254 and
https://github.com/go-mgo/mgo/pull/255/files

* Include the patch from Issue 255.

This brings in a test which fails without the patch, and passes with the
patch. Still to be tested, manual tcpkill of a socket.

* changeStream support (#97)

Add $changeStream support

* readme: credit @peterdeka and @steve-gray (#110)

* Hotfix #120  (#136)

* cluster: fix deadlock in cluster synchronisation (#120)

For a impressively thorough breakdown of the problem, see:
	#120 (comment)

Huge thanks to @dvic and @KJTsanaktsidis for the report and fix.

* readme: credit @dvic and @KJTsanaktsidis

* added support for marshalling/unmarshalling maps with non-string keys

* refactor method receiver
#116 adds a much needed ability to shrink the connection pool, but requires
tracking the last-used timestamp for each socket after every operation. Frequent
calls to time.Now() in the hot-path reduced read throughput by ~6% and increased
the latency (and variance) of socket operations as a whole.

This PR adds a periodically updated time value to amortise the cost of the last-
used bookkeeping, restoring the original throughput at the cost of approximate
last-used values (configured to be ~25ms of potential skew).

On some systems (currently including FreeBSD) querying the time counter also
requires a syscall/context switch.

Fixes #142.
This was taken from @ReeseWang's PR and turned into an example - thanks!
@domodwyer domodwyer self-assigned this Apr 23, 2018
@gnawux
Copy link

gnawux commented Apr 24, 2018

great, looking forward to this release.

@domodwyer domodwyer merged commit efe0945 into master Apr 24, 2018
@domodwyer domodwyer deleted the release/r2018.04.23 branch April 24, 2018 09:14
libi pushed a commit to libi/mgo that referenced this pull request Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.