-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] Reads don't reflect all the transactions when table lacks primary key #1646
Comments
Tried to implement the same test in Java: @Test
public void testTableWithoutPrimaryKey() throws Exception {
Statement statement = connection.createStatement();
statement.execute("CREATE TABLE t (k INT, v INT)");
final int NUM_THREADS = 3;
final int NUM_INCREMENTS_PER_THREAD = 1000;
ExecutorService ecs = Executors.newFixedThreadPool(NUM_THREADS);
final AtomicBoolean hadErrors = new AtomicBoolean();
for (int i = 1; i <= NUM_THREADS; ++i) {
final int threadIndex = i;
ecs.submit(() -> {
LOG.info("Workload thread #" + threadIndex + " starting");
int numIncrements = 0;
try (Connection conn =
createConnection(IsolationLevel.REPEATABLE_READ, AutoCommit.ENABLED)) {
Statement stmt = conn.createStatement();
stmt.execute("INSERT INTO t (k, v) VALUES (" + threadIndex + ", 0)");
while (numIncrements < NUM_INCREMENTS_PER_THREAD) {
try {
stmt.execute("UPDATE t SET v = v + 1 WHERE k = " + threadIndex);
numIncrements++;
} catch (PSQLException ex) {
LOG.warn("Error updating in thread " + threadIndex);
}
}
} catch (Exception ex) {
LOG.error("Exception in thread " + threadIndex, ex);
hadErrors.set(true);
} finally {
LOG.info("Workload thread #" + threadIndex + " exiting, numIncrements=" + numIncrements);
}
});
}
ecs.shutdown();
ecs.awaitTermination(30, TimeUnit.SECONDS);
for (int i = 1; i <= NUM_THREADS; ++i) {
ResultSet rs = statement.executeQuery(
"SELECT v FROM t WHERE k = " + i);
assertTrue("Did not find any values with k=" + i, rs.next());
int v = rs.getInt("v");
LOG.info("Value for k=" + i + ": " + v);
assertEquals(NUM_INCREMENTS_PER_THREAD, v);
}
assertFalse("Test had errors, look for 'Exception in thread' above", hadErrors.get());
} Surprisingly, I do get a valid total sum at the end, but I get a lot of this log spew:
|
Got a failure in my Java test: https://gist.github.com/mbautin/d420bf2190381b0772da1451a6ab24f8/raw This is with this patch (with some extra debugging): https://gist.githubusercontent.com/mbautin/da02a70a71ccde1a185a5d9451536d48/raw |
Verified that this issue is related to read restarts. |
… and lost writes Summary: Fixing two bugs: - Only the first operation in a PostgreSQL transaction, no matter if it is a read or a write, is allowed to do an automatic server-side read restart. Previously we were always allowing read operations to do a read restart. - We should not set txn_in_progress_ to false in PgTxnManager::RestartTransaction (which we were doing by calling ResetTxnAndSession). This would sometimes cause the PgDocOp::CheckRestartUnlocked to PgDocWriteOp::SendRequestUnlocked to PgSession::PgApplyAsync to PgSession::GetSessionForOp to PgSession::GetSession to PgTxnManager::BeginWriteTransactionIfNecessary call chain to start a new transaction, abandoning the newly restarted transaction to which a write operation was applied, and as a result not writing the values that the higher layer expected to be written. Also we are now reusing the same YBSession for a restarted transaction rather than creating a new one. For reference, the original commit that introduced the read restart logic in YSQL is 046b4b6 Test Plan: Jenkins (including a new test) Run the new test in a loop: ./yb_build.sh release --java-test TestPgTransactions#testTableWithoutPrimaryKey --tp 1 -n 100 Also verify that the test time for each invocation of the above test is about the same as before this fix (30 seconds on my dev server). Run the old test in a loop from 046b4b6 in a loop: ./yb_build.sh --cxx-test pg_libpq-test --gtest_filter PgLibPqTest.ReadRestart -n 100 Perf test from the same commit (RF=1 local cluster): java -jar yb-sample-apps.jar --workload SqlSnapshotTxns --nodes 127.0.0.1:5433 Results without this commit (upstream commit 7c77cc1): https://gist.githubusercontent.com/mbautin/3a87b759a6091aea486169c87c164059/raw Results with this commit (on top of 7c77cc1): https://gist.githubusercontent.com/mbautin/46379cc7040f207ba0d89efe0d636bee/raw Reviewers: neil, mihnea, sergei, george Reviewed By: sergei, george Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D6848
While porting some concurrency tests from using protobufs to using libpq (which allows sending SQL commands from C++ code), I ended up running into a strange issue. Here is the code for the test in question:
This test creates 3 threads, each of which commits in REPEATABLE READ mode 100 increments on their own per-thread counter. After all the threads finish, each counter is checked to make sure it is exactly 100. It frequently fails with issues akin to the following:
In experimentation by @mbautin and myself, we found a couple more things. First of all, this issue is even reproducible for as few as 2 threads and 1 increment each, though it happens much less frequently. Second, declaring
key
as aPRIMARY KEY
causes this issue to go away completely: after changingkey INT
tokey INT PRIMARY KEY
in the above code and making no other alterations, there are no test failures after 100 trials. Third, it does appear that all of the expected writes are going through, leaving it unclear why the reads don't work. No"Commit failed:"
or"Execution failed:"
messages appear in the logs, so the client is given the impression that everything was committed. And when we adding debugging statements to print out thepgsql_write_batch
structures on the tablet servers, we did see all of the expected updates (on some of the failed tests for smaller numbers of threads/increments).Both of us are currently debugging this, but any additional input about why this might happen is appreciated also.
The text was updated successfully, but these errors were encountered: