Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Reads don't reflect all the transactions when table lacks primary key #1646

Closed
georgeklees opened this issue Jun 28, 2019 · 3 comments
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug

Comments

@georgeklees
Copy link
Contributor

While porting some concurrency tests from using protobufs to using libpq (which allows sending SQL commands from C++ code), I ended up running into a strange issue. Here is the code for the test in question:

void PgSerializableTxnTest::DoIncrement(int key, int num_increments, bool serializable) {
  auto conn = ASSERT_RESULT(Connect());

  // Perform increments
  int succeeded_incs = 0;
  while (succeeded_incs < num_increments) {
    ASSERT_OK(Execute(conn.get(), serializable ?
                                  "START TRANSACTION ISOLATION LEVEL SERIALIZABLE" :
                                  "START TRANSACTION ISOLATION LEVEL REPEATABLE READ"));
    auto exec_status = Execute(conn.get(),
                               Format("UPDATE t SET value = value + 1 WHERE key = $0", key));
    if (exec_status.ok()) {
      auto commit_status = Execute(conn.get(), "COMMIT");
      if (commit_status.ok()) {
        succeeded_incs++;
      } else {
        LOG(WARNING) << "Commit failed:" << commit_status;
        ASSERT_OK(Execute(conn.get(), "ROLLBACK"));
      }
    } else {
      LOG(WARNING) << "Execution failed: " << exec_status;
      ASSERT_OK(Execute(conn.get(), "ROLLBACK"));
    }
  }
}

void PgSerializableTxnTest::TestParallelCounter(bool serializable) {
  auto conn = ASSERT_RESULT(Connect());

  ASSERT_OK(Execute(conn.get(), "CREATE TABLE t (key INT, value INT)"));

  const auto kThreads = RegularBuildVsSanitizers(3, 2);
  const auto kIncrements = RegularBuildVsSanitizers(100, 20);

  // Have each thread increment a different counter after making it
  std::vector<std::thread> threads;
  while (threads.size() != kThreads) {
    int key = threads.size();
    ASSERT_OK(Execute(conn.get(),
                      Format("INSERT INTO t (key, value) VALUES ($0, 0)", key)));

    threads.emplace_back([this, key, serializable] {
      DoIncrement(key, kIncrements, serializable);
    });
  }

  // Wait for completion
  for (auto& thread : threads) {
    thread.join();
  }

  // Check each counter
  for (int i = 0; i < kThreads; i++) {
    auto res = ASSERT_RESULT(Fetch(conn.get(),
                                   Format("SELECT value FROM t WHERE key = $0", i)));

    auto row_val = ASSERT_RESULT(GetInt32(res.get(), 0, 0));
    ASSERT_EQ(row_val, kIncrements);
  }
}

/* SNIP */

TEST_F(PgSerializableTxnTest, TestParallelCounterRepeatRead) {
  TestParallelCounter(false);
}

This test creates 3 threads, each of which commits in REPEATABLE READ mode 100 increments on their own per-thread counter. After all the threads finish, each counter is checked to make sure it is exactly 100. It frequently fails with issues akin to the following:

      Expected: row_val
      Which is: 71
To be equal to: kIncrements
      Which is: 100

In experimentation by @mbautin and myself, we found a couple more things. First of all, this issue is even reproducible for as few as 2 threads and 1 increment each, though it happens much less frequently. Second, declaring key as a PRIMARY KEY causes this issue to go away completely: after changing key INT to key INT PRIMARY KEY in the above code and making no other alterations, there are no test failures after 100 trials. Third, it does appear that all of the expected writes are going through, leaving it unclear why the reads don't work. No "Commit failed:" or "Execution failed:" messages appear in the logs, so the client is given the impression that everything was committed. And when we adding debugging statements to print out the pgsql_write_batch structures on the tablet servers, we did see all of the expected updates (on some of the failed tests for smaller numbers of threads/increments).

Both of us are currently debugging this, but any additional input about why this might happen is appreciated also.

@georgeklees georgeklees added kind/bug This issue is a bug area/ysql Yugabyte SQL (YSQL) labels Jun 28, 2019
@mbautin
Copy link
Contributor

mbautin commented Jun 28, 2019

Tried to implement the same test in Java:

  @Test
  public void testTableWithoutPrimaryKey() throws Exception {
    Statement statement = connection.createStatement();
    statement.execute("CREATE TABLE t (k INT, v INT)");
    final int NUM_THREADS = 3;
    final int NUM_INCREMENTS_PER_THREAD = 1000;
    ExecutorService ecs = Executors.newFixedThreadPool(NUM_THREADS);
    final AtomicBoolean hadErrors = new AtomicBoolean();
    for (int i = 1; i <= NUM_THREADS; ++i) {
      final int threadIndex = i;
      ecs.submit(() -> {
        LOG.info("Workload thread #" + threadIndex + " starting");
        int numIncrements = 0;
        try (Connection conn =
                createConnection(IsolationLevel.REPEATABLE_READ, AutoCommit.ENABLED)) {
          Statement stmt = conn.createStatement();
          stmt.execute("INSERT INTO t (k, v) VALUES (" + threadIndex + ", 0)");
          while (numIncrements < NUM_INCREMENTS_PER_THREAD) {
            try {
              stmt.execute("UPDATE t SET v = v + 1 WHERE k = " + threadIndex);
              numIncrements++;
            } catch (PSQLException ex) {
              LOG.warn("Error updating in thread " + threadIndex);
            }
          }
        } catch (Exception ex) {
          LOG.error("Exception in thread " + threadIndex, ex);
          hadErrors.set(true);
        } finally {
          LOG.info("Workload thread #" + threadIndex + " exiting, numIncrements=" + numIncrements);
        }
      });
    }
    ecs.shutdown();
    ecs.awaitTermination(30, TimeUnit.SECONDS);
    for (int i = 1; i <= NUM_THREADS; ++i) {
      ResultSet rs = statement.executeQuery(
          "SELECT v FROM t WHERE k = " + i);
      assertTrue("Did not find any values with k=" + i, rs.next());
      int v = rs.getInt("v");
      LOG.info("Value for k=" + i + ": " + v);
      assertEquals(NUM_INCREMENTS_PER_THREAD, v);
    }
    assertFalse("Test had errors, look for 'Exception in thread' above", hadErrors.get());
  }

Surprisingly, I do get a valid total sum at the end, but I get a lot of this log spew:

ts1|pid115901|:19021|http://127.39.13.47:12954 2019-06-28 02:41:18.981 UTC [116181] STATEMENT:  UPDATE t SET v = v + 1 WHERE k = 2
2019-06-28 02:41:18,982 (pool-4-thread-2) [WARN - org.yb.pgsql.TestPgTransactions.lambda$testTableWithoutPrimaryKey$0(TestPgTransactions.java:94)] Error updating in thread 2
ts1|pid115901|:19021|http://127.39.13.47:12954 2019-06-28 02:41:18.991 UTC [116182] ERROR:  Illegal state: Attempted to restart when transaction does not require restart
ts1|pid115901|:19021|http://127.39.13.47:12954 2019-06-28 02:41:18.991 UTC [116182] STATEMENT:  UPDATE t SET v = v + 1 WHERE k = 3
2019-06-28 02:41:18,992 (pool-4-thread-3) [WARN - org.yb.pgsql.TestPgTransactions.lambda$testTableWithoutPrimaryKey$0(TestPgTransactions.java:94)] Error updating in thread 3
ts1|pid115901|:19021|http://127.39.13.47:12954 2019-06-28 02:41:19.029 UTC [116182] ERROR:  Illegal state: Attempted to restart when transaction does not require restart
ts1|pid115901|:19021|http://127.39.13.47:12954 2019-06-28 02:41:19.029 UTC [116182] STATEMENT:  UPDATE t SET v = v + 1 WHERE k = 3
2019-06-28 02:41:19,030 (pool-4-thread-3) [WARN - org.yb.pgsql.TestPgTransactions.lambda$testTableWithoutPrimaryKey$0(TestPgTransactions.java:94)] Error updating in thread 3
ts1|pid115901|:19021|http://127.39.13.47:12954 2019-06-28 02:41:19.039 UTC [116182] ERROR:  Illegal state: Attempted to restart when transaction does not require restart
ts1|pid115901|:19021|http://127.39.13.47:12954 2019-06-28 02:41:19.039 UTC [116182] STATEMENT:  UPDATE t SET v = v + 1 WHERE k = 3
2019-06-28 02:41:19,039 (pool-4-thread-3) [WARN - org.yb.pgsql.TestPgTransactions.lambda$testTableWithoutPrimaryKey$0(TestPgTransactions.java:94)] Error updating in thread 3
ts1|pid115901|:19021|http://127.39.13.47:12954 2019-06-28 02:41:19.054 UTC [116182] ERROR:  Illegal state: Attempted to restart when transaction does not require restart
ts1|pid115901|:19021|http://127.39.13.47:12954 2019-06-28 02:41:19.054 UTC [116182] STATEMENT:  UPDATE t SET v = v + 1 WHERE k = 3
2019-06-28 02:41:19,055 (pool-4-thread-3) [WARN - org.yb.pgsql.TestPgTransactions.lambda$testTableWithoutPrimaryKey$0(TestPgTransactions.java:94)] Error updating in thread 3
ts1|pid115901|:19021|http://127.39.13.47:12954 2019-06-28 02:41:19.065 UTC [116182] ERROR:  Illegal state: Attempted to restart when transaction does not require restart
ts1|pid115901|:19021|http://127.39.13.47:12954 2019-06-28 02:41:19.065 UTC [116182] STATEMENT:  UPDATE t SET v = v + 1 WHERE k = 3
2019-06-28 02:41:19,066 (pool-4-thread-3) [WARN - org.yb.pgsql.TestPgTransactions.lambda$testTableWithoutPrimaryKey$0(TestPgTransactions.java:94)] Error updating in thread 3
ts1|pid115901|:19021|http://127.39.13.47:12954 2019-06-28 02:41:19.074 UTC [116181] ERROR:  Illegal state: Attempted to restart when transaction does not require restart
ts1|pid115901|:19021|http://127.39.13.47:12954 2019-06-28 02:41:19.074 UTC [116181] STATEMENT:  UPDATE t SET v = v + 1 WHERE k = 2
2019-06-28 02:41:19,074 (pool-4-thread-2) [WARN - org.yb.pgsql.TestPgTransactions.lambda$testTableWithoutPrimaryKey$0(TestPgTransactions.java:94)] Error updating in thread 2
ts1|pid115901|:19021|http://127.39.13.47:12954 2019-06-28 02:41:19.100 UTC [116182] ERROR:  Illegal state: Attempted to restart when transaction does not require restart
ts1|pid115901|:19021|http://127.39.13.47:12954 2019-06-28 02:41:19.100 UTC [116182] STATEMENT:  UPDATE t SET v = v + 1 WHERE k = 3

@mbautin
Copy link
Contributor

mbautin commented Jun 28, 2019

Got a failure in my Java test: https://gist.github.com/mbautin/d420bf2190381b0772da1451a6ab24f8/raw

This is with this patch (with some extra debugging): https://gist.githubusercontent.com/mbautin/da02a70a71ccde1a185a5d9451536d48/raw

@mbautin
Copy link
Contributor

mbautin commented Jun 29, 2019

Verified that this issue is related to read restarts.
With this patch: https://gist.githubusercontent.com/mbautin/31798d432610ab91158e5951107b1d47/raw that disables automatic read restarts in the YSQL codepath
I get 100 successes in the new test: https://gist.githubusercontent.com/mbautin/8a81e8a9009c28145a2c565a0a5a5d19/raw

yugabyte-ci pushed a commit that referenced this issue Jul 2, 2019
… and lost writes

Summary:
Fixing two bugs:
- Only the first operation in a PostgreSQL transaction, no matter if it is a read or a write,
  is allowed to do an automatic server-side read restart. Previously we were always allowing read
  operations to do a read restart.
- We should not set txn_in_progress_ to false in PgTxnManager::RestartTransaction (which we were
  doing by calling ResetTxnAndSession). This would sometimes cause the PgDocOp::CheckRestartUnlocked
  to PgDocWriteOp::SendRequestUnlocked to PgSession::PgApplyAsync to PgSession::GetSessionForOp
  to PgSession::GetSession to PgTxnManager::BeginWriteTransactionIfNecessary call chain to start
  a new transaction, abandoning the newly restarted transaction to which a write operation was
  applied, and as a result not writing the values that the higher layer expected to be written.

Also we are now reusing the same YBSession for a restarted transaction rather than creating a new one.

For reference, the original commit that introduced the read restart logic in YSQL is
046b4b6

Test Plan:
Jenkins (including a new test)

Run the new test in a loop:
./yb_build.sh release --java-test TestPgTransactions#testTableWithoutPrimaryKey --tp 1 -n 100

Also verify that the test time for each invocation of the above test is about the same as before this fix (30 seconds on my dev server).

Run the old test in a loop from 046b4b6 in a loop:
./yb_build.sh --cxx-test pg_libpq-test --gtest_filter PgLibPqTest.ReadRestart -n 100

Perf test from the same commit (RF=1 local cluster):
java -jar yb-sample-apps.jar --workload SqlSnapshotTxns --nodes 127.0.0.1:5433

Results without this commit (upstream commit 7c77cc1): https://gist.githubusercontent.com/mbautin/3a87b759a6091aea486169c87c164059/raw
Results with this commit (on top of 7c77cc1): https://gist.githubusercontent.com/mbautin/46379cc7040f207ba0d89efe0d636bee/raw

Reviewers: neil, mihnea, sergei, george

Reviewed By: sergei, george

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D6848
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug
Projects
None yet
Development

No branches or pull requests

2 participants