Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CassandraTransactionalKeyValue app pure write mode hit "Commit of expired transaction" with 32 writers #196

Closed
robertpang opened this issue Apr 19, 2018 · 1 comment
Assignees
Labels
kind/bug This issue is a bug

Comments

@robertpang
Copy link
Contributor

Ran

/usr/bin/java -jar /home/centos/yb-sample-apps.jar --num_unique_keys 1000000 --num_threads_read 0 --num_threads_write 32 --workload CassandraTransactionalKeyValue --value_size 16 --nodes 10.151.0.10:9042,10.151.0.11:9042,10.151.0.9:9042 --use_ascii_values --create_table_name PerfTest_0

and it immediately hit:

2018-04-17 21:13:30,753 test_base.py:468 INFO 2018-04-17 21:13:30,713 [INFO|com.yugabyte.sample.common.CmdLineOpts|CmdLineOpts] Num reader threads: 0, num writer threads: 32
...
2018-04-17 21:13:31,590 test_base.py:468 INFO 2018-04-17 21:13:31,549 [INFO|com.yugabyte.sample.apps.AppBase|AppBase] Connecting to nodes: /10.151.0.10:9042,/10.151.0.11:9042,/10.151.0.9:9042
2018-04-17 21:13:34,077 test_base.py:468 INFO 2018-04-17 21:13:34,036 [INFO|com.yugabyte.sample.apps.AppBase|AppBase] Created a Cassandra table using query: [CREATE TABLE IF NOT EXISTS PerfTest_0 (k varchar, v blob, primary key (k)) WITH transactions = { 'enabled' : true };]
2018-04-17 21:13:39,094 test_base.py:468 INFO 2018-04-17 21:13:39,053 [INFO|com.yugabyte.sample.common.metrics.MetricsTracker|MetricsTracker] Read: 0.00 ops/sec (0.00 ms/op), 0 total ops  |  Write: 460.24 ops/sec (59.42 ms/op), 2308 total ops  |  Uptime: 5016 ms | maxWrittenKey: 1582 | maxGeneratedKey: 2339 |
2018-04-17 21:13:43,194 test_base.py:468 INFO 2018-04-17 21:13:43,153 [WARN|com.datastax.driver.core.RequestHandler|RequestHandler$SpeculativeExecution] /10.151.0.10:9042 replied with server error (Operation expired (yb/common/wire_protocol.cc:188): Failed UpdateTransaction: tablet_id: "0ec22d447eb14a9cbeddb4758d251546" state { transaction_id: "ErCT\210YB\227\214\005\010q):\246M" status: COMMITTED tablets: "e5f66c717b224c5ea187654b64a3e723" tablets: "2a72a33d15b74ccbba1f49661bd181ab" } propagated_hybrid_time: 6242302448313036801, retrier: { task_id: -1 state: kIdle deadline: 617.918s } to tablet 0ec22d447eb14a9cbeddb4758d251546 on tablet server 5f9c811d179944d3a042d5f3af22885a (10.151.0.10:9100) after 61 attempt(s): Commit of expired transaction), defuncting connection.
2018-04-17 21:13:43,246 test_base.py:468 INFO 2018-04-17 21:13:43,157 [INFO|com.yugabyte.sample.apps.AppBase|AppBase] Caught Exception:
2018-04-17 21:13:43,246 test_base.py:468 INFO com.datastax.driver.core.exceptions.TransportException: [/10.151.0.10:9042] Connection has been closed
2018-04-17 21:13:43,247 test_base.py:468 INFO 	at com.datastax.driver.core.exceptions.TransportException.copy(TransportException.java:38)
2018-04-17 21:13:43,247 test_base.py:468 INFO 	at com.datastax.driver.core.exceptions.TransportException.copy(TransportException.java:24)
2018-04-17 21:13:43,247 test_base.py:468 INFO 	at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
2018-04-17 21:13:43,247 test_base.py:468 INFO 	at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:264)
2018-04-17 21:13:43,247 test_base.py:468 INFO 	at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:68)
2018-04-17 21:13:43,247 test_base.py:468 INFO 	at com.yugabyte.sample.apps.CassandraTransactionalKeyValue.doWrite(CassandraTransactionalKeyValue.java:223)
2018-04-17 21:13:43,247 test_base.py:468 INFO 	at com.yugabyte.sample.apps.AppBase.performWrite(AppBase.java:523)
2018-04-17 21:13:43,247 test_base.py:468 INFO 	at com.yugabyte.sample.common.IOPSThread.run(IOPSThread.java:87)
2018-04-17 21:13:43,248 test_base.py:468 INFO Caused by: com.datastax.driver.core.exceptions.TransportException: [/10.151.0.10:9042] Connection has been closed
2018-04-17 21:13:43,248 test_base.py:468 INFO 	at com.datastax.driver.core.Connection$ConnectionCloseFuture.force(Connection.java:1241)
2018-04-17 21:13:43,248 test_base.py:468 INFO 	at com.datastax.driver.core.Connection$ConnectionCloseFuture.force(Connection.java:1226)
2018-04-17 21:13:43,248 test_base.py:468 INFO 	at com.datastax.driver.core.Connection.defunct(Connection.java:464)
2018-04-17 21:13:43,248 test_base.py:468 INFO 	at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:579)
2018-04-17 21:13:43,248 test_base.py:468 INFO 	at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1101)
2018-04-17 21:13:43,248 test_base.py:468 INFO 	at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1024)
2018-04-17 21:13:43,248 test_base.py:468 INFO 	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
2018-04-17 21:13:43,249 test_base.py:468 INFO 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
2018-04-17 21:13:43,249 test_base.py:468 INFO 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
2018-04-17 21:13:43,249 test_base.py:468 INFO 	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
2018-04-17 21:13:43,249 test_base.py:468 INFO 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
2018-04-17 21:13:43,249 test_base.py:468 INFO 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
2018-04-17 21:13:43,249 test_base.py:468 INFO 	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
2018-04-17 21:13:43,249 test_base.py:468 INFO 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
2018-04-17 21:13:43,250 test_base.py:468 INFO 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
2018-04-17 21:13:43,250 test_base.py:468 INFO 	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:276)
2018-04-17 21:13:43,250 test_base.py:468 INFO 	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:263)
2018-04-17 21:13:43,250 test_base.py:468 INFO 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318)
2018-04-17 21:13:43,250 test_base.py:468 INFO 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304)
2018-04-17 21:13:43,250 test_base.py:468 INFO 	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
2018-04-17 21:13:43,250 test_base.py:468 INFO 	at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:823)
2018-04-17 21:13:43,250 test_base.py:468 INFO 	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:339)
2018-04-17 21:13:43,251 test_base.py:468 INFO 	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:255)
2018-04-17 21:13:43,251 test_base.py:468 INFO 	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
2018-04-17 21:13:43,251 test_base.py:468 INFO 	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
2018-04-17 21:13:43,251 test_base.py:468 INFO 	at java.lang.Thread.run(Thread.java:748)

No such error is hit when the workload is run with 8 writers.

@robertpang robertpang changed the title CassandraTransactionalKeyValue app pure write mode hit "Failed UpdateTransaction" with 32 writers CassandraTransactionalKeyValue app pure write mode hit "Commit of expired transaction" with 32 writers Apr 19, 2018
@rkarthik007 rkarthik007 added the kind/bug This issue is a bug label Apr 19, 2018
yugabyte-ci pushed a commit that referenced this issue Apr 20, 2018
…ibuted transaction in high-load situation

Summary:
A pending distributed transaction will be expired by the transaction coordinator if the transaction has not heartbeated to the coordinator for the timeout specified by the "transaction_timeout_usec" flag (default value = 1.5 sec). The heartbeat frequency is specified by the "transaction_heartbeat_usec" flag (default value = 0.5 sec).

This revision changes the transaction timeout to be specified as multiples of heartbeat frequency using a new flag "transaction_max_missed_heartbeat_periods" with default value = 6, thereby doubling the default timeout.

Test Plan: Run CassandraTransactionalKeyValue pure-write workload with 32 writers against a GCP 3-node, n1-standard-16 cluster.

Reviewers: bharat, mikhail, sergei

Reviewed By: mikhail, sergei

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D4649
@robertpang
Copy link
Contributor Author

Fixed in commit a9fcc08.

jasonyb pushed a commit that referenced this issue Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This issue is a bug
Projects
None yet
Development

No branches or pull requests

2 participants