Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHECK(leader_state_cache_.is_lock_free()) fails in gcc 7 #1189

Closed
rajukumaryb opened this issue Apr 10, 2019 · 11 comments
Closed

CHECK(leader_state_cache_.is_lock_free()) fails in gcc 7 #1189

rajukumaryb opened this issue Apr 10, 2019 · 11 comments
Assignees
Labels
area/docdb YugabyteDB core features

Comments

@rajukumaryb
Copy link
Contributor

rajukumaryb commented Apr 10, 2019

Original description of the issue as filed by @rajukumaryb :
Master process does not start on Ubuntu 18.04 following a "destroy + create" sequence

@rajukumaryb
Copy link
Contributor Author

Output of yb-master.raju-ubuntu1804.raju.log.INFO.20190402-234308.21926

Log file created at: 2019/04/02 23:43:08
Running on machine: raju-ubuntu1804
Application fingerprint: version 1.2.4.0 build PRE_RELEASE revision 2af8b68b360b90e8f8fa0b3b8db4d83f7c87d0e5 build_type DEBUG built at 02 Apr 2019 23:29:42 UTC
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0402 23:43:08.656505 21926 master_main.cc:93] NumCPUs determined to be: 8
I0402 23:43:08.658072 21926 mem_tracker.cc:240] MemTracker: hard memory limit is 0.778885 GB
I0402 23:43:08.658131 21926 mem_tracker.cc:242] MemTracker: soft memory limit is 0.662052 GB
I0402 23:43:08.668902 21926 master_main.cc:108] Initializing master server...
I0402 23:43:08.669593 21926 server_base.cc:411] Could not load existing FS layout: Not found (yb/util/env_posix.cc:1013): /home/raju/yugabyte-data/node-1/disk-1/yb-data/master/instance: No such file or directory (error 2)
I0402 23:43:08.669767 21926 server_base.cc:412] Creating new FS layout
I0402 23:43:08.679858 21926 fs_manager.cc:388] Generated new instance metadata in path /home/raju/yugabyte-data/node-1/disk-1/yb-data/master/instance:
uuid: "f67f97ddb95f43868d1a64d02b036bd5"
format_stamp: "Formatted at 2019-04-02 23:43:08 on raju-ubuntu1804"
I0402 23:43:08.687199 21926 fs_manager.cc:388] Generated new instance metadata in path /home/raju/yugabyte-data/node-1/disk-2/yb-data/master/instance:
uuid: "f67f97ddb95f43868d1a64d02b036bd5"
format_stamp: "Formatted at 2019-04-02 23:43:08 on raju-ubuntu1804"
I0402 23:43:08.692404 21926 fs_manager.cc:232] Opened local filesystem: /home/raju/yugabyte-data/node-1/disk-1,/home/raju/yugabyte-data/node-1/disk-2
uuid: "f67f97ddb95f43868d1a64d02b036bd5"
format_stamp: "Formatted at 2019-04-02 23:43:08 on raju-ubuntu1804"
I0402 23:43:08.694295 21926 server_base.cc:198] Auto setting FLAGS_num_reactor_threads to 8
I0402 23:43:08.707659 21926 master_main.cc:111] Starting Master server...
I0402 23:43:08.723924 21926 webserver.cc:147] Starting webserver on 127.0.0.1:7000
I0402 23:43:08.724134 21926 webserver.cc:152] Document root: /home/raju/code/yugabyte/www
I0402 23:43:08.725191 21926 webserver.cc:239] Webserver started. Bound to: http://127.0.0.1:7000/
I0402 23:43:08.729696 21926 rpc_server.cc:167] RPC server started. Bound to: 127.0.0.1:7100
I0402 23:43:08.729861 21926 server_base.cc:450] Using private ip address 127.0.0.1
I0402 23:43:08.731357 22007 sys_catalog.cc:261] Creating new SysCatalogTable data
I0402 23:43:08.746498 22007 sys_catalog.cc:311] Determining permanent_uuid for [127.0.0.1:7100]
I0402 23:43:08.758729 22008 server_base.cc:450] Using private ip address 127.0.0.1
I0402 23:43:08.761692 22007 sys_catalog.cc:326] Setting up raft configuration: opid_index: -1 peers { permanent_uuid: "f67f97ddb95f43868d1a64d02b036bd5" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 7100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } }
I0402 23:43:08.762526 22007 consensus_meta.cc:248] T 00000000000000000000000000000000 P f67f97ddb95f43868d1a64d02b036bd5: Updating active role from UNKNOWN_ROLE to FOLLOWER. Consensus state: current_term: 0 leader_uuid: "" config { opid_index: -1 peers { permanent_uuid: "f67f97ddb95f43868d1a64d02b036bd5" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 7100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } }, has_pending_config = 0
I0402 23:43:08.772284 22007 server_base.cc:450] Using private ip address 127.0.0.1
I0402 23:43:08.772966 22007 tablet_peer.cc:963] T 00000000000000000000000000000000 P f67f97ddb95f43868d1a64d02b036bd5 [state=BOOTSTRAPPING]: Changed state from NOT_STARTED to BOOTSTRAPPING
I0402 23:43:08.773701 22007 consensus_meta.cc:248] T 00000000000000000000000000000000 P f67f97ddb95f43868d1a64d02b036bd5: Updating active role from UNKNOWN_ROLE to FOLLOWER. Consensus state: current_term: 0 leader_uuid: "" config { opid_index: -1 peers { permanent_uuid: "f67f97ddb95f43868d1a64d02b036bd5" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 7100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } }, has_pending_config = 0
I0402 23:43:08.774181 22007 tablet_bootstrap_if.cc:71] T 00000000000000000000000000000000 P f67f97ddb95f43868d1a64d02b036bd5: Bootstrap starting.
I0402 23:43:08.778533 22007 docdb_rocksdb_util.cc:390] Auto setting FLAGS_rocksdb_max_background_flushes to 2
I0402 23:43:08.778910 22007 docdb_rocksdb_util.cc:409] Auto setting FLAGS_rocksdb_max_background_compactions to 2
I0402 23:43:08.778960 22007 docdb_rocksdb_util.cc:416] Auto setting FLAGS_rocksdb_base_background_compactions to 2
I0402 23:43:08.780329 22007 tablet.cc:421] T 00000000000000000000000000000000 P f67f97ddb95f43868d1a64d02b036bd5: Creating RocksDB database in dir /home/raju/yugabyte-data/node-1/disk-1/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000
I0402 23:43:08.808970 22007 tablet.cc:507] Opening RocksDB at: /home/raju/yugabyte-data/node-1/disk-1/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000
I0402 23:43:08.815714 22007 db_impl.cc:478] T 00000000000000000000000000000000: Creating manifest 1
I0402 23:43:08.830760 22007 version_set.cc:2802] T 00000000000000000000000000000000: Recovered from manifest file:/home/raju/yugabyte-data/node-1/disk-1/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/MANIFEST-000001 succeeded,manifest_file_number is 1, next_file_number is 3, last_sequence is 1125899906842624, log_number is 0,prev_log_number is 0,max_column_family is 0, flushed_values is
I0402 23:43:08.831295 22007 version_set.cc:2810] T 00000000000000000000000000000000: Column family [default] (ID 0), log number is 0
I0402 23:43:08.844755 22007 db_impl.cc:5820] T 00000000000000000000000000000000: DB pointer 0x55d62c404000
I0402 23:43:08.845494 22007 tablet.cc:554] T 00000000000000000000000000000000 P f67f97ddb95f43868d1a64d02b036bd5: Successfully opened a RocksDB database at /home/raju/yugabyte-data/node-1/disk-1/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000, obj: 0x55d62c404000
I0402 23:43:08.845839 22007 tablet_bootstrap.cc:400] T 00000000000000000000000000000000 P f67f97ddb95f43868d1a64d02b036bd5: Time spent opening tablet: real 0.068s user 0.026s sys 0.000s
I0402 23:43:08.846583 22007 tablet_bootstrap.cc:339] T 00000000000000000000000000000000 P f67f97ddb95f43868d1a64d02b036bd5: No blocks or log segments found. Creating new log.
I0402 23:43:08.848775 22007 log.cc:396] durable_wal_write is turned on.
I0402 23:43:08.858319 22007 tablet_bootstrap_if.cc:71] T 00000000000000000000000000000000 P f67f97ddb95f43868d1a64d02b036bd5: No bootstrap required, opened a new log
I0402 23:43:08.869271 22007 consensus_meta.cc:248] T 00000000000000000000000000000000 P f67f97ddb95f43868d1a64d02b036bd5: Updating active role from UNKNOWN_ROLE to FOLLOWER. Consensus state: current_term: 0 leader_uuid: "" config { opid_index: -1 peers { permanent_uuid: "f67f97ddb95f43868d1a64d02b036bd5" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 7100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } }, has_pending_config = 0
F0402 23:43:08.874306 22007 replica_state.cc:111] Check failed: leader_state_cache_.is_lock_free()

@rajukumaryb
Copy link
Contributor Author

cc - @bmatican

@bmatican
Copy link
Contributor

@spolitov does CHECK(leader_state_cache_.is_lock_free()); have to be a check?

https://en.cppreference.com/w/cpp/atomic/atomic/is_lock_free

All atomic types except for std::atomic_flag may be implemented using mutexes or other locking operations, rather than using the lock-free atomic CPU instructions. Atomic types are also allowed to be sometimes lock-free

@mbautin
Copy link
Contributor

mbautin commented Jun 19, 2019

Moving comments from #1246 here:

From @d-uspenskiy :

It looks like g++-7 generates non-lock-free code for 128 bits atomic structures

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80878

as a result
CHECK(leader_state_cache_.is_lock_free());
fails at runtime

https://github.com/YugaByte/yugabyte-db/blob/master/src/yb/consensus/replica_state.cc#L111


Some description about changes in g++-7

https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02344.html

@mbautin
Copy link
Contributor

mbautin commented Jun 19, 2019

@mbautin mbautin changed the title Master process does not start on Ubuntu 18.04 following a "destroy + create" sequence CHECK(leader_state_cache_.is_lock_free()) fails in gcc 7 Jun 19, 2019
@spolitov
Copy link
Contributor

Could we just use boost::atomic for 16 bytes structures?

@mbautin
Copy link
Contributor

mbautin commented Jun 20, 2019

From an offline discussion with @spolitov:

  • boost::atomic also has an assertion to check if it is lock-free.
  • We need to revisit all atomics in the codebase and add a check that they are lock-free.

@spolitov
Copy link
Contributor

The check should be added for 16 bytes atomics, so to revisit we could use the following command:
git grep 'std::atomic<' | grep -v -E 'std::atomic<(bool|int|.?int\d\d?_t|size_t|MonoTime|\w+\*|MonoDelta|double)>' | less

It will give all suspicious usages of std::atomic, that should be checked.
Most of them are for enums and other small types and could be skipped.

@mbautin mbautin assigned d-uspenskiy and unassigned mbautin Jun 21, 2019
@mbautin
Copy link
Contributor

mbautin commented Jun 21, 2019

BTW here is the patch that @rajukumaryb has been using locally: https://gist.githubusercontent.com/mbautin/bbb02cbf60df635b8a92a548789d54ed/raw

@mbautin
Copy link
Contributor

mbautin commented Jun 21, 2019

@d-uspenskiy: we should be able to upgrade to gcc 7.x (the default version on Ubuntu 18.04) when we fix the lock-free/atomic-related issues mentioned above.

@d-uspenskiy
Copy link
Contributor

d-uspenskiy commented Jun 21, 2019

git grep 'std::atomic<' | grep -v -E 'std::atomic<(bool|int|.?int\d\d?_t|size_t|MonoTime|\w+*|MonoDelta|double)>' | less

Slightly updated regex:
git grep 'std::atomic<' | grep -v -P 'std::atomic<(bool|int|char|.?int\d\d?(_t)?|size_t|MonoTime|(const\s?)?(\w+::)?\w+\*|[\w<]+>+\*|MonoDelta|double|ThreadStatus::\w+|T|(\w+::)?ScheduledTaskId|TransactionState|TransactionTableStatus|RejectMode|PackedRoleAndTerm|LockState|MicrosTime|MonitoredTaskState|SequenceNumber|FlushState|TransactionID|ExecutionStatus||State|ResolveState|ReactorState|RpcRetrierState|TestTaskState|enum\s+\w+|(std::)?ptrdiff_t|ThreadId|RedisClientMode)>'

yugabyte-ci pushed a commit that referenced this issue Jun 25, 2019
Summary:
For std::atomic `gcc-7` generates non lock-free code for 16 bytes structures even in case `-mcx16` flags is
used for compilation.
Solution is to substitute `std::atomic` with `boost::atomic` for all 16 bytes structures used in YB code

Also false positive `array-bounds` error (due to agressive `-O3` optimization in `release` build) is
fixed in `slice_transform_test.cc`

```
In file included from src/yb/rocksdb/slice_transform.h:34:0,
                 from src/yb/rocksdb/util/slice_transform_test.cc:24:
src/yb/util/slice.h: In member function ‘virtual void rocksdb::SliceTransformTest_CapPrefixTransform_Test::TestBody()’:
src/yb/util/slice.h:64:57: error: array subscript is below array bounds [-Werror=array-bounds]
   Slice(const uint8_t* d, size_t n) : begin_(d), end_(d + n) {}
```

Using `Slice()` explicitly instead of implicit `Slice(const char* s)` constructor for `""` (empty string) solves the issue.
Problem is very specific for this particular place. There is no issue with creation of `Slice` from `""` all the time.

Test Plan: build  and run YB cluster in `debug` and `release` mode with using gcc-7 (`ubuntu` 16+ have gcc-7 as a default compiler)

Reviewers: raju, mikhail, sergei

Reviewed By: sergei

Subscribers: ybase, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D6800
yugabyte-ci pushed a commit that referenced this issue Jun 26, 2019
Summary:
D6777 and D6800 was made in parallel, so one 16 bytes std::atomic left in LockFreeStack.
Changed it to boost::atomic.

Test Plan: Jenkins

Reviewers: dmitry, bogdan

Reviewed By: bogdan

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D6811
d-uspenskiy added a commit that referenced this issue Oct 10, 2019
…cture)

Summary:
Due to gcc 7 issue with lock free structures described in #1189 the following check fails

https://github.com/yugabyte/yugabyte-db/blob/master/src/yb/tablet/operations/operation_driver.cc#L93

Solution: boost::atomic should be used instead of std::atomic

Test Plan:
Manual. Reinit db on centos build
```
./yb_build.sh reinitdb
```

Reviewers: sergei

Reviewed By: sergei

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D7362
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features
Projects
None yet
Development

No branches or pull requests

5 participants