Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade fail from 0.14.6 to latest version #160

Closed
Matsue opened this issue Mar 28, 2014 · 3 comments
Closed

Upgrade fail from 0.14.6 to latest version #160

Matsue opened this issue Mar 28, 2014 · 3 comments
Assignees
Labels
Milestone

Comments

@Matsue
Copy link

Matsue commented Mar 28, 2014

I tried to upgrade LeoFS from 0.14.6 to latest version. After upgrade, I can not execute suspend command and resume command for storages. It response "[ERROR] Node not exist".

Settings overview

  • LeoFS Version
    • From: 0.14.6 (installed by leofs-0.14.6-1.x86_64.rpm)
    • To: latest (compiled develop branch)
  • OS: CentOS release 6.5 (Final)
  • Erlang: R15B03 (erts-5.9.3.1)
  • Cluster setting on 0.14.6
  [System config]
                system version : 0.14.6
                total replicas : 2
           # of successes of R : 1
           # of successes of W : 1
           # of successes of D : 1
 # of DC-awareness replicas    : 0
 # of Rack-awareness replicas  : 0
                     ring size : 2^128
              ring hash (cur)  : 0e6faed4
              ring hash (prev) : 0e6faed4

[Node(s) state]
-------------------------------------------------------------------------------------------------------
 type node                          state       ring (cur)    ring (prev)   when
-------------------------------------------------------------------------------------------------------
 S    storage_0@192.168.101.123     running     0e6faed4      0e6faed4      2014-03-28 19:38:49 +0900
 S    storage_0@192.168.101.124     running     0e6faed4      0e6faed4      2014-03-28 19:38:49 +0900
 S    storage_0@192.168.101.125     running     0e6faed4      0e6faed4      2014-03-28 19:38:48 +0900
 S    storage_0@192.168.101.126     running     0e6faed4      0e6faed4      2014-03-28 19:38:48 +0900
 G    gateway_0@192.168.101.122     running     0e6faed4      0e6faed4      2014-03-28 19:38:49 +0900

Operation logs

I followed this document.
http://www.leofs.org/docs/admin_guide.html#upgrade-leofs-v0-14-9-v0-16-0-v0-16-5-to-v0-16-8-or-v1-0-0-pre3

On manager node

# /usr/local/leofs/current/leo_manager_1/bin/leo_manager stop
ok
# /usr/local/leofs/current/leo_manager_0/bin/leo_manager stop
ok
# ln -sTf /usr/local/leofs/1.0.0-pre4 /usr/local/leofs/current
# ls -l /usr/local/leofs/current
lrwxrwxrwx 1 root root 27  3月 28 21:52 2014 /usr/local/leofs/current -> /usr/local/leofs/1.0.0-pre4

# cp -aT /usr/local/leofs/{0.14.6,1.0.0-pre4}/leo_manager_0/work
# find /usr/local/leofs/1.0.0-pre4/leo_manager_0/work/ -maxdepth 2
/usr/local/leofs/1.0.0-pre4/leo_manager_0/work/
/usr/local/leofs/1.0.0-pre4/leo_manager_0/work/mnesia
/usr/local/leofs/1.0.0-pre4/leo_manager_0/work/mnesia/127.0.0.1
/usr/local/leofs/1.0.0-pre4/leo_manager_0/work/queue
/usr/local/leofs/1.0.0-pre4/leo_manager_0/work/queue/membership

# cp -aT /usr/local/leofs/{0.14.6,1.0.0-pre4}/leo_manager_1/work
# find /usr/local/leofs/1.0.0-pre4/leo_manager_1/work/ -maxdepth 2
/usr/local/leofs/1.0.0-pre4/leo_manager_1/work/
/usr/local/leofs/1.0.0-pre4/leo_manager_1/work/mnesia
/usr/local/leofs/1.0.0-pre4/leo_manager_1/work/mnesia/127.0.0.1
/usr/local/leofs/1.0.0-pre4/leo_manager_1/work/queue
/usr/local/leofs/1.0.0-pre4/leo_manager_1/work/queue/membership

# /usr/local/leofs/current/leo_manager_0/bin/leo_manager start
# /usr/local/leofs/current/leo_manager_1/bin/leo_manager start

# telnet localhost 10010

status
[System config]
                System version : 1.0.0
                    Cluster Id : leofs_1
                         DC Id : dc_1
                Total replicas : 1
           # of successes of R : 1
           # of successes of W : 1
           # of successes of D : 1
 # of DC-awareness replicas    : 0
 # of Rack-awareness replicas  : 0
                     ring size : 2^128
             Current ring hash : c4ba139d
                Prev ring hash : c4ba139d

[Node(s) state]
-------+--------------------------------+--------------+----------------+----------------+----------------------------
 type  |              node              |    state     |  current ring  |   prev ring    |          updated at
-------+--------------------------------+--------------+----------------+----------------+----------------------------
  S    | storage_0@192.168.101.123      | running      | 0e6faed4       | 0e6faed4       | 2014-03-28 19:38:49 +0900
  S    | storage_0@192.168.101.124      | running      | 0e6faed4       | 0e6faed4       | 2014-03-28 19:38:49 +0900
  S    | storage_0@192.168.101.125      | running      | 0e6faed4       | 0e6faed4       | 2014-03-28 19:38:48 +0900
  S    | storage_0@192.168.101.126      | running      | 0e6faed4       | 0e6faed4       | 2014-03-28 19:38:48 +0900
  G    | gateway_0@192.168.101.122      | running      | 0e6faed4       | 0e6faed4       | 2014-03-28 19:38:49 +0900

suspend storage_0@192.168.101.123
[ERROR] Node not exist

On storage node

# /usr/local/leofs/current/leo_storage/bin/leo_storage stop
ok
# telnet manager_node 10010

status
[System config]
                System version : 1.0.0
                    Cluster Id : leofs_1
                         DC Id : dc_1
                Total replicas : 1
           # of successes of R : 1
           # of successes of W : 1
           # of successes of D : 1
 # of DC-awareness replicas    : 0
 # of Rack-awareness replicas  : 0
                     ring size : 2^128
             Current ring hash : c4ba139d
                Prev ring hash : c4ba139d

[Node(s) state]
-------+--------------------------------+--------------+----------------+----------------+----------------------------
 type  |              node              |    state     |  current ring  |   prev ring    |          updated at
-------+--------------------------------+--------------+----------------+----------------+----------------------------
  S    | storage_0@192.168.101.123      | stop         |                |                | 2014-03-28 22:07:24 +0900
  S    | storage_0@192.168.101.124      | running      | 0e6faed4       | 0e6faed4       | 2014-03-28 19:38:49 +0900
  S    | storage_0@192.168.101.125      | running      | 0e6faed4       | 0e6faed4       | 2014-03-28 19:38:48 +0900
  S    | storage_0@192.168.101.126      | running      | 0e6faed4       | 0e6faed4       | 2014-03-28 19:38:48 +0900
  G    | gateway_0@192.168.101.122      | running      | 0e6faed4       | 0e6faed4       | 2014-03-28 19:38:49 +0900

quit

# ln -sTf /usr/local/leofs/1.0.0-pre4 /usr/local/leofs/current
# ls -l /usr/local/leofs/current
lrwxrwxrwx 1 root root 27  3月 28 22:55 2014 /usr/local/leofs/current -> /usr/local/leofs/1.0.0-pre4

# cp -aT /usr/local/leofs/{0.14.6,1.0.0-pre4}/leo_storage/work
# find /usr/local/leofs/1.0.0-pre4/leo_storage/work/ -maxdepth 2
/usr/local/leofs/1.0.0-pre4/leo_storage/work/
/usr/local/leofs/1.0.0-pre4/leo_storage/work/mnesia
/usr/local/leofs/1.0.0-pre4/leo_storage/work/queue
/usr/local/leofs/1.0.0-pre4/leo_storage/work/queue/membership
/usr/local/leofs/1.0.0-pre4/leo_storage/work/queue/5
/usr/local/leofs/1.0.0-pre4/leo_storage/work/queue/4
/usr/local/leofs/1.0.0-pre4/leo_storage/work/queue/1
/usr/local/leofs/1.0.0-pre4/leo_storage/work/queue/2
/usr/local/leofs/1.0.0-pre4/leo_storage/work/queue/3

# /usr/local/leofs/current/leo_storage/bin/leo_storage start
# telnet manager_node 10010

status
[System config]
                System version : 1.0.0
                    Cluster Id : leofs_1
                         DC Id : dc_1
                Total replicas : 1
           # of successes of R : 1
           # of successes of W : 1
           # of successes of D : 1
 # of DC-awareness replicas    : 0
 # of Rack-awareness replicas  : 0
                     ring size : 2^128
             Current ring hash : c4ba139d
                Prev ring hash : c4ba139d

[Node(s) state]
-------+--------------------------------+--------------+----------------+----------------+----------------------------
 type  |              node              |    state     |  current ring  |   prev ring    |          updated at
-------+--------------------------------+--------------+----------------+----------------+----------------------------
  S    | storage_0@192.168.101.123      | restarted    | 000000-1       | 000000-1       | 2014-03-28 22:08:59 +0900
  S    | storage_0@192.168.101.124      | running      | 0e6faed4       | 0e6faed4       | 2014-03-28 19:38:49 +0900
  S    | storage_0@192.168.101.125      | running      | 0e6faed4       | 0e6faed4       | 2014-03-28 19:38:48 +0900
  S    | storage_0@192.168.101.126      | running      | 0e6faed4       | 0e6faed4       | 2014-03-28 19:38:48 +0900
  G    | gateway_0@192.168.101.122      | running      | 0e6faed4       | 0e6faed4       | 2014-03-28 19:38:49 +0900

resume storage_0@192.168.101.123
[ERROR] Node not exist

Error logs

At manager_0

...snip...
[E]     manager_0@192.168.101.121       2014-03-28 22:05:53.667124 +0900        139611953       leo_redundant_manager_api:get_redundancies_by_addr_id_1/4       516     "Could not retrieve redundancies"
[E]     manager_0@192.168.101.121       2014-03-28 22:06:04.89115 +0900 139611964       leo_redundant_manager_api:get_redundancies_by_addr_id_1/4       516     "Could not retrieve redundancies"

At manager_1

[E]     manager_1@192.168.101.121       2014-03-28 22:12:40.425586 +0900        139612360       leo_manager_cluster_monitor:register_fun_1/2    438     cause:{aborted,{no_exists,leo_gateway_nodes}}
[E]     manager_1@192.168.101.121       2014-03-28 22:12:42.394524 +0900        139612362       leo_ring_tbl_transformer:migrate_ring/2 109     {aborted,{no_exists,{leo_members_cur,disc_copies}}}
[E]     manager_1@192.168.101.121       2014-03-28 22:12:54.39485 +0900 139612374       leo_ring_tbl_transformer:migrate_ring/2 109     {aborted,{no_exists,{leo_members_cur,disc_copies}}}

At gateway_0

[W]     gateway_0@192.168.101.122       2014-03-28 22:00:14.605102 +0900        139611614       leo_gateway_api:register_in_monitor/3   146     manager:'manager_0@192.168.101.121', cause:timeout
[W]     gateway_0@192.168.101.122       2014-03-28 22:00:19.606119 +0900        139611619       leo_gateway_api:register_in_monitor/3   146     manager:'manager_1@192.168.101.121', cause:timeout
[W]     gateway_0@192.168.101.122       2014-03-28 22:00:34.610075 +0900        139611634       leo_gateway_api:register_in_monitor/3   146     manager:'manager_0@192.168.101.121', cause:timeout
[W]     gateway_0@192.168.101.122       2014-03-28 22:00:39.611114 +0900        139611639       leo_gateway_api:register_in_monitor/3   146     manager:'manager_1@192.168.101.121', cause:timeout
[W]     gateway_0@192.168.101.122       2014-03-28 22:00:54.616080 +0900        139611654       leo_gateway_api:register_in_monitor/3   146     manager:'manager_0@192.168.101.121', cause:timeout
[W]     gateway_0@192.168.101.122       2014-03-28 22:00:59.617103 +0900        139611659       leo_gateway_api:register_in_monitor/3   146     manager:'manager_1@192.168.101.121', cause:timeout
[W]     gateway_0@192.168.101.122       2014-03-28 22:01:14.621074 +0900        139611674       leo_gateway_api:register_in_monitor/3   146     manager:'manager_0@192.168.101.121', cause:timeout
[W]     gateway_0@192.168.101.122       2014-03-28 22:01:19.622110 +0900        139611679       leo_gateway_api:register_in_monitor/3   146     manager:'manager_1@192.168.101.121', cause:timeout
[E]     gateway_0@192.168.101.122       2014-03-28 22:08:09.863765 +0900        139612089       leo_membership:compare_with_remote_chksum/3     393     {'storage_0@192.168.101.123',nodedown}
[E]     gateway_0@192.168.101.122       2014-03-28 22:08:29.871564 +0900        139612109       leo_membership:compare_with_remote_chksum/3     393     {'storage_0@192.168.101.123',nodedown}
[E]     gateway_0@192.168.101.122       2014-03-28 22:08:59.898321 +0900        139612139       leo_membership:notify_error_to_manager/3        418     {'manager_0@192.168.101.121',{error,"Could not get member"}}

...snip...
@yosukehara
Copy link
Member

Thank you for your report. We'll check this issue on next Monday.

@Matsue
Copy link
Author

Matsue commented Mar 31, 2014

I forgot to change replication number setting on configuration file last time. I tried same test with fixed configuration file but it show same responses.

Status before upgrade

status
[System config]
                system version : 0.14.6
                total replicas : 2
           # of successes of R : 1
           # of successes of W : 1
           # of successes of D : 1
 # of DC-awareness replicas    : 0
 # of Rack-awareness replicas  : 0
                     ring size : 2^128
              ring hash (cur)  : 831f612a
              ring hash (prev) : 831f612a

[Node(s) state]
-------------------------------------------------------------------------------------------------------
 type node                          state       ring (cur)    ring (prev)   when
-------------------------------------------------------------------------------------------------------
 S    storage_0@192.168.101.123     running     831f612a      831f612a      2014-03-31 14:46:10 +0900
 S    storage_0@192.168.101.124     running     831f612a      831f612a      2014-03-31 14:46:10 +0900
 S    storage_0@192.168.101.125     running     831f612a      831f612a      2014-03-31 14:46:10 +0900
 S    storage_0@192.168.101.126     running     831f612a      831f612a      2014-03-31 14:46:10 +0900
 G    gateway_0@192.168.101.122     running     831f612a      831f612a      2014-03-31 14:46:10 +0900

Status after upgrade managers

status
[System config]
                System version : 1.0.0
                    Cluster Id : leofs_1
                         DC Id : dc_1
                Total replicas : 2
           # of successes of R : 1
           # of successes of W : 1
           # of successes of D : 1
 # of DC-awareness replicas    : 0
 # of Rack-awareness replicas  : 0
                     ring size : 2^128
             Current ring hash : 1503900f
                Prev ring hash : 1503900f

[Node(s) state]
-------+--------------------------------+--------------+----------------+----------------+----------------------------
 type  |              node              |    state     |  current ring  |   prev ring    |          updated at
-------+--------------------------------+--------------+----------------+----------------+----------------------------
  S    | storage_0@192.168.101.123      | running      | 831f612a       | 831f612a       | 2014-03-31 14:46:10 +0900
  S    | storage_0@192.168.101.124      | running      | 831f612a       | 831f612a       | 2014-03-31 14:46:10 +0900
  S    | storage_0@192.168.101.125      | running      | 831f612a       | 831f612a       | 2014-03-31 14:46:10 +0900
  S    | storage_0@192.168.101.126      | running      | 831f612a       | 831f612a       | 2014-03-31 14:46:10 +0900
  G    | gateway_0@192.168.101.122      | running      | 831f612a       | 831f612a       | 2014-03-31 14:46:10 +0900


status storage_0@192.168.101.123
[config]
            version : 0.14.4
        # of vnodes : 168
      group level-1 :
      group level-2 :
      obj-container : [[{path,"/leofs"},{num_of_containers,8}]]
            log dir : /usr/local/leofs/current/leo_storage/log

[status-1: ring]
  ring state (cur)  : 831f612a
  ring state (prev) : 831f612a

[status-2: erlang-vm]
         vm version : 5.9.3.1
    total mem usage : 26146288
   system mem usage : 11515568
    procs mem usage : 14616520
      ets mem usage : 775352
              procs : 231/1048576
        kernel_poll : true
   thread_pool_size : 32

[status-3: # of msgs]
   replication msgs : 0
    vnode-sync msgs : 0
     rebalance msgs : 0


suspend storage_0@192.168.101.123
[ERROR] Node not exist

Status after upgrade storage_0

status
[System config]
                System version : 1.0.0
                    Cluster Id : leofs_1
                         DC Id : dc_1
                Total replicas : 2
           # of successes of R : 1
           # of successes of W : 1
           # of successes of D : 1
 # of DC-awareness replicas    : 0
 # of Rack-awareness replicas  : 0
                     ring size : 2^128
             Current ring hash : 1503900f
                Prev ring hash : 1503900f

[Node(s) state]
-------+--------------------------------+--------------+----------------+----------------+----------------------------
 type  |              node              |    state     |  current ring  |   prev ring    |          updated at
-------+--------------------------------+--------------+----------------+----------------+----------------------------
  S    | storage_0@192.168.101.123      | restarted    | 000000-1       | 000000-1       | 2014-03-31 14:57:43 +0900
  S    | storage_0@192.168.101.124      | running      | 831f612a       | 831f612a       | 2014-03-31 14:46:10 +0900
  S    | storage_0@192.168.101.125      | running      | 831f612a       | 831f612a       | 2014-03-31 14:46:10 +0900
  S    | storage_0@192.168.101.126      | running      | 831f612a       | 831f612a       | 2014-03-31 14:46:10 +0900
  G    | gateway_0@192.168.101.122      | running      | 831f612a       | 831f612a       | 2014-03-31 14:46:10 +0900


status storage_0@192.168.101.123
[config]
            version : 1.0.0-pre3
        # of vnodes : 168
      group level-1 :
      group level-2 :
      obj-container : [[{path,"/leofs"},{num_of_containers,8}]]
            log dir : /usr/local/leofs/current/leo_storage/log/erlang

[status-1: ring]
  ring state (cur)  : 000000-1
  ring state (prev) : 000000-1

[status-2: erlang-vm]
         vm version : 5.9.3.1
    total mem usage : 55757880
   system mem usage : 45413488
    procs mem usage : 10365256
      ets mem usage : 4875992
              procs : 294/1048576
        kernel_poll : true
   thread_pool_size : 32

[status-3: # of msgs]
   replication msgs : 0
    vnode-sync msgs : 0
     rebalance msgs : 0


resume storage_0@192.168.101.123
[ERROR] Node not exist

@yosukehara yosukehara added the Bug label Apr 15, 2014
@yosukehara yosukehara added this to the 1.0.1 milestone Apr 15, 2014
@yosukehara yosukehara self-assigned this Apr 15, 2014
@yosukehara
Copy link
Member

Sharing my operaion log:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants