Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FVT]2.11 fireware update failed on some of the Firestone machine when concurrently do rflash #471

Closed
tingtli opened this issue Nov 25, 2015 · 5 comments

Comments

@tingtli
Copy link
Contributor

tingtli commented Nov 25, 2015

xCAT 2.11 on rh7.2
[root@c910f03c05k23 xcat-dep]# lsxcatd -v
Version 2.11 (git commit 9885d9a, built Tue Nov 24 03:05:32 EST 2015)
[root@c910f03c05k23 firmware]# rflash c910f05c37,c910f05c33 8335_810.1549.20151116c.hpm
c910f05c37: rflash started, please wait.......
c910f05c33: rflash started, please wait.......
c910f05c33: Error: Running ipmitool command /opt/xcat/bin/ipmitool-xcat -H 50.5.33.10 -I lanplus -U ADMIN -P passw0rd -z 30000 hpm upgrade /firmware/8335_810.1549.20151116c.hpm force failed.
c910f05c37: rflash completed.

@chenglch
Copy link
Contributor

When I set the sleep time to 120s @tingtli has success to update 3 Firestone machines concurrently.
I haven't found the root cause, so I prepare to add a recover time option to set the sleep time for rflash command at first

@whowutwut
Copy link
Member

@chenglch , yeah this is the concern when adding sleep time... It would be best if we could query for some status (say.. every 5-10 seconds) then continue if we can detect some status change.... Otherwise we are always guessing the right timeout.. which won't work every time..

chenglch added a commit to chenglch/xcat-core that referenced this issue Nov 27, 2015
Currently abort failure(0x81) occurres during the initiate stage as
the firmware state is not stable enough after cold reset. As the
delay time is not a constant value, this change add a interface in
site table for user to set this value.

Close-issue: xcat2#471
chenglch added a commit to chenglch/xcat-core that referenced this issue Nov 30, 2015
Currently abort failure(0x81) occurres during the initiate stage as
the firmware state is not stable enough after cold reset. As the
delay time is not a constant value, this change add a interface in
site table for user to set this value.

Close-issue: xcat2#471
chenglch added a commit to chenglch/xcat-core that referenced this issue Nov 30, 2015
Currently abort failure(0x81) occur during the initiate stage as
the firmware state is not stable enough after cold reset. As the
delay time is not a constant value, this change add a interface in
site table for user to set this value.

Close-issue: xcat2#471
chenglch added a commit to chenglch/xcat-core that referenced this issue Nov 30, 2015
Currently abort failure(0x81) occur during the initiate stage as
the firmware state is not stable enough after cold reset. As the
delay time is not a constant value, this change add a interface in
site table for user to set this value.

Close-issue: xcat2#471
@chenglch
Copy link
Contributor

chenglch commented Dec 9, 2015

@tingtli , The root cause for this issue is from internal of ipmitool. rflash is not stable enough and a workaround is provided. We'd better not to use concurrently to describe this issue as rflash also fails in some case with only one node.

@whowutwut
Copy link
Member

@tingtli , @pdlun92 made pull request #818 which should improve the stability of rflash.

@whowutwut
Copy link
Member

Closing this issue for now and I think xCAT is doing the right thing for the rflash command with the pull requests associated with this issue.

Will open new issues if we see them on the large systems when running rflash with 2.11.1 build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants