Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracking] NewDot Deploy Improvements #12021

Closed
roryabraham opened this issue Oct 19, 2022 · 29 comments
Closed

[Tracking] NewDot Deploy Improvements #12021

roryabraham opened this issue Oct 19, 2022 · 29 comments
Assignees
Labels
Engineering Internal Requires API changes or must be handled by Expensify staff NewFeature Something to build that is a new item. Weekly KSv2

Comments

@roryabraham
Copy link
Contributor

roryabraham commented Oct 19, 2022

Problem: For the most part, the NewDot deploy system served us well over the last couple years. We've successfully completed over 2,000 staging deploys and hundreds of production deploys. :star-struck: However, experience has revealed a few issues with the current system:

  • It's complex, which makes it harder to follow and diagnose problems
  • It's brittle. It's a complex state machine that doesn't operate well when it gets into an "unexpected state". When it's in an unexpected state, it's difficult to predict what will happen when you take any given action to try and fix it. This has led to us making problems worse, and even doing an accidental production deploy before.
  • It's unpredictable. The lack of idempotency in deploy actions means that the same action, taken twice, can end up with different results.
  • It's weak. Which is to say, when our QA fails and we do ship bugs to customers, we are limited in our options to fix those, and/or otherwise prevent real users from experiencing those bugs.
  • It's slow. The amount of time it takes to handle deploys and push out fixes eats up a lot of deployer time during their week.
  • It's fundamentally flawed. There are a few rare cases in which the underlying git logic is flawed, and this can end up in the wrong code being deployed to staging and/or production.

These problems prevent daily deploys, waste deployer time, and result in a higher probability of NewDot users experiencing bugs.

Solution: Let's engage in a holistic rebuild of the NewDot deploy system to address these issues and focus on improving the simplicity, durability, idempotency, speed, and flexibility of the system, and include this project in the scope of #vip-waq-everywhere.


Known Issues

TODO:

In Progress:

Done:

@roryabraham roryabraham added Engineering Monthly KSv2 NewFeature Something to build that is a new item. labels Oct 19, 2022
@roryabraham roryabraham self-assigned this Oct 19, 2022
@roryabraham roryabraham added the Internal Requires API changes or must be handled by Expensify staff label Oct 19, 2022
@Julesssss Julesssss added the Reviewing Has a PR in review label Oct 26, 2022
@roryabraham roryabraham removed the Reviewing Has a PR in review label Oct 26, 2022
@melvin-bot melvin-bot bot added the Overdue label Nov 28, 2022
@roryabraham roryabraham changed the title [Tracking] NewDot Deploy Improvements [HOLD][Tracking] NewDot Deploy Improvements Nov 28, 2022
@roryabraham
Copy link
Contributor Author

Putting this on HOLD for WAQ

@roryabraham
Copy link
Contributor Author

I am going to drop this for now with the intention of bringing it back up as a SWM project once they are done with the PR testing project.

@roryabraham
Copy link
Contributor Author

roryabraham commented Jan 17, 2023

Another potentially valuable addition here would be to deprecate .github/libs and provide a cleaner way to share code between the main application, tests, and GitHub Actions. Right now we have issues where CJS and ES6 modules don't know how to talk to eachother properly (example)

@melvin-bot melvin-bot bot added the Overdue label Mar 1, 2023
@melvin-bot melvin-bot bot closed this as completed Mar 31, 2023
@MelvinBot
Copy link

@roryabraham, this Monthly task hasn't been acted upon in 6 weeks; closing.

If you disagree, feel encouraged to reopen it -- but pick your least important issue to close instead.

@Julesssss Julesssss reopened this Mar 31, 2023
@melvin-bot melvin-bot bot added the Overdue label May 4, 2023
@melvin-bot melvin-bot bot closed this as completed Jun 12, 2023
@melvin-bot
Copy link

melvin-bot bot commented Jun 12, 2023

@roryabraham, this Monthly task hasn't been acted upon in 6 weeks; closing.

If you disagree, feel encouraged to reopen it -- but pick your least important issue to close instead.

@melvin-bot melvin-bot bot removed the Overdue label Jun 12, 2023
@roryabraham roryabraham reopened this Jun 13, 2023
@roryabraham
Copy link
Contributor Author

Made a number of improvements last week, including a fix for #27123

@melvin-bot melvin-bot bot removed the Overdue label Nov 20, 2023
@melvin-bot melvin-bot bot added the Overdue label Nov 29, 2023
@roryabraham
Copy link
Contributor Author

roryabraham commented Dec 7, 2023

I think the next improvement I would make here would be to split up the build and upload steps in TestFlight such that:

  1. There's a separate job or workflow for the build, and it generates the build as a build artifact
  2. The upload step just grabs that build artifact and uploads it to TestFlight

I think this could provide a few benefits:

  • If TestFlight upload is slow or flaky (as is pretty common), we can easily download the build artifact and upload it to TF manually
  • As soon as one build is done, we can start the next one without waiting for an upload to complete

In short, I think separating the build and deploy steps might DRY things up, improve atomicity, and lend itself to more robust deploy process overall without additional complexity. It also might make the workflows more readable

@melvin-bot melvin-bot bot removed the Overdue label Dec 7, 2023
@melvin-bot melvin-bot bot added the Overdue label Dec 15, 2023
@roryabraham
Copy link
Contributor Author

No update – been OOO this week with jury duty

@melvin-bot melvin-bot bot removed the Overdue label Dec 16, 2023
@melvin-bot melvin-bot bot added the Overdue label Dec 25, 2023
@roryabraham
Copy link
Contributor Author

No upate

@melvin-bot melvin-bot bot removed the Overdue label Dec 27, 2023
@melvin-bot melvin-bot bot added the Overdue label Jan 5, 2024
@roryabraham
Copy link
Contributor Author

No update, but we're discussing how to hook in HybridApp deploys in the near future.

@melvin-bot melvin-bot bot removed the Overdue label Jan 9, 2024
@melvin-bot melvin-bot bot added the Overdue label Jan 17, 2024
@roryabraham
Copy link
Contributor Author

I think the last idea is a good one. I also think it's a bit of a problem that we can't retry a build on one platform until the builds for all platforms complete.

@melvin-bot melvin-bot bot removed the Overdue label Jan 17, 2024
@melvin-bot melvin-bot bot added the Overdue label Jan 29, 2024
@roryabraham
Copy link
Contributor Author

No update right now

@melvin-bot melvin-bot bot removed the Overdue label Feb 6, 2024
@melvin-bot melvin-bot bot added the Overdue label Feb 15, 2024
@roryabraham
Copy link
Contributor Author

Continued exploration in #22821

@melvin-bot melvin-bot bot removed the Overdue label Feb 16, 2024
@melvin-bot melvin-bot bot added the Overdue label Feb 26, 2024
@roryabraham
Copy link
Contributor Author

Last week I found and fixed some issues with our pod cache that was making our iOS builds unreliable. I'm hoping this will contribute to significantly reduced build failures and hopefully no need to ever go deleting caches manually (though, if you find you have to do this and it helps, please speak up so we can fix it)

@melvin-bot melvin-bot bot removed the Overdue label Feb 27, 2024
@melvin-bot melvin-bot bot added the Overdue label Mar 6, 2024
@roryabraham
Copy link
Contributor Author

I think the time has come to close this long-standing tracking issue. There will always be more improvements to make, but overall the deploy system is in a much better place than it used to be stability-wise. Further improvements and new features would likely better be tracked in their own dedicated, focused issues going forward.

@melvin-bot melvin-bot bot removed the Overdue label Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Engineering Internal Requires API changes or must be handled by Expensify staff NewFeature Something to build that is a new item. Weekly KSv2
Projects
None yet
Development

No branches or pull requests

3 participants