-
Notifications
You must be signed in to change notification settings - Fork 773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setup monorepo with lerna, account library integration #561
Comments
With the stable Istanbul being released, I think it's time to re-start this conversation. I had two specific discussion-points in mind, namely:
Re using the same version: I think this has some advantages. To achieve it we have 2 options as I see it. If we rename libraries we can safely reset the version. If we don't rename, we can upgrade all libraries to the biggest version among them (e.g. if VM has v5.x and all other v2.x, we upgrade all to v6) |
Yes, I also thought that it would be very much the time for the mono repo implementation. We can also very much do this step-by-step and on a first round e.g. keep everything "as is" so the VM integrating all the modules it currently does and just setup the outer mono-repo structure and maybe do a first bug fix release for testing the new structure (just a first thought, not sure if exactly this way makes sense, but generally: we can take on this one step after the other). @s1na on your questions:
|
This would definitely be the downside. The only approach by which we can avoid it is to both keep the names (i.e. no scoped renaming) and the versions. I guess it comes down to a pro/con discussion. To bring a pro for using the same version I'd refer to the discussion in ethereumjs/ethereumjs-block#74. @alcuadrado curious to hear your thoughts as well. |
I'd like to add one more topic for discussion: what should be the ideal test structure for the monorepo? Here's a brief analysis of the current situation:
Both CI platforms that we use still don't have a tailored support for monorepos. I see some approaches: 1. Build all 43 jobs, for every commitThis is the most straightforward way to go, but it is also resource-heavy. It strictly depends on a robust parallelization plan on the CI side, otherwise, we'd have an unpleasant job queue to cope with.
2. Only trigger jobs for packages that had changes (👎)In this scenario it would detect which package got changes and run only their jobs. That can be accomplished with a single test runner that uses a combo of The big 3. Run tests by affected packageWe'd use their explicit dependency relation to determine which jobs need to run. I still don't know how elegant this solution would be on the CI side, but I'd be happy to explore this path more. How to read the image:
In any case, it seems primordial to migrate them over to Edit: Added |
Great analysis @evertonfraga, and a good point! I'd also go either with 1 or 3. But yeah as you mentioned as it stands 1 doesn't seem very feasible. I do think however it might be possible to reduce the runtime of some of the tests, making option 1 more feasible. E.g. #536 was a low-hanging fruit I found a while ago which improved runtime quite a bit. I think it'd also make things a lot simpler if the projects were using a similar toolchain (and CI). This should be a good preparatory first step. |
I like the idea of doing lerna in independent mode and discovered semantic-release. |
Ryan brought this up and so I just started thinking about further library integration here (also inspired by the visualization done above by @evertonfraga): in past discussions we had some broader consensus to not include the Another topic: without further reflection on pros and cons on this I would also have a tendency to independent mode as suggested by @ryanio. |
In a monorepo world it might make sense to include util so everything remains under the same umbrella and we receive the full benefits of monorepo-ing. Would it be too much to nest MPT under util? |
I'm really excited that this migration is happening 🤩
Renaming the packages will prevent users to get updates. If I'm using, for example, This would only happen once, so it may not be that important.
I think I'd use different versions with independent mode, as @ryanio suggested. @holgerd77's argument about unnecessary major versions and bug fixes is similar to what I just wrote about renaming the packages, but it would be much more frequent so way more important. That alone is enough to use independent mode IMO. I understand your motivation and share your concern, @s1na . Keeping separate versions will probably lead to situations like ethereumjs/ethereumjs-block#74 again. I believe the root of the problem is that npm dependencies are handled incorrectly in most of the libraries. Things like Changing the libs to peer dependencies would be a big breaking change, so I'd keep it outside of the scope of the migration. Great job @evertonfraga ! I think option 3 is the most interesting, but it may be hard to implement. @ryanio's PR using Github Actions on the |
Also, I think this is a very good way to approach this. There are lots of things to migrate and config, so doing it gradually makes sense to me. |
Ok just a summary on how the consensus seem to form on how we proceed here (correct me or continue the discussion if you have strong opinions against any of the points):
Anything I forgot (feel free to directly edit the post)? 😀 |
My assessment to Github Actions: 👍 Pros
Concerns and "non-concerns"
|
I wonder if the cache limits of github actions won't be a problem. They cache up to 2gb per repo, and |
Thanks @alcuadrado, that is indeed an important aspect to keep in mind for the evaluation. I found this thread: actions/cache#6 It has some interesting discussion, including one user seeing success with running a 700mb node_modules and the cache compressing the archive. I’ll have to give the thread another closer look in relation to our specific needs to extract more conclusions. I can try to investigate these limits further and if we may find them constraining. 2GB per repo sounds extremely low for a monorepo use case like ours, especially with [vm, account, block, blockchain, tx] all running their own checks in matrix across 4 node versions on every PR push. From the top of your head, which ci steps / tests / builds / outputs / logs do you find being on the larger side? I also found Persisting workflow data using artifacts. |
As a separate point to add to the list of considerations, how do we feel about semantic-release? I think it would be nice to have a fully automated release process. |
@ryanio Can we have this as a separate topic, eventually just open a new issue on that? This can be discussed relatively separated from this issue (and also might need a bit longer and thoughtful discussion) and there is a bit the danger to overload the discussion here. |
@holgerd77 Yes great point, have started it here: #625 |
As a hands-on way to understand the requirements to the migration, I started to make this shell script monorepify.sh. This can be useful to rehearse the migration and get valuable information ahead of time (eg: would GitHub actions work with X jobs? What about their cache limit?). Also, this automation tool could be in fact used to perform the changes we need 😛. The migration can be run for RoadmapEdit: see this project for related tasks: https://github.com/orgs/ethereumjs/projects/1 I'll use my own fork as a playground, and you can inspect it here: https://github.com/evertonfraga/ethereumjs-vm/ |
Good news: GitHub actions make it trivial to implement my proposed solution (3). We’ll have to declare some exclusion paths to each workflow file, so the irrelevant builds are not ran for each |
Nice script @evertonfraga! You may want to install lerna in one of its first steps, as it has this very handy command https://github.com/lerna/lerna/tree/master/commands/import Should we create a new repo for the monorepo? This way we can leave all the different library-specific repos as they are now and archive them. |
Thanks, @alcuadrado! I was uncertain at first to use an importer like lerna's, as it rewrites all commits as if the files were always under the new file structure. But maybe it's the best way forward. Re: new repo for it — I lean towards to keep |
On the caching question: Just had a look at the VM |
What is with open PRs on the other repos (but eventually also on the VM?) containing a significant amount of work? These should likely (optimally) be merged before a transition (hope this is realistic)? |
PRs on the other repos |
Monorepo Transition PR AnalysisSome analysis of the currently open PRs on the repos. Feel free to edit the post. I would suggest not to include the monorepo working PRs (e.g. on the GitHub actions like ethereumjs/ethereumjs-account#59) to not clutter the list too much. AccountNo significant open PRs. Block
Blockchain
Transaction
Common
VM
Summary: Ok, PR situation is actually easier to handle than I initially thought. Basically for the moment we are fine and there is some defined path to go for all open PRs. 😄 |
Our goal is to have a smooth migration, without surprises or rushed changes because something was not considered first place. My approach is mostly programmatic, as IMO doing every step manually can be error-prone. I am using a project board to keep track of all requirements to accomplish that. Please take a look and see if I am forgetting anything. https://github.com/orgs/ethereumjs/projects/1 Feel free to add new cards, and if any of those are worth discussing, feel free to convert to an issue (Click on card … > Convert to Issue). |
@evertonfraga That's really cool, thanks for setting this up! 😄 |
Hi all, I need more context regarding the use of 1.
|
I've always found the vm setup somewhat frustrating, as I tend to forget to run |
Thanks for the feedback @alcuadrado. The VM npm scripts used to let you test both the code and dist, with the I’m taking the |
Some update on Greenkeeper, this is activated for many repositories (see ethereumjs/ethereumjs-account#55 for an exemplary PR) to notify about dependency updates (just had a look at the setup, not in a consistent state though, some repos missing, some not activated): First, monorepos seems to be supported. To ease the migration I've now disabled Greenkeeper to stop opening new PRs, I will also close existing open PRs on the respective libraries. We can then re-activate once the transition is done (or eventually also considering switching to another tool, seems you also have some alternative solution here on the Grid side, don't see any pressure for a switch though in general and have no comparison overview. Very much a side topic I would say, but anyhow). |
We started using RenovateBot for Grid repositories. It had lots of granularity for configurations, but in the end I had no special need with the array of settings for it. I had planned to implement Greenkeeper in the monorepo already, since it's already used by the target repository (ethereumjs-vm) and it supports multiple I have activated Greenkeeper today in our monorepo playground: https://github.com/evertonfraga/ethereumjs-vm/ Mind that Readme linked above is not yet complete, but it's already a good opportunity for feedback. |
@evertonfraga whuah, have to admit that I didn't look at this monorepo playground yet you created. This already looks fantastic! 😄 |
"This branch is 1305 commits ahead, 31 commits behind ethereumjs:master." Lol. 🤪 |
Being in discussion for some time there is now an agreement to integrate the following libraries here by setting up a monorepo with lerna similar to the setup in the ethereumjs-config:
We'll start with integrating the
account
library due to its small size, low direct usage level and proximity to the VM logic, integration of the other libraries should be handled as a separate issue.This has to go along with an update on the ethereumjs-config library (libraries) to be able to distribute the respective configurations to a monorepo-style consuming repository. This effort shouldn't be underestimated and will likely be as much work as the direct integration here.
The text was updated successfully, but these errors were encountered: