Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run CT on Node20 #219

Closed
zepumph opened this issue Oct 28, 2024 · 11 comments
Closed

Run CT on Node20 #219

zepumph opened this issue Oct 28, 2024 · 11 comments
Assignees

Comments

@zepumph
Copy link
Member

zepumph commented Oct 28, 2024

Creating this issue since we kinda broke all of CT. Perhaps it is an se linux problem, and perhaps we should upgrade our Puppeteer version. I'll take a look.

CTQ is running correctly.

@zepumph zepumph self-assigned this Oct 28, 2024
zepumph added a commit to phetsims/perennial that referenced this issue Oct 28, 2024
@zepumph
Copy link
Member Author

zepumph commented Oct 28, 2024

Today's investigation about this was largely blocked by changes made from

phetsims/perennial#386

and

phetsims/chipper#1498

I am beginning to think that all the trouble we have had on main is from the registerTasks arg splitting. It is hard to test on servers, but I seem to get consistent args that look like (psuedo code) "node pm2/ProcessForwarder.js quick-server" where the ProcessForwarder knows how to splice in grunt for itself. I'll need to come back to this tomorrow.

@zepumph
Copy link
Member Author

zepumph commented Oct 30, 2024

I made some progress today. I think there is a serious concern that something in our processes are taking up way too much processing. My theories:

  1. We are spawning many more child processes for grunt now, and this is either more overhead or leaking memory in an unknown way.
  2. There is a hidden-esque error occurring internally (maybe with puppeteer??), and the error handling is sub-par, causing a memory leak or infinite loop kind of repetition.

I'll need to come back to this.

zepumph added a commit that referenced this issue Oct 31, 2024
zepumph added a commit that referenced this issue Oct 31, 2024
zepumph added a commit that referenced this issue Oct 31, 2024
@zepumph
Copy link
Member Author

zepumph commented Oct 31, 2024

Some more discussion and summary with @samreid this morning:

problems:

  1. pm2 start ctq triggers forever restart + slack notification. This may be related to trying to start it while the server is overloaded from below (2)
  2. launch 100 puppeteer/firefox clients, === soooooooooooooooooooooooooo slow.
    • "->" indicated hypothesis with listed potential investigations and solutions.
    • -> gruntSpawn sub process is slow
      • Try using sage run, untested if this fixes the slow with 100 instances
        • 4|ct-puppe | 2024-10-30T20:18:49: Aborted due to warnings.
        • 4|ct-puppe | 2024-10-30T20:18:49: 2024-10-30T20:18:49: Warning: Task "../perennial/bin/sage" not found. Use --force to continue.
    • -> old puppeteer 19 + Node 20 = timeouts and other problems
      - 90|ct-node-puppeteer-client | 2024-10-30T16:22:45: error: FAILED TO RUN TEST, Tried to run 3 times, never completed, failure: Error: TimeoutError: Timed out after 30000 ms while trying to connect to the browser! Only Chrome at revision r1056772 is guaranteed to work.
      - 90|ct-node-puppeteer-client | browserPageLoad caught unexpected error: TimeoutError: Timed out after 30000 ms while trying to connect to the browser! Only Chrome at revision r1056772 is guaranteed to work.
    • -> ct-puppeteer-client has 502 on aquaserver/next-test -- We traced back to this failure on CT-main:
      - git ls-remote fatal: unable to access 'https://github.com/phetsims/acid-base-solutions.git/': error:0A000126:SSL routines::unexpected eof while reading
      * Upgrade git? Problably not
      * Add error handling for faulty git opperations on ct-main? Probably not
    • Even if the above solutions work, should we investigate why the problem is occurring to begin with? And if we should improve on the underlying change?

zepumph added a commit that referenced this issue Oct 31, 2024
zepumph added a commit that referenced this issue Oct 31, 2024
zepumph added a commit that referenced this issue Oct 31, 2024
zepumph added a commit that referenced this issue Oct 31, 2024
zepumph added a commit that referenced this issue Oct 31, 2024
zepumph added a commit that referenced this issue Oct 31, 2024
zepumph added a commit that referenced this issue Oct 31, 2024
@zepumph
Copy link
Member Author

zepumph commented Nov 1, 2024

Ok. I'm testing without cluster mode, and I think this error may help show the memory leak from puppeteer's side. Likely we still need to do phetsims/perennial#393.

ct-chrom | 2024-10-31T17:59:03: (node:10112) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 101 SIGINT listeners added to [process]. MaxListeners is 100. Use emitter.setMaxListeners() to increase limit
2|ct-chrom | 2024-10-31T17:59:03: (node:10112) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 101 SIGTERM listeners added to [process]. MaxListeners is 100. Use emitter.setMaxListeners() to increase limit
2|ct-chrom | 2024-10-31T17:59:08: (node:10112) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 101 exit listeners added to [process]. MaxListeners is 100. Use emitter.setMaxListeners() to increase limit
2|ct-chrom | 2024-10-31T17:59:08: (node:10112) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 101 SIGHUP listeners added to [process]. MaxListeners is 100. Use emitter.setMaxListeners() to increase limit

zepumph added a commit that referenced this issue Nov 1, 2024
@zepumph
Copy link
Member Author

zepumph commented Nov 1, 2024

I believe that all the trouble we have been encountering this week was because of a memory leak in browserPageLoad(). Fixed by phetsims/perennial@f6a0b7a above. CT is working well now.

@zepumph
Copy link
Member Author

zepumph commented Nov 1, 2024

We will continue optimizing sparky tasks over in #220

@zepumph zepumph closed this as completed Nov 1, 2024
zepumph added a commit to phetsims/perennial that referenced this issue Nov 1, 2024
zepumph added a commit to phetsims/joist that referenced this issue Nov 1, 2024
zepumph added a commit that referenced this issue Nov 4, 2024
@zepumph
Copy link
Member Author

zepumph commented Nov 4, 2024

Added a TODO here to test phetsims/perennial#362

@phet-dev phet-dev reopened this Nov 5, 2024
@phet-dev
Copy link
Contributor

phet-dev commented Nov 5, 2024

Reopening because there is a TODO marked for this issue.

@zepumph
Copy link
Member Author

zepumph commented Nov 5, 2024

Re-closing for more testing.

@zepumph zepumph closed this as completed Nov 5, 2024
@phet-dev phet-dev reopened this Nov 5, 2024
@phet-dev
Copy link
Contributor

phet-dev commented Nov 5, 2024

Reopening because there is a TODO marked for this issue.

@zepumph
Copy link
Member Author

zepumph commented Nov 5, 2024

Excellent!

@zepumph zepumph closed this as completed Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants