Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark M4 Mac mini #47

Closed
geerlingguy opened this issue Nov 12, 2024 · 8 comments
Closed

Benchmark M4 Mac mini #47

geerlingguy opened this issue Nov 12, 2024 · 8 comments

Comments

@geerlingguy
Copy link
Owner

I have an M4 Mac mini with 10 CPU cores and 32 GB of RAM. Would be nice to see the results and efficiency.

@geerlingguy
Copy link
Owner Author

geerlingguy commented Nov 12, 2024

283.02 Gflops at 42W, for 6.74 Gflops/W (!)

================================================================================
HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   50004
NB     :     256
PMAP   : Row-major process mapping
P      :       2
Q      :       5
PFACT  :   Right
NBMIN  :       4
NDIV   :       2
RFACT  :   Crout
BCAST  :  1ringM
DEPTH  :       1
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR11C2R4       50004   256     2     5             294.53             2.8302e+02
HPL_pdgesv() start time Tue Nov 12 19:23:38 2024

HPL_pdgesv() end time   Tue Nov 12 19:28:32 2024

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   3.30442718e-03 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================

This was run inside a Docker container, with Docker's Preferences set to pass through all 10 CPU cores, and all 32 GB of RAM:

docker run --name top500 -it -v $PWD:/code geerlingguy/docker-ubuntu2404-ansible:latest bash
cd /code
ansible-playbook main.yml --tags "setup,benchmark"

@geerlingguy
Copy link
Owner Author

Power draw graph:

Screenshot 2024-11-12 at 1 37 40 PM

@geerlingguy geerlingguy reopened this Nov 13, 2024
@geerlingguy
Copy link
Owner Author

Paul Haddad on Mastodon mentions:

did you check idle power without ethernet connected, I found it to make a big difference.

I should do this for both idle and another top500 run, just to confirm.

@geerlingguy
Copy link
Owner Author

Indeed, getting idle power consumption between 4.2-4.6W if I disconnect 10 GbE and just use WiFi 6.

@geerlingguy
Copy link
Owner Author

geerlingguy commented Nov 20, 2024

299.60 Gflops at 39.7W for 7.55 Gflops/W

================================================================================
HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   50004
NB     :     256
PMAP   : Row-major process mapping
P      :       2
Q      :       5
PFACT  :   Right
NBMIN  :       4
NDIV   :       2
RFACT  :   Crout
BCAST  :  1ringM
DEPTH  :       1
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR11C2R4       50004   256     2     5             278.23             2.9960e+02
HPL_pdgesv() start time Wed Nov 20 03:58:05 2024

HPL_pdgesv() end time   Wed Nov 20 04:02:44 2024

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   3.30442718e-03 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================

I'm going to test one more time with fan on auto (I had ramped it to 3500 rpm to keep the CPU temp down to 95°C instead of a constant 105°C, which seemed to result in throttling — see the first run here—the second 'plateau' is the run with the fan set to 3500 rpm via Macs Fan Control, showing the power could stay ramped to 100% the whole time.

Screenshot 2024-11-19 at 10 09 22 PM

@geerlingguy
Copy link
Owner Author

Temperatures hovered between 100-105°C, but with my desk open to the air and the M4 hanging out over the front so it's pulling fresh air with no chance of recirculation, it snuck a tiny bit more efficiency:

299.93 Gflops at 39.6W for 7.57 Gflops/W

================================================================================
HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   50004
NB     :     256
PMAP   : Row-major process mapping
P      :       2
Q      :       5
PFACT  :   Right
NBMIN  :       4
NDIV   :       2
RFACT  :   Crout
BCAST  :  1ringM
DEPTH  :       1
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR11C2R4       50004   256     2     5             277.92             2.9993e+02
HPL_pdgesv() start time Wed Nov 20 04:09:59 2024

HPL_pdgesv() end time   Wed Nov 20 04:14:37 2024

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   3.30442718e-03 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================

@geerlingguy
Copy link
Owner Author

hpl efficiency m4 mac mini

@ShamoX
Copy link

ShamoX commented Dec 5, 2024

Interesting, I made the test in native macOS, but I tested several set of parameters and got on my Laptop M2 the following results:

rlaures@LiantOrdialpha-001 hpl % mpirun -n 8 testing/xhpl
================================================================================
HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   20000 
NB     :     256 
PMAP   : Row-major process mapping
P      :       1        2        2        1 
Q      :       4        2        4        8 
PFACT  :   Right 
NBMIN  :       4 
NDIV   :       2 
RFACT  :   Crout 
BCAST  :  1ringM 
DEPTH  :       1 
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR11C2R4       20000   256     1     4              23.27             2.2921e+02
HPL_pdgesv() start time Fri Nov 29 16:52:59 2024

HPL_pdgesv() end time   Fri Nov 29 16:53:22 2024

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   3.22060915e-03 ...... PASSED
================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR11C2R4       20000   256     2     2              25.92             2.0581e+02
HPL_pdgesv() start time Fri Nov 29 16:53:40 2024

HPL_pdgesv() end time   Fri Nov 29 16:54:05 2024

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   4.41717176e-03 ...... PASSED
================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR11C2R4       20000   256     2     4              25.47             2.0944e+02
HPL_pdgesv() start time Fri Nov 29 16:54:20 2024

HPL_pdgesv() end time   Fri Nov 29 16:54:46 2024

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   3.39499366e-03 ...... PASSED
================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR11C2R4       20000   256     1     8              18.56             2.8732e+02
HPL_pdgesv() start time Fri Nov 29 16:54:55 2024

HPL_pdgesv() end time   Fri Nov 29 16:55:14 2024

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   2.96557012e-03 ...... PASSED
================================================================================

I got a 287 Gflops. With the cluster configuration, P=1 and Q=8. I didn't check the configuration but I could if you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants