Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLCLE instruction performance #500

Merged
merged 6 commits into from
Sep 25, 2022
Merged

Conversation

JamesWekel
Copy link
Contributor

@JamesWekel JamesWekel commented Aug 31, 2022

Fish,

"(or do you prefer to be called Jim?)" Generally Jim, but usually anything written is James.

Here is the updated CLCLE instruction performance pull request with the performance test split into a basic function test and a no-op perfromance test unless manually enabled. I've included the following comment in the CLCLE-04-perfomance.tst for emphasis.

#  ----------------------------------------------------------------------------------
#  This ONLY tests the performance of the CLCLE instruction.
#
#  The default is to NOT run performance tests. To enable
#  performance test, uncomment the "#r           21fd=ff "
#  line below.
#
#        Tests:
#              1. CLCLE of 512 bytes
#              2. CLCLE of 512 bytes where operand 1 crosses a page boundary
#              3. CLCLE of 2048 bytes
#              4. CLCLE of 2048 bytes where operand 1 crosses a page boundary
#              5. CLCLE of 2048 bytes where both operand 1 and operand 2
#                                     crosses a page boundary
#
#        Output:
#               For each test, a console line will the generated with timing results,
#               as follows:
#               /         1,000,000 iterations of CLCLE took      38,698 microseconds
#               /         1,000,000 iterations of CLCLE took      48,617 microseconds
#               /         1,000,000 iterations of CLCLE took      49,178 microseconds
#               /         1,000,000 iterations of CLCLE took      68,355 microseconds
#               /         1,000,000 iterations of CLCLE took      69,991 microseconds
#  ----------------------------------------------------------------------------------

The CLCLE-04performance.tst does 5 tests. On my system, the performance results
before the change were:

/         1,000,000 iterations of CLCLE took   9,052,879 microseconds
/         1,000,000 iterations of CLCLE took   9,096,818 microseconds
/         1,000,000 iterations of CLCLE took  35,928,188 microseconds
/         1,000,000 iterations of CLCLE took  35,424,265 microseconds
/         1,000,000 iterations of CLCLE took  36,205,255 microseconds

and after the change:

/         1,000,000 iterations of CLCLE took      41,159 microseconds
/         1,000,000 iterations of CLCLE took      49,770 microseconds
/         1,000,000 iterations of CLCLE took      51,996 microseconds
/         1,000,000 iterations of CLCLE took      69,810 microseconds
/         1,000,000 iterations of CLCLE took      74,077 microseconds

There are 5 tests as I was curious on the impact of operands crossing page boundaries with the revised instruction. But a 99% reduction from the original timing still works for me. And, the reduction is all from your mem_cmp function!

Again, no rush in the review.

Jim

Copy link
Member

@Fish-Git Fish-Git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Remove the following unneeded statements in general1.c:

https://github.com/JamesWekel/hyperion/blob/710824b05b06d041e2f011fd53aae83fe580c0ce/general1.c#L3563-L3565

  1. CLCLE-03-basic.asm: bad (incorrect) TITLE directive: "CLCE-03-performance" should be something like "CLCLE-03-basic" instead.

  2. There appear to be several places in your CLCLE-01,02,03 .asm files where the comments still mention "CLCL" instead of "CLCLE".

  3. Trailing blanks (.asm, general1.c)

  4. Again, PLEASE LOSE the ASMA -d option!!

  ADDR      POS                                 OBJECT CONTENT                                         CHARACTER CONTENT

Region: CODE

00000000   00000   00080000 00000200 00000000 00000000  00000000 00000000 00000000 00000000   |................ ................|
00000020   00020   00000000 00000000 00000000 00000000  00000000 00000000 00000000 00000000   |................ ................|
00000040   00040   00000000 00000000 00000000 00000000  00000000 00000000 000A0000 00000018   |................ ................|
00000060   00060   000A0000 00000020 000A0000 00000028  000A0000 00000030 000A0000 00000038   |................ ................|
00000080   00080   00000000 00000000 00000000 00000000  00000000 00000000 00000000 00000000   |................ ................|
000000A0   000A0   00000000 00000000 00000000 00000000  00000000 00000000 00000000 00000000   |................ ................|
000000C0   000C0   00000000 00000000 00000000 00000000  00000000 00000000 00000000 00000000   |................ ................|
000000E0   000E0   00000000 00000000 00000000 00000000  00000000 00000000 00000000 00000000   |................ ................|
00000100   00100   00000000 00000000 00000000 00000000  00000000 00000000 00000000 00000000   |................ ................|
00000120   00120   00000000 00000000 00000000 00000000  00000000 00000000 00000000 00000000   |................ ................|
00000140   00140   00000000 00000000 00000000 00000000  00000000 00000000 00000000 00000000   |................ ................|
00000160   00160   00000000 00000000 00000000 00000000  00000000 00000000 00000000 00000000   |................ ................|
00000180   00180   00000000 00000000 00000000 00000000  00000000 00000000 00000000 00000000   |................ ................|
000001A0   001A0   00000000 00000000 00000000 00000000  00000000 00000000 00000000 00000000   |................ ................|
000001C0   001C0   00000000 00000000 00000000 00000000  00000000 00000000 00000000 00000000   |................ ................|
000001E0   001E0   00000000 00000000 00000000 00000000  00000000 00000000 00000000 00000000   |................ ................|
00000200   00200   05C006C0 06C041D0 C80041D0 D8009849  C0901598 47B0C01A 18891804 18251818   |........H...Q.q. ...q.....i......|
00000220   00220   18380E02 18061827 18181838 0E021804  18E61818 18F8A90E 00004710 C0364780   |................ .W...8z.........|
00000240   00240   C05C18A0 D500A000 E0004780 C0804A00  C0A84AE0 C0A80610 46F0C036 1E581E78   |.*..N.........¢. .y¢..y...0......|
00000260   00260   1B984780 C0704720 C01A1089 47F0C01A  8200C078 00000000 000A0000 00000000   |.q.........i.0.. b...............|
00000280   00280   8200C088 00000000 000A0000 00010BAD  00020320 00060000 00040A68 00080000   |b..h............ ................|

 
Results of my own performance runs:

  • Old:
    1,000,000 iterations of CLCLE took  24,387,750 microseconds
    1,000,000 iterations of CLCLE took  24,611,478 microseconds
    1,000,000 iterations of CLCLE took  96,883,210 microseconds
    1,000,000 iterations of CLCLE took  96,914,502 microseconds
    1,000,000 iterations of CLCLE took  97,088,257 microseconds
  • New:
    1,000,000 iterations of CLCLE took     118,936 microseconds
    1,000,000 iterations of CLCLE took     151,260 microseconds
    1,000,000 iterations of CLCLE took     241,981 microseconds
    1,000,000 iterations of CLCLE took     270,545 microseconds
    1,000,000 iterations of CLCLE took     286,604 microseconds

Overall, WELL DONE!

Make the above extremely minor changes and I will be very happy to accept your Pull request!

Thanks, James!   :)
 

@JamesWekel
Copy link
Contributor Author

Thank you for the review and the list of requested changes. Every interaction is a learning opportunity e.g. there is an option in VS Code for "Trim Trailing Whitespace" that I have now enabled.

The changes should be made early next week.

Jim

Copy link
Member

@Fish-Git Fish-Git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close enough!

@Fish-Git Fish-Git merged commit 9074326 into SDL-Hercules-390:develop Sep 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants