Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S2 amd64 assembly encoder #186

Merged
merged 74 commits into from
Jan 28, 2020
Merged

S2 amd64 assembly encoder #186

merged 74 commits into from
Jan 28, 2020

Conversation

klauspost
Copy link
Owner

@klauspost klauspost commented Nov 26, 2019

Add S2 assembly encoder.

Stream compression (in memory)

file	out	level	insize	outsize	millis	mb/s
sofia-air-quality-dataset.tar	s2	1	15464463872	4991794828	1343	10978.97
nyc-taxi-data-10M.csv		s2	1	3325605752	1096053615	301	10534.31
consensus.db.10gb		s2	1	10737418240	4552306611	850	12044.36
10gb.tar			s2	1	10065157632	5945249284	1000	9596.74
rawstudio-mint14.tar		s2	1	8558382592	4443732919	778	10488.53
BenchmarkEncodeS2Block/0-html/block-32   	  333254	      3518 ns/op	29110.56 MB/s
BenchmarkEncodeS2Block/0-html/block-better-32         	  123682	      9672 ns/op	10587.23 MB/s
BenchmarkEncodeS2Block/1-urls/block-32                	   26484	     44678 ns/op	15714.22 MB/s
BenchmarkEncodeS2Block/1-urls/block-better-32         	    9730	    124899 ns/op	5621.22 MB/s
BenchmarkEncodeS2Block/2-jpg/block-32                 	 3408324	       349 ns/op	352476.62 MB/s
BenchmarkEncodeS2Block/2-jpg/block-better-32          	 1254948	       944 ns/op	130439.88 MB/s
BenchmarkEncodeS2Block/3-jpg_200b/block-32            	152312077	         7.96 ns/op	25128.37 MB/s
BenchmarkEncodeS2Block/3-jpg_200b/block-better-32     	 3714338	       325 ns/op	 614.82 MB/s
BenchmarkEncodeS2Block/4-pdf/block-32                 	 1743794	       682 ns/op	150146.63 MB/s
BenchmarkEncodeS2Block/4-pdf/block-better-32          	  100816	     11568 ns/op	8851.84 MB/s
BenchmarkEncodeS2Block/5-html4/block-32               	  244842	      4771 ns/op	85843.15 MB/s
BenchmarkEncodeS2Block/5-html4/block-better-32        	   90204	     13229 ns/op	30963.35 MB/s
BenchmarkEncodeS2Block/6-txt1/block-32                	   80518	     14981 ns/op	10151.87 MB/s
BenchmarkEncodeS2Block/6-txt1/block-better-32         	   32779	     36831 ns/op	4129.42 MB/s
BenchmarkEncodeS2Block/7-txt2/block-32                	   93001	     12164 ns/op	10291.05 MB/s
BenchmarkEncodeS2Block/7-txt2/block-better-32         	   36465	     33163 ns/op	3774.71 MB/s
BenchmarkEncodeS2Block/8-txt3/block-32                	   28362	     42390 ns/op	10067.30 MB/s
BenchmarkEncodeS2Block/8-txt3/block-better-32         	   12338	     97039 ns/op	4397.76 MB/s
BenchmarkEncodeS2Block/9-txt4/block-32                	   22722	     51459 ns/op	9363.90 MB/s
BenchmarkEncodeS2Block/9-txt4/block-better-32         	    7497	    133417 ns/op	3611.70 MB/s
BenchmarkEncodeS2Block/10-pb/block-32                 	  413708	      2795 ns/op	42430.64 MB/s
BenchmarkEncodeS2Block/10-pb/block-better-32          	  148114	      8029 ns/op	14769.22 MB/s
BenchmarkEncodeS2Block/11-gaviota/block-32            	   93728	     12614 ns/op	14612.58 MB/s
BenchmarkEncodeS2Block/11-gaviota/block-better-32     	   41656	     28574 ns/op	6450.68 MB/s
BenchmarkEncodeS2Block/12-txt1_128b/block-32          	106272855	        11.1 ns/op	11554.70 MB/s
BenchmarkEncodeS2Block/12-txt1_128b/block-better-32   	 3999106	       298 ns/op	 430.06 MB/s
BenchmarkEncodeS2Block/13-txt1_1000b/block-32         	20685124	        53.6 ns/op	18664.71 MB/s
BenchmarkEncodeS2Block/13-txt1_1000b/block-better-32  	 2209448	       548 ns/op	1824.07 MB/s
BenchmarkEncodeS2Block/14-txt1_10000b/block-32        	 2221724	       540 ns/op	18525.65 MB/s
BenchmarkEncodeS2Block/14-txt1_10000b/block-better-32 	  413696	      2819 ns/op	3547.20 MB/s
BenchmarkEncodeS2Block/15-txt1_20000b/block-32        	  922834	      1152 ns/op	17358.94 MB/s
BenchmarkEncodeS2Block/15-txt1_20000b/block-better-32 	  214236	      5486 ns/op	3645.75 MB/s

Blocks, Snappy output, compared to Snappy.

BenchmarkEncodeSnappyBlock/0-html/snappy-32  	  315715	      3827 ns/op	26756.61 MB/s
BenchmarkEncodeSnappyBlock/0-html/s2-snappy-32         	  324252	      3742 ns/op	27366.84 MB/s

BenchmarkEncodeSnappyBlock/1-urls/snappy-32            	   22939	     52542 ns/op	13362.30 MB/s
BenchmarkEncodeSnappyBlock/1-urls/s2-snappy-32         	   26839	     46473 ns/op	15107.53 MB/s

BenchmarkEncodeSnappyBlock/2-jpg/snappy-32             	 4284750	       273 ns/op	450687.89 MB/s
BenchmarkEncodeSnappyBlock/2-jpg/s2-snappy-32          	 3148904	       383 ns/op	321327.71 MB/s

BenchmarkEncodeSnappyBlock/3-jpg_200b/snappy-32        	108451161	        13.9 ns/op	14389.77 MB/s
BenchmarkEncodeSnappyBlock/3-jpg_200b/s2-snappy-32     	150685057	        12.7 ns/op	15775.01 MB/s

BenchmarkEncodeSnappyBlock/4-pdf/snappy-32             	 2672005	       453 ns/op	226076.15 MB/s
BenchmarkEncodeSnappyBlock/4-pdf/s2-snappy-32          	 1692146	       715 ns/op	143289.50 MB/s

BenchmarkEncodeSnappyBlock/5-html4/snappy-32           	   75454	     15987 ns/op	25621.10 MB/s
BenchmarkEncodeSnappyBlock/5-html4/s2-snappy-32        	  210477	      5665 ns/op	72308.77 MB/s

BenchmarkEncodeSnappyBlock/6-txt1/snappy-32            	   77401	     15287 ns/op	9948.61 MB/s
BenchmarkEncodeSnappyBlock/6-txt1/s2-snappy-32         	   79981	     14957 ns/op	10168.49 MB/s

BenchmarkEncodeSnappyBlock/7-txt2/snappy-32            	   86936	     13565 ns/op	9228.27 MB/s
BenchmarkEncodeSnappyBlock/7-txt2/s2-snappy-32         	   99151	     12307 ns/op	10171.18 MB/s

BenchmarkEncodeSnappyBlock/8-txt3/snappy-32            	   27579	     42179 ns/op	10117.64 MB/s
BenchmarkEncodeSnappyBlock/8-txt3/s2-snappy-32         	   24483	     45143 ns/op	9453.29 MB/s

BenchmarkEncodeSnappyBlock/9-txt4/snappy-32            	   20937	     56993 ns/op	8454.70 MB/s
BenchmarkEncodeSnappyBlock/9-txt4/s2-snappy-32         	   23250	     51667 ns/op	9326.19 MB/s

BenchmarkEncodeSnappyBlock/10-pb/snappy-32             	  342784	      3481 ns/op	34066.20 MB/s
BenchmarkEncodeSnappyBlock/10-pb/s2-snappy-32          	  387001	      2970 ns/op	39933.34 MB/s

BenchmarkEncodeSnappyBlock/11-gaviota/snappy-32        	  106170	     11343 ns/op	16249.90 MB/s
BenchmarkEncodeSnappyBlock/11-gaviota/s2-snappy-32     	   90888	     12623 ns/op	14602.20 MB/s

BenchmarkEncodeSnappyBlock/12-txt1_128b/snappy-32      	100000000	        11.6 ns/op	11041.53 MB/s
BenchmarkEncodeSnappyBlock/12-txt1_128b/s2-snappy-32   	206217410	         5.87 ns/op	21791.84 MB/s

BenchmarkEncodeSnappyBlock/13-txt1_1000b/snappy-32     	20334468	        61.0 ns/op	16381.88 MB/s
BenchmarkEncodeSnappyBlock/13-txt1_1000b/s2-snappy-32  	21047941	        53.2 ns/op	18788.60 MB/s

BenchmarkEncodeSnappyBlock/14-txt1_10000b/snappy-32    	 1753992	       677 ns/op	14760.93 MB/s
BenchmarkEncodeSnappyBlock/14-txt1_10000b/s2-snappy-32 	 2217615	       542 ns/op	18460.60 MB/s

BenchmarkEncodeSnappyBlock/15-txt1_20000b/snappy-32    	  749877	      1642 ns/op	12180.49 MB/s
BenchmarkEncodeSnappyBlock/15-txt1_20000b/s2-snappy-32 	  922885	      1162 ns/op	17214.15 MB/s

s2/gen.go Show resolved Hide resolved
s2/gen.go Outdated Show resolved Hide resolved
@mmcloughlin mmcloughlin mentioned this pull request Dec 17, 2019
11 tasks
@klauspost
Copy link
Owner Author

The parts are complete, but requires mmcloughlin/avo#100 to be fixed before everything can be enabled.

@klauspost klauspost marked this pull request as ready for review January 23, 2020 08:25
mmcloughlin added a commit to mmcloughlin/avo that referenced this pull request Jan 28, 2020
Adds a regression test based on klauspost/compress#186. This necessitated some related changes:

* Mark "RET" as a terminal instruction
* printer refactor to maintain compatibility with asmfmt
* Tweaks to other regression tests to ensure they are run correctly in CI

Updates #100 #65 #8
@klauspost klauspost merged commit a8f0b11 into master Jan 28, 2020
@klauspost klauspost deleted the s2-assembler-encode branch January 28, 2020 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants