NextLibrary - Z80 Assembly Utilities Library for Spectrum and Next

A high-performance, utility library for Z80 assembly development on the ZX Spectrum and ZX Spectrum Next platforms. The choice is yours, you can use device independent routines or limit yourself to platform specific routines for a single target architecture.

NextLibrary provides world-class mathematical operations, random number generation, screen management, DMA operations, and utility functions optimized for retro game development and system programming.

It offers hardware independent routines that work on both Spectrum and Spectrum Next hardware. It also provides equivalent and optimised Next only versions making use of Z80N Next only extended op codes and DMA for best performance.

T-State tables in this document also allow for easy performance and requirement assessment by the developer.

Release History

v1.10 - Plus 3 Hardware Double Buffering

Key improvements:

Plus 3 Double Buffering: Complete hardware double buffering implementation for Spectrum +3 and Next
Hardware Bank Switching: True hardware double buffering using +3 memory banking (99.93% faster than software copying)
Smart Buffer Management: Intelligent off-screen buffer setup with automatic bank switching at $C000
Frame-Synchronized Swapping: HALT-synchronized buffer swaps to eliminate screen tearing
State Synchronization: Robust tracking of hardware banking state with automatic correction
Optimized Performance: Buffer swaps in ~75-95 T-states vs 145,152 T-states for memory copying
Complete Double Buffering Suite: Software copying (all platforms) + hardware switching (+3/Next)
Developer-Friendly API: Simple setup, set, and toggle operations with consistent addressing

Plus 3 double buffering performance achievements:

Setup: ~200 T-states + screen clearing (one-time initialization)
Buffer Set: ~35-55 T-states (smart banking, only when needed)
Buffer Toggle: ~75-95 T-states (instant display swap with HALT synchronization)
Performance vs Software: 99.93% faster than traditional screen copying methods
Frame Rate: 1 frame per vsync due to use of HALT, so we get 50FPS (PAL 50Hz) or 60FPS (NTSC 60Hz)

Architecture completion:

Universal Double Buffering: Software methods (48K/128K/+2/+3/Next) + hardware methods (+3/Next)
Comprehensive Graphics Pipeline: Traditional screens + Layer 2 + hardware double buffering
Platform Optimization: Optimal methods for every Spectrum platform and capability
Complete Screen Operations: Clearing, copying, and double buffering with maximum performance

v1.9 - Layer 2 Screen Copying and Advanced Graphics Pipeline

Key improvements:

Layer 2 Screen Copying: Complete Layer 2 copying suite with 8 performance levels from LDIRX to DMA burst acceleration
Multi-Resolution Layer 2 Support: Full support for all Layer 2 modes (256×192, 320×256, 640×256) with optimized copying routines
Layer 2 Manual Copying: Direct address and resolution specification for maximum control
- LDIRX methods: 78,663 to 262,224 T-states depending on resolution
- DMA methods: 260 to 600 T-states for hardware-accelerated copying
Layer 2 Auto-Detection Copying: Intelligent Layer 2 configuration detection with optimal method selection
- Auto LDIRX: Automatic resolution detection with Z80N optimization
- Auto DMA: Automatic detection with DMA burst for maximum performance
Advanced Graphics Pipeline: Foundation for sophisticated Layer 2 graphics operations and double-buffering
Unified Copy API: Single interface supporting traditional screens and all Layer 2 modes seamlessly
Performance Optimization: Layer 2 256×192 mode 46% faster than traditional screen copying with LDIRX

Layer 2 copying performance achievements:

Layer 2 256×192 LDIRX: 44.4 FPS (46% faster than traditional screen)
Layer 2 320×256 LDIRX: 26.7 FPS (enhanced resolution with good performance)
Layer 2 640×256 LDIRX: 13.3 FPS (maximum resolution with acceptable performance)
Layer 2 DMA Methods: 5,800+ to 13,500+ FPS (99.5%+ faster than traditional methods)
Cross-Resolution Support: Same API works across all Layer 2 modes with automatic optimization

Notable pending features:

Hardware Double Buffer Swapping: Layer 2 bank switching for instant buffer swaps (SCREEN_COPY_LAYER2_DOUBLE_BUFFER_BANK)
Plus 3 Double Buffering: Spectrum Plus 3 specific double buffering optimization (SCREEN_COPY_DOUBLE_BUFFER_PLUS_3)

Architecture enhancements:

Layer 2 Memory Management: Intelligent handling of large Layer 2 buffers with multi-bank DMA operations
Resolution-Aware Operations: Automatic memory size calculation based on Layer 2 mode detection
Hardware Acceleration: DMA burst modes optimized for different Layer 2 resolutions
Performance Scaling: Excellent performance across all Next CPU speeds (3.5MHz to 28MHz)

v1.8 - Advanced Screen Copying and DMA Memory Operations

Key improvements:

Advanced Screen Copying: Complete screen copying suite with 9 performance levels from basic LDIR to DMA acceleration
Multi-Platform Screen Copy: Unified API supporting all Spectrum platforms (48K/128K/+2/+3/Next)
Software Optimization Mastery: Progressive stack-based optimizations (1PUSH through ALLPUSH) achieving up to 27.9 FPS on standard Z80
Z80N Screen Copy Acceleration: LDIRX optimization providing 31.6 FPS theoretical maximum (31% faster than LDIR)
DMA Screen Copy Revolution: DMA-based copying achieving 12,500+ FPS theoretical maximum (99.8% faster than LDIR)
Flexible Screen Addressing: Support for custom source and destination addresses enabling advanced software double-buffering
DMA Memory Copy Utilities: General-purpose memory copying routines with standard and burst modes
Frame Rate Analysis: Complete performance analysis across all Next CPU speeds (3.5MHz to 28MHz)
Optimal Algorithm Selection: Intelligent performance level selection based on available hardware capabilities

Screen copying performance achievements:

Standard Z80 Peak: ALLPUSH method achieving 27.9 FPS (14% faster than LDIR)
Z80N Enhancement: LDIRX achieving 31.6 FPS (24% faster than LDIR)
DMA Acceleration: DMA_BURST achieving 12,500+ FPS (99.8% faster than LDIR)

v1.7 - Layer 2 Graphics Support and Enhanced Modularity

Key improvements:

62 Test Cases: Expanded test suite from 59 to 62 comprehensive test cases including Layer 2 validation
Layer 2 Graphics Support: Complete Layer 2 utility functions for Next hardware graphics programming
Layer 2 Screen Clearing: Ultra-fast Layer 2 clearing using DMA with support for all resolutions (256×192, 320×256, 640×256)
Layer 2 Detection: Hardware detection for active Layer 2 with resolution and address retrieval
Enhanced Modularity: Split constants and variables into separate domain-specific files for easier partial library adoption
Improved Developer Experience: Cleaner file organization allowing developers to include only needed components
Layer 2 Information Retrieval: Complete Layer 2 configuration detection including resolution, color depth, and memory requirements
Extended Graphics Pipeline: Foundation for advanced Layer 2 graphics operations and double-buffering

Layer 2 performance improvements:

Layer 2 DMA Clearing: Up to 99.8% faster Layer 2 screen clearing using DMA burst mode
Resolution Detection: Fast Layer 2 configuration retrieval (83-157 T-states)
Memory Efficient: Optimized Layer 2 operations with minimal CPU overhead
Hardware Adaptive: Automatic fallback to standard operations if Layer 2 unavailable

File structure improvements:

Modular Constants: Separate files for Maths, Random, Display, and DMA constants
Organized Variables: Domain-specific variable files for cleaner partial library usage
Developer Friendly: Extract only needed components without dependency overhead

v1.6 - Enhanced Screen Management and DMA Support

Key improvements:

59 Test Cases: Expanded test suite from 57 to 59 comprehensive test cases including DMA validation
Flexible Screen Addressing: All screen clearing routines now accept HL parameter for custom screen locations
In-Memory Screen Support: Full support for off-screen rendering and secondary screen buffers
DMA Screen Clearing: Ultra-fast screen clearing using Spectrum Next DMA controller (99% faster)
Z80N Enhanced Screen Operations: LDIRX optimization for 33% faster screen clearing on Next hardware
Hardware Detection: Automatic detection of Z80N and DMA capabilities with graceful fallbacks
Memory Fill Operations: Complete DMA memory fill routines with burst mode support
Utility Functions: Enhanced utility library with Z80N detection and DMA availability checking
Performance Optimization: Comprehensive T-States analysis and optimization across all screen operations

Screen clearing performance improvements:

Standard Z80: Up to 74% faster with stack-based optimizations
Z80N LDIRX: Additional 33% improvement using extended opcodes
DMA Fill: 99.7% faster pixel clearing, 99.2% faster attribute setting
DMA Burst: 99.8% faster with maximum hardware acceleration

v1.5 - Enhanced Test Framework and Maintainability

Key improvements:

Streamlined Test Framework: Replaced repetitive test execution code with elegant table-driven loop system
Simplified Test Management: Adding new tests now requires only 3 simple steps: create TestCase0nn routine, add to table, update counter
Reduced Code Complexity: Test execution code reduced from ~240 lines to ~80 lines (70% reduction)
Improved Maintainability: Single point of control for test count and execution flow
Enhanced Reliability: Proper stack management and flag preservation throughout test execution
Developer Experience: Much easier to add, modify, or debug individual test cases
Code Quality: Eliminated copy-paste errors and inconsistencies in test execution flow

v1.4 - Enhanced 16-bit Random Operations with Z80N Support

Key improvements:

57 Test Cases: Expanded test suite from 53 to 57 comprehensive test cases
16-bit Random Algorithms: Complete suite of 16-bit random number generators (LFSR, XOR Shift, Middle Square) with both standard Z80 and Z80N optimized versions
Z80N 16-bit Random Performance: 33-38% faster 16-bit random generation using MUL instruction
Hardware-Accelerated 16-bit Random: Single-cycle multiplication for enhanced Middle Square, optimized bit operations for LFSR/XOR Shift
Seed Compatibility: Z80N 16-bit random versions maintain identical output sequences to standard algorithms
Complete T-state Analysis: Accurate performance benchmarks for all random number generation routines

v1.3 - Enhanced Random 8-bit Operations with Z80N Support

Key improvements:

53 Test Cases: Expanded test suite from 43 to 53 comprehensive test cases
8-bit Random Algorithms: Four standard Z80 algorithms (LCG, LFSR, XOR Shift, Middle Square) plus four Z80N optimized versions
Z80N 8-bit Random Performance: 20-47% faster random generation using MUL instruction
Hardware-Accelerated Random: Single-cycle multiplication for LCG, optimized bit operations for LFSR/XorShift/MiddleSquare
Seed Compatibility: Z80N random versions maintain identical output sequences to standard algorithms

v1.2 - Enhanced Division Operations with Z80N Support

Key improvements:

50 Test Cases: Expanded test suite from 43 to 50 comprehensive test cases
Enhanced 8÷8 Division: Three Z80N options (COMPACT hybrid, BALANCED 8-bit reciprocal, MAXIMUM 16-bit reciprocal)
Enhanced 16÷8 Division: Three Z80N options with hybrid algorithms and high-precision reciprocal methods
Accuracy Validation: All algorithms pass comprehensive test validation ensuring mathematical correctness
Performance Optimization: Up to 95% faster division on Spectrum Next hardware
Algorithm Selection: Intelligent hybrid approaches combining traditional and reciprocal methods for optimal speed/precision balance

Three primary Z80N division approaches are provided:

Hybrid routines: Combination of MUL DE and traditional subtraction for optimal convergence
8-bit Reciprocal methods: Pre-computed reciprocal tables using Z80N MUL for maximum speed with validated precision
16-bit Reciprocal methods: Pre-computed 16-bit reciprocal tables providing the highest precision over 8-bit reciprocals

Target Platforms

The following platforms are targetted. The main entry points and individual functionality are tagged with @COMPAT: 48K,128K,+2,+3,NEXT - where the list shown is the known compatability of the routine. I also detail below the main differences of the platforms which will give rise to the compatability of the routines. This means NEXT only routines will be tagged with just @COMPAT: NEXT, and if Z80N op codes are present, then @Z80N: MUL DE, ... will be documented as to which extended op codes are being used.

ZX SPECTRUM 48K:

CPU: Z80 @ 3.5MHz
Memory: 48KB RAM
Features: Basic ULA, beeper sound
Limitations: No extra RAM banks, no AY sound

ZX SPECTRUM 128K:

CPU: Z80 @ 3.5MHz
Memory: 128KB RAM (banked)
Features: AY-3-8912 sound chip, extra RAM banks
New: Memory paging, enhanced sound

ZX SPECTRUM +2:

CPU: Z80 @ 3.5MHz
Memory: 128KB RAM (banked)
Features: Built-in tape deck, AY sound
Differences: Different ROM, tape interface

ZX SPECTRUM +3:

CPU: Z80 @ 3.5MHz
Memory: 128KB RAM (banked)
Features: Built-in disk drive, +2A/+3 ROM
New: Disk interface, different memory map

ZX SPECTRUM NEXT:

CPU: Z80N @ 3.5/7/14/28MHz
Memory: 1MB+ RAM, advanced banking
Features: Enhanced graphics, sprites, DMA
New: Z80N extended opcodes, copper, DMA controller

🚀 Current Features

📊 Mathematical Operations - 12 optimized algorithms

8×8 Unsigned Multiplication: Six performance levels (10-160 T-states)
- Standard Z80: COMPACT, BALANCED, MAXIMUM (35-160 T-states)
- Next Z80N: NEXT_COMPACT, NEXT_BALANCED, NEXT_MAXIMUM (10-29 T-states)
16×8 Unsigned Multiplication: Six performance levels (45-380 T-states)
- Standard Z80: COMPACT, BALANCED, MAXIMUM (45-380 T-states)
- Next Z80N: NEXT_COMPACT, NEXT_BALANCED, NEXT_MAXIMUM (97 T-states)
8÷8 Unsigned Division: Six performance levels (25-1975 T-states)
- Standard Z80: COMPACT, BALANCED, MAXIMUM (25-1975 T-states)
- Next Z80N: NEXT_COMPACT (40-400 T-states), NEXT_BALANCED (~175 T-states), NEXT_MAXIMUM (~218 T-states)
16÷8 Unsigned Division: High-precision 16-bit division with Z80N support
- Standard Z80: COMPACT, BALANCED, MAXIMUM (45-1300 T-states)
- Next Z80N: NEXT_COMPACT/BALANCED/MAXIMUM (107-520 T-states)

🎲 Random Number Generation - 16 algorithms with Z80N acceleration

8-bit Random: Eight algorithms (Standard Z80 + Z80N optimized versions)
- Standard Z80: LCG (45-55 T-states), LFSR (85-95 T-states), XorShift (35-45 T-states), Middle Square (115-150 T-states)
- Next Z80N: 20-47% faster with hardware MUL instruction
16-bit Random: Six algorithms (Standard Z80 + Z80N optimized versions)
- Standard Z80: LCG (85-95 T-states), LFSR (68 T-states), XorShift (55 T-states), Middle Square (78 T-states)
- Next Z80N: 30-38% faster with hardware acceleration

🧹 Screen Clearing and Memory Fill - 17 performance levels

Traditional ZX Spectrum Screen: 9 performance levels from LDIR to DMA
- Cross-platform compatibility (48K, 128K, +2, +3, Next)
- Performance range: 149,504 T-states (LDIR) to 235 T-states (DMA)
- Frame rates: 23.4 FPS (LDIR) to 14,800+ FPS (DMA) at 3.5MHz
Layer 2 Screen Clearing: Next-only with manual and automatic modes
- Support for all resolutions: 256×192, 320×256, 640×256
- Manual address specification or automatic detection
- DMA acceleration for maximum performance

🖥️ Screen Copying and Memory Transfer - 17 performance levels

Unified Screen Copying: Complete screen copying suite with 17 performance levels from LDIR to DMA burst acceleration
- Cross-platform compatibility: Options available for all Spectrum variants (48K, 128K, +2, +3, Next)
- Flexible addressing: Support for custom source and destination screen buffers
- Automatic optimization: Hardware detection with optimal performance level selection
Traditional Screen Copying: 9 performance levels for standard ZX Spectrum screens
- SCREEN_COPY_COMPACT: Standard LDIR operation (145,152 T-states full copy)
- SCREEN_COPY_1PUSH to SCREEN_COPY_ALLPUSH: Stack optimizations (173,278 to 124,908 T-states)
- SCREEN_COPY_Z80N_COMPACT: Z80N LDIRX optimization (110,612 T-states, 24% faster)
- SCREEN_COPY_DMA_FILL: DMA memory transfer (300 T-states, 99.8% faster)
- SCREEN_COPY_DMA_BURST: DMA burst mode (270 T-states, 99.8% faster)
Layer 2 Screen Copying: 8 Next-only performance levels for enhanced graphics
- Manual Layer 2 Methods: Direct address and resolution specification
  - LDIRX methods: 78,663 to 262,224 T-states depending on resolution
  - DMA methods: 260 to 600 T-states for hardware-accelerated copying
- Auto Layer 2 Detection: Intelligent configuration detection with optimal method selection
  - Auto LDIRX: Automatic resolution detection with Z80N optimization
  - Auto DMA: Automatic detection with DMA burst for maximum performance
Frame Rate Capabilities: Revolutionary performance across Next CPU speeds
- Traditional Screen: 24.1 FPS (LDIR) to 12,500+ FPS (DMA) at 3.5MHz
- Layer 2 256×192: 44.4 FPS (LDIRX) to 13,500+ FPS (DMA) at 3.5MHz
- Layer 2 320×256: 26.7 FPS (LDIRX) to 10,000+ FPS (DMA) at 3.5MHz
- Layer 2 640×256: 13.3 FPS (LDIRX) to 5,800+ FPS (DMA) at 3.5MHz
- 28MHz scaling: Up to 100,000+ FPS (traditional) / 108,000+ FPS (Layer 2) maximum
Advanced Features: Double-buffering, off-screen composition, memory-to-memory transfers, multi-resolution support

🎨 Layer 2 Graphics - Complete Next graphics support

Layer 2 Detection: Hardware detection and configuration retrieval
Resolution Support: All Layer 2 modes (256×192, 320×256, 640×256)
Memory Management: Automatic address calculation and bank management

⚡ DMA Support - 4 hardware-accelerated operations

Memory Operations: Fill, copy, and burst modes
Hardware Detection: Automatic fallback if DMA unavailable
Performance: ~235-300 T-states CPU overhead + parallel hardware transfer

⌨️ Input and Keyboard Utilities - Cross-platform input handling

Keyboard Scanning: Comprehensive keyboard input detection across all Spectrum variants
Player Interaction: Wait for player input with timeout and validation options
Cross-Platform Input: Unified input handling for 48K, 128K, +2, +3, and Next
Performance Optimized: Fast input scanning with minimal CPU overhead

📝 Text and Font System - Embedded font with rendering utilities

Text Rendering: Advanced text display utilities for all screen modes
Embedded Font: Built-in font system for consistent text across platforms
String Utilities: Text manipulation and display positioning functions
Cross-Platform Text: Unified text rendering across all Spectrum variants

🏆 Scoring and Data Management - Score conversion and display

Score Conversion: 16-bit score to display string conversion utilities
Display Integration: Seamless integration with text rendering system
Performance Optimized: Fast score display for real-time games
Format Control: Flexible score formatting and padding options

🔧 Utility Functions - Hardware detection and memory management

Hardware Detection: Z80N processor and DMA controller detection
Memory Operations: Efficient memory management utilities

Screen Clearing T-States Performance

Traditional ZX Spectrum Screen (32x24 character, 256x192 pixel)

Performance Level	Full Clear	Pixel Clear	Attr Clear	Platform	Frame Rate (3.5MHz)
SCREEN_COMPACT	149,504 T	132,608 T	16,896 T	All	23.4 FPS
SCREEN_1PUSH	177,152 T	157,696 T	19,456 T	All	19.7 FPS
SCREEN_2PUSH	152,832 T	135,936 T	16,896 T	All	22.9 FPS
SCREEN_4PUSH	140,672 T	125,056 T	15,616 T	All	24.9 FPS
SCREEN_8PUSH	134,592 T	119,616 T	15,056 T	All	26.0 FPS
SCREEN_ALLPUSH	128,512 T	114,176 T	14,336 T	All	27.1 FPS
SCREEN_Z80N_COMPACT	114,176 T	101,376 T	12,800 T	Next	30.6 FPS
SCREEN_DMA_FILL	240 T	240 T	240 T	Next	14,500+ FPS
SCREEN_DMA_BURST	235 T	235 T	235 T	Next	14,800+ FPS

Spectrum Next Layer 2 Screen Clearing (Next Only)

Performance Level	256x192 Clear	320x256 Clear	640x256 Clear	Frame Rate (3.5MHz)
SCREEN_LAYER2_MANUAL_256by192	~205,000 T	N/A	N/A	17.1 FPS
SCREEN_LAYER2_MANUAL_320by256	N/A	~350,000 T	N/A	10.0 FPS
SCREEN_LAYER2_MANUAL_640by256	N/A	N/A	~700,000 T	5.0 FPS
SCREEN_LAYER2_MANUAL_DMA_256by192	~280 T	N/A	N/A	12,500+ FPS
SCREEN_LAYER2_MANUAL_DMA_320by256	N/A	~320 T	N/A	10,900+ FPS
SCREEN_LAYER2_MANUAL_DMA_640by256	N/A	N/A	~400 T	8,700+ FPS
SCREEN_LAYER2_AUTO_ACTIVE	~205,000 T	~350,000 T	~700,000 T	Variable
SCREEN_LAYER2_AUTO_DMA	~280 T	~320 T	~400 T	8,700+ FPS

Layer 2 Memory Requirements

Layer 2 Mode	Resolution	Memory Size	Bytes to Clear	DMA Operations
256x192	256×192×8bit	48KB	49,152 bytes	Single operation
320x256	320×256×8bit	80KB	81,920 bytes	Two 40KB operations
640x256	640×256×8bit	160KB	163,840 bytes	Four 40KB operations

Frame Rate Capabilities Across Next CPU Speeds

Traditional Screen Clearing (Fastest: DMA_BURST)

3.5MHz: 14,800+ FPS maximum (235 T-states per clear)
7MHz: 29,600+ FPS maximum (117 T-states effective)
14MHz: 59,200+ FPS maximum (58 T-states effective)
28MHz: 118,400+ FPS maximum (29 T-states effective)

Layer 2 Screen Clearing (Fastest: LAYER2_AUTO_DMA)

3.5MHz: 8,700+ FPS maximum (400 T-states for 640x256)
7MHz: 17,400+ FPS maximum (200 T-states effective)
14MHz: 34,800+ FPS maximum (100 T-states effective)
28MHz: 69,600+ FPS maximum (50 T-states effective)

🖥️ Screen Copying and Memory Transfer

Unified Screen Copying: Complete screen copying suite with 9 performance levels
- Cross-platform compatibility: Works on all Spectrum variants (48K, 128K, +2, +3, Next)
- Flexible addressing: Support for custom source and destination screen buffers
- Automatic optimization: Hardware detection with optimal performance level selection
Performance Progression: From basic LDIR to ultimate DMA acceleration
- SCREEN_COPY_COMPACT: Standard LDIR operation (145,152 T-states full copy)
- SCREEN_COPY_1PUSH to SCREEN_COPY_ALLPUSH: Stack optimizations (173,278 to 124,908 T-states)
- SCREEN_COPY_Z80N_COMPACT: Z80N LDIRX optimization (110,612 T-states, 24% faster)
- SCREEN_COPY_DMA_FILL: DMA memory transfer (300 T-states, 99.8% faster)
- SCREEN_COPY_DMA_BURST: DMA burst mode (270 T-states, 99.8% faster)
Frame Rate Capabilities: Revolutionary performance across Next CPU speeds
- 3.5MHz: Up to 27.9 FPS (ALLPUSH) / 31.6 FPS (Z80N) / 12,500+ FPS (DMA)
- 28MHz: Up to 224 FPS (ALLPUSH) / 253 FPS (Z80N) / 100,000+ FPS (DMA)
Advanced Features: Double-buffering, off-screen composition, memory-to-memory transfers

Screen Copying T-States Performance

Performance Level	Full Copy	Pixel Copy	Attr Copy	Platform	Frame Rate (3.5MHz)
SCREEN_COPY_COMPACT	145,152 T	129,024 T	16,128 T	All	24.1 FPS
SCREEN_COPY_1PUSH	173,278 T	154,036 T	19,342 T	All	20.2 FPS
SCREEN_COPY_2PUSH	149,086 T	132,532 T	16,654 T	All	23.5 FPS
SCREEN_COPY_4PUSH	136,990 T	121,780 T	15,310 T	All	25.5 FPS
SCREEN_COPY_8PUSH	130,942 T	116,404 T	14,638 T	All	26.7 FPS
SCREEN_COPY_ALLPUSH	124,908 T	111,042 T	13,980 T	All	27.9 FPS
SCREEN_COPY_Z80N_COMPACT	110,612 T	98,324 T	12,308 T	Next	31.6 FPS
SCREEN_COPY_DMA_FILL	300 T	300 T	300 T	Next	12,500+ FPS
SCREEN_COPY_DMA_BURST	270 T	270 T	270 T	Next	12,500+ FPS

Screen Copying Performance Compared with Layer 2

Method	Traditional Screen	Layer 2 256×192	Layer 2 320×256	Layer 2 640×256
LDIR	145,152 T (24.1 FPS)	N/A	N/A	N/A
ALLPUSH	124,908 T (27.9 FPS)	N/A	N/A	N/A
Z80N LDIRX	110,612 T (31.6 FPS)	78,663 T (44.4 FPS)	131,112 T (26.7 FPS)	262,224 T (13.3 FPS)
DMA	270 T (12,500+ FPS)	260 T (13,500+ FPS)	350 T (10,000+ FPS)	600 T (5,800+ FPS)

Plus 3 Hardware Double Buffering

Plus 3 Double Buffering Support (Hardware Bank Switching)

Operation	T-States	Platform	Description
PLUS3_SETUP_DOUBLE_BUFFER	~200 T + clearing	+3, Next	One-time initialization with screen clearing
PLUS3_SET_OFFSCREEN_BUFFER	~35-55 T	+3, Next	Smart banking (only when needed)
PLUS3_DOUBLE_BUFFER_TOGGLE	~75-95 T	+3, Next	Instant buffer swap with HALT sync

Performance Comparison: Software vs Hardware Double Buffering

Method	T-States	Frame Rate (3.5MHz)	Performance vs Software
Software Copy (LDIR)	145,152 T	24.1 FPS	Baseline
Software Copy (ALLPUSH)	124,908 T	27.9 FPS	14% faster
Software Copy (DMA)	270 T	12,500+ FPS	99.8% faster
Hardware Toggle (+3)	~95 T	Unlimited*	99.93% faster
Hardware Toggle (+3)	~95 T	50 FPS	VSync limited

*Software copying can exceed 50 FPS but results in screen tearing without VSync synchronization

Plus 3 Banking Memory Layout

Memory Range	Visible Screen	Off-Screen Buffer	Screen Display
$4000-$7FFF	Screen Bank 5	N/A	ULA displays Bank 5
$C000-$FFFF	N/A	Screen Bank 7	Ready for drawing
After Toggle	Screen Bank 7	Screen Bank 5	ULA displays Bank 7
$C000-$FFFF	N/A	Screen Bank 5	Ready for drawing

Plus 3 Double Buffering Advantages

Instant Swaps: Buffer switching in ~95 T-states regardless of screen content
No Memory Copying: Hardware bank switching eliminates data transfer overhead
Tearing Elimination: HALT synchronization ensures clean frame boundaries at 50 FPS
Consistent Performance: Buffer swap time independent of screen complexity
Memory Efficient: No duplicate screen data, uses existing +3 banking
Always $C000: Off-screen buffer always accessible at same address for drawing
Perfect 50 FPS: Synchronized with display refresh for smooth animation

Layer 2 Utility T-States

Function	Scenario	T-States	Description
CheckForActiveLayer2	Next Not Found	109	Z80N detection fails
CheckForActiveLayer2	Layer 2 Inactive	151	Z80N found, Layer 2 disabled
CheckForActiveLayer2	Layer 2 Active	157	Z80N found, Layer 2 enabled
GetActiveLayer2Addr	No Layer 2	87	Returns HL = 0
GetActiveLayer2Addr	Active Layer 2	83	Returns Layer 2 base address
GetLayer2FullInfo	Variable	200-300+	Complete configuration retrieval

Layer 2 Constants

; Layer 2 Memory Requirements
LAYER2_BYTES_256x192         EQU     $C000   ; 49,152 bytes (48KB) - 256×192 mode
LAYER2_BYTES_320x256         EQU     $14000  ; 81,920 bytes (80KB) - 320×256 mode  
LAYER2_BYTES_640x256         EQU     $28000  ; 163,840 bytes (160KB) - 640×256 mode
LAYER2_BYTES_320x256_HALF    EQU     $A000   ; 40,960 bytes (40KB) - Half for 16-bit DMA

; Layer 2 Resolutions
LAYER2_RESOLUTION_256x192    EQU     0       ; Standard Layer 2 resolution
LAYER2_RESOLUTION_320x256    EQU     1       ; Enhanced Layer 2 resolution
LAYER2_RESOLUTION_640x256    EQU     2       ; Maximum Layer 2 resolution

; Layer 2 Performance Levels
SCREEN_LAYER2_DMA_FILL       EQU     9       ; Layer 2 DMA fill mode
SCREEN_LAYER2_DMA_BURST      EQU     10      ; Layer 2 DMA burst mode

### Random Generation T-States

| Algorithm | 8-bit Standard | 8-bit Z80N | 16-bit Standard | 16-bit Z80N | Improvement |
|-----------|----------------|------------|-----------------|-------------|-------------|
| **LCG** | 45-55 | 25-35 | 85-95 | 55-65 | 30-35% faster |
| **LFSR** | 85-95 | 45-55 | 68 | 42 | 38-47% faster |
| **XorShift** | 35-45 | 20-30 | 55 | 35 | 36-43% faster |
| **Middle Square** | 115-150 | 65-75 | 78 | 48 | 33-43% faster |

#### Random Algorithm Constants

```asm
; 8-bit Standard Z80 Random Algorithms
PERFORMANCE_STANDARD_RANDOM_LCG       EQU 0    ; Linear Congruential Generator (45-55 T-states)
PERFORMANCE_STANDARD_RANDOM_LFSR      EQU 1    ; Linear Feedback Shift Register (85-95 T-states)
PERFORMANCE_STANDARD_RANDOM_XORSHIFT  EQU 2    ; XorShift Algorithm (35-45 T-states)
PERFORMANCE_STANDARD_RANDOM_MIDDLESQUARE EQU 3 ; Middle Square Method (115-150 T-states)

; 8-bit Next Z80N Random Algorithms  
PERFORMANCE_Z80N_RANDOM_LCG           EQU 4    ; Z80N optimized LCG (25-35 T-states)
PERFORMANCE_Z80N_RANDOM_LFSR          EQU 5    ; Z80N optimized LFSR (45-55 T-states)
PERFORMANCE_Z80N_RANDOM_XORSHIFT      EQU 6    ; Z80N optimized XorShift (20-30 T-states)
PERFORMANCE_Z80N_RANDOM_MIDDLESQUARE  EQU 7    ; Z80N optimized Middle Square (65-75 T-states)

; 16-bit Standard Z80 Random Algorithms
PERFORMANCE_STANDARD_RANDOM16_LCG      EQU 0   ; Linear Congruential Generator (85-95 T-states)
PERFORMANCE_STANDARD_RANDOM16_LFSR     EQU 1   ; Linear Feedback Shift Register (68 T-states)
PERFORMANCE_STANDARD_RANDOM16_XORSHIFT EQU 2   ; XorShift Algorithm (55 T-states)
PERFORMANCE_STANDARD_RANDOM16_MIDDLESQUARE EQU 3 ; Middle Square Method (78 T-states)

; 16-bit Next Z80N Random Algorithms
PERFORMANCE_Z80N_RANDOM16_LCG          EQU 4   ; Z80N optimized 16-bit LCG (55-65 T-states)
PERFORMANCE_Z80N_RANDOM16_LFSR         EQU 5   ; Z80N optimized 16-bit LFSR (42 T-states)
PERFORMANCE_Z80N_RANDOM16_XORSHIFT     EQU 6   ; Z80N optimized 16-bit XorShift (35 T-states)
PERFORMANCE_Z80N_RANDOM16_MIDDLESQUARE EQU 7   ; Z80N optimized 16-bit Middle Square (48 T-states)

Utility Functions T-States

Function	Available Path	Not Available Path	Description
CheckOnZ80N	81 T-states	60 T-states	Z80N processor detection
CheckDMAAvailable	66 T-states	58 T-states	DMA controller detection

⚡ DMA Support (Spectrum Next)

DMA Memory Fill: High-speed memory filling using DMA controller
- CPU overhead: ~240-260 T-states for setup
- Hardware transfer: Parallel to CPU execution
- Automatic fallback to standard routines if DMA unavailable
DMA Burst Fill: Maximum performance burst mode operations
- CPU overhead: ~235-250 T-states for setup
- Fastest possible memory operations on Next hardware
- Optimized wait loops with comprehensive status checking
DMA Memory Copy: General-purpose memory copying using DMA controller
- CPU overhead: ~280-300 T-states for setup
- Hardware transfer: Parallel to CPU execution (effectively instantaneous)
- Support for any source/destination addresses and transfer sizes
DMA Memory Copy Burst: Ultra-fast burst mode memory copying
- CPU overhead: ~260-280 T-states for setup
- Maximum DMA performance with burst transfer modes
- Ideal for screen copying and large memory operations

DMA Operations T-States

Operation	CPU Overhead	Hardware Time	Use Case
DMA_MemoryFill	~240-260 T	Parallel	Memory clearing, pattern fills
DMA_BurstFill	~235-250 T	Parallel	Ultra-fast memory clearing
DMA_MemoryCopy	~280-300 T	Parallel	Memory copying, screen copying
DMA_MemoryCopy_Burst	~260-280 T	Parallel	Ultra-fast memory copying

🧮 Random Number Generation Algorithms

NextLibrary provides comprehensive 8-bit and 16-bit random number generation algorithms optimized for different scenarios:

8-bit Standard Z80 Methods

LCG (Linear Congruential Generator): Fast with good uniformity (45-55 T-states)
- Best for: High-speed applications requiring uniform distribution
- Uses: Multiply-add formula (a*seed + c) mod m
- Quality: Good distribution, very fast, widely used in games
LFSR (Linear Feedback Shift Register): High-quality randomness (85-95 T-states)
- Best for: Cryptographic applications, high-quality sequences
- Uses: Bit shifting with XOR feedback polynomial
- Quality: Excellent distribution, maximum period (255 values)
XorShift: Fast with good quality (35-45 T-states)
- Best for: Games requiring fast, good-quality random numbers
- Uses: XOR and bit shifting operations
- Quality: Very good distribution, fast execution
Middle Square: Classic algorithm with moderate speed (115-150 T-states)
- Best for: Educational purposes, moderate quality needs
- Uses: Squares the seed and extracts middle digits
- Quality: Fair distribution, requires careful seed management

8-bit Next Z80N Methods (Spectrum Next Only)

Z80N_LCG: Hardware-accelerated LCG (25-35 T-states)
- Best for: Ultra-fast uniform random generation
- Uses: Z80N MUL for single-cycle multiplication in LCG formula
- Quality: Same as standard LCG, 35% faster execution
Z80N_LFSR: Z80N optimized LFSR (45-55 T-states)
- Best for: High-quality randomness with hardware acceleration
- Uses: Z80N MUL for efficient bit extraction and manipulation
- Quality: Same as standard LFSR, 47% faster execution
Z80N_XorShift: Ultra-fast XorShift (20-30 T-states)
- Best for: Fastest possible random generation with good quality
- Uses: Z80N MUL for optimized bit operations
- Quality: Same as standard XorShift, 43% faster execution
Z80N_MiddleSquare: Hardware-accelerated Middle Square (65-75 T-states)
- Best for: Classic algorithm with modern performance
- Uses: Z80N MUL for single-cycle squaring operation
- Quality: Same as standard Middle Square, 43% faster execution

16-bit Standard Z80 Methods

LCG (Linear Congruential Generator): Fast 16-bit uniform generation (85-95 T-states)
- Best for: High-speed 16-bit applications requiring uniform distribution
- Uses: Extended 16-bit multiply-add formula
- Quality: Good distribution, fastest 16-bit method, widely used
LFSR (Linear Feedback Shift Register): High-quality 16-bit randomness (68 T-states)
- Best for: High-quality sequences with extended precision
- Uses: 16-bit polynomial feedback with bit manipulation
- Quality: Excellent distribution, maximum period (65535 values)
XorShift: Fast 16-bit generation (55 T-states)
- Best for: Fast 16-bit random numbers with good quality
- Uses: Extended XOR and bit shifting operations
- Quality: Very good distribution, optimal speed/quality balance
Middle Square: 16-bit classic algorithm (78 T-states)
- Best for: Extended precision middle square method
- Uses: 16-bit squaring with middle extraction
- Quality: Good distribution with careful seed management

16-bit Next Z80N Methods (Spectrum Next Only)

Z80N_LCG: Hardware-accelerated 16-bit LCG (55-65 T-states)
- Best for: Ultra-fast 16-bit uniform random generation
- Uses: Z80N MUL for efficient 16-bit LCG calculations
- Quality: Same as standard 16-bit LCG, 30% faster execution
Z80N_LFSR: Z80N optimized 16-bit LFSR (42 T-states)
- Best for: High-quality 16-bit randomness with hardware acceleration
- Uses: Z80N MUL for efficient 16-bit polynomial operations
- Quality: Same as standard 16-bit LFSR, 38% faster execution
Z80N_XorShift: Ultra-fast 16-bit XorShift (35 T-states)
- Best for: Fastest possible 16-bit random generation
- Uses: Z80N MUL for optimized 16-bit bit operations
- Quality: Same as standard 16-bit XorShift, 36% faster execution
Z80N_MiddleSquare: Hardware-accelerated 16-bit Middle Square (48 T-states)
- Best for: 16-bit classic algorithm with modern performance
- Uses: Z80N MUL for single-cycle 16-bit squaring operation
- Quality: Same as standard 16-bit Middle Square, 38% faster execution

Z80N Performance Benefits

The Z80N optimized versions provide significant performance improvements:

Hardware Multiplication: Single-cycle MUL instruction vs multi-cycle addition loops
Efficient Bit Operations: MUL-based bit extraction vs traditional shifting
Maintained Quality: Identical mathematical properties and output sequences
Same Seed Compatibility: Drop-in replacements for standard versions

Performance Summary: Z80N random generators are 30-47% faster while maintaining identical output quality and seed compatibility with their standard Z80 counterparts.

📝 TODO List

🖥️ Display & Graphics

Extended screen manipulation functions, e.g. line draw, fill, patterned fill
Software sprite routines and Next hardware sprite routines

🎮 Input Systems

Joystick input support with multiple controller options
Enhanced text input utilities and keyboard handling

🏆 Scoring & Data

Extended scoring system supporting up to 12-digit scores (beyond 65535)
Leaderboard and score table management utilities

🔊 Audio Support

Beeper sound utilities (tags likely to be @COMPAT: 48K,128K,+2,+3,NEXT)
AY sound routines (tags likely to be @COMPAT: 128K,+2,+3,NEXT, @REQUIRES: AY-3-8912)

🏦 Memory Banking

Memory bank switching and loading (tags likely to be @COMPAT: 128K,+2,+3,NEXT, @REQUIRES: Memory banking)

💾 Loading and Saving

Tape routines (tags likely to be @COMPAT: 48K,128K,+2,+3,NEXT)
Microdrive routines (tags likely to be @COMPAT: 48K,128K,+2,+3) - likely requires Next to be in a required compatibility mode with microdrive interface attached and active - not sure yet
Disk routines (tags likely to be @COMPAT: +3, @REQUIRES: +3 disk interface) - potentially no Next compatibility as very hardware based

🚀 Advanced Next Features

Tagged @COMPAT NEXT as they will be specific to the Next, this list is not exhaustive

Sprites (tags likely to be @COMPAT: NEXT, @REQUIRES: Next sprites)
Copper (tags likely to be @COMPAT: NEXT, @REQUIRES: Next copper)
Enhanced DMA operations (pattern fills, memory copies, etc.)
More features... (one thing added at a time)

⚡ Optimization

Complete T-state optimization pass (e.g., replace JR with JP where beneficial)
Memory usage optimization analysis

🎯 Performance Levels

NextLibrary uses a unified performance system across all mathematical operations:

Performance Level	Characteristics	Use Case
PERFORMANCE_COMPACT	Variable timing, minimal code size	Memory-constrained applications
PERFORMANCE_BALANCED	Fixed timing, predictable performance	Real-time applications, games
PERFORMANCE_MAXIMUM	Optimized for speed, larger code size	Performance-critical operations
PERFORMANCE_NEXT_COMPACT	Z80N MUL instruction, fastest code	Next-only, maximum speed
PERFORMANCE_NEXT_BALANCED	Z80N MUL with overflow checking	Next-only, speed + validation
PERFORMANCE_NEXT_MAXIMUM	Z80N MUL with special case handling	Next-only, optimized edge cases

📋 Quick Start

Basic Usage Example

; Include the NextLibrary
INCLUDE "NextLibrary.asm"

; 8×8 Multiplication Example (Standard Z80)
LD      A, 25           ; Multiplicand
LD      B, 12           ; Multiplier  
LD      C, PERFORMANCE_BALANCED
CALL    Multiply8x8_Unified
; Result in HL = 300

; 8×8 Multiplication Example (Next Z80N - Ultra Fast!)
LD      A, 25           ; Multiplicand
LD      B, 12           ; Multiplier
LD      C, PERFORMANCE_NEXT_COMPACT   ; Uses Z80N MUL instruction
CALL    Multiply8x8_Unified
; Result in HL = 300 (85% faster!)

; 8-bit Random Number Generation
LD      A, 15           ; Upper limit (inclusive)
LD      B, 123          ; Seed value
LD      C, PERFORMANCE_Z80N_RANDOM_LFSR ; Z80N LFSR algorithm
CALL    Random8_Unified_Seed
; First random value in A

LD      A, 7            ; New upper limit
LD      C, PERFORMANCE_Z80N_RANDOM_LFSR ; Same algorithm  
CALL    Random8_Unified_Next
; Next random value in A

; 16-bit Random Number Generation
LD      HL, 1000        ; Upper limit (inclusive)
LD      BC, 9876        ; Seed value
LD      D, PERFORMANCE_Z80N_RANDOM16_MIDDLESQUARE ; Z80N 16-bit Middle Square
CALL    Random16_Unified_Seed
; First random value in HL

LD      HL, 5000        ; New upper limit
LD      D, PERFORMANCE_Z80N_RANDOM16_MIDDLESQUARE ; Same algorithm
CALL    Random16_Unified_Next  
; Next random value in HL

Enhanced Screen Management Examples

; Clear entire screen with white on black attributes (standard screen)
LD      A, %00000111    ; White ink, black paper
LD      C, SCREEN_4PUSH ; High performance mode
LD      HL, 0           ; Use default screen address (16384)
CALL    Screen_FullReset_Unified

; Clear off-screen buffer with different attributes
LD      A, %01000010    ; Green ink, black paper, bright
LD      C, SCREEN_DMA_FILL ; Ultra-fast DMA clearing (Next only)
LD      HL, $8000       ; Custom screen buffer address
CALL    Screen_FullReset_Unified

; Clear pixels only in secondary screen buffer
LD      A, 0            ; Not used for pixel clearing
LD      C, SCREEN_Z80N_COMPACT ; Z80N LDIRX optimization
LD      HL, $C000       ; Another screen buffer
CALL    Screen_ClearPixel_Unified

; Set attributes only in main screen, preserve pixels
LD      A, %01000010    ; Green ink, black paper, bright
LD      C, SCREEN_DMA_BURST ; Maximum DMA performance (Next only)
LD      HL, 0           ; Use default screen address
CALL    Screen_ClearAttr_Unified

; High-speed double buffering example
; Clear back buffer at maximum speed while displaying front buffer
LD      A, %00000111    ; White on black
LD      C, SCREEN_DMA_BURST ; Fastest possible
LD      HL, BackBuffer  ; Address of back buffer
CALL    Screen_FullReset_Unified
; ... render to back buffer ...
; ... swap buffers when ready ...

; Hardware detection example
CALL    CheckOnZ80N     ; Check if Z80N available
JR      Z, UseStandard  ; Use standard routines if not Z80N

CALL    CheckDMAAvailable ; Check if DMA available
JR      Z, Z80NOnly     ; Use Z80N only if no DMA

; Use DMA for maximum performance
LD      C, SCREEN_DMA_BURST
JR      ClearScreen

Z80NOnly:
LD      C, SCREEN_Z80N_COMPACT ; Use Z80N optimizations
JR      ClearScreen

UseStandard:
LD      C, SCREEN_ALLPUSH

ClearScreen:
LD      A, %00000111    ; Screen attribute
LD      HL, 0           ; Default screen
CALL    Screen_FullReset_Unified

Layer 2 Screen Clearing Examples

; Basic Layer 2 clear with manual address (256x192 mode)
LD      HL, $4000               ; Layer 2 bank address
LD      A, $E3                  ; Bright yellow color (palette index)
LD      C, SCREEN_LAYER2_MANUAL_256by192
CALL    Screen_FullReset_Unified

; High-resolution Layer 2 clear (320x256 mode)
LD      HL, $6000               ; Layer 2 bank address for 320x256
LD      A, $1F                  ; Bright blue color
LD      C, SCREEN_LAYER2_MANUAL_320by256
CALL    Screen_FullReset_Unified

; Maximum resolution Layer 2 clear (640x256 mode)
LD      HL, $8000               ; Layer 2 bank address for 640x256
LD      A, $07                  ; White color
LD      C, SCREEN_LAYER2_MANUAL_640x256
CALL    Screen_FullReset_Unified

; Ultra-fast DMA Layer 2 clear (auto-detection)
LD      HL, 0                   ; Use current active Layer 2
LD      A, $C0                  ; Red color
LD      C, SCREEN_LAYER2_AUTO_DMA
CALL    Screen_FullReset_Unified

; Manual DMA Layer 2 clear for maximum control
LD      HL, $4000               ; Specific Layer 2 address
LD      A, $38                  ; Orange color
LD      C, SCREEN_LAYER2_MANUAL_DMA_256by192
CALL    Screen_FullReset_Unified

; Layer 2 double buffering setup
; Clear back buffer
LD      HL, $8000               ; Back buffer Layer 2 address
LD      A, $00                  ; Black background
LD      C, SCREEN_LAYER2_MANUAL_DMA_256by192
CALL    Screen_FullReset_Unified

; ... render graphics to back buffer ...

; Switch Layer 2 to display back buffer
LD      BC, LAYER2_REGISTER_SELECT_PORT
LD      A, LAYER2_ADDRESS_REGISTER
OUT     (C), A
LD      BC, LAYER2_REGISTER_DATA_PORT
LD      A, $20                  ; Point to back buffer bank
OUT     (C), A

; Automatic Layer 2 mode detection and clearing
LD      HL, 0                   ; Use active Layer 2
LD      A, $F0                  ; Bright magenta
LD      C, SCREEN_LAYER2_AUTO_ACTIVE
CALL    Screen_FullReset_Unified

; Performance comparison - clear traditional screen vs Layer 2
; Traditional screen (fastest software method)
LD      HL, SCREEN_PIXEL_BASE   ; Traditional screen
LD      A, $07                  ; White on black
LD      C, SCREEN_ALLPUSH       ; 27.1 FPS
CALL    Screen_FullReset_Unified

; Layer 2 screen (DMA method)
LD      HL, 0                   ; Active Layer 2
LD      A, $FF                  ; White
LD      C, SCREEN_LAYER2_AUTO_DMA ; 8,700+ FPS
CALL    Screen_FullReset_Unified

; Multi-resolution Layer 2 game setup
; Check Layer 2 capabilities and set optimal mode
CALL    Layer2_GetActiveMode    ; Returns mode in A
CP      2                       ; Check if 640x256 mode
JR      Z, Setup640x256
CP      1                       ; Check if 320x256 mode  
JR      Z, Setup320x256
; Default to 256x192
LD      C, SCREEN_LAYER2_AUTO_DMA
JR      SetupComplete

Setup640x256:
LD      C, SCREEN_LAYER2_MANUAL_DMA_640by256
JR      SetupComplete

Setup320x256:
LD      C, SCREEN_LAYER2_MANUAL_DMA_320by256

SetupComplete:
LD      HL, 0                   ; Use active Layer 2
LD      A, $00                  ; Black background
CALL    Screen_FullReset_Unified

Screen Copying Examples

; Basic screen copy from buffer to display
LD      HL, BackBuffer          ; Source address
LD      DE, 0                   ; Destination (0 = default screen)
LD      BC, 6912                ; Full screen size
LD      C, SCREEN_COPY_COMPACT  ; Standard LDIR method
CALL    Screen_FullCopy_Unified

; High-performance copy using stack optimization
LD      HL, OffScreenBuffer     ; Source address
LD      DE, SCREEN_PIXEL_BASE   ; Destination address
LD      BC, 6144                ; Pixel area only
LD      C, SCREEN_COPY_ALLPUSH  ; Fastest software method (27.9 FPS)
CALL    Screen_PixelCopy_Unified

; Next-only Z80N optimized copy
LD      HL, BackBuffer          ; Source address
LD      DE, 0                   ; Destination (default screen)
LD      BC, 6912                ; Full screen
LD      C, SCREEN_COPY_Z80N_COMPACT ; Z80N LDIRX (31.6 FPS)
CALL    Screen_FullCopy_Unified

; Ultra-fast DMA copy (Next only)
LD      HL, BackBuffer          ; Source address
LD      DE, SCREEN_PIXEL_BASE   ; Destination address
LD      BC, 6912                ; Full screen
LD      C, SCREEN_COPY_DMA_BURST ; DMA burst mode (12,500+ FPS)
CALL    Screen_FullCopy_Unified

; Attribute-only copy for color changes
LD      HL, AttributeBuffer     ; Source attributes
LD      DE, SCREEN_ATTR_BASE    ; Destination attributes
LD      BC, 768                 ; Attribute area size
LD      C, SCREEN_COPY_4PUSH    ; Fast stack method
CALL    Screen_AttrCopy_Unified

; Automatic hardware detection and optimal performance
CALL    CheckOnZ80N             ; Check for Z80N
JR      Z, StandardHardware     ; Jump if standard Z80

CALL    CheckDMAAvailable       ; Check for DMA
JR      Z, Z80NOnly            ; Jump if no DMA

; Use DMA for maximum performance
LD      C, SCREEN_COPY_DMA_BURST
JR      CopyScreen

Z80NOnly:
LD      C, SCREEN_COPY_Z80N_COMPACT
JR      CopyScreen

StandardHardware:
LD      C, SCREEN_COPY_ALLPUSH

CopyScreen:
LD      HL, BackBuffer          ; Source
LD      DE, 0                   ; Destination
CALL    Screen_FullCopy_Unified

; Double buffering with optimal performance
; Render to back buffer
LD      HL, 0                   ; Clear back buffer first
LD      A, $07                  ; White on black
LD      C, SCREEN_DMA_BURST     ; Ultra-fast clearing
CALL    Screen_FullReset_Unified

; ... render graphics to back buffer ...

; Copy back buffer to display
LD      HL, BackBuffer          ; Source
LD      DE, 0                   ; Destination (screen)
LD      C, SCREEN_COPY_DMA_BURST ; Ultra-fast copy
CALL    Screen_FullCopy_Unified

; High-performance Layer 2 auto detect active layer 2 address providing just the back buffer address to copy from.
LD      HL, BackBuffer          ; Source: off-screen Layer 2 buffer
LD      DE, ActiveLayer2        ; Destination: active Layer 2 display
LD      C, SCREEN_COPY_LAYER2_AUTO_DMA ; Auto-detect mode, use DMA
CALL    Screen_FullCopy_Unified

; Manual Layer 2 copying for specific resolutions
LD      HL, Layer2Buffer        ; Source buffer
LD      DE, $4000               ; Layer 2 display address
LD      C, SCREEN_COPY_LAYER2_MANUAL_DMA_256by192 ; Fastest for 256×192
CALL    Screen_FullCopy_Unified

Plus 3 Hardware Double Buffering Examples

; Initialize Plus 3 double buffering (call once at start)
    CALL    PLUS3_SETUP_DOUBLE_BUFFER       ; Bank 5 visible, Bank 7 at $C000
; Now ready for hardware double buffering

GameLoop:
    ; Ensure off-screen buffer is ready for drawing
    CALL    PLUS3_SET_OFFSCREEN_BUFFER   ; HL = $C000, A = off-screen bank number
    ; Smart routine - only performs banking if needed (~35-55 T-states)
    
    ; Clear off-screen buffer (always at $C000)
    LD      A, %00000111                 ; White on black attributes
    LD      C, SCREEN_DMA_BURST          ; Use fastest clearing method
    ; HL already contains $C000 from SET_OFFSCREEN_BUFFER
    CALL    Screen_FullReset_Unified
    
    ; Draw your frame to $C000 (off-screen buffer)
    ; ... all rendering code draws to $C000 ...
    ; ... sprites, backgrounds, text, etc. ...
    
    ; Instant buffer swap (~95 T-states total)
    CALL    PLUS3_DOUBLE_BUFFER_TOGGLE   ; Hardware switch + HALT sync
    ; Previous off-screen buffer now visible, previous visible now off-screen
    
    ; Check for game exit condition
    CALL    ScanKeyboard
    BIT     0, A                         ; Check for SPACE
    JR      Z, GameLoop                  ; Continue if not pressed
    RET                                  ; Exit game

; Advanced Plus 3 double buffering with performance optimization
OptimizedGameLoop:
    ; Get off-screen buffer (smart banking)
    CALL    PLUS3_SET_OFFSCREEN_BUFFER   ; HL = $C000, A = bank number
    
    ; Ultra-fast screen clearing using DMA
    LD      A, $00                       ; Black screen
    LD      C, SCREEN_DMA_BURST          ; Maximum performance
    CALL    Screen_FullReset_Unified     ; Clear in ~235 T-states
    
    ; High-performance rendering to off-screen buffer
    ; All drawing operations target $C000
    
    ; Draw sprites using fast multiplication
    LD      A, SpriteX                   ; Sprite X position
    LD      B, 32                        ; Screen width in characters
    LD      C, PERFORMANCE_NEXT_COMPACT  ; Z80N MUL for speed
    CALL    Multiply8x8_Unified          ; HL = screen offset
    LD      BC, $C000                    ; Off-screen buffer base
    ADD     HL, BC                       ; HL = sprite screen address
    ; ... draw sprite at HL ...
    
    ; Generate random enemy positions using Z80N
    LD      A, 255                       ; Screen width range
    LD      C, PERFORMANCE_Z80N_RANDOM_XORSHIFT ; Fastest random
    CALL    Random8_Unified_Next         ; A = random X position
    ; ... use A for enemy placement ...
    
    ; Hardware buffer swap (instant display update)
    CALL    PLUS3_DOUBLE_BUFFER_TOGGLE   ; ~95 T-states
    
    ; Game logic during next frame preparation
    CALL    UpdatePlayerPosition
    CALL    MoveEnemies
    CALL    CheckCollisions
    
    JR      OptimizedGameLoop

; Multi-platform double buffering selection
; Automatically choose best method based on available hardware
InitializeDoubleBuffering:
    ; Check if we're on +3 or Next with banking support
    LD      A, $FF                       ; Test value
    LD      ($C000), A                   ; Try to write to potential RAM
    LD      A, ($C000)                   ; Read back
    CP      $FF                          ; Check if write worked
    JR      NZ, UseSoftwareBuffering     ; Use software if no banking
    
    ; Hardware double buffering available
    CALL    PLUS3_SETUP_DOUBLE_BUFFER    ; Initialize hardware method
    LD      A, 1                         ; Flag for hardware buffering
    LD      (BufferingMode), A
    RET

UseSoftwareBuffering:
    ; Set up software double buffering
    LD      HL, BackBuffer               ; Allocate back buffer
    LD      A, 0                         ; Flag for software buffering
    LD      (BufferingMode), A
    RET

; Unified buffer swap routine (hardware or software)
SwapBuffers:
    LD      A, (BufferingMode)
    OR      A                            ; Check buffering mode
    JR      Z, SoftwareSwap              ; Jump if software mode
    
    ; Hardware buffer swap
    CALL    PLUS3_DOUBLE_BUFFER_TOGGLE   ; ~95 T-states
    RET

SoftwareSwap:
    ; Software buffer copy
    LD      HL, BackBuffer               ; Source: back buffer
    LD      DE, 0                        ; Destination: screen
    LD      C, SCREEN_COPY_DMA_BURST     ; Use fastest available copy
    CALL    Screen_FullCopy_Unified      ; Copy buffer to screen
    RET

; Performance comparison demonstration
PerformanceTest:
    ; Test software copying speed
    LD      B, 100                       ; 100 iterations
    CALL    StartTimer
SoftwareLoop:
    LD      HL, TestBuffer
    LD      DE, 0
    LD      C, SCREEN_COPY_ALLPUSH
    CALL    Screen_FullCopy_Unified
    DJNZ    SoftwareLoop
    CALL    StopTimer                    ; Record software time
    
    ; Test hardware switching speed  
    CALL    PLUS3_SETUP_DOUBLE_BUFFER
    LD      B, 100                       ; 100 iterations
    CALL    StartTimer
HardwareLoop:
    CALL    PLUS3_DOUBLE_BUFFER_TOGGLE
    DJNZ    HardwareLoop
    CALL    StopTimer                    ; Record hardware time
    
    ; Hardware method will be ~1500× faster!
    RET

; Variables
BufferingMode:          DB      0       ; 0=software, 1=hardware
BackBuffer:             DS      6912    ; Software back buffer
TestBuffer:             DS      6912    ; Test data buffer

Plus 3 Double Buffering Best Practices

; 1. Always initialize double buffering before use
    CALL    PLUS3_SETUP_DOUBLE_BUFFER       ; One-time setup

; 2. Always call SET_OFFSCREEN_BUFFER before drawing
MainLoop:
    CALL    PLUS3_SET_OFFSCREEN_BUFFER   ; Ensures correct bank at $C000
    ; Now safe to draw to $C000
    
; 3. Use HALT synchronization for smooth animation
    CALL    PLUS3_DOUBLE_BUFFER_TOGGLE   ; Includes HALT for sync
    JR      MainLoop

; 4. Combine with DMA for maximum performance
    LD      A, ClearValue
    LD      C, SCREEN_DMA_BURST          ; Ultra-fast clearing
    ; HL already set to $C000 by SET_OFFSCREEN_BUFFER
    CALL    Screen_FullReset_Unified

; 5. Leverage always-$C000 addressing for consistent code
DrawSprite:
    ; Off-screen buffer is always at $C000 after SET_OFFSCREEN_BUFFER
    LD      HL, $C000                    ; Base address
    ; ... add sprite offset ...
    ; ... draw sprite data ...
    RET

; 6. Error handling for unsupported hardware
InitGame:
    CALL    PLUS3_SETUP_DOUBLE_BUFFER
    ; Check if setup succeeded by testing banking
    LD      A, $AA
    LD      ($C000), A
    LD      A, ($C000)
    CP      $AA
    JR      Z, BufferingOK
    
    ; Fall back to software rendering
    JP      InitSoftwareMode
    
BufferingOK:
    ; Continue with hardware double buffering
    JP      InitHardwareMode

Layer 2 Graphics Examples

; Detect if Layer 2 is available and active
CALL    CheckForActiveLayer2    ; Check Layer 2 availability
JR      Z, NoLayer2            ; Jump if not available

; Get Layer 2 configuration
CALL    GetLayer2FullInfo      ; Get complete Layer 2 info
; Layer2Resolution now contains: 0=256×192, 1=320×256, 2=640×256
; Layer2Width/Height contain pixel dimensions
; Layer2Bpp contains color depth (4 or 8 bits per pixel)

; Clear Layer 2 screen with DMA acceleration
LD      A, $FF                 ; Fill color (white in 8bpp mode)
LD      C, SCREEN_LAYER2_DMA_BURST ; Maximum performance
CALL    GetActiveLayer2Addr    ; Get Layer 2 address in HL
CALL    Screen_Layer2Clear_Unified ; Clear Layer 2 screen

; Double buffering example
CALL    GetActiveLayer2Addr    ; Get current display buffer
LD      (DisplayBuffer), HL    ; Store display buffer
LD      HL, BackBuffer         ; Set back buffer address
; ... render to back buffer ...
; ... swap buffers when ready ...

NoLayer2:
; Fall back to standard ULA screen operations
LD      HL, 0                  ; Use standard screen
LD      A, $07                 ; White on black
LD      C, SCREEN_DMA_BURST    ; Use DMA if available
CALL    Screen_FullReset_Unified

DMA Memory Operations

; Fill large memory area using DMA
LD      HL, $8000       ; Destination address
LD      A, $FF          ; Fill pattern
LD      BC, 16384       ; Size (16KB)
CALL    DMA_MemoryFill  ; ~240 T-states CPU + hardware time

; Ultra-fast burst fill
LD      HL, $C000       ; Destination
LD      A, $00          ; Clear pattern
LD      BC, 8192        ; Size (8KB)
LD      D, $FF          ; Burst mode flag
CALL    DMA_BurstFill   ; ~235 T-states CPU + hardware time

; Copy large memory blocks using DMA
LD      HL, SourceData          ; Source address
LD      DE, DestBuffer          ; Destination address
LD      BC, 16384               ; Size (16KB)
CALL    DMA_MemoryCopy          ; ~300 T-states CPU + hardware time

; Ultra-fast burst copy
LD      HL, LargeBuffer         ; Source address
LD      DE, TargetLocation      ; Destination address
LD      BC, 32768               ; Size (32KB)
CALL    DMA_MemoryCopy_Burst    ; ~270 T-states CPU + hardware time

; Screen copying with DMA utilities
LD      HL, OffScreenBuffer     ; Source screen
LD      DE, SCREEN_PIXEL_BASE   ; Destination screen
LD      BC, 6912                ; Full screen size
CALL    DMA_MemoryCopy_Burst    ; Fastest possible screen copy

; Memory preparation for double buffering
LD      HL, Buffer1             ; Source buffer
LD      DE, Buffer2             ; Destination buffer  
LD      BC, 6912                ; Screen size
CALL    DMA_MemoryCopy          ; Prepare buffer swap

Hardware Detection Utilities

; Check what hardware is available and select optimal routines
CALL    CheckOnZ80N     ; Returns NZ if Z80N available
JR      Z, StandardZ80  ; Jump if not Z80N

; Z80N detected - check for DMA
CALL    CheckDMAAvailable ; Returns NZ if DMA available
JR      Z, Z80NOnly     ; Jump if no DMA

; Full Next hardware available
LD      C, SCREEN_DMA_BURST ; Use maximum performance
JR      PerformOperation

Z80NOnly:
LD      C, SCREEN_Z80N_COMPACT ; Use Z80N optimizations
JR      PerformOperation

StandardZ80:
LD      C, SCREEN_ALLPUSH

PerformOperation:
LD      A, %00000111    ; Screen attribute
LD      HL, 0           ; Default screen
CALL    Screen_FullReset_Unified

⚡ Z80N Performance Comparison

8×8 Multiplication Performance

Performance Level	T-States	Platform	Description
PERFORMANCE_COMPACT	35-75	All	Variable timing, compact code
PERFORMANCE_BALANCED	~160	All	Fixed timing, predictable
PERFORMANCE_MAXIMUM	~120	All	Optimized for speed
PERFORMANCE_NEXT_COMPACT	~14	Next	Z80N MUL instruction
PERFORMANCE_NEXT_BALANCED	~29	Next	Z80N MUL + overflow check
PERFORMANCE_NEXT_MAXIMUM	~20	Next	Z80N MUL + special cases

Performance Improvement: Up to 85% faster on Spectrum Next!

16×8 Multiplication Performance

Performance Level	T-States	Platform	Description
PERFORMANCE_COMPACT	45-380	All	Variable timing, compact code
PERFORMANCE_BALANCED	~180	All	Fixed timing, predictable
PERFORMANCE_MAXIMUM	~140	All	Optimized for speed
PERFORMANCE_NEXT_COMPACT	~97	Next	Z80N MUL instruction
PERFORMANCE_NEXT_BALANCED	~97	Next	Z80N MUL + same algorithm
PERFORMANCE_NEXT_MAXIMUM	~97	Next	Z80N MUL + same algorithm

Performance Improvement: Up to 75% faster on Spectrum Next for balanced/maximum modes!

8÷8 Division Performance

Performance Level	T-States	Platform	Description
PERFORMANCE_COMPACT	25-1950	All	Variable timing, compact subtraction
PERFORMANCE_BALANCED	30-1975	All	Similar to compact, different registers
PERFORMANCE_MAXIMUM	40-1000	All	Optimized with 2× acceleration
PERFORMANCE_NEXT_COMPACT	40-400	Next	Z80N MUL hybrid method
PERFORMANCE_NEXT_BALANCED	~175	Next	Z80N MUL 8-bit reciprocal table
PERFORMANCE_NEXT_MAXIMUM	~218	Next	Z80N MUL 16-bit reciprocal table

Performance Improvement: Up to 90% faster on Spectrum Next!
✅ Accuracy Note: NEXT_COMPACT and NEXT_MAXIMUM provide exact mathematical results. NEXT_BALANCED uses 8-bit reciprocal for speed with minor accuracy trade-offs only for edge cases. NEXT_MAXIMUM uses 16-bit reciprocal for maximum precision. All algorithms pass comprehensive test validation.

16÷8 Division Performance

Performance Level	T-States	Platform	Description
PERFORMANCE_COMPACT	45-1300	All	Variable subtraction, worst case 65535÷1
PERFORMANCE_BALANCED	220-280	All	Fixed binary long division, consistent timing
PERFORMANCE_MAXIMUM	180-420	All	Optimized binary division with early exits
PERFORMANCE_NEXT_COMPACT	118-500	Next	Z80N hybrid: 8×8 for H=0, traditional for larger
PERFORMANCE_NEXT_BALANCED	118-500	Next	Uses 8-bit reciprocal table, some precision tradeoff
PERFORMANCE_NEXT_MAXIMUM	107-520	Next	Use 16-bit reciprocal table, high precision

Algorithm Selection (NEXT_COMPACT/BALANCED):

H=0 (dividend ≤255): Uses Z80N 8×8 hybrid division
H=1-15 (256-4095): Uses traditional balanced division
H≥16 (4096+): Uses traditional maximum division

Algorithm Selection (NEXT_MAXIMUM):

H=0 and L<B: Direct return with quotient=0, remainder=L
H=0 and L≥B: Uses Z80N 8×8 16-bit reciprocal division for maximum precision
H≠0: Uses traditional maximum division algorithm

Performance Improvement: Up to 65% faster for small dividends on Spectrum Next!

Screen Clearing Performance

Performance Level	Full Reset T-States	Platform	Improvement
SCREEN_COMPACT	145,265	All	Baseline
SCREEN_Z80N_COMPACT	96,700	Next	33% faster
SCREEN_ALLPUSH	40,344	All	72% faster
SCREEN_DMA_FILL	400	Next	99.7% faster
SCREEN_DMA_BURST	260	Next	99.8% faster

DMA Performance Notes:

T-States shown are CPU overhead only
Actual memory transfer happens in hardware parallel to CPU
DMA provides dramatic speed improvements for large memory operations
Automatic fallback to standard routines if DMA unavailable

16-bit Operations

; 16×8 Multiplication (Standard Z80)
LD      HL, 1000        ; 16-bit multiplicand
LD      B, 50           ; 8-bit multiplier
LD      C, PERFORMANCE_MAXIMUM
CALL    Multiply16x8_Unified
; Result in DE:HL = 50000

; 16×8 Multiplication (Next Z80N - Ultra Fast!)
LD      HL, 1000        ; 16-bit multiplicand  
LD      B, 50           ; 8-bit multiplier
LD      C, PERFORMANCE_NEXT_COMPACT   ; Uses Z80N MUL instruction
CALL    Multiply16x8_Unified
; Result in DE:HL = 50000 (75% faster!)

; 16÷8 Division (Standard Z80)
LD      HL, 1234        ; 16-bit dividend
LD      B, 10           ; 8-bit divisor
LD      C, PERFORMANCE_BALANCED
CALL    Divide16x8_Unified
; Quotient in HL = 123, remainder in A = 4

; 16÷8 Division (Next Z80N - Hybrid Method)
LD      HL, 5000        ; 16-bit dividend
LD      B, 25           ; 8-bit divisor
LD      C, PERFORMANCE_NEXT_COMPACT   ; Uses Z80N hybrid algorithm
CALL    Divide16x8_Unified
; Quotient in HL = 200, remainder in A = 0 (65% faster for H≥16!)

🔧 API Reference

Mathematical Operations

Multiplication

Multiply8x8_Unified - 8×8 bit unsigned multiplication
- Standard Z80: COMPACT/BALANCED/MAXIMUM (35-160 T-states)
- Next Z80N: NEXT_COMPACT/NEXT_BALANCED/NEXT_MAXIMUM (10-29 T-states)
Multiply16x8_Unified - 16×8 bit unsigned multiplication
- Standard Z80: COMPACT/BALANCED/MAXIMUM (45-380 T-states)
- Next Z80N: NEXT_COMPACT/NEXT_BALANCED/NEXT_MAXIMUM (97 T-states)

Input: A/HL = multiplicand, B = multiplier, C = performance level
Output: HL = result (8×8), DE:HL = result (16×8)
Z80N Performance: Up to 85% faster (8×8) and 75% faster (16×8) on Spectrum Next

Division

Divide8x8_Unified - 8÷8 bit unsigned division
- Standard Z80: COMPACT/BALANCED/MAXIMUM (25-1975 T-states)
- Next Z80N: NEXT_COMPACT (40-400 T-states hybrid - subtraction <128, 8-bit reciprocal ≥128)
- Next Z80N: NEXT_BALANCED (~175 T-states 8-bit reciprocal table)
- Next Z80N: NEXT_MAXIMUM (~175 T-states currently fallback to 8-bit reciprocal)
Divide16x8_Unified - 16÷8 bit unsigned division
- Standard Z80: COMPACT/BALANCED/MAXIMUM (45-1300 T-states)
- Next Z80N: NEXT_COMPACT/NEXT_BALANCED (118-500 T-states hybrid method)
- Next Z80N: NEXT_MAXIMUM (107-520 T-states reciprocal with fallback)

Input: A/HL = dividend, B = divisor, C = performance level
Output: A/HL = quotient, A/B = remainder
Z80N Performance: Up to 95% faster (8÷8), 65% faster (16÷8) on Spectrum Next
✅ Accuracy Note: All division algorithms pass comprehensive test validation. Reciprocal methods use optimized approximation with validated accuracy for typical use cases.

Random Number Generation

8-bit Random

Random8_Unified_Seed - Initialize 8-bit random seed and generate first value
Random8_Unified_Next - Generate subsequent 8-bit random numbers

Input: A = upper limit (inclusive), B = seed (seeding only), C = algorithm selection
Output: A = random value in range [0, limit]

16-bit Random

Random16_Unified_Seed - Initialize 16-bit random seed and generate first value
Random16_Unified_Next - Generate subsequent 16-bit random numbers

Input: HL = upper limit (inclusive), BC = seed (seeding only), D = algorithm selection
Output: HL = random value in range [0, limit]

Screen Clearing

Traditional ZX Spectrum Screen Clearing

Screen_FullReset_Unified - Clear complete screen (pixels + attributes)
Screen_PixelReset_Unified - Clear pixels only, preserve attributes
Screen_AttrReset_Unified - Clear attributes only, preserve pixels

Input:

HL = screen address (0 = use default screen address)
A = fill value (pixel value or attribute value)
C = performance level (SCREEN_COMPACT through SCREEN_DMA_BURST)

Spectrum Next Layer 2 Screen Clearing

Screen_FullReset_Unified - Clear Layer 2 screen (supports all Layer 2 modes)

Input:

HL = Layer 2 address (0 = use current active Layer 2)
A = color value (8-bit palette index)
C = Layer 2 performance level

Layer 2 Performance Levels:

Manual Modes: Direct address and resolution specification
- SCREEN_LAYER2_MANUAL_256by192: 256×192 LDIRX clearing (~205,000 T-states)
- SCREEN_LAYER2_MANUAL_320by256: 320×256 LDIRX clearing (~350,000 T-states)
- SCREEN_LAYER2_MANUAL_640by256: 640×256 LDIRX clearing (~700,000 T-states)
- SCREEN_LAYER2_MANUAL_DMA_256by192: 256×192 DMA clearing (~280 T-states)
- SCREEN_LAYER2_MANUAL_DMA_320by256: 320×256 DMA clearing (~320 T-states)
- SCREEN_LAYER2_MANUAL_DMA_640by256: 640×256 DMA clearing (~400 T-states)
Automatic Modes: Hardware detection with optimal selection
- SCREEN_LAYER2_AUTO_ACTIVE: Auto-detect mode, use LDIRX (variable T-states)
- SCREEN_LAYER2_AUTO_DMA: Auto-detect mode, use DMA burst (~280-400 T-states)

Output: Screen area cleared with specified color value

Legacy Individual Performance Methods (Deprecated)

Screen_FullReset_CompactLDIR through Screen_FullReset_ALLPUSH - Use unified versions instead

Unified Screen Clearing with Flexible Addressing

Screen_FullReset_Unified - Clear pixels and set attributes
Screen_ClearPixel_Unified - Clear pixels only, preserve attributes
Screen_ClearAttr_Unified - Set attributes only, preserve pixels

Input:

A = attribute value (for full reset and attribute operations)
C = performance level
HL = screen base address (0 = use default 16384, other values = custom address)

Output: Screen cleared according to specified operation

Enhanced Features:

Flexible Addressing: Support for any memory location as screen buffer
Off-Screen Rendering: Full support for secondary screen buffers
Double Buffering: Enable smooth animations with back-buffer rendering
Memory Conservation: Efficient in-memory screen composition

Performance Levels:

SCREEN_COMPACT: Standard LDIR operation (baseline, ~145,265 T-states)
SCREEN_Z80N_COMPACT: Z80N LDIRX optimization (33% faster, ~96,700 T-states)
SCREEN_1PUSH to SCREEN_ALLPUSH: Stack-based optimizations (37-72% faster)
SCREEN_DMA_FILL: DMA memory fill (99.7% faster, ~400 T-states CPU)
SCREEN_DMA_BURST: DMA burst mode (99.8% faster, ~260 T-states CPU)

Screen Copying

Unified Screen Copying with Maximum Performance

Screen_FullCopy_Unified - Copy complete screen (pixels + attributes)
Screen_PixelCopy_Unified - Copy pixels only, preserve destination attributes
Screen_AttrCopy_Unified - Copy attributes only, preserve destination pixels

Input:

HL = source address
DE = destination address (0 = use default screen address)
BC = byte count (6912 for full screen, 6144 for pixels, 768 for attributes)
C = performance level

Output: Memory copied according to specified operation

Performance Levels:

SCREEN_COPY_COMPACT: Standard LDIR operation (145,152 T-states, 24.1 FPS)
SCREEN_COPY_1PUSH to SCREEN_COPY_ALLPUSH: Stack optimizations (173,278 to 124,908 T-states, 20.2 to 27.9 FPS)
SCREEN_COPY_Z80N_COMPACT: Z80N LDIRX optimization (110,612 T-states, 31.6 FPS)
SCREEN_COPY_DMA_FILL: DMA memory copy (300 T-states, 12,500+ FPS)
SCREEN_COPY_DMA_BURST: DMA burst copy (270 T-states, 12,500+ FPS)

Input Utilities

ScanKeyboard - Comprehensive keyboard scanning across all rows
WaitForKey - Wait for any key press with optional timeout
GetKeyPress - Get current key state without waiting

Input: Various parameters depending on function Output: Key codes or status flags Performance: Optimized for minimal input lag

Text Utilities

DisplayText - Render text string to screen
DisplayTextAt - Render text at specific screen coordinates
GetTextWidth - Calculate text width for positioning
ClearTextArea - Clear specific text area

Input: Text strings, coordinates, formatting options Output: Text rendered to screen Performance: Fast text rendering for real-time display

Scoring Utilities

ConvertScoreToString - Convert 16-bit score to display string
DisplayScore - Render score to screen with formatting
FormatScore - Apply padding and alignment to score string

Input: Score values, formatting options, display coordinates Output: Formatted score displayed on screen Performance: Optimized for frequent score updates

Hardware Detection

Utility Functions

CheckOnZ80N - Detect Z80N processor availability
CheckDMAAvailable - Detect DMA controller availability

Input: None
Output: Z flag set if feature not available, NZ if available
Performance: 58-81 T-states depending on feature and availability

DMA Memory Operations (Spectrum Next Only)

Memory Copy Operations

DMA_MemoryCopy - DMA-accelerated memory copying
DMA_MemoryCopy_Burst - DMA burst mode memory copying

Input: HL = source address, DE = destination address, BC = byte count
Output: Memory copied via DMA controller
Performance: ~270-300 T-states CPU overhead + parallel hardware transfer

Memory Fill Operations

DMA_MemoryFill - DMA-accelerated memory fill
DMA_BurstFill - DMA burst mode memory fill

Input: HL = destination address, A = fill byte, BC = byte count, D = burst mode (BurstFill only)
Output: Memory filled via DMA controller
Performance: ~235-260 T-states CPU overhead + parallel hardware transfer

Algorithm Constants

; Performance Levels (Standard Z80)
PERFORMANCE_COMPACT                         EQU 0
PERFORMANCE_BALANCED                        EQU 1  
PERFORMANCE_MAXIMUM                         EQU 2

; Performance Levels (Next Z80N)
PERFORMANCE_NEXT_COMPACT                    EQU 3
PERFORMANCE_NEXT_BALANCED                   EQU 4
PERFORMANCE_NEXT_MAXIMUM                    EQU 5

; Screen Performance Levels
SCREEN_COMPACT                              EQU 0    ; Standard LDIR operation
SCREEN_1PUSH                                EQU 1    ; 2 bytes per iteration
SCREEN_2PUSH                                EQU 2    ; 4 bytes per iteration
SCREEN_4PUSH                                EQU 3    ; 8 bytes per iteration
SCREEN_8PUSH                                EQU 4    ; 16 bytes per iteration
SCREEN_ALLPUSH                              EQU 5    ; 256 bytes per iteration
SCREEN_Z80N_COMPACT                         EQU 6    ; Z80N LDIRX optimization
SCREEN_DMA_FILL                             EQU 7    ; DMA memory fill
SCREEN_DMA_BURST                            EQU 8    ; DMA burst fill
SCREEN_LAYER2_MANUAL_256by192               EQU 9    ; Manual Layer 2 256x192 LDIRX
SCREEN_LAYER2_MANUAL_320by256               EQU 10   ; Manual Layer 2 320x256 LDIRX
SCREEN_LAYER2_MANUAL_640by256               EQU 11   ; Manual Layer 2 640x256 LDIRX
SCREEN_LAYER2_MANUAL_DMA_256by192           EQU 12   ; Manual Layer 2 256x192 DMA
SCREEN_LAYER2_MANUAL_DMA_320by256           EQU 13   ; Manual Layer 2 320x256 DMA
SCREEN_LAYER2_MANUAL_DMA_640by256           EQU 14   ; Manual Layer 2 640x256 DMA
SCREEN_LAYER2_AUTO_ACTIVE                   EQU 15   ; Auto Layer 2 detection LDIRX
SCREEN_LAYER2_AUTO_DMA                      EQU 16   ; Auto Layer 2 detection DMA

; Layer 2 Display Constants
LAYER2_REGISTER_DATA_PORT                   EQU $243B ; Next register data port
LAYER2_REGISTER_SELECT_PORT                 EQU $253B ; Next register select port
LAYER2_ADDRESS_REGISTER                     EQU $12   ; Layer 2 address register
LAYER2_CONTROL_REGISTER                     EQU $15   ; Layer 2 control register
LAYER2_BYTES_256by192                       EQU $C000 ; 48KB (256x192 mode)
LAYER2_BYTES_320by256_HALF                  EQU $A000 ; 40KB (half of 320x256)
LAYER2_BYTES_640by256_QTR                   EQU $A000 ; 40KB (quarter of 640x256)

; Screen Copy Performance Levels
SCREEN_COPY_COMPACT                         EQU 0    ; Standard LDIR operation
SCREEN_COPY_1PUSH                           EQU 1    ; 2 bytes per iteration
SCREEN_COPY_2PUSH                           EQU 2    ; 4 bytes per iteration
SCREEN_COPY_4PUSH                           EQU 3    ; 8 bytes per iteration  
SCREEN_COPY_8PUSH                           EQU 4    ; 16 bytes per iteration
SCREEN_COPY_ALLPUSH                         EQU 5    ; 256 bytes per iteration
SCREEN_COPY_Z80N_COMPACT                    EQU 6    ; Z80N LDIRX optimization
SCREEN_COPY_DMA_FILL                        EQU 7    ; DMA memory copy
SCREEN_COPY_DMA_BURST                       EQU 8    ; DMA burst copy
SCREEN_COPY_LAYER2_MANUAL_256by192          EQU 9    ; Manual Layer 2 256x192 LDIRX
SCREEN_COPY_LAYER2_MANUAL_320by256          EQU 10   ; Manual Layer 2 320x256 LDIRX
SCREEN_COPY_LAYER2_MANUAL_640by256          EQU 11   ; Manual Layer 2 640x256 LDIRX
SCREEN_COPY_LAYER2_MANUAL_DMA_256by192      EQU 12   ; Manual Layer 2 256x192 DMA
SCREEN_COPY_LAYER2_MANUAL_DMA_320by256      EQU 13   ; Manual Layer 2 320x256 DMA
SCREEN_COPY_LAYER2_MANUAL_DMA_640by256      EQU 14   ; Manual Layer 2 640x256 DMA
SCREEN_COPY_LAYER2_AUTO_ACTIVE              EQU 15   ; Use automatic active Layer 2 address and resolution detection and LDIRX to copy Layer 2 screen - Next only.
SCREEN_COPY_LAYER2_AUTO_DMA                 EQU 16   ; Use automatic active Layer 2 address and resolution detection and DMA BURST to copy Layer 2 screen - Next only.

; 8-bit Random Algorithms (Standard Z80)
PERFORMANCE_STANDARD_RANDOM_LCG             EQU 0    ; Linear Congruential Generator
PERFORMANCE_STANDARD_RANDOM_LFSR            EQU 1    ; Linear Feedback Shift Register
PERFORMANCE_STANDARD_RANDOM_XORSHIFT        EQU 2    ; XorShift Algorithm
PERFORMANCE_STANDARD_RANDOM_MIDDLESQUARE    EQU 3    ; Middle Square Method

; 8-bit Random Algorithms (Next Z80N)
PERFORMANCE_Z80N_RANDOM_LCG                 EQU 4    ; Z80N optimized LCG
PERFORMANCE_Z80N_RANDOM_LFSR                EQU 5    ; Z80N optimized LFSR
PERFORMANCE_Z80N_RANDOM_XORSHIFT            EQU 6    ; Z80N optimized XorShift
PERFORMANCE_Z80N_RANDOM_MIDDLESQUARE        EQU 7    ; Z80N optimized Middle Square

; 16-bit Random Algorithms (Standard Z80)
PERFORMANCE_STANDARD_RANDOM16_LCG           EQU 0    ; 16-bit Linear Congruential Generator
PERFORMANCE_STANDARD_RANDOM16_LFSR          EQU 1    ; 16-bit Linear Feedback Shift Register
PERFORMANCE_STANDARD_RANDOM16_XORSHIFT      EQU 2    ; 16-bit XorShift Algorithm
PERFORMANCE_STANDARD_RANDOM16_MIDDLESQUARE  EQU 3   ; 16-bit Middle Square Method

; 16-bit Random Algorithms (Next Z80N)
PERFORMANCE_Z80N_RANDOM16_LCG               EQU 4    ; Z80N optimized 16-bit LCG
PERFORMANCE_Z80N_RANDOM16_LFSR              EQU 5    ; Z80N optimized 16-bit LFSR
PERFORMANCE_Z80N_RANDOM16_XORSHIFT          EQU 6    ; Z80N optimized 16-bit XorShift
PERFORMANCE_Z80N_RANDOM16_MIDDLESQUARE      EQU 7    ; Z80N optimized 16-bit Middle Square

; DMA Constants (Next Only)
DMA_RESET                                   EQU $C3    ; DMA reset command
DMA_FILL                                    EQU $79    ; DMA fill transfer mode
DMA_BURST_TRANSFER                          EQU $7F    ; DMA burst transfer mode
DMA_BURST_CONTROL                           EQU $18    ; DMA burst control
DMA_LOAD                                    EQU $CF    ; DMA load/start command
DMA_BURST_LOAD                              EQU $DF    ; DMA burst load/start command
ZXN_DMA_PORT                                EQU $6B    ; Next DMA port

🧪 Testing

NextLibrary includes comprehensive test suites:

62 Test Cases continually being expanded to cover more functionality
Algorithm Validation for all random number generators (8-bit and 16-bit)
Performance Verification across all performance levels
Edge Case Testing for boundary conditions
Statistical Distribution Testing for random number quality
Seed Compatibility Testing between standard and Z80N versions
Screen Management Testing for all performance levels and addressing modes
DMA Validation for hardware detection and memory operations
Hardware Detection Testing for Z80N and DMA capability verification

New Test Cases in v1.6

Test 058: Parameterized screen clearing with custom addresses
Test 059: DMA screen clearing validation (Next hardware only)

Run tests using the included test framework:

INCLUDE "Testing/TestCases.asm"

🏗️ Building

Requirements

sjasmplus assembler
ZX Spectrum Next development environment (for Next-specific features)

Build Instructions

# Assemble the library
sjasmplus --lst=NextLibrary.lst NextLibrary.asm

# Build output will be generated in Output/nextlibrary.nex

🔧 Modular Usage

NextLibrary is designed with a clear, modular structure that allows developers to extract only the routines they need:

Selective Inclusion: Each mathematical operation is self-contained in its own file
Minimal Dependencies: Most routines only depend on constants and variables
Clean Code Structure: Well-commented code makes extraction straightforward
No Overhead: Include only what you need for optimal memory usage

Example: If you only need 8×8 multiplication and basic screen clearing, simply extract:

Source/Multiply/Multiply8x8.asm - The multiplication routines
Source/Display/ScreenClearing.asm - Screen clearing routines
Relevant constants from Source/Constants.asm
Any required variables from Source/Variables.asm

This modular approach ensures you can integrate specific functionality into your projects without including the entire library.

📁 Project Structure

NextLibrary/
├── Source/
│   ├── Display/                # Screen, Layer 2, and text utilities
│   ├── Divide/                 # Division routines  
│   ├── DMA/                    # DMA support routines
│   ├── Input/                  # Input handling routines
│   ├── Multiply/               # Multiplication routines
│   ├── Random/                 # Random number generation
│   ├── Scoring/                # Score management
│   ├── Testing/                # Test suites
│   ├── Utility/                # Hardware detection utilities
│   ├── ConstantsDisplay.asm    # Display and graphics constants
│   ├── ConstantsDMA.asm        # DMA operation constants
│   ├── ConstantsMaths.asm      # Mathematical constants
│   ├── ConstantsRandom.asm     # Random generation constants 
│   ├── NextLibrary.asm         # Main library file
│   ├── Variables.asm           # Global ariables
│   ├── VariablesDisplay.asm    # Display-specific variables
│   ├── VariablesDMA.asm        # DMA operation variables
│   └── VariablesRandom.asm     # Random Generator variables
├── Output/
│   └── nextlibrary.nex     # Compiled library
└── README.md              # This file

📄 License

NextLibrary is available for free use under the following terms:

Free Use License

This software is provided FREE OF CHARGE for any purpose, including commercial and non-commercial use. You are granted the following rights:

✅ Use: Use this library in any project without restriction
✅ Modify: Modify the source code to suit your needs
✅ Distribute: Redistribute original or modified versions
✅ Commercial Use: Use in commercial projects without royalties or fees
✅ Private Use: Use in private/personal projects

Disclaimer of Warranty and Liability

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.

IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

By using this library, you acknowledge that:

No Support Obligation: The author has no obligation to provide support, updates, or bug fixes
Use at Your Own Risk: You assume all risks associated with using this software
No Responsibility: The author takes no responsibility for any consequences of using this library in your projects
Your Responsibility: You are responsible for testing and validating the library's suitability for your specific use case

Attribution (Optional)

While not required, attribution is appreciated:

"Uses NextLibrary - Z80 Assembly Utilities by [Author Name]"

Usage Information (Encouraged)

Help the community! If you use NextLibrary in your project, please consider:

📝 Opening an issue or discussion in this repository with:

Project Name: What you're building
Routines Used: Which NextLibrary functions you're using (e.g., "8×8 multiplication (Z80N), XORShift random, DMA screen clearing")
Platform Target: 48K, 128K, +2, +3, Next, or multi-platform
Brief Description: What your project does
Optional Link: Share your project if it's public!

This information helps:

Other Developers: See real-world usage examples and inspiration
Library Development: The author gaining an understanding of which routines are most valuable and most used
Community Building: Connect developers using similar functionality
Documentation: Improve examples based on actual use cases

Example usage report:

Project: "RetroBlaster 2024"
Routines: Multiply8x8 (NEXT_COMPACT), DMA screen clearing (BURST), Random8 XORShift, Off-screen rendering
Platform: ZX Spectrum Next (Using Z80N optimizations and DMA)
Description: Side-scrolling shooter with procedural enemies, ultra-fast multiplication, and smooth double-buffered graphics

This information helps improve the library and provides inspiration to other developers.

TL;DR: Use it freely, modify it, distribute it, make money with it - just don't blame me if something goes wrong! And if you feel like sharing what you built, that's awesome! 😊

🤝 Contributing

Contributions are welcome! Please ensure:

Code Quality: Follow Z80 assembly best practices
Performance: Maintain T-state accuracy documentation
Testing: Add test cases for new functionality
Documentation: Update README and inline comments
Hardware Compatibility: Test on both standard Z80 and Next hardware where applicable

🎮 Use Cases

NextLibrary is perfect for:

Retro Game Development: High-performance math for physics, scoring, and graphics
System Programming: Efficient utilities for Next-specific applications
Educational Projects: Well-documented Z80 assembly examples
Performance-Critical Code: T-state accurate timing for real-time applications
Graphics Programming: Fast screen clearing and off-screen rendering
Hardware Optimization: Automatic detection and utilization of Next-specific features

🔗 Related Projects

📧 Contact

For questions, suggestions, or support, please open an issue on GitHub.

NextLibrary - Empowering Z80 assembly development with world-class mathematics, utilities, and hardware-accelerated performance.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.vscode		.vscode
Output		Output
Source		Source
.gitignore		.gitignore
README.md		README.md

antwjadam/z80nextlibrary

Folders and files

Latest commit

History

Repository files navigation

NextLibrary - Z80 Assembly Utilities Library for Spectrum and Next

Release History

Target Platforms

ZX SPECTRUM 48K:

ZX SPECTRUM 128K:

ZX SPECTRUM +2:

ZX SPECTRUM +3:

ZX SPECTRUM NEXT:

🚀 Current Features

📊 Mathematical Operations - 12 optimized algorithms

🎲 Random Number Generation - 16 algorithms with Z80N acceleration

🧹 Screen Clearing and Memory Fill - 17 performance levels

🖥️ Screen Copying and Memory Transfer - 17 performance levels

🎨 Layer 2 Graphics - Complete Next graphics support

⚡ DMA Support - 4 hardware-accelerated operations

⌨️ Input and Keyboard Utilities - Cross-platform input handling

📝 Text and Font System - Embedded font with rendering utilities

🏆 Scoring and Data Management - Score conversion and display

🔧 Utility Functions - Hardware detection and memory management

Screen Clearing T-States Performance

Traditional ZX Spectrum Screen (32x24 character, 256x192 pixel)

Spectrum Next Layer 2 Screen Clearing (Next Only)

Layer 2 Memory Requirements

Frame Rate Capabilities Across Next CPU Speeds

Traditional Screen Clearing (Fastest: DMA_BURST)

Layer 2 Screen Clearing (Fastest: LAYER2_AUTO_DMA)

🖥️ Screen Copying and Memory Transfer

Screen Copying T-States Performance

Screen Copying Performance Compared with Layer 2

Plus 3 Hardware Double Buffering

Plus 3 Double Buffering Support (Hardware Bank Switching)

Performance Comparison: Software vs Hardware Double Buffering

Plus 3 Banking Memory Layout

Plus 3 Double Buffering Advantages

Layer 2 Utility T-States

Layer 2 Constants

Utility Functions T-States

⚡ DMA Support (Spectrum Next)

DMA Operations T-States

🧮 Random Number Generation Algorithms

8-bit Standard Z80 Methods

8-bit Next Z80N Methods (Spectrum Next Only)

16-bit Standard Z80 Methods

16-bit Next Z80N Methods (Spectrum Next Only)

Z80N Performance Benefits

📝 TODO List

🖥️ Display & Graphics

🎮 Input Systems

🏆 Scoring & Data

🔊 Audio Support

🏦 Memory Banking

💾 Loading and Saving

🚀 Advanced Next Features

⚡ Optimization

🎯 Performance Levels

📋 Quick Start

Basic Usage Example

Enhanced Screen Management Examples

Layer 2 Screen Clearing Examples

Screen Copying Examples

Plus 3 Hardware Double Buffering Examples

Plus 3 Double Buffering Best Practices

Layer 2 Graphics Examples

DMA Memory Operations

Hardware Detection Utilities

⚡ Z80N Performance Comparison

8×8 Multiplication Performance

16×8 Multiplication Performance

8÷8 Division Performance

16÷8 Division Performance

Screen Clearing Performance

16-bit Operations

🔧 API Reference

Mathematical Operations

Multiplication

Packages