LLaDA GUI

New version available that is slightly more functional-

https://github.com/angrysky56/llada_gui_new/tree/main

A graphical user interface for interacting with the LLaDA (Large Language Diffusion with mAsking) model.

Currently maxes out my 12gb VRAM using 4 bit (reads around 20gb) but the new optimizations are working well and much faster- maybe 10x.

./start_memory_optimized.sh

Prototype memory system now available, slower and VRAM intensive. Derived from:

https://github.com/synthience/mcp-titan-cognitive-memory

Generally uses around 40gb RAM in CPU mode.

Overview

This is a GUI wrapper for the LLaDA model, an 8B scale diffusion model trained entirely from scratch that rivals LLaMA3 8B in performance. Unlike conventional autoregressive language models, LLaDA uses a diffusion approach with masking to generate text.

Important: This GUI is a third-party tool and not officially affiliated with the original LLaDA project. All credit for the underlying LLaDA model goes to the original authors at the Gaoling School of Artificial Intelligence, Renmin University of China. Please visit their official repository for more information about the model.

🚀 Performance Optimizations

This GUI includes several optimizations to make the model run efficiently on consumer hardware:

Memory Efficiency

Smart CPU-GPU Offloading: Intelligently moves tensors between CPU and GPU to minimize memory usage
Token Buffer Management: Manages token data efficiently to reduce peak memory requirements
Adaptive Step Scheduling: Uses fewer steps for easier tokens, more for difficult ones

Generation Speed

Block-Level Processing: Processes tokens in blocks for better GPU utilization
Progressive Generation: High-confidence tokens are revealed early in the process
Chunked Operations: Large operations are broken into manageable chunks

These optimizations allow the model to run on GPUs with 8-12GB VRAM while providing faster generation than the original implementation.

Features

Text Generation: Generate text responses to your prompts
Intuitive Interface: Easy-to-use controls for interacting with the model
Configurable Parameters: Adjust generation length, sampling steps, and more
Diffusion Visualization: Watch the diffusion process unfold in real-time
Token Evolution: See how masked tokens evolve into predicted text
Memory Management: Options to optimize memory usage, including:
- Real-time memory monitoring
- 4-bit and 8-bit quantization options
- CPU fallback for low-memory situations
- Automatic parameter adjustment based on available memory
Performance Optimizations: Built-in tools to improve performance:
- Memory-efficient settings for lower GPU usage
- Attention slicing for handling larger prompts
- Precision control for speed/memory tradeoffs

Requirements

Python 3.10 or later
PyQt6
PyTorch 2.0 or later
Transformers 4.38.2
CUDA-capable GPU with at least 10GB memory (for optimal performance)
CPU-only mode is also supported (slower but works on any machine)

Installation

Clone this repository:

git clone https://github.com/angrysky56/llada-gui.git
cd llada-gui

Use the provided installation script:
```
chmod +x install.sh
./install.sh
```
The script will:
- Create a virtual environment
- Install all required packages
- Set up desktop integration if applicable

Alternatively, you can manually set up the environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Usage

Starting the Application

There are several ways to start the application:

Using the start script:
```
./start_gui.sh
```
Direct Python execution:
```
./venv/bin/python run.py
```
Using the desktop file (if installed): Double-click the LLaDA_GUI.desktop file in your applications menu or desktop.

Using the Interface

Enter your prompt in the text input area
Adjust generation parameters as needed:
- Generation Length: Number of tokens to generate
- Sampling Steps: Number of diffusion steps (higher = better quality but slower)
- Block Length: Size of blocks for semi-autoregressive generation
- Temperature: Controls randomness (0 = deterministic, higher = more random)
- CFG Scale: Classifier-free guidance strength
- Remasking Strategy: Method to select which tokens remain masked
Select hardware options:
- Choose between CPU or GPU
- Select memory optimization (normal precision, 8-bit, or 4-bit quantization)
Click "Generate" to start the process
Watch the diffusion process in the visualization tab
View the final output in the text output tab

Memory Optimization

If you encounter out-of-memory errors:

Reduce Generation Length and Sampling Steps
Try 8-bit or 4-bit quantization options
Switch to CPU mode if necessary (will be slower but more reliable)
Use the built-in performance optimizer (described below)

Performance Optimization

This application includes built-in performance optimization tools that can significantly reduce memory usage and improve generation speed.

Using the Optimizer

Launch the optimizer:
```
python optimize_launcher.py
```
Or use the desktop shortcut: Double-click the LLaDA_Optimizer.desktop file.
Select optimizations in the GUI:
- GPU Memory Optimizations
- Config File Patches
- Worker Code Optimizations
Apply optimizations by clicking "Apply Optimizations"
Restart the application to use the optimized version

Understanding Diffusion in LLaDA

Unlike autoregressive models that generate one token at a time, LLaDA works by:

Starting with a completely masked sequence of the desired length
At each step, predicting values for all masked tokens simultaneously
Based on prediction confidence, keeping some tokens and remasking others
Repeating until all tokens are predicted

The visualization tab shows this process in action, with:

Gray boxes for masked tokens
Colored boxes for predicted tokens (color intensity indicates confidence)

Project Structure

The application is organized into the following components:

llada_gui.py: Main GUI application code
llada_worker.py: Worker thread for asynchronous model execution
diffusion_visualization.py: Visualization of the diffusion process
memory_monitor.py: Real-time memory usage tracking
config.py: Application configuration and constants
utils.py: Utility functions
run.py: Entry point script
optimizations/: Performance optimization tools
onnx/: Experimental ONNX conversion utilities

Acknowledgements

This GUI is built on top of the LLaDA model developed by researchers at the Gaoling School of Artificial Intelligence, Renmin University of China. Please cite their work when using this application:

@article{nie2025large,
  title={Large Language Diffusion Models},
  author={Nie, Shen and Zhu, Fengqi and You, Zebin and Zhang, Xiaolu and Ou, Jingyang and Hu, Jun and Zhou, Jun and Lin, Yankai and Wen, Ji-Rong and Li, Chongxuan},
  journal={arXiv preprint arXiv:2502.09992},
  year={2025}
}

License

This application is provided as-is under the MIT License. See the LICENSE file for details.

The LLaDA model has its own license from the original developers. Please refer to the original repository for more information.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
GSAI-ML_LLaDA-8B-Instruct		GSAI-ML_LLaDA-8B-Instruct
archive		archive
memory_server		memory_server
onnx		onnx
optimizations		optimizations
.gitignore		.gitignore
LICENSE		LICENSE
LLaDA_Extreme_Optimizer.desktop		LLaDA_Extreme_Optimizer.desktop
LLaDA_GUI.desktop		LLaDA_GUI.desktop
LLaDA_Memory.desktop		LLaDA_Memory.desktop
LLaDA_Optimized.desktop		LLaDA_Optimized.desktop
LLaDA_Optimizer.desktop		LLaDA_Optimizer.desktop
OPTIMIZATION_DETAILS.md		OPTIMIZATION_DETAILS.md
OPTIMIZE.md		OPTIMIZE.md
OPTIMIZED.md		OPTIMIZED.md
QUICK_START.md		QUICK_START.md
README.md		README.md
README_MEMORY.md		README_MEMORY.md
apply_patch.sh		apply_patch.sh
chat.py		chat.py
cleanup.py		cleanup.py
cognitive-diffusion-architecture.py		cognitive-diffusion-architecture.py
cognitive_llada.py		cognitive_llada.py
config.py		config.py
config_extreme.py		config_extreme.py
convert_to_onnx.py		convert_to_onnx.py
diffusion_visualization.py		diffusion_visualization.py
extreme_mode_patch.py		extreme_mode_patch.py
fix_gpu_memory.py		fix_gpu_memory.py
generate.py		generate.py
get_log_likelihood.py		get_log_likelihood.py
gui-integration.py		gui-integration.py
install.sh		install.sh
install_optimizations.py		install_optimizations.py
llada_gui.py		llada_gui.py
llada_worker.py		llada_worker.py
low_memory_config.py		low_memory_config.py
make_executable.sh		make_executable.sh
memory_embeddings.py		memory_embeddings.py
memory_guided_generate.py		memory_guided_generate.py
memory_guided_worker.py		memory_guided_worker.py
memory_integration.py		memory_integration.py
memory_integration_auto.py		memory_integration_auto.py
memory_monitor.py		memory_monitor.py
memory_patch.py		memory_patch.py
onnx_converter.py		onnx_converter.py
onnx_dialog.py		onnx_dialog.py
onnx_integration.py		onnx_integration.py
optimize.py		optimize.py
optimize_extreme.py		optimize_extreme.py
optimize_gui.py		optimize_gui.py
optimize_launcher.py		optimize_launcher.py
remove_extreme_limits.py		remove_extreme_limits.py
remove_warnings.py		remove_warnings.py
requirements.txt		requirements.txt
requirements_memory.txt		requirements_memory.txt
run.py		run.py
run_cognitive_llada.py		run_cognitive_llada.py
run_no_warnings.py		run_no_warnings.py
run_patch.py		run_patch.py
setup_scripts.py		setup_scripts.py
start_gui.sh		start_gui.sh
start_memory_optimized.sh		start_memory_optimized.sh
start_with_memory.sh		start_with_memory.sh
test_onnx.py		test_onnx.py
titan_memory.py		titan_memory.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLaDA GUI

New version available that is slightly more functional-

Overview

🚀 Performance Optimizations

Memory Efficiency

Generation Speed

Features

Requirements

Installation

Usage

Starting the Application

Using the Interface

Memory Optimization

Performance Optimization

Using the Optimizer

Understanding Diffusion in LLaDA

Project Structure

Acknowledgements

License

Contributing

About

Releases

Packages

Languages

License

angrysky56/llada_gui

Folders and files

Latest commit

History

Repository files navigation

LLaDA GUI

New version available that is slightly more functional-

Overview

🚀 Performance Optimizations

Memory Efficiency

Generation Speed

Features

Requirements

Installation

Usage

Starting the Application

Using the Interface

Memory Optimization

Performance Optimization

Using the Optimizer

Understanding Diffusion in LLaDA

Project Structure

Acknowledgements

License

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages