Releases: FederatedAI/eggroll
Release v3.2.0
Release v3.1.0
- add 2.x api backport support
- bug fix
Release v3.0.0
Major Features and Improvements
Enhancements in the JVM Part:
- Core Component Reconstruction: The
cluster-manager
andnode-manager
components have been entirely rebuilt using Java, ensuring uniformity and enhanced performance. - Transport Component Modification: The
rollsite
transport component has been removed and replaced with the more efficientosx
component. - Improved Process Management: Advanced logic has been implemented to manage processes more effectively, significantly reducing the risk of process leakage.
- Enhanced Data Storage Logic: Data storage mechanisms have been refined for better performance and reliability.
- Concurrency Control Improvements: We've upgraded the logic for concurrency control in the original components, leading to performance boosts.
- Visualization Component: A new visualization component has been added for convenient monitoring of computational information.
- Refined Logging: The logging system has been enhanced for more precise outputs, aiding in rapid anomaly detection.
Upgrades in the Python Part:
- Reconstruction of
roll_pair
andegg_pair
: These components now support serialization and partition methods controlled by the caller. Serialization safety is uniformly managed by the caller. - Automated Cleanup of Intermediate Tables: The issue of automatic cleaning for intermediate tables between federation and computing has been resolved, eliminating the need for extra operations by the caller.
- Unified Configuration Control: A flexible configuration system is introduced, supporting direct pass-through, configuration files, and environment variables to cater to diverse requirements.
- Client-Side PyPI Installation: Eggroll 3.0 supports easy installation via PyPI for clients.
- Optimized Log Configuration: Callers can now customize log formats according to their needs.
- Code Structure Refinement: The codebase has been streamlined for clarity, removing a substantial amount of redundant code.
Eggroll 3.0 brings comprehensive enhancements in system performance, usability, and reliability with these significant updates in both the JVM and Python parts.
Commit Authors
- dylan-fan <289765648@qq.com> @dylan-fan
- forgivedengkai forgive_dengkai@163.com @forgivedengkai
- sagewe wbwmat@gmail.com @sagewe
- mgqa34 mgq3374541@163.com @mgqa34
- xiongli 740332065@qq.com @Xiong-Li-github
- NingDu 598671894@qq.com @REXNingDu
- chengtcc 864261919@qq.com @chengtcc
v2.5.3
1 .Changing NodeManager Heartbeat Logic
2. Add judgment on IP and port in NodeManager file
3. You can use a shell to obtain the number of GPUs
4. LLM environment variables can be configured through additional configuration files
5. Specific network card names can be configured to obtain local IP
6. Improvement of Eggroll client logic
8. LLM session addition status
9. Modify the cluster manager startup logic to prevent it from starting when the MySQL connection is unsuccessful
v3.0.0-beta
Merge pull request #651 from FederatedAI/dev-3.0.0-beta-merge merge
v2.5.2
1.real time log for LLM
2.eggroll client for LLM
3.improve downloading function for LLM
4.fix security issure
5.fix historical legacy bugs
Release v2.5.1
Changes:
integrate deepspeed to train LLM models
2.Distributed GPU resource management
3.enhance cluster high availability
Release v2.4.9
update whitelist and configuration description
v2.5.0-alpha
update whitelist to support fate-v2.0.0-alpha