The File Zipper project is a file compression utility that utilizes the Greedy Huffman encoding algorithm to efficiently compress files. This project demonstrates the practical use of a greedy algorithm, allowing users to reduce the size of large files while maintaining the original content.
The focus of this project is to provide a balance between compression ratio and execution time, helping users understand how Huffman encoding optimizes file sizes without compromising on speed.
- File Compression: Compress large files using Huffman encoding.
- File Decompression: Restore compressed files to their original form.
- Greedy Algorithm Application: Learn how the greedy algorithm is applied in file compression.
- Compression Ratio vs Execution Time: Analyze the trade-offs between file size reduction and the time required for compression.
Huffman encoding is a lossless compression method that assigns variable-length codes to characters based on their frequency. Frequently occurring characters are given shorter codes, while less common characters receive longer ones, minimizing the overall file size.
- Frequency Calculation: The frequency of each character in the file is calculated.
- Building the Huffman Tree: A binary tree is constructed, where the least frequent characters are placed deeper in the tree and the most frequent characters are closer to the root.
- Assigning Codes: Shorter binary codes are assigned to frequent characters, and longer codes to infrequent characters.
- Encoding the File: The file is encoded using these binary codes.
- Decompression: The encoded file can be decompressed by reversing the Huffman tree process.
-
Clone the repository:
git clone https://github.com/CKShetty4/FileZipper.git
-
Navigate to the project directory:
cd FileZipper
You can compress a file by:
-
Navigating to the
encoder
folder, adding your input file, and running:python huffman_encoder.py compress <input_file.txt>
-
Or adding the input file to the
Data
folder, navigating to thesrc
folder, and running:python -B main.py compress <input_file.txt> <output_file.txt>
The
-B
flag is used to prevent the generation of Python cache files.The compressed output file will be saved in the same directory as the input file.
You can decompress a file by:
-
Navigating to the
decoder
folder, adding your compressed file, and running:python huffman_decoder.py decompress <compressed_file.txt> <huffman_codes_file.txt>
-
Or navigating to the
src
folder and running:python -B main.py decompress <compressed_file.txt> <decoded_file.txt>
python huffman_encoder.py compress example.txt
python huffman_decoder.py decompress compressed.txt
You can also run both encoding and decoding together (for testing purposes, as the input and output files will be identical) by using:
python -B main.py
To run the automated test cases, navigate to the src
folder and run:
python -B test_huffman.py
This will execute 12 test cases, display the results in the terminal, and generate a testResultlog.txt
file with the test results.
- input_file.txt: A sample input file to demonstrate compression.
- output_file.txt: The output file generated after applying Huffman compression.
- decoded_file.txt: The file generated after decompression, which should match the original input.
FileZipper/
├── Data/
├── src/
│ ├── encoder/
│ │ └── huffman_encoder.py
│ ├── decoder/
│ │ └── huffman_decoder.py
│ └── main.py
│ ├── test_huffman.py
One of the core aspects of this project is to explore the trade-off between compression ratio and execution time:
- Compression Ratio: Huffman encoding can significantly reduce file size, particularly for files with many repeated characters.
- Execution Time: While the algorithm is efficient, compression and decompression times increase with larger files and more complex character distributions.
- Add support for compressing multiple input files into a single compressed file.
- Implement functionality to decompress a single compressed file back into multiple files.