author | date |
---|---|
Pulkit Goyal |
May 11, 2021 |
Problem Statement - ./docs/Data_Coding_Challenge.pdf
This problem is perfectly solved with a priority queue, which is a data structure that attaches a priority to every element and serves them in increasing (or decreasing) priority. Priority queues are implemented using heaps. The relevant time complexities of heaps using binary trees are as follows -
- Inserting a new element in a heap (of size n) takes O(log(n)) time
- Finding the lowest (or highest) priority element takes O(1) time
- Creating a heap out of an unsorted list (of size n) takes O(n) time
The presented solution (in Python) uses the heapq
module of the Python Standard Library to implement the priority queue using binary heaps.
The worst case performance of the algorithm is as follows:
- Time Complexity -
O((N-X).log(X))
- Space Complexity -
O(X)
where,
X
- Number of elements to be foundN
- Size of the input data
- Unix-like OS
python3
This program was developed in Python3.8 on MacOS
There are two options to pass the data stream to be processed to this program,
-
Pass the file path (relative or absolute) as an argument
python3 heapx.py X file
-
Pass the data via stdin
# Use the shell redirection operator '<' python3 heapx.py X <file
or
# Pipe it cat file | python3 heapx.py X
where,
X
- Number of elements to be foundfile
- Name of the file containing the data
I ran all the different permutations of sample data provided in the problem statement through my script (with X = 3), and compared it against the expected output.
This test is written in the
test.py
file. The sample data with the solution is stored under the\data
folder.
Use the following command to run the tests,
python3 -m unittest test