Skip to content
This repository was archived by the owner on Jun 23, 2023. It is now read-only.

Outputs a list of the unique ids associated with the X-largest values given a large data stream.

Notifications You must be signed in to change notification settings

pulkitgoyal56/x-largest_values

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

author date
Pulkit Goyal
May 11, 2021

X LARGEST VALUES

Problem Statement - ./docs/Data_Coding_Challenge.pdf


  1. Conceptual Overview
  2. Technical Overview
  3. Running
    1. Install Requirements
    2. Execute
  4. Testing

Conceptual Overview

This problem is perfectly solved with a priority queue, which is a data structure that attaches a priority to every element and serves them in increasing (or decreasing) priority. Priority queues are implemented using heaps. The relevant time complexities of heaps using binary trees are as follows -

  • Inserting a new element in a heap (of size n) takes O(log(n)) time
  • Finding the lowest (or highest) priority element takes O(1) time
  • Creating a heap out of an unsorted list (of size n) takes O(n) time

Technical Overview

The presented solution (in Python) uses the heapq module of the Python Standard Library to implement the priority queue using binary heaps.

The worst case performance of the algorithm is as follows:

  1. Time Complexity - O((N-X).log(X))
  2. Space Complexity - O(X)

where,

  • X - Number of elements to be found
  • N - Size of the input data

Running

Install Requirements

  • Unix-like OS
  • python3

This program was developed in Python3.8 on MacOS

Execute

There are two options to pass the data stream to be processed to this program,

  1. Pass the file path (relative or absolute) as an argument

    python3 heapx.py X file
  2. Pass the data via stdin

    # Use the shell redirection operator '<'
    python3 heapx.py X <file

    or

    # Pipe it
    cat file | python3 heapx.py X

where,

  • X - Number of elements to be found
  • file - Name of the file containing the data

Testing

I ran all the different permutations of sample data provided in the problem statement through my script (with X = 3), and compared it against the expected output.

This test is written in the test.py file. The sample data with the solution is stored under the \data folder.

Use the following command to run the tests,

python3 -m unittest test

About

Outputs a list of the unique ids associated with the X-largest values given a large data stream.

Topics

Resources

Stars

Watchers

Forks

Languages