Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparing Data Structures #401

Open
sjathin opened this issue May 23, 2021 · 11 comments
Open

Comparing Data Structures #401

sjathin opened this issue May 23, 2021 · 11 comments

Comments

@sjathin
Copy link
Contributor

sjathin commented May 23, 2021

Description of the problem

Feature Request: As there are multiple data structures available to solve a particular problem, this feature allows one to choose the most optimized data structure for a particular problem.

Example of the problem

Array vs Stack
  1. Access: O(1) vs O(N)
  2. Insertion: O(N) vs O(1)

References/Other comments

The example given here is very generic and basic, there can be multiple metrics to compare data structures.

@sjathin
Copy link
Contributor Author

sjathin commented May 23, 2021

It would to good to discuss, how to show this comparison? A document? And all the metrics that can be used to compare data structures.

Some initial metrics (CRUD):

  1. Access
  2. Search
  3. Insert/delete

@czgdp1807
Copy link
Member

I was wondering if we can compare the common methods of two data structures or algorithms. Technically, what I mean is that comparing two APIs which perform the same function. Say, for example if we want to compare two trees, AVL Tree and Red Black Tree, then there are two classes in pydatastructs for these and we can check the timings and space requirements of these two trees using the proposed feature. Consider the following example.

Say there is a class, pydatastructs.compare(obj1, obj2) and there are two binary trees under consideration, avl_obj = AVLTree() and rb_obj = RedBlackTree(). Then I would call, pydatastructs.compare(avl_obj, rb_obj). Some common methods of avl_obj and rb_obj are insert, delete, search. The above call may return the comparison data in JSON format with ratio of timings for each method of both the objects. Or in fact, the compare function may just accept class/function names as pydatastructs.compare(AVLTree, RedBlackTree) and create objects inside its definition. Moreover, doing something like, pydatastructs.compare(AVLTree, merge_sort) should give a ValueError because these two APIs are non-comparable for the same data.

Let me know what you think.

@czgdp1807 czgdp1807 added enhancement New feature or request performance labels May 24, 2021
@sjathin
Copy link
Contributor Author

sjathin commented May 27, 2021

Yes that's very accurate. I do have few more points that we can talk about:

  1. Comparing different data structure types for instance - pydatastructs.compare(AVLTree, LinkedList) - here as well there are common methods that can be compared, for instance given a set of numbers, what's the time/space to create a tree /list.
  2. I really think we can start with a document/wiki page, so that we can document the design, so that it would be easier to talk about what can be compared and how would we go about this.
  3. Although, I still have a question on how would we show this comparison?
    a. One option when we run the pydatastructs.compare(obj1, obj2) API, show the time and space for each common method?
  4. I think it would be better to add a API like pydatastructs.compare(*objects) so that we can compare multiple data structures

Thoughts on this?

@czgdp1807
Copy link
Member

1. Comparing different data structure types for instance - `pydatastructs.compare(AVLTree, LinkedList)` - here as well there are common methods that can be compared, for instance given a set of numbers, what's the time/space to create a tree /list.

Agreed. In fact that would help in making the APIs consistent.

2\. I really think we can start with a document/wiki page, so that we can document the design, so that it would be easier to talk about what can be compared and how would we go about this.

We can discuss and document the design here in this thread. We can edit the original post with the final design.

3\. One option when we run the `pydatastructs.compare(obj1, obj2)` API, show the time and space for each common method?

Yes. We can show that in JSON format for each data structure. For two arguments the comparison would be more detailed (ratios, which one is how much percent better than the other, etc.) and for multiple ones, we can just sort the data structures with respect to the time and space of each common method. For example if there are 3 common methods between ds1, ds2 and ds3 then, we would receive, 3 x 2 orders for each common method's time and space. Anyways, this is just speculation, once we start implementing it the output format will become more clear.

4\. I think it would be better to add a API like `pydatastructs.compare(*objects)` so that we can compare multiple data structures

Agreed.

@codezonediitj codezonediitj deleted a comment from Rajveer100 Feb 23, 2022
@Abekaesh
Copy link
Contributor

Hi,
I would like to work on this issue under GSOC 2023.

In order to compare data structures, it is essential to give the features or functions which are not in one of the data structures. I think that we can compare the data structures on the basic operations such as insertion, deletion, traversing to get element at a position and extra operations that can be done by one data structure which other can't with the same dataset with its time and space.
For example if we do pydatastructs.compare(obj1, obj2), we need to return a json format output ie, {obj1:{operation1:{time:<>,space:<>}, operation2:{time:<>,space:<>}},obj2:{operation1:{time:<>,space:<>}, operation2:{time:<>,space:<>},operation3:{time:<>,space:<>}}}
In the above output format operation3 in obj2 is another operation or function which obj1 doesnot have. Now this can be of more helpful to analyse the data structures.
It helps the user to analyse the extra functions of data structures with their time and space for their completion.

Any suggestions or thoughts on this?

@czgdp1807
Copy link
Member

I think comparison should be done for the set of operations which are common to both the data structures. If nothing is common then an empty JSON output along with a warning that no operation is common between the desired data structures.

@Ahmed-Gamal24
Copy link

Hi @sjathin

I am interested in working on this issue as part of my contribution to PyDataStructs. I have reviewed the requirements and would like to propose the following approach for implementing the Comparing Data Structures feature.

Proposed Approach

Define Comparison Metrics

To effectively compare data structures, I plan to implement the following key metrics:

  • Time Complexity: Measure the execution time of key operations such as:

    • insert()
    • delete()
    • search()
    • access()

    I will use the timeit module for accurate time measurement.

  • Space Complexity: Measure memory consumption using:

    • sys.getsizeof() for basic object size.
    • tracemalloc module for deeper memory profiling.
  • Scalability & Performance:

    • Run benchmarks on different input sizes (e.g., 1000, 10,000, 100,000 elements).
    • Evaluate performance under different loads (single-threaded vs. multi-threaded execution).

API Design

I propose creating a function-based API for ease of use:

def compare_data_structures(ds1, ds2, operations, num_elements=1000):
    """
    Compare two data structures based on execution time and memory usage.
    
    Parameters:
        ds1, ds2: Data structure instances to compare.
        operations: List of operations to benchmark (e.g., ['insert', 'search']).
        num_elements: Number of elements to test performance with.
    
    Returns:
        A dictionary containing comparison results.
    """

Documentation

I will add:

  • Clear examples of how to use the function in real-world scenarios.
  • An explanation of each metric, what it represents, and how users can interpret the results.

Next Steps

Before starting, I would love to get your feedback on this approach. Do you have any suggestions or additional requirements?
Looking forward to contributing

Best,
Ahmed

Description of the problem

Feature Request: As there are multiple data structures available to solve a particular problem, this feature allows one to choose the most optimized data structure for a particular problem.

Example of the problem

Array vs Stack
  1. Access: O(1) vs O(N)
  2. Insertion: O(N) vs O(1)

References/Other comments

The example given here is very generic and basic, there can be multiple metrics to compare data structures.

@Ahmed-Gamal24
Copy link

Hi @sjathin

I am interested in working on this issue as part of my contribution to PyDataStructs. I have reviewed the requirements and would like to propose the following approach for implementing the Comparing Data Structures feature.

Proposed Approach

Define Comparison Metrics

To effectively compare data structures, I plan to implement the following key metrics:

  • Time Complexity: Measure the execution time of key operations such as:

    • insert()
    • delete()
    • search()
    • access()

    I will use the timeit module for accurate time measurement.

  • Space Complexity: Measure memory consumption using:

    • sys.getsizeof() for basic object size.
    • tracemalloc module for deeper memory profiling.
  • Scalability & Performance:

    • Run benchmarks on different input sizes (e.g., 1000, 10,000, 100,000 elements).
    • Evaluate performance under different loads (single-threaded vs. multi-threaded execution).

API Design

I propose creating a function-based API for ease of use:

def compare_data_structures(ds1, ds2, operations, num_elements=1000):
"""
Compare two data structures based on execution time and memory usage.

Parameters:
    ds1, ds2: Data structure instances to compare.
    operations: List of operations to benchmark (e.g., ['insert', 'search']).
    num_elements: Number of elements to test performance with.

Returns:
    A dictionary containing comparison results.
"""

Documentation

I will add:

  • Clear examples of how to use the function in real-world scenarios.
  • An explanation of each metric, what it represents, and how users can interpret the results.

Next Steps

Before starting, I would love to get your feedback on this approach. Do you have any suggestions or additional requirements? Looking forward to contributing

Best, Ahmed

Description of the problem

Feature Request: As there are multiple data structures available to solve a particular problem, this feature allows one to choose the most optimized data structure for a particular problem.

Example of the problem

Array vs Stack
  1. Access: O(1) vs O(N)
  2. Insertion: O(N) vs O(1)

References/Other comments

The example given here is very generic and basic, there can be multiple metrics to compare data structures.

Hi @sjathin ,

I was thinking about how we could make the Comparing Data Structures feature even more user-friendly. in addition to running comparisons in a script, what if we develop a VS Code extension where users can:

  • Select two data structures (e.g., List vs. Heap).
  • Choose operations to benchmark (insert, search, delete).
  • Enter the number of elements to test performance on.
  • View results directly in a small UI panel inside VS Code.

** Proposed Implementation**

  • Use Node.js + Python to execute comparisons inside VS Code.
  • Use VS Code WebView API to create an interactive UI.
  • Display results in tables or simple charts inside VS Code.

Would love to hear your thoughts on this! Do you think this could be a useful enhancement?

Best,
Ahmad

@khadar1020
Copy link

Hello, @sjathin myself khadar I have done GSOC'24 in Sugar Labs I would love to contribute to this project

Thank you

@Saahi30
Copy link

Saahi30 commented Mar 6, 2025

Hi @czgdp1807, @anutosh491

User @Ahmed-Gamal24 has already laid out a solid approach that serves the purpose well. However, I believe there are a few key areas that could be improved:

Identified Gaps & Proposed Fixes-

No Support for Edge Cases in Data

The current approach doesn't test various data distributions (sorted, reverse-sorted, duplicate-heavy, sparse, dense).

Fix: Introduce a Mutation Testing Mode to evaluate data structures under different conditions.

Limited Metrics for Comparison
Only execution time and memory usage are considered, but real-world applications often require:

  • Throughput (operations per second)
  • Cache performance (locality of reference)
  • Concurrency behavior (lock contention, multi-threading impact)

Fix: Extend metrics to include these additional performance factors.

Additionally, I’d like to highlight my experience in the required tech stack, ensuring I can implement all the functionalities proposed by User @Ahmed-Gamal24 with high quality and robustness.

Would love to hear your thoughts on these additions.

@Saahi30
Copy link

Saahi30 commented Mar 6, 2025

Also I thought of some features around the same problem, I know these features might seem ambitious, but I’d love to get everyone's feedback. If any of these enhancements sound significant and worth exploring, please let me know.

Proposed Advanced Features

Instant Performance Profiling in IDEs

  • VS Code/Jupyter Extension: Run benchmarks inside the editor, view results in a sidebar panel, and get inline data structure optimization suggestions.
  • CLI with Rich Output: ASCII charts, color-coded performance indicators, and progress bars for dynamic tracking.
  • Mutation Testing Mode
  • Modify datasets (sorted, reversed, duplicate-heavy, sparse, dense) to analyze behavior under edge conditions.

Adaptive Benchmarking Mode

  • Dynamically adjust input sizes based on system performance.
  • If execution time exceeds 10 seconds, reduce the sample size automatically.

Visual Performance Reports

  • Interactive Dashboard (Streamlit/Dash) with filters for operation type, data size, and trends.
  • Heatmaps to highlight strengths and weaknesses.
  • Exportable Reports (PNG/CSV) for sharing and documentation.

Looking forward to your thoughts. Let me know if any feature should be explored further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants