Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparing Data Structures #401

Open
sjathin opened this issue May 23, 2021 · 6 comments
Open

Comparing Data Structures #401

sjathin opened this issue May 23, 2021 · 6 comments

Comments

@sjathin
Copy link
Contributor

sjathin commented May 23, 2021

Description of the problem

Feature Request: As there are multiple data structures available to solve a particular problem, this feature allows one to choose the most optimized data structure for a particular problem.

Example of the problem

Array vs Stack
  1. Access: O(1) vs O(N)
  2. Insertion: O(N) vs O(1)

References/Other comments

The example given here is very generic and basic, there can be multiple metrics to compare data structures.

@sjathin
Copy link
Contributor Author

sjathin commented May 23, 2021

It would to good to discuss, how to show this comparison? A document? And all the metrics that can be used to compare data structures.

Some initial metrics (CRUD):

  1. Access
  2. Search
  3. Insert/delete

@czgdp1807
Copy link
Member

I was wondering if we can compare the common methods of two data structures or algorithms. Technically, what I mean is that comparing two APIs which perform the same function. Say, for example if we want to compare two trees, AVL Tree and Red Black Tree, then there are two classes in pydatastructs for these and we can check the timings and space requirements of these two trees using the proposed feature. Consider the following example.

Say there is a class, pydatastructs.compare(obj1, obj2) and there are two binary trees under consideration, avl_obj = AVLTree() and rb_obj = RedBlackTree(). Then I would call, pydatastructs.compare(avl_obj, rb_obj). Some common methods of avl_obj and rb_obj are insert, delete, search. The above call may return the comparison data in JSON format with ratio of timings for each method of both the objects. Or in fact, the compare function may just accept class/function names as pydatastructs.compare(AVLTree, RedBlackTree) and create objects inside its definition. Moreover, doing something like, pydatastructs.compare(AVLTree, merge_sort) should give a ValueError because these two APIs are non-comparable for the same data.

Let me know what you think.

@czgdp1807 czgdp1807 added enhancement New feature or request performance labels May 24, 2021
@sjathin
Copy link
Contributor Author

sjathin commented May 27, 2021

Yes that's very accurate. I do have few more points that we can talk about:

  1. Comparing different data structure types for instance - pydatastructs.compare(AVLTree, LinkedList) - here as well there are common methods that can be compared, for instance given a set of numbers, what's the time/space to create a tree /list.
  2. I really think we can start with a document/wiki page, so that we can document the design, so that it would be easier to talk about what can be compared and how would we go about this.
  3. Although, I still have a question on how would we show this comparison?
    a. One option when we run the pydatastructs.compare(obj1, obj2) API, show the time and space for each common method?
  4. I think it would be better to add a API like pydatastructs.compare(*objects) so that we can compare multiple data structures

Thoughts on this?

@czgdp1807
Copy link
Member

1. Comparing different data structure types for instance - `pydatastructs.compare(AVLTree, LinkedList)` - here as well there are common methods that can be compared, for instance given a set of numbers, what's the time/space to create a tree /list.

Agreed. In fact that would help in making the APIs consistent.

2\. I really think we can start with a document/wiki page, so that we can document the design, so that it would be easier to talk about what can be compared and how would we go about this.

We can discuss and document the design here in this thread. We can edit the original post with the final design.

3\. One option when we run the `pydatastructs.compare(obj1, obj2)` API, show the time and space for each common method?

Yes. We can show that in JSON format for each data structure. For two arguments the comparison would be more detailed (ratios, which one is how much percent better than the other, etc.) and for multiple ones, we can just sort the data structures with respect to the time and space of each common method. For example if there are 3 common methods between ds1, ds2 and ds3 then, we would receive, 3 x 2 orders for each common method's time and space. Anyways, this is just speculation, once we start implementing it the output format will become more clear.

4\. I think it would be better to add a API like `pydatastructs.compare(*objects)` so that we can compare multiple data structures

Agreed.

@codezonediitj codezonediitj deleted a comment from Rajveer100 Feb 23, 2022
@Abekaesh
Copy link
Contributor

Hi,
I would like to work on this issue under GSOC 2023.

In order to compare data structures, it is essential to give the features or functions which are not in one of the data structures. I think that we can compare the data structures on the basic operations such as insertion, deletion, traversing to get element at a position and extra operations that can be done by one data structure which other can't with the same dataset with its time and space.
For example if we do pydatastructs.compare(obj1, obj2), we need to return a json format output ie, {obj1:{operation1:{time:<>,space:<>}, operation2:{time:<>,space:<>}},obj2:{operation1:{time:<>,space:<>}, operation2:{time:<>,space:<>},operation3:{time:<>,space:<>}}}
In the above output format operation3 in obj2 is another operation or function which obj1 doesnot have. Now this can be of more helpful to analyse the data structures.
It helps the user to analyse the extra functions of data structures with their time and space for their completion.

Any suggestions or thoughts on this?

@czgdp1807
Copy link
Member

I think comparison should be done for the set of operations which are common to both the data structures. If nothing is common then an empty JSON output along with a warning that no operation is common between the desired data structures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants