Skip to content

Commit

Permalink
Update main with change in memory-optimization (#799)
Browse files Browse the repository at this point in the history
* Fixed testing to run on all feature branches for PRs (#793)

* cleanup time sapce analysis code (#797)

* quick update to feature/memory-optimization for merge to `main` (#802)

* Encode update format (#789)

* Update categorical test

* Fix encode tests to agree with new JSONEncoder

Fix categorical column tests

Fix DateTimeColumn tests

Fix IntColumn tests

Fix NumericStatsMixin tests

Fix OrderColumn tests

Fix BaseColumn tests

* Remove unnecessary cast() in csv_data.py (1) (#796)

Removing this cast() doesn't cause a mypy error.

self._header is set to type Optional[Union[str, int]] in the
constructor. Also, self.guess_header_row() has return type
Optional[int], so casting to int doesn't make sense here.

---------

Co-authored-by: Kshitij Sinha <55467782+kshitijavis@users.noreply.github.com>
Co-authored-by: Junho Lee <53921230+junholee6a@users.noreply.github.com>

* Update feat mem (#803)

* Encode update format (#789)

* Update categorical test

* Fix encode tests to agree with new JSONEncoder

Fix categorical column tests

Fix DateTimeColumn tests

Fix IntColumn tests

Fix NumericStatsMixin tests

Fix OrderColumn tests

Fix BaseColumn tests

* Remove unnecessary cast() in csv_data.py (1) (#796)

Removing this cast() doesn't cause a mypy error.

self._header is set to type Optional[Union[str, int]] in the
constructor. Also, self.guess_header_row() has return type
Optional[int], so casting to int doesn't make sense here.

* Remove unnecessary cast() in csv_data.py (2) (#798)

---------

Co-authored-by: Kshitij Sinha <55467782+kshitijavis@users.noreply.github.com>
Co-authored-by: Junho Lee <53921230+junholee6a@users.noreply.github.com>

---------

Co-authored-by: ksneab7 <91956551+ksneab7@users.noreply.github.com>
Co-authored-by: Kshitij Sinha <55467782+kshitijavis@users.noreply.github.com>
Co-authored-by: Junho Lee <53921230+junholee6a@users.noreply.github.com>
  • Loading branch information
4 people authored Apr 27, 2023
1 parent 23da09d commit d9648fb
Show file tree
Hide file tree
Showing 3 changed files with 15 additions and 11 deletions.
4 changes: 3 additions & 1 deletion .github/workflows/test-python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ name: Test Python Package

on:
pull_request:
branches: [ main ]
branches:
- 'main'
- 'feature/**'

jobs:
build:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,7 @@ def dp_space_time_analysis(
)
print(f"Dataset of size {max(SAMPLE_SIZES)} created.")
else:
full_dataset = dp.Data(DATASET_PATH)
_full_dataset = dp.Data(DATASET_PATH)

dp_space_time_analysis(
_rng,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,20 @@ testing mechanism for both structured and unstructured datasets.

# Structured Dataset Throughput Evaluation

The test script `structured_throughput_testing.py` has been provided to simplify
The test script `structured_space_time_analysis.py` has been provided to simplify
the throughput testing procedure. Simply running the script will provide a
printed output as well as four files and saved to the working directory of where
the script was ran.

* `structured_profile_times.json`: dict of total time, time to merge, and
* `time_analysis/structured_profile_times.json`: dict of total time, time to merge, and
runtimes for each of the profiled functions within the library
* `structured_profile_times.csv`: a flattened table of the above json
* `profile_space_analysis.bin`: a bin file that contains information on the
* `time_analysis/structured_profile_times.csv`: a flattened table of the above json
* `space_analysis/profile_space_analysis_*.bin`: a bin files that contain information on the
spatial analysis of running the dp.Profiler function
* `merge_space_analysis.bin`: a bin file that contains information on the
* `space_analysis/merge_space_analysis_*.bin`: a bin files that contain information on the
spatial analysis of merging two profiles together
* `time_analysis/time_report_*.txt`: a text file that shows the total time taken for
profiling and merging a dataset

Total time and merge time can be used for comparing the overall runtime changes,
whereas the individual function times can detail bottlenecks or speed changes as
Expand All @@ -27,14 +29,14 @@ a result of alterations to a property's calculation.
The spatial analysis `bin` files can be viewed in different report formats with memray.
For example running:
```console
python3 -m memray flamegraph profile_space_analysis.bin -o profile_space_analysis.html
python3 -m memray flamegraph profile_space_analysis*.bin -o profile_space_analysis.html
```
Gives a html formatted flamegraph that displays the distribution of space allocated by
function calls involved in the dp.Profiler

The script can be run as follows:
```console
python structured_throughput_testing.py
python structured_space_time_analysis.py
```

### Tunable parameters
Expand Down Expand Up @@ -108,12 +110,12 @@ data.to_csv('data/time_structured_profiler.csv', index=False)


### Obtaining outputs
- Run `python structured_spect_time_analysis.py`
- Run `python structured_space_time_analysis.py`
- This will output:
- `.bin` files in the `./space_analysis` folder:
- To generate readable flamegraph reports run:
```console
./create_flamegraph.sh
./create_flamegraphs.sh
```
- Text files in the `./time_analysis` folder

Expand Down

0 comments on commit d9648fb

Please sign in to comment.