Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timer function wrapper and instant timer #90

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Timer function wrapper and instant timer #90

wants to merge 10 commits into from

Conversation

eatpk
Copy link
Collaborator

@eatpk eatpk commented Feb 5, 2024

Details below

return psutil.swap_memory().used


class FunctionTimer:
Copy link
Collaborator Author

@eatpk eatpk Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a wrapper for the particular function:
can be used as the fowlloing

timer = FunctionTimer()

@timer
def some_function_to_time():
    time.sleep(1)
@timer
def another_function_to_time():
    time.sleep(2)


some_function_to_time()
another_function_to_time()

print(timer.get_log())

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add name as an argument for FunctionTimer? For example, we may want to do something like below:

@timer("some")
def some_function_to_time():
    time.sleep(1)

@timer("another")
def another_function_to_time():
    time.sleep(2)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timer("some") # label = "some"
def some_function_to_time():
    time.sleep(1)

@timer # label = "another_function_to_time" # which is equivalent to func.__name__
def another_function_to_time():
    time.sleep(2)

Now it supports both.

return self.log


class Timer:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This timer is used in a way like..

timer= Timer()
timer_name = "some_function"
timer.start_timer(timer_name, host_timer=False)
timer.stop_timer(timer_name)

timer_log = timer.get_info() # returns dictionary for each name.
print(f" elapsed time is {timer_log[timer_name]} miliseconds")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about assigning one timer for each AnaLog instance (or each HookLogger), and print all timer logs in update with a nice-looking format?

Copy link
Collaborator Author

@eatpk eatpk Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am all in for that, but I am bit reluctant to shovel the Timer()s into the current code, since..

  1. The code can get slower if I set timer for every bit.
  2. To solve 1, I would have to put if timer_mode: then time the following approach, so I think this may make the code bit more dirty.(a lot of if else branches)
  3. The resoultion for this is to make the current tests codes to Time, and record in the PR actions.

I would like to ask your opinion on this.. I will follow whatever you suggest!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment! Can you measure the slow down caused by Timer in the MNIST example (w/ and w/o LoRA). I suspect that time.time() would only incur a negligible cost compared to neural network operations. If it's negligible in the MNIST example, which is one of the smallest NN experiments, we can ignore the cost in the larger scale experiments.

Copy link
Collaborator Author

@eatpk eatpk Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING:root:CUDA is not set, setting the timer is set to host timer.
[2024-02-18 23:28:50] [INFO] 1: Linear(in_features=784, out_features=512, bias=False)
[2024-02-18 23:28:50] [INFO] 3: Linear(in_features=512, out_features=256, bias=False)
[2024-02-18 23:28:50] [INFO] 5: Linear(in_features=256, out_features=10, bias=False)
[2024-02-18 23:28:50] [INFO] Tracking the following modules:
[2024-02-18 23:28:50] [INFO] Total number of parameters: 535,040

logging time: 28.088079929351807
Function Timer Logs:
  register_all_module_hooks:
     operation costs 0.0004165172576904297 seconds
  _forward_hook_fn:
     operation costs 0.013933420181274414 seconds
  _grad_hook_fn:
     operation costs 0.0003361701965332031 seconds
  _backward_hook_fn:
     operation costs 0.0050966739654541016 seconds
  buffer_write:
     operation costs 0.09905576705932617 seconds
  flush:
     operation costs 5.245208740234375e-06 seconds
  update:
     operation costs 0.10047507286071777 seconds
  build_log_dataset:
     operation costs 0.0857551097869873 seconds
  build_log_dataloader:
     operation costs 0.08604812622070312 seconds
  precondition:
     operation costs 0.12075304985046387 seconds
  compute_influence_all:
     operation costs 6.2306530475616455 seconds
Computation time: 6.238124847412109
Top influential data indices: [[1281, 5025, 2658, 3210, 5242, 5000, 140, 5811, 3804, 5280]]

This is how it is logged. I find that the logging part is 28 secs and ana part is 6.2 sec.
For the logging part, it seems like it is not coming from update but rather.. from the model training. It was not captured here. I tried other methods, but was not able to get it.

Can you check where else I may but the decorator?

Comment on lines 10 to 24
def get_gpu_memory(device_index=None):
return torch.cuda.memory_allocated(device_index)


def get_gpu_max_memory(device_index=None):
return torch.cuda.max_memory_allocated(device_index)


def get_device_memory():
process = psutil.Process(os.getpid())
return process.memory_info().rss


def get_cpu_swap_memory():
return psutil.swap_memory().used
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For memory, I think this is the best way to live monitor the memory pressure.
Say we do Operation A and want to measure the memory pressure.

If the memories are garbage collected after Operation A, the memory pressure cannot be accurately measured by subtracting after - before. Hence, we just need to print the memory inside the operation A using above functions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to do more research to figure out the best way to profile GPU memory usage. For instance, PyTorch has torch.profiler.profile, and we want to make sure that our implementation has some advantages over it (e.g. less extensive but faster).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty fun! I will split the memory profile with this PR.

@eatpk eatpk requested a review from sangkeun00 February 5, 2024 08:49
@eatpk eatpk force-pushed the timer branch 2 times, most recently from ae01df6 to 1f0c55b Compare February 19, 2024 04:43
requirements.txt Outdated
@@ -2,6 +2,5 @@ numpy
pandas
torch
einops

psutil
Copy link
Collaborator Author

@eatpk eatpk Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can just remove the psutil, since this is for the memory monitoring.. and it seems like the memory monitoring is not really doing anything at this point.(calculating the memory difference of before & after) We may need to do this via profiling as mentioned before

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can remove psutil for now. We can add it back once there is a request for CPU memory tracking from the users.

Copy link
Collaborator

@sangkeun00 sangkeun00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good to me.

Two comments I have are:

  1. Instead of letting users manually print the timer log with FunctionTimer.print_log(), we can let AnaLog automatically print it, for example, in finalize.
  2. You may want to visually show the log in your terminal in the comment of this PR as below:
### TIMER LOG

@eatpk
Copy link
Collaborator Author

eatpk commented Feb 20, 2024

image Will print log like this @sangkeun00

@eatpk eatpk force-pushed the timer branch 2 times, most recently from b7042dc to 2c00270 Compare February 25, 2024 07:55
from torch.profiler import profile, ProfilerActivity


def memory_profiler(func):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addding this to the example codes will be added in the next PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants