Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exponentially backs off retry attempts for sending usage data #117955

Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
291 changes: 187 additions & 104 deletions src/plugins/telemetry/public/services/telemetry_sender.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -127,67 +127,103 @@ describe('TelemetrySender', () => {
expect(telemetryService.getIsOptedIn).toBeCalledTimes(0);
expect(shouldSendReport).toBe(false);
});
});
describe('sendIfDue', () => {
let originalFetch: typeof window['fetch'];
let mockFetch: jest.Mock<typeof window['fetch']>;

describe('sendIfDue', () => {
let originalFetch: typeof window['fetch'];
let mockFetch: jest.Mock<typeof window['fetch']>;
beforeAll(() => {
originalFetch = window.fetch;
});

beforeAll(() => {
originalFetch = window.fetch;
});
beforeEach(() => (window.fetch = mockFetch = jest.fn()));
afterAll(() => (window.fetch = originalFetch));

beforeEach(() => (window.fetch = mockFetch = jest.fn()));
afterAll(() => (window.fetch = originalFetch));
it('does not send if shouldSendReport returns false', async () => {
const telemetryService = mockTelemetryService();
const telemetrySender = new TelemetrySender(telemetryService);
telemetrySender['shouldSendReport'] = jest.fn().mockReturnValue(false);
telemetrySender['retryCount'] = 0;
await telemetrySender['sendIfDue']();
Comment on lines +143 to +147
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Thinking out loud, not related to the PR as it was already present) This is a code smell of bad isolation. Proper unit testing should never require to mock the blackboxed component.

e.g shouldSendReport mocking could be replaced with telemetryService.canSendTelemetry mocking (which would be the correct way to test the method's behavior btw). Here, we need to mock some of the TelemetrySender's methods to test the behavior of other methods of the class, which is not unit testing the class.


it('does not send if already sending', async () => {
const telemetryService = mockTelemetryService();
const telemetrySender = new TelemetrySender(telemetryService);
telemetrySender['shouldSendReport'] = jest.fn();
telemetrySender['isSending'] = true;
await telemetrySender['sendIfDue']();
expect(telemetrySender['shouldSendReport']).toBeCalledTimes(1);
expect(mockFetch).toBeCalledTimes(0);
});

expect(telemetrySender['shouldSendReport']).toBeCalledTimes(0);
expect(mockFetch).toBeCalledTimes(0);
});
it('does not send if we are in screenshot mode', async () => {
const telemetryService = mockTelemetryService({ isScreenshotMode: true });
const telemetrySender = new TelemetrySender(telemetryService);
await telemetrySender['sendIfDue']();

it('does not send if shouldSendReport returns false', async () => {
const telemetryService = mockTelemetryService();
const telemetrySender = new TelemetrySender(telemetryService);
telemetrySender['shouldSendReport'] = jest.fn().mockReturnValue(false);
telemetrySender['isSending'] = false;
await telemetrySender['sendIfDue']();
expect(mockFetch).toBeCalledTimes(0);
});

expect(telemetrySender['shouldSendReport']).toBeCalledTimes(1);
expect(mockFetch).toBeCalledTimes(0);
});
it('updates last lastReported and calls saveToBrowser', async () => {
const lastReported = Date.now() - (REPORT_INTERVAL_MS + 1000);

it('does not send if we are in screenshot mode', async () => {
const telemetryService = mockTelemetryService({ isScreenshotMode: true });
const telemetrySender = new TelemetrySender(telemetryService);
telemetrySender['isSending'] = false;
await telemetrySender['sendIfDue']();
const telemetryService = mockTelemetryService();
const telemetrySender = new TelemetrySender(telemetryService);
telemetrySender['shouldSendReport'] = jest.fn().mockReturnValue(true);
telemetrySender['sendUsageData'] = jest.fn().mockReturnValue(true);
telemetrySender['saveToBrowser'] = jest.fn();
telemetrySender['lastReported'] = `${lastReported}`;

expect(mockFetch).toBeCalledTimes(0);
});
await telemetrySender['sendIfDue']();

expect(telemetrySender['lastReported']).not.toBe(lastReported);
expect(telemetrySender['saveToBrowser']).toBeCalledTimes(1);
expect(telemetrySender['retryCount']).toEqual(0);
expect(telemetrySender['sendUsageData']).toHaveBeenCalledTimes(1);
});

it('sends report if due', async () => {
const mockClusterUuid = 'mk_uuid';
const mockTelemetryUrl = 'telemetry_cluster_url';
const mockTelemetryPayload = [
{ clusterUuid: mockClusterUuid, stats: 'hashed_cluster_usage_data1' },
];

const telemetryService = mockTelemetryService();
const telemetrySender = new TelemetrySender(telemetryService);
telemetryService.getTelemetryUrl = jest.fn().mockReturnValue(mockTelemetryUrl);
telemetryService.fetchTelemetry = jest.fn().mockReturnValue(mockTelemetryPayload);
telemetrySender['shouldSendReport'] = jest.fn().mockReturnValue(true);
telemetrySender['isSending'] = false;
await telemetrySender['sendIfDue']();

expect(telemetryService.fetchTelemetry).toBeCalledTimes(1);
expect(mockFetch).toBeCalledTimes(1);
expect(mockFetch.mock.calls[0]).toMatchInlineSnapshot(`
it('resets the retry counter when report is due', async () => {
const telemetryService = mockTelemetryService();
const telemetrySender = new TelemetrySender(telemetryService);
telemetrySender['shouldSendReport'] = jest.fn().mockReturnValue(true);
telemetrySender['sendUsageData'] = jest.fn();
telemetrySender['saveToBrowser'] = jest.fn();
telemetrySender['retryCount'] = 9;

await telemetrySender['sendIfDue']();
expect(telemetrySender['retryCount']).toEqual(0);
expect(telemetrySender['sendUsageData']).toHaveBeenCalledTimes(1);
});
});

describe('sendUsageData', () => {
let originalFetch: typeof window['fetch'];
let mockFetch: jest.Mock<typeof window['fetch']>;

beforeAll(() => {
originalFetch = window.fetch;
});

beforeEach(() => {
window.fetch = mockFetch = jest.fn();
jest.useFakeTimers();
});
afterAll(() => {
window.fetch = originalFetch;
jest.useRealTimers();
});
it('sends the report', async () => {
const mockClusterUuid = 'mk_uuid';
const mockTelemetryUrl = 'telemetry_cluster_url';
const mockTelemetryPayload = [
{ clusterUuid: mockClusterUuid, stats: 'hashed_cluster_usage_data1' },
];

const telemetryService = mockTelemetryService();
const telemetrySender = new TelemetrySender(telemetryService);
telemetryService.getTelemetryUrl = jest.fn().mockReturnValue(mockTelemetryUrl);
telemetryService.fetchTelemetry = jest.fn().mockReturnValue(mockTelemetryPayload);
telemetrySender['shouldSendReport'] = jest.fn().mockReturnValue(true);

await telemetrySender['sendUsageData']();

expect(telemetryService.fetchTelemetry).toBeCalledTimes(1);
expect(mockFetch).toBeCalledTimes(1);
expect(mockFetch.mock.calls[0]).toMatchInlineSnapshot(`
Array [
"telemetry_cluster_url",
Object {
Expand All @@ -202,73 +238,120 @@ describe('TelemetrySender', () => {
},
]
`);
});
});
it('sends report separately for every cluster', async () => {
// this now all needs to go into a new test for `sendUsageData`
TinaHeiligers marked this conversation as resolved.
Show resolved Hide resolved
const mockTelemetryUrl = 'telemetry_cluster_url';
const mockTelemetryPayload = ['hashed_cluster_usage_data1', 'hashed_cluster_usage_data2'];

it('sends report separately for every cluster', async () => {
const mockTelemetryUrl = 'telemetry_cluster_url';
const mockTelemetryPayload = ['hashed_cluster_usage_data1', 'hashed_cluster_usage_data2'];
const telemetryService = mockTelemetryService();
const telemetrySender = new TelemetrySender(telemetryService);
telemetryService.getTelemetryUrl = jest.fn().mockReturnValue(mockTelemetryUrl);
telemetryService.fetchTelemetry = jest.fn().mockReturnValue(mockTelemetryPayload);
telemetrySender['shouldSendReport'] = jest.fn().mockReturnValue(true);
await telemetrySender['sendIfDue']();

const telemetryService = mockTelemetryService();
const telemetrySender = new TelemetrySender(telemetryService);
telemetryService.getTelemetryUrl = jest.fn().mockReturnValue(mockTelemetryUrl);
telemetryService.fetchTelemetry = jest.fn().mockReturnValue(mockTelemetryPayload);
telemetrySender['shouldSendReport'] = jest.fn().mockReturnValue(true);
telemetrySender['isSending'] = false;
await telemetrySender['sendIfDue']();
expect(telemetryService.fetchTelemetry).toBeCalledTimes(1);
expect(mockFetch).toBeCalledTimes(2);
});

expect(telemetryService.fetchTelemetry).toBeCalledTimes(1);
expect(mockFetch).toBeCalledTimes(2);
});
it('does not increase the retry counter on successful send', async () => {
// this now all needs to go into a new test for `sendUsageData`
TinaHeiligers marked this conversation as resolved.
Show resolved Hide resolved
const mockTelemetryUrl = 'telemetry_cluster_url';
const mockTelemetryPayload = ['hashed_cluster_usage_data1'];

it('updates last lastReported and calls saveToBrowser', async () => {
const mockTelemetryUrl = 'telemetry_cluster_url';
const mockTelemetryPayload = ['hashed_cluster_usage_data1'];
const telemetryService = mockTelemetryService();
const telemetrySender = new TelemetrySender(telemetryService);
telemetryService.getTelemetryUrl = jest.fn().mockReturnValue(mockTelemetryUrl);
telemetryService.fetchTelemetry = jest.fn().mockReturnValue(mockTelemetryPayload);
telemetrySender['shouldSendReport'] = jest.fn().mockReturnValue(true);
telemetrySender['saveToBrowser'] = jest.fn();

const telemetryService = mockTelemetryService();
const telemetrySender = new TelemetrySender(telemetryService);
telemetryService.getTelemetryUrl = jest.fn().mockReturnValue(mockTelemetryUrl);
telemetryService.fetchTelemetry = jest.fn().mockReturnValue(mockTelemetryPayload);
telemetrySender['shouldSendReport'] = jest.fn().mockReturnValue(true);
telemetrySender['saveToBrowser'] = jest.fn();
await telemetrySender['sendUsageData']();

await telemetrySender['sendIfDue']();
expect(mockFetch).toBeCalledTimes(1);
expect(telemetrySender['retryCount']).toBe(0);
});

expect(mockFetch).toBeCalledTimes(1);
expect(telemetrySender['lastReported']).toBeDefined();
expect(telemetrySender['saveToBrowser']).toBeCalledTimes(1);
expect(telemetrySender['isSending']).toBe(false);
it('catches fetchTelemetry errors and retries again', async () => {
const telemetryService = mockTelemetryService();
const telemetrySender = new TelemetrySender(telemetryService);
telemetryService.getTelemetryUrl = jest.fn();
telemetryService.fetchTelemetry = jest.fn().mockImplementation(() => {
throw Error('Error fetching usage');
});
await telemetrySender['sendUsageData']();
expect(telemetryService.fetchTelemetry).toBeCalledTimes(1);
expect(telemetrySender['retryCount']).toBe(1);
expect(setTimeout).toBeCalledTimes(1);
expect(setTimeout).toBeCalledWith(telemetrySender['sendUsageData'], 120000);
});

it('catches fetchTelemetry errors and sets isSending to false', async () => {
const telemetryService = mockTelemetryService();
const telemetrySender = new TelemetrySender(telemetryService);
telemetryService.getTelemetryUrl = jest.fn();
telemetryService.fetchTelemetry = jest.fn().mockImplementation(() => {
throw Error('Error fetching usage');
});
await telemetrySender['sendIfDue']();
expect(telemetryService.fetchTelemetry).toBeCalledTimes(1);
expect(telemetrySender['lastReported']).toBeUndefined();
expect(telemetrySender['isSending']).toBe(false);
it('catches fetch errors and sets a new timeout if fetch fails more than once', async () => {
const mockTelemetryPayload = ['hashed_cluster_usage_data1', 'hashed_cluster_usage_data2'];
const telemetryService = mockTelemetryService();
const telemetrySender = new TelemetrySender(telemetryService);
telemetryService.getTelemetryUrl = jest.fn();
telemetryService.fetchTelemetry = jest.fn().mockReturnValue(mockTelemetryPayload);
mockFetch.mockImplementation(() => {
throw Error('Error sending usage');
});
telemetrySender['retryCount'] = 3;
await telemetrySender['sendUsageData']();

expect(telemetryService.fetchTelemetry).toBeCalledTimes(1);
expect(mockFetch).toBeCalledTimes(2);
expect(telemetrySender['retryCount']).toBe(4);
expect(setTimeout).toBeCalledWith(telemetrySender['sendUsageData'], 960000);

it('catches fetch errors and sets isSending to false', async () => {
const mockTelemetryPayload = ['hashed_cluster_usage_data1', 'hashed_cluster_usage_data2'];
const telemetryService = mockTelemetryService();
const telemetrySender = new TelemetrySender(telemetryService);
telemetryService.getTelemetryUrl = jest.fn();
telemetryService.fetchTelemetry = jest.fn().mockReturnValue(mockTelemetryPayload);
mockFetch.mockImplementation(() => {
throw Error('Error sending usage');
});
await telemetrySender['sendIfDue']();
expect(telemetryService.fetchTelemetry).toBeCalledTimes(1);
expect(mockFetch).toBeCalledTimes(2);
expect(telemetrySender['lastReported']).toBeUndefined();
expect(telemetrySender['isSending']).toBe(false);
await telemetrySender['sendUsageData']();
expect(telemetrySender['retryCount']).toBe(5);
expect(setTimeout).toBeCalledWith(telemetrySender['sendUsageData'], 1920000);
});

it('stops trying to resend the data after 20 retries', async () => {
const telemetryService = mockTelemetryService();
const telemetrySender = new TelemetrySender(telemetryService);
telemetryService.getTelemetryUrl = jest.fn();
telemetryService.fetchTelemetry = jest.fn().mockImplementation(() => {
throw Error('Error fetching usage');
});
telemetrySender['retryCount'] = 21;
await telemetrySender['sendUsageData']();
expect(setTimeout).not.toBeCalled();
});
});

describe('delay Retry Send', () => {
beforeEach(() => jest.useFakeTimers());
afterAll(() => jest.useRealTimers());

it('sets a minimum retry delay of 60 seconds', () => {
const telemetryService = mockTelemetryService();
const telemetrySender = new TelemetrySender(telemetryService);
expect(telemetrySender['retryCount']).toBe(0);
expect(telemetrySender['delayRetrySend']()).toBe(60000);
});

it('changes the retry delay depending on the retry count', () => {
const telemetryService = mockTelemetryService();
const telemetrySender = new TelemetrySender(telemetryService);
telemetrySender['retryCount'] = 3;
expect(telemetrySender['delayRetrySend']()).toBe(480000);
telemetrySender['retryCount'] = 5;
expect(telemetrySender['delayRetrySend']()).toBe(1920000);
});

it('sets a maximum retry delay of 64 min', () => {
const telemetryService = mockTelemetryService();
const telemetrySender = new TelemetrySender(telemetryService);
telemetrySender['retryCount'] = 8;
expect(telemetrySender['delayRetrySend']()).toBe(3840000);
telemetrySender['retryCount'] = 10;
expect(telemetrySender['delayRetrySend']()).toBe(3840000);
});
});

describe('startChecking', () => {
beforeEach(() => jest.useFakeTimers());
afterAll(() => jest.useRealTimers());
Expand Down
32 changes: 21 additions & 11 deletions src/plugins/telemetry/public/services/telemetry_sender.ts
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,10 @@ import type { EncryptedTelemetryPayload } from '../../common/types';

export class TelemetrySender {
private readonly telemetryService: TelemetryService;
private isSending: boolean = false;
private lastReported?: string;
private readonly storage: Storage;
private intervalId?: number;
private intervalId: number = 0; // setInterval returns a positive integer, 0 means no interval is set
private retryCount: number = 0;

constructor(telemetryService: TelemetryService) {
this.telemetryService = telemetryService;
Expand Down Expand Up @@ -53,13 +53,22 @@ export class TelemetrySender {
return false;
};

private delayRetrySend = () => {
return 60 * (1000 * Math.min(Math.pow(2, this.retryCount), 64)); // 120s, 240s, 480s, 960s, 1920s, 3840s, 3840s, 3840s
Copy link
Contributor Author

@TinaHeiligers TinaHeiligers Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be written as 60000 * Math.min(.....) but writing it this way makes it easier to read at a glance.

We limit the backoff period to just over once per hour.

};

private sendIfDue = async (): Promise<void> => {
if (this.isSending || !this.shouldSendReport()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the removal of isSending, could we add unit tests where sendIfDue is called multiple time in a row without delay just to be sure that shouldSendReport properly handles it (aka sendUsageData should only be called once)

Copy link
Contributor Author

@TinaHeiligers TinaHeiligers Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sendIfDue is private and we’re the only ones calling the method. It will never be called without a delay.

if (!this.shouldSendReport()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exit early if and only if the next report isn't due to be sent. This backs-off on the once-per minute fetch and send request calls we used to make.

return;
}
// optimistically update the report date and reset the retry counter for a new time report interval window
this.lastReported = `${Date.now()}`;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a behavior change here where we're now updating the report date before sending the data vs only after a successful send. We need to do this to ensure we're backing off the retry attempts.

this.saveToBrowser();
this.retryCount = 0;
await this.sendUsageData();
};

// mark that we are working so future requests are ignored until we're done
this.isSending = true;
private sendUsageData = async (): Promise<void> => {
try {
const telemetryUrl = this.telemetryService.getTelemetryUrl();
const telemetryPayload: EncryptedTelemetryPayload =
Expand All @@ -80,17 +89,18 @@ export class TelemetrySender {
})
)
);
this.lastReported = `${Date.now()}`;
this.saveToBrowser();
} catch (err) {
// ignore err
} finally {
this.isSending = false;
// ignore err and try again but after a longer wait period.
this.retryCount = this.retryCount + 1;
if (this.retryCount < 20) {
// try once again at 60s, then exponentially backoff the time between subsequent retries
TinaHeiligers marked this conversation as resolved.
Show resolved Hide resolved
window.setTimeout(this.sendUsageData, this.delayRetrySend());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT/optional: delayRetrySend can be extracted as a static function getRetryDelay(retryCount: number)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent suggestion, thank you! It also made the testing much easier.

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: maybe we want to add else { console.log(something) } to at least surface the error in the console logs? I feel like it's better than just trapping the error.

Copy link
Contributor Author

@TinaHeiligers TinaHeiligers Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
};

public startChecking = () => {
if (typeof this.intervalId === 'undefined') {
if (this.intervalId === 0) {
this.intervalId = window.setInterval(this.sendIfDue, 60000);
}
};
Expand Down