AMLII-933 - Linux log file tailing e2e test #20202

DDuongNguyen · 2023-10-17T18:26:46Z

What does this PR do?

This PR introduces a new end-to-end (e2e) test to verify the agent's capability to tail log files on a Linux OS. This serves as a step in our effort to shift our quality assurance testing towards more comprehensive e2e testing.

Motivation

This PR purpose is to ensure robustness in the agent's log-tailing feature, especially in various edge cases like file permission changes and log rotations in a Linux environment.
By converting this QA test to an e2e test, we aim to catch potential issues early in the development cycle in an automated way.

pducolin

Some changes required (gitlab rule, DevMode in the CI), and some suggestions.

While reviewing @pgimalac found an interesting case that generates empty logs payloads, happening when writing multiple lines of text to the monitored log file.

.gitlab/e2e.yml

test/new-e2e/tests/agent-metric-logs/linux-log-agent/01_file_tailing_test.go

pducolin · 2023-10-19T10:10:29Z

test/new-e2e/tests/agent-metric-logs/linux-log-agent/01_file_tailing_test.go

+
+	// Check if agent is tailing new log file via agent status
+	s.EventuallyWithT(func(c *assert.CollectT) {
+		newStatusOutput, err := s.Env().VM.ExecuteWithError("sudo datadog-agent status | grep -A 10 'custom_logs'")


💬 suggestion
You could use the stack definition with VM, fakeintake and agent client, to help running status command - the gain is tiny

Is there any built-in support to query the status object before its turned into template text?
Ie, instead of using agent status to query, query the status HTTP endpoint directly and expose the returned JSON?

I'm hesitant to rely too much on agent status for tests in general, but it would be less error-prone if we're querying and asserting on a specific json field vs assert.Contains

We could move this implementation by @KevinFairise2 to common helpers, it executes the status command on the agent client (s.Env().Agent) and parses it using json library

Agreed you can skip status, except if you want to test status, and check fakeintake directly

pducolin · 2023-10-19T10:15:28Z

test/new-e2e/tests/agent-metric-logs/linux-log-agent/file_tailing_utils_test.go

+func generateLog(s *vmFakeintakeSuite, t *testing.T, content string) {
+	// Determine the OS and set the appropriate log path and command.
+	var logPath, cmd, checkCmd string
+
+	osType, err := s.getOSType()
+	require.NoErrorf(t, err, "Failed to get OS type: %s", err)
+
+	switch osType {
+	case "windows":
+		t.Log("Generating Windows log.")
+		logPath = "C:\\logs\\hello-world.log"
+		cmd = fmt.Sprintf("echo %s > %s", strings.Repeat(content, 10), logPath)
+		checkCmd = fmt.Sprintf("Get-Content %s", logPath)
+	default: // Assuming Linux if not Windows.
+		t.Log("Generating Linux log.")
+		logPath = "/var/log/hello-world.log"
+		cmd = fmt.Sprintf("echo %s > %s", strings.Repeat(content, 10), logPath)
+		checkCmd = fmt.Sprintf("cat %s", logPath)
+	}


👏 praise
Nice !

test/new-e2e/tests/agent-metric-logs/linux-log-agent/file_tailing_utils_test.go

test/new-e2e/tests/agent-metric-logs/linux-log-agent/01_file_tailing_test.go

L3n41c

Apart from the other comments that I left, I think that EventuallyWithT(func(c *assert.CollectT) {…}, …) is misused.

Inside the inner function, we call require.XXX(t, …) which makes the test fail immediately without any retry.
In order to benefit from the EventuallyWithT retry mechanism, the errors need to be put on the c *assert.CollectT parameter instead.

Could you also add an entry in .github/CODEOWNERS for the test/new-e2e/tests/agent-metric-logs directory ? Here:

datadog-agent/.github/CODEOWNERS

Lines 468 to 477 in dcbe0ab

    
           /test/new-e2e/                          @DataDog/agent-e2e-testing 
        
           /test/new-e2e/language-detection        @DataDog/processes 
        
           /test/new-e2e/system-probe              @DataDog/ebpf-platform 
        
           /test/new-e2e/scenarios/system-probe    @DataDog/ebpf-platform 
        
           /test/new-e2e/tests/agent-subcommands   @DataDog/agent-shared-components 
        
           /test/new-e2e/tests/containers          @DataDog/container-integrations 
        
           /test/new-e2e/tests/language-detection  @DataDog/processes 
        
           /test/new-e2e/tests/ndm                 @DataDog/network-device-monitoring 
        
           /test/new-e2e/tests/npm                 @DataDog/Networks 
        
           /test/new-e2e/tests/agent-platform      @DataDog/agent-platform

.gitlab/e2e.yml

go.mod

test/new-e2e/pkg/utils/e2e/client/vm_client.go

test/new-e2e/tests/agent-metric-logs/linux-log-agent/file_tailing_utils_test.go

test/new-e2e/tests/agent-metric-logs/linux-log-agent/01_file_tailing_test.go

test/new-e2e/tests/agent-metric-logs/linux-log-agent/file_tailing_utils_test.go

go.mod

test/new-e2e/pkg/utils/e2e/client/vm.go

test/new-e2e/tests/agent-metric-logs/linux-log-agent/01_file_tailing_test.go

pducolin · 2023-11-08T15:37:30Z

test/new-e2e/tests/agent-metric-logs/linux-log-agent/01_file_tailing_test.go

+	s.Env().VM.Execute("sudo chmod 777 /var/log/hello-world.log")
+
+	// Part 6: Restart the agent, generate new logs
+	s.Env().VM.Execute("sudo service datadog-agent restart")


Interesting, is this an expected feature ?

pducolin · 2023-11-08T15:38:20Z

test/new-e2e/tests/agent-metric-logs/linux-log-agent/01_file_tailing_test.go

+	s.Env().VM.Execute("sudo chmod 777 /var/log/hello-world.log")
+
+	// Part 6: Restart the agent, generate new logs
+	s.Env().VM.Execute("sudo service datadog-agent restart")


After this you could add a call to wait for the agent to be ready, there should be an helper on the agent client

test/new-e2e/tests/agent-metric-logs/linux-log-agent/01_file_tailing_test.go

scottopell · 2023-12-01T15:36:08Z

test/new-e2e/tests/agent-metric-logs/linux-log-agent/01_file_tailing_test.go

+			cat, _ := s.Env().VM.ExecuteWithError("cat /var/log/hello-world.log")
+			t.Logf("Logs detected when none were expected: %v", cat)
+		}
+	}, 5*time.Minute, 10*time.Second)


Ah, got it, thanks for the explanation, I had misunderstood when it retries.

This makes sense to me then, in the normal case this should run once, find that neither of the asserts fails, and proceed with the rest of the test.

scottopell · 2023-12-01T15:38:14Z

test/new-e2e/tests/agent-metric-logs/linux-log-agent/01_file_tailing_test.go

+
+	// Part 1: Ensure no logs are present in fakeintake
+	s.EventuallyWithT(func(c *assert.CollectT) {
+		logs, err := fakeintake.FilterLogs("hello")


This filters for any logs from the service 'hello'. Should we simplify this and just assert that we don't find the log we emit below?
My thought is that by being more specific we would reduce any chances of false-positives.

So currently we do have different log content for different scenarios. Mainly:
hello-world
access-denied
access-granted
hello-world-new-content

We could check for all those 4 specific logs but i feel like that is less efficient vs just checking for the intake for all those logs sent via the same service tag. What do you think?

If that is the case, then this assertion should be part of the parent suite as a precondition to running the tests.

test/new-e2e/tests/agent-metric-logs/linux-log-agent/01_file_tailing_test.go

scottopell · 2023-12-01T15:43:18Z

test/new-e2e/tests/agent-metric-logs/linux-log-agent/01_file_tailing_test.go

+
+func (s *LinuxVMFakeintakeSuite) LogPermission() {
+
+	// Part 4: Block permission and check the Agent status


Do we not get a fresh filesystem in this test? If this is relying on /var/log/hello-world.log to exist from the previous test, we should make that dependency explicit in some way. Maybe we assert on the files existence?

Right now if this is just relying on the ordering of the cases in TestLinuxLogTailing, that is very subtle and feels easy to break in the future.

we technically check for the existence of the log file via error from the chmod and generate log since both will generate errors if the log file does not exist. I can put a sanity check at the beginning of the function to see if the log file exist

test/new-e2e/tests/agent-metric-logs/linux-log-agent/01_file_tailing_test.go

scottopell · 2023-12-01T15:47:18Z

test/new-e2e/tests/agent-metric-logs/linux-log-agent/file_tailing_utils_test.go

+		os = "linux"
+		t.Log("Generating Linux log.")
+		logPath = "/var/log/hello-world.log"
+		cmd = fmt.Sprintf("echo %s > %s", strings.Repeat(content, 10), logPath)


Why do we repeat the 'content' 10 times? This is slightly surprising to me as a user of generateLog, I would expect generateLog to write exactly the log that I pass as an argument.

I'm thinking of in the future if we have tests that require to send big payload. So for now I'll just pass in a parameter that allow users to repeat contents X amount of time?

scottopell · 2023-12-01T16:07:31Z

test/new-e2e/tests/agent-metric-logs/linux-log-agent/file_tailing_utils_test.go

+			assert.NoErrorf(c, err, "Error found: %s", err)
+			assert.NotEmpty(c, logs, "Expected at least 1 log with content: '%s', but received %v logs.", content, logs)
+		}
+	}, 5*time.Minute, 10*time.Second)


5 minutes seems like an awfully long time to wait for logs to show up. Do we have any idea how long this normally takes? I'd expect it to be a few seconds at most for the log to go through the agent and make it to fake-intake.

From my testing I don't really have a good way to predict how long it would take. Overtime i think the intake does become faster but at the end of the day this might relies on the connection and performance between Pulumi and AWS. For now ill adjust it to 2 mins timeout (which i think is pretty safe so far base on testing) and we can readjust it in the future

scottopell · 2023-12-01T16:08:17Z

test/new-e2e/tests/agent-metric-logs/linux-log-agent/file_tailing_utils_test.go

+
+}
+
+// cleanUp cleans up any existing log files (only useful when running dev mode/local runs).


why is this only useful in dev mode? It looks like we currently run this after every test, should we only run it after every test if devMode == true?

Overall since the log files will get removed outside of devmode, ill adjust the cleanup script so that it will only remove the files if dev mode is enabled. As for the intake, i think its necessary for the intake to be cleaned up after every test

…ent into e2e-linux-log-tailing

scottopell · 2023-12-05T19:57:44Z

test/new-e2e/tests/agent-metric-logs/linux-log-agent/01_file_tailing_test.go

+
+	// Part 1: Ensure no logs are present in fakeintake
+	s.EventuallyWithT(func(c *assert.CollectT) {
+		logs, err := fakeintake.FilterLogs("hello")


If that is the case, then this assertion should be part of the parent suite as a precondition to running the tests.

test/new-e2e/tests/agent-metric-logs/linux-log-agent/01_file_tailing_test.go

scottopell · 2023-12-05T20:01:59Z

test/new-e2e/tests/agent-metric-logs/linux-log-agent/01_file_tailing_test.go

+		generateLog(s, "hello-after-permission-world", 1)
+	}
+
+	// Grant log file permission and check intake for new logs


I'm not sure if I understand the motivation for the last two steps. Could you restate what the goal of this test is?

I was adjusting the test to this comment:

It would be nice to see two versions of this test, one where the log is generated before the file permissions are fixed and one that is generated after the file permissions are fixed. If we add the second variant, we would test that the Agent is correctly reading from the beginning of the file once it becomes accessible.

But let me know if it is too vague or if i misunderstood something

Got it, I can see that now. Lets move this scenario to the LogPermission test case. My thought here is that its much clearer that LogPermission is testing scenarios related to file permissions, whereas this test appears intended to test just the basic logging functionality.

This ties into my above comment about potentially factoring the "create an accessible log file" code into your utility so that the tests that aren't focused on permissions don't have to deal with permissions.

test/new-e2e/tests/agent-metric-logs/linux-log-agent/01_file_tailing_test.go

…ent into e2e-linux-log-tailing

scottopell

Looking better! My only blocking comments are the comments I have on better documenting the goal of each test case. I want the intent/goal of each test case to be crystal clear to Future Engineer who looks at this in a few months without any context.

The rest of these are just for your knowledge to improve future tests.

test/new-e2e/tests/agent-metric-logs/linux-log-agent/01_file_tailing_test.go

scottopell · 2023-12-07T16:39:12Z

test/new-e2e/tests/agent-metric-logs/linux-log-agent/01_file_tailing_test.go

+	// Check intake for new logs
+	checkLogs(s, "hello", "hello-before-permission-world", true)
+
+	//  Adjust permissions of new log file after log generation


should this comment line be combined into the below line? OR is this a stale comment?

test/new-e2e/tests/agent-metric-logs/linux-log-agent/01_file_tailing_test.go

scottopell · 2023-12-07T16:49:37Z

test/new-e2e/tests/agent-metric-logs/linux-log-agent/01_file_tailing_test.go

+	t := s.T()
+	// fakeintake := s.Env().Fakeintake
+
+	// Create a new log file with permissionn inaccessible to the agent


From here until the call to appendLog, we're creating a file and granting read permission to world. This is not really the "interesting" part of the test, but there are a lot of lines of code dedicated to this. It would be nice if this "create a log file accessible to the agent" were factored out into a helper function, this would make this test read much more easily.

This is not a blocking comment, feel free to leave this as-is for now and consider this for a future refactor.

scottopell · 2023-12-07T16:57:02Z

test/new-e2e/tests/agent-metric-logs/linux-log-agent/01_file_tailing_test.go

+	s.Env().VM.Execute("sudo chmod 777 /var/log/hello-world.log")
+
+	// Part 6: Restart the agent, generate new logs
+	s.Env().VM.Execute("sudo service datadog-agent restart")


Do we need to add a blocking call to wait for the agent to be ready as @pducolin suggests?

test/new-e2e/tests/agent-metric-logs/linux-log-agent/file_tailing_utils_test.go

scottopell · 2023-12-07T17:10:20Z

test/new-e2e/tests/agent-metric-logs/linux-log-agent/file_tailing_utils_test.go

+		os = "windows"
+		t.Log("Generating Windows log.")
+		logPath = "C:\\logs\\hello-world.log"
+		cmd = fmt.Sprintf("echo %s > %s", strings.Repeat(content+" ", recurrence), logPath)


This will overwrite the contents of the existing file, not append. Not a blocking comment for this PR as this is focused on linux, but it should be fixed for the windows version.

scottopell · 2023-12-07T17:11:57Z

test/new-e2e/tests/agent-metric-logs/linux-log-agent/file_tailing_utils_test.go

+		if err != nil {
+			assert.FailNowf(t, "Log content %s not found, instead received:: %s", content, output)
+		}
+		if strings.Contains(output, content) {


This check will give incorrectly pass if you ask for the same content multiple times. The first content addition may succeed, and the second one may fail, but if the second one fails, this check will pass because it was added in the first call to generateLogs.

This is also not a blocking comment for this review, just pointing this out for future improvement.

test/new-e2e/tests/agent-metric-logs/linux-log-agent/file_tailing_utils_test.go

DDuongNguyen added 5 commits August 28, 2023 16:08

+ bug fix

c014d79

Merge branch 'main' of github.com:DataDog/datadog-agent

a540004

Merge branch 'main' of github.com:DataDog/datadog-agent

2334aad

Merge branch 'main' of github.com:DataDog/datadog-agent

64c6cb1

Apply changes from linux-log-agent folder

9da23b0

DDuongNguyen requested a review from a team as a code owner October 17, 2023 18:26

DDuongNguyen self-assigned this Oct 17, 2023

DDuongNguyen added changelog/no-changelog team/agent-metrics-logs labels Oct 17, 2023

deploy files

51135e0

DDuongNguyen requested review from a team as code owners October 17, 2023 18:44

lint

93d335e

DDuongNguyen added this to the 7.50.0 milestone Oct 18, 2023

pducolin reviewed Oct 19, 2023

View reviewed changes

DDuongNguyen added 2 commits October 19, 2023 14:26

Merge branch 'main' of github.com:DataDog/datadog-agent

2063e69

adjusted to comments

9f869f3

DDuongNguyen requested a review from pducolin October 25, 2023 21:23

DDuongNguyen added 4 commits October 25, 2023 17:31

Merge branch 'main' of github.com:DataDog/datadog-agent

1f705ac

update test for e2e new changes

e21eeeb

lint

16e007f

Merge branch 'main' of github.com:DataDog/datadog-agent

66b5c1e

L3n41c requested changes Nov 2, 2023

View reviewed changes

DDuongNguyen and others added 2 commits November 2, 2023 16:40

adjusted to comments

7862cca

Merge branch 'main' into e2e-linux-log-tailing

91675df

DDuongNguyen requested a review from L3n41c November 3, 2023 16:03

DDuongNguyen added 2 commits November 3, 2023 14:25

Merge branch 'main' of github.com:DataDog/datadog-agent

399f3c9

Merge branch 'main' of github.com:DataDog/datadog-agent

59a8d8e

pducolin reviewed Nov 8, 2023

View reviewed changes

kacper-murzyn added the 7.50.0-drop label Nov 11, 2023

carlosroman requested a review from scottopell December 1, 2023 14:43

scottopell reviewed Dec 1, 2023

View reviewed changes

DDuongNguyen and others added 7 commits December 4, 2023 11:34

sanity checks

5621419

comments+lints

6d9964f

Merge branch 'main' into e2e-linux-log-tailing

16d3070

Merge branch 'main' into e2e-linux-log-tailing

c02e5da

lint

6f770cf

Merge branch 'e2e-linux-log-tailing' of github.com:DataDog/datadog-ag…

957e1f4

…ent into e2e-linux-log-tailing

make functions helper functions + lints

833f859

DDuongNguyen requested a review from scottopell December 5, 2023 16:59

scottopell reviewed Dec 5, 2023

View reviewed changes

DDuongNguyen and others added 4 commits December 6, 2023 14:19

Merge branch 'main' into e2e-linux-log-tailing

2ce3d15

new comments

4074606

Merge branch 'e2e-linux-log-tailing' of github.com:DataDog/datadog-ag…

0846fa8

…ent into e2e-linux-log-tailing

lint

7416bd8

scottopell reviewed Dec 7, 2023

View reviewed changes

DDuongNguyen added 5 commits December 11, 2023 10:31

documentation + lint

b25d5b9

Merge branch 'main' of github.com:DataDog/datadog-agent

caf9b62

Merge branch 'main' into e2e-linux-log-tailing

9496578

SSM issue

57d3ef4

documentation + break test down into smaller tests

36b7e50

DDuongNguyen requested a review from scottopell December 11, 2023 20:41

scottopell approved these changes Dec 12, 2023

View reviewed changes

pducolin and others added 5 commits December 13, 2023 12:00

[e2e] call base suite cleanup and setup

d3ba4d7

[e2e] pass devmode option to suite run

c89fbaf

[e2e] embed log config as string

6427e90

[e2e] run cleanup after tests

76e7371

CI name change

b4fd09e

DDuongNguyen merged commit fd883c3 into main Dec 13, 2023
7 checks passed

DDuongNguyen deleted the e2e-linux-log-tailing branch December 13, 2023 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMLII-933 - Linux log file tailing e2e test #20202

AMLII-933 - Linux log file tailing e2e test #20202

DDuongNguyen commented Oct 17, 2023 •

edited by carlosroman

Loading

pducolin left a comment

pducolin Oct 19, 2023

scottopell Nov 28, 2023

pducolin Nov 28, 2023

pducolin Nov 28, 2023

pducolin Oct 19, 2023

L3n41c left a comment

pducolin Nov 8, 2023

pducolin Nov 8, 2023

scottopell Dec 1, 2023

scottopell Dec 1, 2023

DDuongNguyen Dec 4, 2023

scottopell Dec 5, 2023

scottopell Dec 1, 2023

DDuongNguyen Dec 4, 2023

scottopell Dec 1, 2023

DDuongNguyen Dec 4, 2023

scottopell Dec 1, 2023

DDuongNguyen Dec 4, 2023 •

edited

Loading

scottopell Dec 1, 2023

DDuongNguyen Dec 4, 2023

scottopell Dec 5, 2023

scottopell Dec 5, 2023

DDuongNguyen Dec 6, 2023 •

edited

Loading

scottopell Dec 7, 2023

scottopell left a comment

scottopell Dec 7, 2023

scottopell Dec 7, 2023 •

edited

Loading

scottopell Dec 7, 2023

scottopell Dec 7, 2023

scottopell Dec 7, 2023

scottopell Dec 7, 2023

	/test/new-e2e/ @DataDog/agent-e2e-testing
	/test/new-e2e/language-detection @DataDog/processes
	/test/new-e2e/system-probe @DataDog/ebpf-platform
	/test/new-e2e/scenarios/system-probe @DataDog/ebpf-platform
	/test/new-e2e/tests/agent-subcommands @DataDog/agent-shared-components
	/test/new-e2e/tests/containers @DataDog/container-integrations
	/test/new-e2e/tests/language-detection @DataDog/processes
	/test/new-e2e/tests/ndm @DataDog/network-device-monitoring
	/test/new-e2e/tests/npm @DataDog/Networks
	/test/new-e2e/tests/agent-platform @DataDog/agent-platform


		func (s *LinuxVMFakeintakeSuite) LogPermission() {

		// Part 4: Block permission and check the Agent status


		}

		// cleanUp cleans up any existing log files (only useful when running dev mode/local runs).

AMLII-933 - Linux log file tailing e2e test #20202

AMLII-933 - Linux log file tailing e2e test #20202

Conversation

DDuongNguyen commented Oct 17, 2023 • edited by carlosroman Loading

What does this PR do?

Motivation

pducolin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

L3n41c left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DDuongNguyen Dec 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DDuongNguyen Dec 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottopell left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottopell Dec 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DDuongNguyen commented Oct 17, 2023 •

edited by carlosroman

Loading

DDuongNguyen Dec 4, 2023 •

edited

Loading

DDuongNguyen Dec 6, 2023 •

edited

Loading

scottopell Dec 7, 2023 •

edited

Loading