Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add input for Azure AD Entity Analytics #34305

Merged
merged 17 commits into from
Feb 6, 2023

Conversation

taylor-swanson
Copy link
Contributor

@taylor-swanson taylor-swanson commented Jan 18, 2023

What does this PR do?

  • Add new generic input for Entity Analytics. The input can be extended further through providers, which interface with an external identity provider, such as Azure Active Directory.
  • Add new Azure AD provider for Entity Analytics
  • Add docs

Why is it important?

Supports the greater Entity Analytics project by ingesting user and group identities from Azure Active Directory into Elasticsearch.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

  • Run unit tests against x-pack/filebeat/input/entityanalytics
  • Testing the input outside of unit tests requires access to Azure Active Directory.

Related issues

Use cases

@taylor-swanson taylor-swanson requested a review from a team January 18, 2023 17:19
@taylor-swanson taylor-swanson self-assigned this Jan 18, 2023
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jan 18, 2023
@mergify
Copy link
Contributor

mergify bot commented Jan 18, 2023

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @taylor-swanson? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@taylor-swanson taylor-swanson added the backport-skip Skip notification from the automated backport with mergify label Jan 18, 2023
@elasticmachine
Copy link
Collaborator

elasticmachine commented Jan 18, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-02-02T18:31:09.946+0000

  • Duration: 76 min 57 sec

Test stats 🧪

Test Results
Failed 0
Passed 7555
Skipped 746
Total 8301

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@taylor-swanson taylor-swanson force-pushed the input-entity-analytics branch 3 times, most recently from 7b962f4 to e03d43d Compare January 19, 2023 21:39
- Add new generic input for Entity Analytics. The input can be
extended further through providers, which interface with an
external identity provider, such as Azure Active Directory.
- Add new Azure AD provider for Entity Analytics
- Add docs
@taylor-swanson taylor-swanson marked this pull request as ready for review January 23, 2023 15:51
@taylor-swanson taylor-swanson requested a review from a team as a code owner January 23, 2023 15:51
@taylor-swanson taylor-swanson requested review from belimawr and rdner and removed request for a team January 23, 2023 15:51
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@taylor-swanson
Copy link
Contributor Author

This should be ready for general review now. A couple of notes:

  • I'd like to keep this input "experimental" for now.
    • I don't think we've completely narrowed down the data model for Entity Analytics yet, so things may change in the future.
    • We plan on adding another identity provider in the future (Okta). The process of doing that may identify issues with the current design.
  • @jamiehynds or @SourinPaul, how do we feel about entity-analytics as the input name? Do we expect this project to have a different name in the future? While not impossible, changing the name of the input later can be a bit painful.

Copy link
Contributor

@efd6 efd6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review. From a maintenance perspective, it would be helpful to have more godoc in this.

@mergify
Copy link
Contributor

mergify bot commented Jan 24, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b input-entity-analytics upstream/input-entity-analytics
git merge upstream/main
git push upstream input-entity-analytics

@SourinPaul
Copy link

@taylor-swanson entity-analytics is suitable for this input. It covers the generic intent of our initiative, allowing you to reuse this input across additional vendor sources.

- Fixed issue where group relationship tree was being passed by value
and not pointer to marshaling functions
- Changed behavior of full sync so it will force a fresh sync from
Azure AD rather than try to use existing state via the delta link token.
It was observed in testing that the API sometimes doesn't report proper
group membership information and never seems to come back into alignment.
Forcing a fresh sync corrects this issue, and also aligns better with
the concepts mentioned in the RFC.
Copy link
Contributor

@efd6 efd6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, thank you for adding all the docs. Just a couple of queries.

@mergify
Copy link
Contributor

mergify bot commented Jan 27, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b input-entity-analytics upstream/input-entity-analytics
git merge upstream/main
git push upstream input-entity-analytics

Copy link
Contributor

@efd6 efd6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after conflict is resolved.

@taylor-swanson
Copy link
Contributor Author

Hey @belimawr or @rdner, looks like I will need approval from either one of you. Thanks!

Copy link
Member

@rdner rdner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing the input outside of unit tests requires access to Azure Active Directory.

I assume it was actually tested manually with AD, could you confirm that and describe the test steps and the example of the output?

Having just unit tests is not enough in this case.

@taylor-swanson
Copy link
Contributor Author

Testing the input outside of unit tests requires access to Azure Active Directory.

I assume it was actually tested manually with AD, could you confirm that and describe the test steps and the example of the output?

Having just unit tests is not enough in this case.

Testing the input outside of unit tests requires access to Azure Active Directory.

I assume it was actually tested manually with AD, could you confirm that and describe the test steps and the example of the output?

Having just unit tests is not enough in this case.

Yes, this was tested manually with Azure AD. For a list of test cases, I ran through the following scenarios (this was linked in the related issue):

Test Criteria

  • Verify users are synced.
  • Verify documents contain group membership info.
  • Update a user in Azure AD, verify the change is reflected in Elasticsearch as a new document.
  • Delete a user in Azure AD, verify the change is reflected in Elasticsearch as a new document that indicates a deleted status
  • Create a new user in Azure AD, verify the new user reflected in Elasticsearch
  • Verify that the data stream contains a "full sync marker" document with event.action: started when a new sync starts.
  • Verify that the data stream contains a "full sync marker" document with event.action: completed when a new sync completed.
  • Documentation exists that explains how the input works (what APIs it utilizes, how it persists information, etc)
  • Documentation exists that explains how to authenticate and authorize the input with least privileges.

It takes a while to get everything set up, and I just returned from traveling, so it may take me a bit to get sample documents.

- Switch from assert to require, interrupts tests in case of failure
- Remove panics from test setup and replace with require.NoError
- Use t.TempDir instead of os.MkdirTemp
@taylor-swanson
Copy link
Contributor Author

Here are some sample documents:
Start write marker

{
    "@timestamp": "2023-02-02T15:18:08.691Z",
    "@metadata": {
        "beat": "filebeat",
        "type": "_doc",
        "version": "8.7.0"
    },
    "labels": {
        "identity_source": "azure-1"
    },
    "event": {
        "start": "2023-02-02T15:18:08.691Z",
        "action": "started"
    },
    "input": {
        "type": "entity-analytics"
    },
    "ecs": {
        "version": "8.0.0"
    },
    "host": {
        "hostname": "agent1",
        "architecture": "x86_64",
        "name": "agent1",
        "os": {
            "codename": "jammy",
            "type": "linux",
            "platform": "ubuntu",
            "version": "22.04.1 LTS (Jammy Jellyfish)",
            "family": "debian",
            "name": "Ubuntu",
            "kernel": "5.15.0-58-generic"
        },
        "id": "c7e8e9335ba042fabbbe850aa104d692",
        "containerized": false,
        "ip": [
            "10.0.2.15",
            "fe80::a00:27ff:fe88:e72a"
        ],
        "mac": [
            "08-00-27-88-E7-2A"
        ]
    },
    "agent": {
        "id": "28e36fc0-551b-4ac2-95f0-4ed66455756e",
        "name": "agent1",
        "type": "filebeat",
        "version": "8.7.0",
        "ephemeral_id": "c53ad281-54db-4263-87bf-b6da287bfd9c"
    }
}

User document:

{
    "@timestamp": "2023-02-02T15:18:08.693Z",
    "@metadata": {
        "beat": "filebeat",
        "type": "_doc",
        "version": "8.7.0"
    },
    "ecs": {
        "version": "8.0.0"
    },
    "host": {
        "mac": [
            "08-00-27-88-E7-2A"
        ],
        "name": "agent1",
        "hostname": "agent1",
        "architecture": "x86_64",
        "os": {
            "platform": "ubuntu",
            "version": "22.04.1 LTS (Jammy Jellyfish)",
            "family": "debian",
            "name": "Ubuntu",
            "kernel": "5.15.0-58-generic",
            "codename": "jammy",
            "type": "linux"
        },
        "id": "c7e8e9335ba042fabbbe850aa104d692",
        "containerized": false,
        "ip": [
            "10.0.2.15",
            "fe80::a00:27ff:fe88:e72a"
        ]
    },
    "agent": {
        "version": "8.7.0",
        "ephemeral_id": "c53ad281-54db-4263-87bf-b6da287bfd9c",
        "id": "28e36fc0-551b-4ac2-95f0-4ed66455756e",
        "name": "agent1",
        "type": "filebeat"
    },
    "azure_ad": {
        "surname": "User1",
        "userPrincipalName": "test.user1@azure2elasticsearch.onmicrosoft.com",
        "displayName": "Test User1",
        "givenName": "Test"
    },
    "labels": {
        "identity_source": "azure-1"
    },
    "user": {
        "id": "aeb2dc6a-797d-4e6d-8552-df43e4200f79",
        "group": [
            {
                "id": "a36ac877-d4e4-41d2-b2f8-5895c1ec3eb5",
                "name": "Test Group 1"
            },
            {
                "id": "3ef344f3-3cb9-45ba-b997-057b76b3c1f7",
                "name": "Test Group 2"
            },
            {
                "id": "6e47d59e-9e02-4c6b-bbb9-43ee57637619",
                "name": "Test Group 3"
            }
        ]
    },
    "event": {
        "action": "user-discovered"
    },
    "input": {
        "type": "entity-analytics"
    }
}

End write marker:

{
    "@timestamp": "2023-02-02T15:18:08.693Z",
    "@metadata": {
        "beat": "filebeat",
        "type": "_doc",
        "version": "8.7.0"
    },
    "labels": {
        "identity_source": "azure-1"
    },
    "event": {
        "action": "completed",
        "end": "2023-02-02T15:18:08.693Z"
    },
    "input": {
        "type": "entity-analytics"
    },
    "agent": {
        "version": "8.7.0",
        "ephemeral_id": "c53ad281-54db-4263-87bf-b6da287bfd9c",
        "id": "28e36fc0-551b-4ac2-95f0-4ed66455756e",
        "name": "agent1",
        "type": "filebeat"
    },
    "ecs": {
        "version": "8.0.0"
    },
    "host": {
        "containerized": false,
        "name": "agent1",
        "ip": [
            "10.0.2.15",
            "fe80::a00:27ff:fe88:e72a"
        ],
        "mac": [
            "08-00-27-88-E7-2A"
        ],
        "hostname": "agent1",
        "architecture": "x86_64",
        "os": {
            "type": "linux",
            "platform": "ubuntu",
            "version": "22.04.1 LTS (Jammy Jellyfish)",
            "family": "debian",
            "name": "Ubuntu",
            "kernel": "5.15.0-58-generic",
            "codename": "jammy"
        },
        "id": "c7e8e9335ba042fabbbe850aa104d692"
    }
}

Incremental update to user (removed group memberships):

{
    "@timestamp": "2023-02-02T15:27:17.543Z",
    "@metadata": {
        "beat": "filebeat",
        "type": "_doc",
        "version": "8.7.0"
    },
    "labels": {
        "identity_source": "azure-1"
    },
    "user": {
        "id": "aeb2dc6a-797d-4e6d-8552-df43e4200f79"
    },
    "event": {
        "action": "user-modified"
    },
    "input": {
        "type": "entity-analytics"
    },
    "ecs": {
        "version": "8.0.0"
    },
    "host": {
        "hostname": "agent1",
        "architecture": "x86_64",
        "os": {
            "family": "debian",
            "name": "Ubuntu",
            "kernel": "5.15.0-58-generic",
            "codename": "jammy",
            "type": "linux",
            "platform": "ubuntu",
            "version": "22.04.1 LTS (Jammy Jellyfish)"
        },
        "id": "c7e8e9335ba042fabbbe850aa104d692",
        "containerized": false,
        "ip": [
            "10.0.2.15",
            "fe80::a00:27ff:fe88:e72a"
        ],
        "mac": [
            "08-00-27-88-E7-2A"
        ],
        "name": "agent1"
    },
    "agent": {
        "id": "28e36fc0-551b-4ac2-95f0-4ed66455756e",
        "name": "agent1",
        "type": "filebeat",
        "version": "8.7.0",
        "ephemeral_id": "619d854f-1882-41b2-a189-9a914e068236"
    },
    "azure_ad": {
        "surname": "User1",
        "userPrincipalName": "test.user1@azure2elasticsearch.onmicrosoft.com",
        "displayName": "Test User1",
        "givenName": "Test"
    }
}

- Lone vertices aren't included in the group relationship tree, so
they won't be included in the expansion from the group tree. Direct
members are inserted into the transitiveMemberOf set first, then
expansion occurs.
@taylor-swanson taylor-swanson merged commit c8578a0 into elastic:main Feb 6, 2023
@taylor-swanson taylor-swanson deleted the input-entity-analytics branch February 6, 2023 14:21
chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
- Add new generic input for Entity Analytics. The input can be
extended further through providers, which interface with an
external identity provider, such as Azure Active Directory.
- Add new Azure AD provider for Entity Analytics
- Add docs for the new input/provider
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.7-candidate backport-skip Skip notification from the automated backport with mergify Filebeat Filebeat new input (filebeat) A new input for file beat
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Build new input for ingesting user and group metadata from Azure AD
6 participants