-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Meta] Audit Logging #52125
Comments
Pinging @elastic/kibana-security (Team:Security) |
@arisonl FYI |
I see the output format is going to be in ECS which is great. Will we support ingesting this data into Elasticsearch and using it in the product for inspection by admins? We should be able to leverage Core's logging appenders to accomplish the ingestion piece. |
My take on it is that the ingestion itself is out of scope for this feature. As long as we can output to JSON on the file system (which we were intending to use Core's logging appenders to do), Filebeat can be used for ingestion. Is that what you meant? Or are the logging appenders going to support ingestion directly? |
Filebeat would definitely work. It'd be interesting if we could actually ship Filebeat with Kibana configured to do this automatically. Of course there's some complexity with that as well (process monitoring, licensing, etc.) My broader question is about whether or not there are plans to use this data in the product. For example, it'd be great if there was a menu item on an visualization that opened a UI with a history of edits to that visualization. |
In short: no. There is overlap of what information we need / what conclusions we can draw with audit logging and what we're calling "usage data". However, there is a strong separation of concerns there. We ultimately decided to keep this at a smaller scope just for the auditing use case. I do think that once we have all of the new audit logging in place, we'll have all of the hooks/plumbing necessary to track and provide robust usage data. But we don't want to conflate audit records and usage data. |
During a Zoom meeting today, there was some discussion about which events and attributes should be in the "normal logs" vs what should be in the "audit logs". @jportner and I discussed this further and I've summarized the consensus that we reached. The normal logs should not include user-specific information. User information is particularly sensitive, and augmenting normal log events with this information is potentially problematic. However, it's perfectly fine for these to include opaque identifiers for the session and the HTTP request. The normal logs should include all events which are logged using the standard logging infrastructure and be filtered however the user chooses. The audit logs should include user-specific information, and controls will be put in place to only log entries for specific users or only specific user information. The audit logs will include only audit specific events. There is potentially some overlap here with regard to the events which appear in the normal logs and in the audit logs, but they're generally completely separate. The audit logs will include all authorization and authentication based events, in addition to events for specific operations of interest, including but not limited to: saved-object CRUD, Elasticsearch queries. The mechanism for creating the audit events for operations which aren't auth related needs to be explored further. |
Components needed:
Open questions:
|
I see the Audit service as a separate top-level service (the outer circle in the onion architecture) No plugins depend on the AuditTrail. AuditTrail Service may depend on any plugin. security.on('authenticationSuccess', (message: string, request: KibanaRequest) => {
const auditData = {
message,
action: 'authenticationSuccess'
user: security.getUser(request),
spaces: spaces.getSpace(request),
server: core.http.getServerInfo(),
...
}
// has a well-known prefix
log.logger(auditData); As an alternative, Platform provides Auditable hook and AuditTrail service registers itself via this hook. registerAuditable(({ action: string, message: string, request: KibanaRequest }) => void): void; To define the logging layout, we can use the same approach as elasticsearch does for SecurityAudit - add an explicit config in x-pack that enhance OSS The open question for me: What type of unique data each auditable event has got? I suspect a dataset for
Elasticsearch doesn't use the
We already have RequestHandlerContext. It might expose |
I think we are largely on the same page here. I'd like to layout this plan with a distinction between some of the concerns. Namely, I'd like to separate what is necessary to support general observability and tracing within Kibana logs (OSS and otherwise) and what is necessary to support audit logs (X-Pack). General observability requirements:
Audit logging requirements:
For the general observability case, we need a couple new components:
I think we're both in agreement on how to accomplish these two requirements. (1) can be solved by introducing a formal "LogContext` struct that is used by both the Logger and the Elasticsearch and SavedObjects clients. This struct would be created by Core's request context provider and injected into the ES and SO clients exposed by RequestHandlerContext. This enables every log message in those clients to include data about the current request (would not include user data). (2) is solved by changing our JSON log layout to be ECS-compatible. For the audit logging case, we need:
(1) is where I think we need some discussion. My only concern about adding domain-specific events is that they may be abused by other plugins for different purposes. For example, we've gotten requests to add hooks like I think we just need to take care in how we implement such events so that the timing of when they are executed is not depended on by business logic. In other words, I want to avoid a situation where an app is dependent on these hooks in order to function correctly (other than audit logging itself). This makes me lean slightly towards the |
Sorry for being dense, I'm a bit confused about the proposed use of Can we outline what a couple of domain action event might look like? Let's say that both Security and Spaces are enabled: View diagram markuptitle Create Dashboard In this example, we have 2 requests made to ES: one for the privileges check, and another to actually index the saved object. In this example, I'd expect a single "Create dashboard" audit record, as the privileges check is a simple implementation detail, which would still be captured by the ES audit logs. What about a more complex example though? Consider the "Copy to space" feature. This works by first performing a server-side export, followed by a server-side import: View diagram markuptitle Copy to Space How many audit records would we expect to see here? Somewhere between 1 and 3?
My initial reaction is that To make a comparison to the ES audit logs, I don't think they record shard read/writes that occur as part of a user's request. They log that the request happened, and the "implementation details" are kept out of the audit logs. I only bring this up because it's not immediately clear to me where we'll choose to generate/emit these audit events. Doing so at the saved objects client would cause these "implementation details" to be logged for various domain action events. Emitting from the http routes (the public API) would probably get us most of the way there, but that doesn't handle actions like background jobs. |
I'd expect to see
The same logic here. I expect the only one event here -
The Infrastructure level (ES / SO clients) cannot emit domain events. A plugin code emits them. Depending on the plugin workflow, it can be done:
I proposed to use Audit Trail service that receives those domain events and calculates data for to build Audit Logging Record: // in plugin code
auditTrail.add({event, message, request});
// in http request handler context can be bound to a request
auditTrail.add({event, message});
// in background task we haven't got a context pattern and might have to introduce one
auditTrail.add({event, message});
// in audit trail plugin code
class AuditTrail {
on(event, message, request){
const auditData = {
message,
action: 'authenticationSuccess'
user: security.getUser(request),
spaces: spaces.getSpace(request),
server: core.http.getServerInfo(),
...
}
// has a well-known prefix
log.logger(auditData);
} Audit Logger doesn't deal with any observability concerns (ES query performance, for example). Let me know if it makes sense to you or if I missed something. |
That all makes sense, thanks. My primary question was how we would allow plugin code to emit events. Something like My initial confusion was around |
I'd expect it to be used by AuditTrail plugin to extend the platform. There are several benefits of using it in this manner:
AuditTrail plugin can depend on any plugin and uses plugin public API to calculate audit data: // package.json
requiredPlugins: ['security', 'spaces'],
// plugin.ts
class AuditTrail {
on(event, message, request){
const auditData = {
message,
action: 'authenticationSuccess'
user: security.getUser(request),
spaces: spaces.getSpace(request),
server: core.http.getServerInfo(),
...
}
// has a well-known prefix
log.logger(auditData);
}
platform.registerAuditable(auditTrail.on) Probably Also, I'd like to hear from Josh. He might have a different vision. |
Good idea making a diagram @legrego -- OK, so
In In my mind it would look something like this. Click to see JSON
Note 1: I omitted some attributes in the interest of brevity. Note 2: each record can be correlated with each other by trace.id (which should also be sent to Elasticsearch as Note 3: the four records in the audit trail with the So, this approach would generically audit all API routes and SOC calls. It would show what's happening "under the hood" for the SOC and its wrappers. Of course this is more verbose than the alternative of writing a single audit event for each request. Potential advantages of
Disadvantages:
Thoughts? |
I think we're on the same page here. The only part I'm confused about in your example is the If we're on the same page there, then the final result is Platform would need to expose two APIs:
In terms of what produces the audit events themselves (@jportner's discussion above), I think I do favor Approach #2 for its completeness. It seems less likely that we may miss an critical event that should be included in the audit log if we log the lower level details. That said, I'm not very familiar with how audit logs are used by customers. If the low-level logs are too opaque to understand, that could make these logs much less useful. So really it seems the question is: do we favor completeness or clearer semantics? Could we do both? Could the semantic, high-level action be provided as a "scope" for the lower-level audit events? For example, what if we had an API that allows an HTTP endpoint to start a auditable event scope so that all audit events that are produced while that scope is open are associated with the high-level semantic action. router.post(
{ path: '/api/do_action' },
async (context, req, res) => {
const auditScope = context.audit.openScope('copy_to_space');
try {
// Any audit events produced by SO client while scope is open
// would be associated with the `copy_to_space` scope.
const res = await copyToSpace(context.savedObjects.client);
return res.ok({ body: res });
} finally {
auditScope.close();
}
}
); Or we could change the API a bit to: router.post(
{ path: '/api/do_action' },
async (context, req, res) => context.audit.openScope(
'copy_to_space',
async () => {
// Any audit events produced by SO client while scope is open
// would be associated with the `copy_to_space` scope.
const res = await copyToSpace(context.savedObjects.client);
return res.ok({ body: res });
}
)
); The tricky part about this in Node.js is that these async actions are running in the same memory space, which makes associating the scope with any asynchronous code difficult. Couple options for solving:
|
Correct 👍
AFAIK Nodejs provides built-in primitives that we can try to use for this case https://nodejs.org/api/async_hooks.html |
I agree async_hooks could be a solution. My concern is just that it's still in experimental, even in the latest Node version. It does look like the working group is discussing stabilization. If it does go stable in v14 LTS, it could be a viable option for us. |
Hi team, I'm new to the project and am starting to get up to speed with the audit log feature. From speaking to different people there still seem to be a few outstanding questions and different ideas as to what the audit log should provide, to what level of detail and how it differs from existing logging. In order to help us define a clear approach I wanted to define some guiding principles that we can agree on and then refer back to when making a decision about whether something should be included in the audit log or not and what the implementation should look like. I have written these as statements but they are all open questions / up for debate. I might have gotten this completely wrong so would be great to get your thoughts! Guiding PrinciplesWhat’s the difference between our audit log and system log?
What events need to be captured?
When are events logged?
Can an action trigger multiple events (log lines)?
How does Kibana audit logging tie into ElasticSearch audit logging?
Examples
|
ECS Audit Log ProposalField Reference: https://www.elastic.co/guide/en/ecs/current/ecs-field-reference.html ApproachAuthorisation / privilege checks are logged as an outcome of an action rather than as a separate log line since they are implementation details. This is the same approach as error/success results in ECS standard. Bulk operations are logged as separate events. It would be less verbose to combine a bulk operation into a single log line but that would mean that we can't record successes/failures individually using ECS standard. Saved object details are extracted into a non-standard
EventsUser Authentication{
"message": "User 'jdoe' logged in successfully using realm 'native'|Failed login attempt using realm 'native'|User re-authentication failed",
"event": {
"action": "user_login|user_logout|user_reauth",
"category": ["authentication"],
"type": ["user"],
"outcome": "success|failure",
"module": "kibana",
"dataset": "kibana.audit"
},
"error": {
"code": "spaces_authorization_failure",
"message": "jdoe unauthorized to getAll spaces",
},
"trace": {
"id": "opaque-id"
}
} Saved Object CRUD{
"message": "User 'jdoe' created dashboard 'new-saved-object' in space 'default'",
"event": {
"action": "saved_object_create",
"category": ["database"],
"type": ["creation|access|change|deletion", "allowed|denied"],
"outcome": "success|failure",
},
"document": {
"space": "default",
"type": "dashboard",
"id": "new-saved-object"
},
"error": {
"code": "spaces_authorization_failure",
"message": "jdoe unauthorized to getAll spaces",
},
"trace": {
"id": "opaque-id"
}
} HTTP Response{
"message": "HTTP request 'login' by user 'jdoe' succeeded",
"event": {
"action": "http_request",
"category": ["web"],
"outcome": "success|failure",
},
"http": {
"request": {
"method": "POST",
"body": {
"content": "{\"objects\":[{\"type\":\"dashboard\",\"id\":\"foo\"}],\"spaces\":[\"destspace\"],\"includeReferences\":true,\"overwrite\":true}"
}
},
"response": {
"status_code": 200
}
},
"source": {
"address": "12.34.56.78",
"ip": "12.34.56.78"
},
"url": {
"domain": "kibana",
"full": "https://kibana/api/spaces/_copy_saved_objects",
"path": "/api/spaces/_copy_saved_objects",
"port": "443",
"query": "",
"scheme": "https"
},
"user": {
"email": "john.doe@company.com",
"full_name": "John Doe",
"hash": "D30A5F57532A603697CCBB51558FA02CCADD74A0C499FCF9D45B...",
"sid": "2FBAF28F6427B1832F2924E4C22C66E85FE96AFBDC3541C659B67...",
"name": "jdoe",
"roles": [ "kibana_user" ]
},
"trace": {
"id": "opaque-id"
}
} ScenariosCopy to space{
"message": "User 'jdoe' accessed dashboard 'first-object' in space 'default'",
"event": { "action": "saved_object_read", "category": ["database"], "type": ["access"], "outcome": "success" },
"document": { "id": "first-object", "type": "dashboard", "space": "default" }
}
{
"message": "User 'jdoe' accessed dashboard 'second-object' in space 'default'",
"event": { "action": "saved_object_read", "category": ["database"], "type": ["access"], "outcome": "success" },
"document": { "id": "second-object", "type": "dashboard", "space": "default" }
}
{
"message": "User 'jdoe' created dashboard 'first-object' in space 'copy'",
"event": { "action": "saved_object_create", "category": ["database"], "type": ["creation"], "outcome": "success" },
"document": { "id": "first-object", "type": "dashboard", "space": "copy" }
}
{
"message": "User 'jdoe' created dashboard 'second-object' in space 'copy'",
"event": { "action": "saved_object_create", "category": ["database"], "type": ["creation"], "outcome": "success" },
"document": { "id": "second-object", "type": "dashboard", "space": "copy" }
}
{
"message": "HTTP request 'copy-to-space' by user 'jdoe' succeeded",
"event": { "action": "http_request", "category": ["web"], "outcome": "success" }
} Error: User not authorised to access dashboard (Kibana authZ):{
"message": "User 'jdoe' not authorised to access dashboard 'first-object' in space 'default'",
"event": { "action": "saved_object_read", "category": ["database"], "type": ["access"], "outcome": "failure" },
"error": { "code": "spaces_authorization_failure", "message": "jdoe unauthorized to getAll spaces" },
"document": { "id": "first-object", "type": "dashboard", "space": "default" }
}
{
"message": "HTTP request 'copy-to-space' by user 'jdoe' failed",
"event": { "action": "http_request", "category": ["web"], "outcome": "failure" },
"error": { "code": "spaces_authorization_failure", "message": "jdoe unauthorized to getAll spaces" }
} Error: Session expired (Kibana authN):{
"message": "Unknown user not authenticated to request 'copy-to-space'",
"event": { "action": "http_request", "category": ["web", "authentication"], "type": ["denied"], "outcome": "failure" }
} Error: User not authorised to access data index (ElasticSearch authZ):{
"message": "User 'jdoe' not authorised to access index 'products'"
}
{
"message": "HTTP request 'copy-to-space' by user 'jdoe' failed",
"event": { "action": "http_request", "category": ["web", "authentication"], "type": ["allowed"], "outcome": "failure" }
} User login{
"message": "User 'jdoe' logged in successfully using realm 'native'",
"event": { "action": "user_login", "category": ["authentication"], "type": ["user"], "outcome": "success" }
}
{
"message": "HTTP request 'login' by user 'jdoe' succeeded",
"event": { "action": "http_request", "category": ["web"], "outcome": "success" }
} Open question
|
@thomheymann thank you for the logging format proposal. I have a couple of questions about the
|
Thanks for feedback Mikhail!
These are only example events, there are a lot more events we would audit but I wanted to establish some kind of a pattern first since most of the other events would follow a similar approach. I've added a list of the possible other events below. (again, not complete / reviewed)
The way I understood HTTP based audit logging is that it's a way of very quickly and easily getting most of our auditing requirements ticked off without forcing plugin authors to manually create audit specific events. It feeds into one of my open questions though around the overlap of these (i.e. do we need an http_request event for the login route in our audit log if we already log user logins as a separate event?)
I have no view on this at this point, I'm purely looking at it from a requirements perspective. Would be great to get a steer in terms of what is actually feasible based on the implementation. |
Thanks for the writeup @thomheymann! A quick note on your guiding principles:
When discussing how this ties into ES audit logs, you menion:
I agree with this. I wouldn't expect Kibana to log responses returned by ES that result from queries against users' data indices. The full list of events might be easier to curate and discuss in a google doc. Entries under user and role management should be left to ES audit logs, as they are the authoritative source of this information. I expect logstash pipelines fall into this category as well.
At the most verbose level, we may want to include everything, or almost everything here. The ability to filter this out will be critical though, and it'll probably make sense to come up with a sensible configuration so that we don't log everything by default, but instead allow administrators to opt-in to more granularity. Perhaps the platform could add a route option to the interface to allow a route to exclude itself from auditing, if we find that we need this flexibility.
I'm leaning towards having the security plugin log these events (it's what we do today). It's technically possible to create a SOC without the security wrapper applied, but in those cases, we'd expect consumers to audit their own SO events. Alerting is one such example: https://github.com/gmmorris/kibana/blob/alerting/consumer-based-rbac/x-pack/plugins/alerts/server/authorization/alerts_authorization.ts#L158 |
There might be an exception to this that I'm overlooking, but I believe all bulk operations are all-or-nothing today, so we don't have a need for logging success/failures individually. Our current approach (which isn't necessarily the right one) is to log bulk operations as a single entry, but that entry identifies the objects in question.
It might be unnecessary duplication, but I think it's hard to definitively say that a certain API endpoint will only ever do a single operation. We could attempt to tag routes as such, but that requires manual effort on the engineering side which could be easily overlooked during a seemingly unrelated refactor. At the moment, I'm thinking we'll accept the duplication since we'll have the ability to filter events, but we can always revisit this if we find a clear pattern to these events I'm interested in hearing other thoughts though! My opinions here are just that. |
Closing this meta issue, as we have sub-issues open to track the remaining individual tasks that we care about at this time. |
Please can you add the saved object name/description so we can provide reports to IT controls? Reports with the saved object ID aren't user friendly. |
Overview
The current state of audit logging in Kibana is not sufficient for many users' needs. Kibana outputs only a few types of events, without much detail, in the same transport as regular log messages. This can be improved in many ways.
Enhancements in scope:
Current state vs. desired state...
Current state
Audit records in Kibana are displayed in plaintext like so:
If JSON output is enabled:
Future state
Audit records should be written in a standard format (ECS), should contain more information about the event that occurred and who originated the action, and fields should be configurable to include more or less information. Such an audit record would look something like this:
Note: in the example above, the
user.hash
(a hash of theuser.name
field) would not be included by default; it would be an optional field that could be included if theuser.name
needed to be excluded for privacy reasons.First Phase
Prerequisites (in progress):
X-Opaque-Id
header to AuditTrail logs and Elasticsearch API calls #62018Phase 1 implementation: #54836
Future Phase
The text was updated successfully, but these errors were encountered: