Skip to content

Audit provenance

Ashish Gehani edited this page Mar 14, 2023 · 56 revisions

Data Model

The Audit reporter transforms records into an Open Provenance Model (OPM) representation.

The table below outlines the key-value annotations that decorate the OPM elements generated.

OPM element Annotation Key Annotation Value's semantics Annotation Value's type Presence
Agent
uid operating system identifier of user that ran the program unsigned
integer
required
euid operating system identifier of effective user of program unsigned
integer
required
gid operating system identifier of user's group when they ran the program unsigned
integer
required
egid operating system identifier of effective group of program unsigned
integer
required
suid saved identifier when program's effective user has changed unsigned
integer
optional
sgid saved identifier when program's effective group has changed unsigned
integer
optional
fsuid program's user identifier for filesystem access checks unsigned
integer
optional
fsgid program's group identifier for filesystem access checks unsigned
integer
optional
source can be one of:
syscall - if information came from a Linux kernel Audit system call record
/proc - if information came from Linux's /proc pseudofilesystem
string (as enumerated) required
Process
name command used to invoke program string optional
pid operating system process identifier integer required
ns pid operating system process identifier in pid namespace integer optional
ppid parent's process identifier integer required
cwd only for process from operation execve, current working directory of user (in the shell when they ran the program) string optional
command line only for process from operation execve, program name and arguments provided string optional
start time if known, when the process (or unit) started (in Unix time) floating
point
optional
seen time if start time not known, (Unix) time of first event seen from process floating
point
optional
unit only if UBSI1 used, unique identifier of unit (with 0 denoting the non-unit part of the process) long
integer
optional
count only if UBSI1 used and unit0, number of times entire unit loop ran previously long
integer
optional
iteration only if UBSI1 used and unit0, number of times unit loop has iterated long
integer
optional
mount namespace filesystem mount point namespace integer optional
user namespace user/group identifier namespace integer optional
net namespace network namespace integer optional
pid namespace process identifier namespace integer optional
children pid namespace process identifier namespace of children integer optional
ipc namespace inter-process message queue namespace integer optional
source can be one of:
syscall - if information came from a Linux kernel Audit system call record
/proc - if information came from Linux's /proc pseudofilesystem
beep - if information came from UBSI1
string (as enumerated) required
Artifact
subtype can be one of:
memory - for memory addresses
file, link, directory, block device, character device - for filesystem entities
named pipe, unnamed pipe, unix socket, unix socket pair, network socket pair, system v message queue, system v shared memory, and posix message queue - for inter-process flow
network socket - for network flows
unknown - underlying artifact can be of subtype file, link, directory, block device, character device, named pipe, unnamed pipe, unix socket, network socket, network socket pair, unix socket pair, system v message queue, system v shared memory, or posix message queue
string (as enumerated) required
memory address only for subtype memory, location in memory integer (in hexadecimal) optional
size only for subtype memory, length of allocated memory hexadecimal integer optional
tgid only for subtype memory, unnamed pipe, unknown, unix socket pair, or network socket pair, group identifier of threads that share memory or file descriptors integer optional
time only for subtype memory, unnamed pipe, 'unknown', unix socket pair, or network socket pair, start or seen time of group identifier of threads that share memory or file descriptors floating point optional
path only for subtype file, named pipe, link, directory, block device, character device, unix socket, or posix message queue, location in the local filesystem string optional
root path only for subtype file, named pipe, link, directory, block device, character device, unix socket, or posix message queue, root filesystem location string optional
inode only for subtype file, named pipe, link, directory, block device, character device, unix socket, or posix message queue, inode in the local filesystem string optional
permissions only for subtype file, link, directory, block device, character device, named pipe, or unix socket, filesystem access mode integer (in octal) optional
version only for subtype file, link, directory, block device, character device, named pipe, unnamed pipe, memory, unix socket, or unknown, how many times it has been written integer optional
epoch only for subtype file, link, directory, block device, character device, named pipe, unnamed pipe, unix socket, network socket, or unknown, how many times an artifact has been created at specified path integer optional
fd only for subtype unknown, descriptor used to access file integer optional
read fd only for subtype unnamed pipe, descriptor used to read pipe integer optional
write fd only for subtype unnamed pipe, descriptor used to write pipe integer optional
fd 0 only for subtypes unix socket pair and network socket pair, descriptor used to access connected socket pair integer optional
fd 1 only for subtypes unix socket pair and network socket pair, descriptor used to access connected socket pair integer optional
local address only for subtype network socket, host from which connection originates dotted octet optional
local port only for subtype network socket, connection port used at originating host unsigned
short
integer
optional
remote address only for subtype network socket, host at which connection terminates dotted octet optional
remote port only for subtype network socket, connection port used at terminating host unsigned
short
integer
optional
protocol can be one of: udp or tcp, only for subtype network socket, connection protocol used string (as enumerated) optional
net namespace only for subtype network socket, net namespace of process initiating connection integer optional
ipc namespace only for subtypes system v message queue, and system v shared memory, ipc namespace of operating process integer optional
id only for subtypes system v message queue, and system v shared memory, System V resource identifier integer optional
owner uid only for subtypes system v message queue, and system v shared memory, user identifier for owner of System V resource integer optional
owner gid only for subtypes system v message queue, and system v shared memory, group identifier for owner of System V resource integer optional
source can be one of:
syscall - if information came from a Linux kernel Audit system call record
netfilter - if information came from a Linux kernel Audit network filter record
/proc - if information came from Linux's /proc pseudofilesystem
beep - if information came from UBSI1
string (as enumerated) required
WasControlledBy
operation can be one of:
update - implicit process ownership change
setuid or setgid - explicit process ownership change
string (as enumerated) optional
time if known, when the event occurred (in Unix time) floating
point
optional
event id if source is syscall, underlying event's identifier unsigned
integer
optional
source can be one of:
syscall - if information came from a Linux kernel Audit system call record
/proc - if information came from Linux's /proc pseudofilesystem
string (as enumerated) required
WasTriggeredBy
operation can be one of:
fork - another independent process was created
clone - another process created with shared state
execve - child process replaced parent
unknown - underlying operation can be of type fork, clone, or execve
update - implicit process ownership change
setuid or setgid - explicit process ownership change
unit - creation of a UBSI1 unit (by a program loop)
unit dependency - dependent unit read memory written by another unit
ptrace - trace another process
kill - send signal to another process
setns - join existing namespace
unshare - move to new namespace
string (as enumerated) optional
flags only for operation clone, clone flags string optional
request only for operation ptrace, can be one of:
PTRACE_POKETEXT or PTRACE_POKEDATA or PTRACE_POKEUSER or PTRACE_SETREGS or PTRACE_SETFPREGS or PTRACE_SETREGSET or PTRACE_SETSIGINFO or PTRACE_SETSIGMASK or PTRACE_SET_THREAD_AREA or PTRACE_SETOPTIONS - data of tracee modified
or
PTRACE_CONT or PTRACE_SYSCALL or PTRACE_SINGLESTEP or PTRACE_SYSEMU or PTRACE_SYSEMU_SINGLESTEP or PTRACE_LISTEN or PTRACE_KILL or PTRACE_INTERRUPT or PTRACE_ATTACH or PTRACE_DETACH - execution of tracee modified
string optional
signal only for operation kill, signal sent integer optional
time if known, when the event occurred (in Unix time) floating
point
optional
event id if source is syscall, underlying event's identifier unsigned
integer
optional
source can be one of:
syscall - if information came from a Linux kernel Audit system call record
/proc - if information came from Linux's /proc pseudofilesystem
beep - if information came from UBSI1
string (as enumerated) required
WasGeneratedBy
operation can be one of:
create - file was created
open - file was opened for writing
write - data was transferred to memory, file, or network
send - data was transferred from process to network
connect - outgoing network connection was established
truncate - data at end of file was removed
rename (write) - to new file, after renaming
link (write) - to new file, after linking
mmap (write) - to mapped memory
tee (write) - data copied to pipe
splice (write) - data transferred to destination
vmsplice (write) - data mapped to pipe
chmod - changed file permissions
mprotect - changed memory protection
unlink - file was deleted
close - file was closed
lseek - file offset was updated
madvise - set memory advice
mq_open - message queue was opened for writing
mq_timedsend - data was transferred from process to process
mq_unlink - message queue was deleted
shmget - shared memory was opened for writing or created
shmat - shared memory was mapped to process memory address space for writing
shmctl - shared memory marked for deletion
msgget - message queue was opened for writing or created
msgsnd - data was transferred from process to process
msgctl - message queue marked for deletion
string (as enumerated) required
size only for operations truncate, tee (write), splice (write), vmsplice (write), write, send, mq_timedsend, and msgsnd, number of bytes transferred long
integer
optional
mode only for operations chmod, open and create, permissions applied to file integer (in octal) optional
flags only for operations open, create, shmat, msgget, and shmget, status or creation flags string optional
protection only for operations mmap, and mprotect, permissions set for memory location hexadecimal integer optional
offset only for system calls lseek, pwrite, and pwritev, offset in the file long optional
whence only for system call lseek, can be one of:
SEEK_SET or SEEK_CUR or SEEK_END or SEEK_DATA or SEEK_HOLE - directive on how to use the offset value for lseek system call
string optional
advice only for system call madvise, can be one of:
MADV_NORMAL or MADV_RANDOM or MADV_SEQUENTIAL or MADV_WILLNEED or MADV_DONTNEED or MADV_FREE or MADV_REMOVE or MADV_DONTFORK or MADV_DOFORK or MADV_MERGEABLE or MADV_UNMERGEABLE or MADV_HUGEPAGE or MADV_NOHUGEPAGE or MADV_DONTDUMP or MADV_DODUMP or MADV_WIPEONFORK or MADV_KEEPONFORK or MADV_HWPOISON or MADV_OFFLINE - advice on memory use
string optional
time if known, when the event occurred (in Unix time) floating
point
required
event id if source is syscall, underlying event's identifier unsigned
integer
required
source can be one of:
syscall - if information came from a Linux kernel Audit system call record
/proc - if information came from Linux's /proc pseudofilesystem
beep - if information came from UBSI1
string (as enumerated) required
Used
operation can be one of:
open - file was opened for reading
read - data was transferred from memory, file, or network
recv - data was transferred from network to process
accept - incoming network connection was established
rename (read) - from original file, before renaming
link (read) - from original file, before linking
mmap (read) - from mapped file
tee (read) - data copied from pipe
splice (read) - data transferred from source
vmsplice (read) - data mapped from pipe
load - dynamic library loaded
close - file was closed
init_module - module loaded from memory
finit_module - module loaded from file
mq_open - message queue was opened for reading
mq_timedreceive - data was received from process
shmget - shared memory was opened for reading
shmat - shared memory was mapped to process memory address space for reading
msgget - message queue was opened for reading
msgrcv - data was received from process
string (as enumerated) required
size only for operations read, tee (read), splice (read), vmsplice (read), recv, mq_timedreceive, and msgrcv, number of bytes transferred long
integer
optional
mode only for operation open, permissions applied to file integer (in octal) optional
flags only for operation open, shmat, msgget, and shmget status flags string optional
offset only for system calls pread, and preadv, offset in the file from where bytes were read long optional
time if known, when the event occurred (in Unix time) floating
point
required
event id if source is syscall, underlying event's identifier unsigned
integer
required
source can be one of:
syscall - if information came from a Linux kernel Audit system call record
/proc - if information came from Linux's /proc pseudofilesystem
beep - if information came from UBSI1
string (as enumerated) required
WasDerivedFrom
operation can be one of:
update - the artifact has been modified
rename - the same artifact has a new name
link - a new name can be used to refer to the old artifact
mmap - a file has been mapped into memory
tee - data copied between pipes
splice - data transferred between artifacts
string (as enumerated) required
pid process that performed the operation integer optional
time if known, when the event occurred (in Unix time) floating
point
required
event id if source is syscall, underlying event's identifier unsigned
integer
required
source can be one of:
syscall - if information came from a Linux kernel Audit system call record
netfilter - if information came from a Linux kernel Audit network filter record
/proc - if information came from Linux's /proc pseudofilesystem
beep - if information came from UBSI1
string (as enumerated) required

NOTE: Though some operation values match system call names, the semantics differ. In particular, the interpretation is provenance-oriented. Multiple system calls may map to a single operation value (such as chmod() and fchmod() both reported as chmod). Some system calls have an indirect effect (such as dup() resulting in a new file descriptor resolving to the old path during read() and write() calls). The mapping of system calls to OPM edges is outlined here.

1 Unit-based selective instrumentation (UBSI). For more information, see:

  • Hassaan Irshad, Gabriela Ciocarlie, Ashish Gehani, Vinod Yegneswaran, Kyu Lee, Jignesh Patel, Somesh Jha, Yonghwi Kwon, Dongyan Xu, Xiangyu Zhang, TRACE: Enterprise-Wide Provenance Tracking For Real-Time APT Detection, IEEE Transactions on Information Forensics and Security (TIFS), 2021. [PDF]
Clone this wiki locally