Program slicing is a technique to extract parts of a program based on a criterion. Atom (powered by chen library) is a static opinionated data flow slicer optimized for application and dependency analysis use cases with up to 100K LOC.
- Precise - With static analysis, atom can generate precise slices with verifiable location information from the application source code.
- Non-deterministic - The slicing operation is optimized for constant-time generation performance and therefore non-deterministic. Repeated runs could yield slightly varying results depending on code complexity.
- Secure - It is not possible to reverse-engineer and obtain the application source code from the atom slices alone.
All slices produce machine-readable json output that can be parsed using atom proto specification.
Usage slices can help answer two key questions about the usages of external libraries.
- HOW? Are the libraries used as-is or via custom alias or derived type?
- WHERE? File and line number locations of the definitions, imports, usage, calls etc.
The mind map below offers an overview.
- Parse the usages json.
- Iterate over the
objectSlices
array. For each slice store its fileName and lineNumber.
{
"objectSlices": [
{
"code": "",
"fullName": "com.example.vulnspring.WebController.jwt:java.lang.String(javax.servlet.http.HttpSession,org.springframework.ui.Model)",
"signature": "java.lang.String(javax.servlet.http.HttpSession,org.springframework.ui.Model)",
"fileName": "src/main/java/com/example/vulnspring/WebController.java",
"lineNumber": 274,
"columnNumber": 2,
"usages": [
{
"targetObj": {
"name": "username",
"typeFullName": "java.lang.String",
"lineNumber": 276,
"columnNumber": 3,
"label": "LOCAL"
},
-
Iterate over the
usages
array. The attributestypeFullName
(found intargetObj
anddefinedBy
) andresolvedMethod
(invokedCalls
andargToCalls
) under each category are of interest. -
Iterate over the
userDefinedTypes
array. Note thefileName
andlineNumber
for each type. For eachfield
, the attributetypeFullName
indicates the aliased field. For eachprocedure
, theparamTypes
array lists the custom type from index 1 onward.
"userDefinedTypes": [
{
"name": "com.example.vulnspring.WebController",
"fields": [
{
"name": "jdbcTemplate",
"typeFullName": "org.springframework.jdbc.core.JdbcTemplate",
"lineNumber" : 42,
"columnNumber": 15,
"label": "LOCAL"
},
{
"name": "logger",
"typeFullName": "org.slf4j.Logger",
"lineNumber": 44,
"columnNumber": 30,
"label" : "LOCAL"
}
],
"procedures": [
{
"callName": "home",
"resolvedMethod": "com.example.vulnspring.WebController.home:java.lang.String(org.springframework.ui.Model,javax.servlet.http.HttpSession)",
"paramTypes" : [
"com.example.vulnspring.WebController",
"org.springframework.ui.Model",
"javax.servlet.http.HttpSession"
],
"returnType": "java.lang.String",
"lineNumber": 46,
"columnNumber": 2
},
Data flow slices represent the data-dependency information computed statically from the source code using Reverse-Reachability Algorithm. The full list of nodes
and edges
from the
Data Dependency Graph (DDG) is also made available for custom visualization and traversal purposes. Up to 50 reachable
paths are precomputed and made available via the paths
attribute in the json by the atom cli tool for convenience.
It is quite common for organizations to have common libraries and modules in separate repositories, jar files, and other packages. These modules would use external libraries as sinks and might lack any entrypoints (sources). Data flow slicing could work well in theses scenarios where the entrypoints (sources) cannot be identified. The brute-force nature means data-flow slicing would often take significant amount of time compared to usages or reachables slicing.
- Parse the data flow json.
- Iterate over the
graph.nodes
array and create a Map for each node with the id as the key and the node as the value. - Iterate over the
paths
array. For each id, look up the node from the map object created in step 2. - Filter any operator calls where the name starts with
<operator
. Note that operator calls could start with either<operator>
or<operators>
(with ans
) due to a known unresolved bug. - All
CALL
nodes withisExternal=true
indicates external method calls. ThefullName
property is interesting for such external calls along with all theparent*
attributes such asparentFileName
,parentMethodName
etc. - Nodes with the label
METHOD_PARAMETER_IN
are method parameters. These could be user-provided input depending on the framework and filename. For instance, method parameters in a controller or service class usually takes input from the users or another service.
The information in a data-flow slice can be used as component evidence in a CycloneDX 1.5 document.
Data Flow Slice Attribute | CycloneDX Attribute | Comments |
---|---|---|
parentPackageName | package | Will be based on the filename for Javascript/Typescript |
parentClassName | module | Will be based on the filename for Javascript/Typescript |
parentMethodName | function | |
parentMethodSignature | parameters | Could be customized to ignore return types |
lineNumber | line | Could be unavailable for certain projects |
columnNumber | column | Could be unavailable for certain projects |
parentFileName | fullFilename |
Reachables represent data flows that can originate from an entrypoint (source) and reach an external sink. These potentially represent the paths an adversary could take to reach and exploit a known vulnerability in a third-party library and hence the terms "reachable flows" or "Forward-Reachability". With atom, reachable slices can be generated for Java, Jars, JavaScript, and TypeScript applications.
A necessary pre-requisite is the presence of the Software Bill-of-Materials (SBOM) file in the directory containing the source code. At present, only the SBOM generated by CycloneDX Generator have the precision and depth required for computing reachables.
- Parse the reachables json.
- Iterate over the
reachables
array. Each item in this array is an object containing flows (Reachable data-flows) and purls (List of Package URLs) - Each item in the flows array is of type
node
similar to the nodes array in the data flow slice above. - Each item in the purls array is of type string
{
"reachables": [
{
"flows": [
{
"id": 44,
"label": "METHOD_PARAMETER_IN",
"name": "this",
"fullName": "",
"signature": "",
"isExternal": false,
"code": "this",
"typeFullName": "com.example.SpringKafkaDemo.config.KafkaConsumerConfig",
"parentMethodName": "consumerFactory",
"parentMethodSignature": "org.springframework.kafka.core.ConsumerFactory()",
"parentFileName": "src/main/java/com/example/SpringKafkaDemo/config/KafkaConsumerConfig.java",
"parentPackageName": "com.example.SpringKafkaDemo.config",
"parentClassName": "com.example.SpringKafkaDemo.config.KafkaConsumerConfig",
"lineNumber": 36,
"columnNumber": null,
"tags": "framework-input"
}
],
"purls": [
"pkg:maven/org.springframework.kafka/spring-kafka@2.8.11?type=jar"
]
}
]
}
Use the atom cli to generate slices.
cdxgen -t java --deep -o bom.json .
atom reachables -o app.atom --slice-outfile df.json -l java .
atom data-flow -o app.atom --slice-outfile df.json -l java .
atom usages -o app.atom --slice-outfile usages.json -l java .
Planned for 2.0.0 release