From 0ac165e74837ab277937ca4585b72631fb25f51b Mon Sep 17 00:00:00 2001 From: facebook-github-bot Date: Fri, 20 Oct 2023 09:36:04 +0000 Subject: [PATCH] =?UTF-8?q?Deploying=20to=20gh-pages=20from=20=20@=209c5ca?= =?UTF-8?q?e6ca6b764cee16b2da8b7c05ef29620c180=20=F0=9F=9A=80?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- 404.html | 4 ++-- _src/models.md | 4 ++-- assets/js/{d6ed0749.2129e329.js => d6ed0749.f1f4c22d.js} | 2 +- .../{runtime~main.4240a0ef.js => runtime~main.12fefa2a.js} | 2 +- blog/archive/index.html | 4 ++-- blog/index.html | 4 ++-- blog/tags/facebook/index.html | 4 ++-- blog/tags/hello/index.html | 4 ++-- blog/tags/index.html | 4 ++-- blog/welcome/index.html | 4 ++-- docs/configuration/index.html | 4 ++-- docs/contribution/index.html | 4 ++-- docs/customize-sources-and-sinks/index.html | 4 ++-- docs/debugging-fp-fns/index.html | 4 ++-- docs/feature-descriptions/index.html | 4 ++-- docs/getting-started/index.html | 4 ++-- docs/known-false-negatives/index.html | 4 ++-- docs/models/index.html | 6 +++--- docs/overview/index.html | 4 ++-- docs/rules/index.html | 4 ++-- docs/shims/index.html | 4 ++-- index.html | 4 ++-- 22 files changed, 43 insertions(+), 43 deletions(-) rename assets/js/{d6ed0749.2129e329.js => d6ed0749.f1f4c22d.js} (79%) rename assets/js/{runtime~main.4240a0ef.js => runtime~main.12fefa2a.js} (85%) diff --git a/404.html b/404.html index 635627a5..a38c3e21 100644 --- a/404.html +++ b/404.html @@ -5,14 +5,14 @@ Page Not Found | Mariana Trench - +
Skip to main content

Page Not Found

We could not find what you were looking for.

Please contact the owner of the site that linked you to the original URL and let them know their link is broken.

- + \ No newline at end of file diff --git a/_src/models.md b/_src/models.md index bbcf62d4..9f1b58ce 100644 --- a/_src/models.md +++ b/_src/models.md @@ -960,7 +960,7 @@ Each "rule" defines a "filter" (which uses "constraints" to specify methods for - `signature_match`: Expects at least one of the two allowed groups of extra properties: `[name | names] [parent | parents | extends [include_self]]` where: - `name` (a single string) or `names` (a list of alternative strings): is exact matched to the method name - - `parent` (a single string) or `parents` (a list of alternative strings) is exact matched to the class of the method or `extends` (either a single string or a list of alternative strings) is exact matched to the base classes or interfaces of the method. `extends` allows an additional property `includes_self` which is a boolean to indicate if the constraint is applied to the class itself or not. + - `parent` (a single string) or `parents` (a list of alternative strings) is exact matched to the class of the method or `extends` (either a single string or a list of alternative strings) is exact matched to the base classes or interfaces of the method. `extends` allows an optional property `include_self` which is a boolean to indicate if the constraint is applied to the class itself or not (defaults to `true`). - `signature | signature_pattern`: Expects an extra property `pattern` which is a regex to fully match the full signature (class, method, argument types) of a method; - **NOTE:** Usage of this constraint is discouraged as it has poor performance. Try using `signature_match` instead! - `parent`: Expects an extra property `inner` [Type] which contains a nested constraint to apply to the class holding the method; @@ -977,7 +977,7 @@ Each "rule" defines a "filter" (which uses "constraints" to specify methods for - **Type:** - - `extends`: Expects an extra property `inner` [Type] which contains a nested constraint that must apply to one of the base classes or itself. The optional property `includes_self` is a boolean that tells whether the constraint must be applied on the type itself or not; + - `extends`: Expects an extra property `inner` [Type] which contains a nested constraint that must apply to one of the base classes or itself. The optional property `include_self` is a boolean that tells whether the constraint must be applied on the type itself or not (defaults to `true`); - `super`: Expects an extra property `inner` [Type] which contains a nested constraint that must apply on the direct superclass; - `is_class | is_interface`: Accepts an extra property `value` which is either `true` or `false`. By default, `value` is considered `true`; diff --git a/assets/js/d6ed0749.2129e329.js b/assets/js/d6ed0749.f1f4c22d.js similarity index 79% rename from assets/js/d6ed0749.2129e329.js rename to assets/js/d6ed0749.f1f4c22d.js index 6f1a0254..4a575252 100644 --- a/assets/js/d6ed0749.2129e329.js +++ b/assets/js/d6ed0749.f1f4c22d.js @@ -1 +1 @@ -"use strict";(self.webpackChunkwebsite=self.webpackChunkwebsite||[]).push([[133],{3905:(e,n,a)=>{a.r(n),a.d(n,{MDXContext:()=>d,MDXProvider:()=>u,mdx:()=>x,useMDXComponents:()=>p,withMDXComponents:()=>m});var t=a(67294);function i(e,n,a){return n in e?Object.defineProperty(e,n,{value:a,enumerable:!0,configurable:!0,writable:!0}):e[n]=a,e}function r(){return r=Object.assign||function(e){for(var n=1;n=0||(i[a]=e[a]);return i}(e,n);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);for(t=0;t=0||Object.prototype.propertyIsEnumerable.call(e,a)&&(i[a]=e[a])}return i}var d=t.createContext({}),m=function(e){return function(n){var a=p(n.components);return t.createElement(e,r({},n,{components:a}))}},p=function(e){var n=t.useContext(d),a=n;return e&&(a="function"==typeof e?e(n):l(l({},n),e)),a},u=function(e){var n=p(e.components);return t.createElement(d.Provider,{value:n},e.children)},c={inlineCode:"code",wrapper:function(e){var n=e.children;return t.createElement(t.Fragment,{},n)}},h=t.forwardRef((function(e,n){var a=e.components,i=e.mdxType,r=e.originalType,o=e.parentName,d=s(e,["components","mdxType","originalType","parentName"]),m=p(a),u=i,h=m["".concat(o,".").concat(u)]||m[u]||c[u]||r;return a?t.createElement(h,l(l({ref:n},d),{},{components:a})):t.createElement(h,l({ref:n},d))}));function x(e,n){var a=arguments,i=n&&n.mdxType;if("string"==typeof e||i){var r=a.length,o=new Array(r);o[0]=h;var l={};for(var s in n)hasOwnProperty.call(n,s)&&(l[s]=n[s]);l.originalType=e,l.mdxType="string"==typeof e?e:i,o[1]=l;for(var d=2;d{a.r(n),a.d(n,{assets:()=>s,contentTitle:()=>o,default:()=>c,frontMatter:()=>r,metadata:()=>l,toc:()=>d});var t=a(87462),i=(a(67294),a(3905));const r={id:"models",title:"Models & Model Generators",sidebar_label:"Models & Model Generators"},o=void 0,l={unversionedId:"models",id:"models",title:"Models & Model Generators",description:"The main way to configure the analysis is through defining model generators. Each model generator defines (1) a filter, made up of constraints to specify the methods (or fields) for which a model should be generated, and (2) a model, an abstract representation of how data flows through a method.",source:"@site/documentation/models.md",sourceDirName:".",slug:"/models",permalink:"/docs/models",draft:!1,editUrl:"https://github.com/facebook/mariana-trench/tree/main/documentation/website/documentation/models.md",tags:[],version:"current",frontMatter:{id:"models",title:"Models & Model Generators",sidebar_label:"Models & Model Generators"},sidebar:"docs",previous:{title:"Rules",permalink:"/docs/rules"},next:{title:"Shims",permalink:"/docs/shims"}},s={},d=[{value:"Models",id:"models",level:2},{value:"Method name format",id:"method-name-format",level:3},{value:"Access path format",id:"access-path-format",level:3},{value:"Kinds",id:"kinds",level:3},{value:"Sources",id:"sources",level:3},{value:"Sinks",id:"sinks",level:3},{value:"Return Sinks",id:"return-sinks",level:3},{value:"Propagation",id:"propagation",level:3},{value:"Features",id:"features",level:3},{value:"Attach to Sources",id:"attach-to-sources",level:4},{value:"Attach to Sinks",id:"attach-to-sinks",level:4},{value:"Attach to Propagations",id:"attach-to-propagations",level:4},{value:"Add Features to Arguments",id:"add-features-to-arguments",level:4},{value:"Via-type Features",id:"via-type-features",level:4},{value:"Via-value Features",id:"via-value-features",level:4},{value:"Taint Broadening",id:"taint-broadening",level:3},{value:"Propagation Broadening",id:"propagation-broadening",level:4},{value:"Issue Broadening Feature",id:"issue-broadening-feature",level:5},{value:"Widen Broadening Feature",id:"widen-broadening-feature",level:5},{value:"Sanitizers",id:"sanitizers",level:3},{value:"Kind-specific Sanitizers",id:"kind-specific-sanitizers",level:4},{value:"Port-specific Sanitizers",id:"port-specific-sanitizers",level:4},{value:"Modes",id:"modes",level:3},{value:"Default model",id:"default-model",level:3},{value:"Field Models",id:"field-models",level:3},{value:"Literal Models",id:"literal-models",level:3},{value:"Model Generators",id:"model-generators",level:2},{value:"Example",id:"example",level:3},{value:"Specification",id:"specification",level:3},{value:"Development",id:"development",level:3},{value:"When Sources or Sinks don't appear in Results",id:"when-sources-or-sinks-dont-appear-in-results",level:4}],m=(p="FbModels",function(e){return console.warn("Component "+p+" was not imported, exported, or provided by MDXProvider as global scope"),(0,i.mdx)("div",e)});var p;const u={toc:d};function c(e){let{components:n,...a}=e;return(0,i.mdx)("wrapper",(0,t.Z)({},u,a,{components:n,mdxType:"MDXLayout"}),(0,i.mdx)("p",null,"The main way to configure the analysis is through defining model generators. Each model generator defines (1) a ",(0,i.mdx)("strong",{parentName:"p"},"filter"),", made up of constraints to specify the methods (or fields) for which a model should be generated, and (2) a ",(0,i.mdx)("strong",{parentName:"p"},"model"),", an abstract representation of how data flows through a method."),(0,i.mdx)("p",null,"Model generators are what define Sink and Source kinds which are the key component of ",(0,i.mdx)("a",{parentName:"p",href:"/docs/rules"},"Rules"),". Model generators can do other things too, like attach ",(0,i.mdx)("strong",{parentName:"p"},"features")," (a.k.a. breadcrumbs) to flows and ",(0,i.mdx)("strong",{parentName:"p"},"sanitize"),' (redact) flows which go through certain "data-safe" methods (e.g. a method which hashes a user\'s password).'),(0,i.mdx)("p",null,"Filters are conceptually straightforward. Thus, this page focuses heavily on conceptualizing and providing examples for the various types of models. See the ",(0,i.mdx)("a",{parentName:"p",href:"#model-generators"},"Model Generators")," section for full implementation documentation for both filters and models."),(0,i.mdx)("h2",{id:"models"},"Models"),(0,i.mdx)("p",null,"A model is an abstract representation of how data flows through a method."),(0,i.mdx)("p",null,"A model essentialy consists of:"),(0,i.mdx)("ul",null,(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("a",{parentName:"li",href:"#sources"},"Sources"),": a set of sources that the method produces or receives on parameters;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("a",{parentName:"li",href:"#sinks"},"Sinks"),": a set of sinks on the method;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("a",{parentName:"li",href:"#propagation"},"Propagation"),": a description of how the method propagates taint coming into it (e.g, the first parameter updates the second, the second parameter updates the return value, etc.);"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("a",{parentName:"li",href:"#attach-to-sources"},"Attach to Sources"),": a set of features/breadcrumbs to add on an any sources flowing out of the method;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("a",{parentName:"li",href:"#attach-to-sinks"},"Attach to Sinks"),": a set of features/breadcrumbs to add on sinks of a given parameter;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("a",{parentName:"li",href:"#attach-to-propagations"},"Attach to Propagations"),": a set of features/breadcrumbs to add on propagations for a given parameter or return value;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("a",{parentName:"li",href:"#add-features-to-arguments"},"Add Features to Arguments"),": a set of features/breadcrumbs to add on any taint that might flow in a given parameter;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("a",{parentName:"li",href:"#sanitizers"},"Sanitizers"),": specifications of taint flows to stop;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("a",{parentName:"li",href:"#modes"},"Modes"),": a set of flags describing specific behaviors (see below).")),(0,i.mdx)("p",null,"Models can be specified in JSON. For example to mark the string parameter to our ",(0,i.mdx)("inlineCode",{parentName:"p"},"Logger.log")," function as a sink we can specify it as"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "parent": "Lcom/example/Logger;",\n "name": "log"\n }\n ],\n "model": {\n "sinks": [\n {\n "kind": "Logging",\n "port": "Argument(1)"\n }\n ]\n }\n}\n')),(0,i.mdx)("p",null,"Note that the naming of methods follow the ",(0,i.mdx)("a",{parentName:"p",href:"#method-name-format"},"Dalvik's bytecode format"),"."),(0,i.mdx)("h3",{id:"method-name-format"},"Method name format"),(0,i.mdx)("p",null,"The format used for method names is:"),(0,i.mdx)("p",null,(0,i.mdx)("inlineCode",{parentName:"p"},".:()")),(0,i.mdx)("p",null,"Example: ",(0,i.mdx)("inlineCode",{parentName:"p"},"Landroidx/fragment/app/Fragment;.startActivity:(Landroid/content/Intent;)V")),(0,i.mdx)("p",null,"For the parameters and return types use the following table to pick the correct one (please refer to ",(0,i.mdx)("a",{parentName:"p",href:"https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.3.2-200"},"JVM doc")," for more details)"),(0,i.mdx)("ul",null,(0,i.mdx)("li",{parentName:"ul"},"V - void"),(0,i.mdx)("li",{parentName:"ul"},"Z - boolean"),(0,i.mdx)("li",{parentName:"ul"},"B - byte"),(0,i.mdx)("li",{parentName:"ul"},"S - short"),(0,i.mdx)("li",{parentName:"ul"},"C - char"),(0,i.mdx)("li",{parentName:"ul"},"I - int"),(0,i.mdx)("li",{parentName:"ul"},"J - long (64 bits)"),(0,i.mdx)("li",{parentName:"ul"},"F - float"),(0,i.mdx)("li",{parentName:"ul"},"D - double (64 bits)")),(0,i.mdx)("p",null,"Classes take the form ",(0,i.mdx)("inlineCode",{parentName:"p"},"Lpackage/name/ClassName;")," - where the leading ",(0,i.mdx)("inlineCode",{parentName:"p"},"L")," indicates that it is a class type, ",(0,i.mdx)("inlineCode",{parentName:"p"},"package/name/")," is the package that the class is in. A nested class will take the form ",(0,i.mdx)("inlineCode",{parentName:"p"},"Lpackage/name/ClassName$NestedClassName")," (the ",(0,i.mdx)("inlineCode",{parentName:"p"},"$")," will need to be double escaped ",(0,i.mdx)("inlineCode",{parentName:"p"},"\\\\$")," in json regex)."),(0,i.mdx)("blockquote",null,(0,i.mdx)("p",{parentName:"blockquote"},(0,i.mdx)("strong",{parentName:"p"},"NOTE:")," Instance (i.e, non-static) method parameters are indexed starting from 1! The 0th parameter is the ",(0,i.mdx)("inlineCode",{parentName:"p"},"this")," parameter in dalvik byte-code. For static method parameter, indices start from 0.")),(0,i.mdx)("h3",{id:"access-path-format"},"Access path format"),(0,i.mdx)("p",null,'An access path describes the symbolic location of a taint. This is commonly used to indicate where a source or a sink originates from. The "port" field of any model is represented by an access path.'),(0,i.mdx)("p",null,"An access path is composed of a root and a path."),(0,i.mdx)("p",null,"The root is either:"),(0,i.mdx)("ul",null,(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"Return"),", representing the returned value;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"Argument(x)")," (where ",(0,i.mdx)("inlineCode",{parentName:"li"},"x")," is an integer), representing the parameter number ",(0,i.mdx)("inlineCode",{parentName:"li"},"x"),";")),(0,i.mdx)("p",null,"The path is a (possibly empty) list of path elements. A path element can be any of the following kinds:"),(0,i.mdx)("ul",null,(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"field"),": represents a field name. String encoding is a dot followed by the field name: ",(0,i.mdx)("inlineCode",{parentName:"li"},".field_name"),";"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"index"),": represents a user defined index for dictionary like objects. String encoding uses square braces to enclose any user defined index: ",(0,i.mdx)("inlineCode",{parentName:"li"},"[index_name]"),";"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"any index"),": represents any or unresolved indices in dictionary like objects. String encoding is an asterisk enclosed in square braces: ",(0,i.mdx)("inlineCode",{parentName:"li"},"[*]"),";"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"index from value of"),": captures the value of the specified callable's port seen at its callsites during taint flow analysis as an ",(0,i.mdx)("inlineCode",{parentName:"li"},"index")," or ",(0,i.mdx)("inlineCode",{parentName:"li"},"any index")," (if the value cannot be resolved). String encoding uses ",(0,i.mdx)("em",{parentName:"li"},"argument root")," to specify the callable's port and encloses it in ",(0,i.mdx)("inlineCode",{parentName:"li"},"[<"),"...",(0,i.mdx)("inlineCode",{parentName:"li"},">]")," to represent that its value is resolved at the callsite to create an index: ",(0,i.mdx)("inlineCode",{parentName:"li"},"[]"),";")),(0,i.mdx)("p",null,"Examples:"),(0,i.mdx)("ul",null,(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"Argument(1).name")," corresponds to the ",(0,i.mdx)("em",{parentName:"li"},"field")," ",(0,i.mdx)("inlineCode",{parentName:"li"},"name")," of the second parameter;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"Argument(1)[name]")," corresponds to the ",(0,i.mdx)("em",{parentName:"li"},"index")," ",(0,i.mdx)("inlineCode",{parentName:"li"},"name")," of the dictionary like second parameter;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"Argument(1)[*]")," corresponds to ",(0,i.mdx)("em",{parentName:"li"},"any index")," of the dictionary like second parameter;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"Argument(1)[]")," corresponds to an ",(0,i.mdx)("em",{parentName:"li"},"index")," of the dictionary like second parameter whose value is resolved from the third parameter;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"Return")," corresponds to the returned value;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"Return.x")," correpsonds to the field ",(0,i.mdx)("inlineCode",{parentName:"li"},"x")," of the returned value;")),(0,i.mdx)("h3",{id:"kinds"},"Kinds"),(0,i.mdx)("p",null,"A source has a ",(0,i.mdx)("strong",{parentName:"p"},"kind")," that describes its content (e.g, user input, file system, etc). A sink also has a ",(0,i.mdx)("strong",{parentName:"p"},"kind")," that describes the operation the method performs (e.g, execute a command, read a file, etc.). Kinds can be arbitrary strings (e.g, ",(0,i.mdx)("inlineCode",{parentName:"p"},"UserInput"),"). We usually avoid whitespaces."),(0,i.mdx)("h3",{id:"sources"},"Sources"),(0,i.mdx)("p",null,"Sources describe sources produced or received by a given method. A source can either flow out via the return value or flow via a given parameter. A source has a ",(0,i.mdx)("strong",{parentName:"p"},"kind")," that describes its content (e.g, user input, file system, etc)."),(0,i.mdx)("p",null,"Here is an example where the source flows by return value:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},'public static String getPath() {\n return System.getenv().get("PATH");\n}\n')),(0,i.mdx)("p",null,"The JSON model generator for this method could be:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "parent": "Lcom/example/Class;",\n "name": "getPath"\n }\n ],\n "model": {\n "sources": [\n {\n "kind": "UserControlled",\n "port": "Return"\n }\n ]\n }\n}\n')),(0,i.mdx)("p",null,"Here is an example where the source flows in via an argument:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"class MyActivity extends Activity {\n public void onNewIntent(Intent intent) {\n // intent should be considered a source here.\n }\n}\n")),(0,i.mdx)("p",null,"The JSON model generator for this method could be:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "extends": "Landroid/app/Activity",\n "name": "onNewIntent"\n }\n ],\n "model": {\n "sources": [\n {\n "kind": "UserControlled",\n "port": "Argument(1)"\n }\n ]\n }\n}\n')),(0,i.mdx)("p",null,"Note that the implicit ",(0,i.mdx)("inlineCode",{parentName:"p"},"this")," parameter is considered the argument 0."),(0,i.mdx)("h3",{id:"sinks"},"Sinks"),(0,i.mdx)("p",null,"Sinks describe dangerous or sensitive methods in the code. A sink has a ",(0,i.mdx)("strong",{parentName:"p"},"kind")," that represents the type of operation the method does (e.g, command execution, file system operation, etc). A sink must be attached to a given parameter of the method. A method can have multiple sinks."),(0,i.mdx)("p",null,"Here is an example of a sink:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"public static String readFile(String path, String extension, int mode) {\n // Return the content of the file path.extension\n}\n")),(0,i.mdx)("p",null,"Since ",(0,i.mdx)("inlineCode",{parentName:"p"},"path")," and ",(0,i.mdx)("inlineCode",{parentName:"p"},"extension")," can be used to read arbitrary files, we consider them sinks. We do not consider ",(0,i.mdx)("inlineCode",{parentName:"p"},"mode")," as a sink since we do not care whether the user can control it. The JSON model generator for this method could be:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "parent": "Lcom/example/Class",\n "name": "readFile"\n }\n ],\n "model": {\n "sinks": [\n {\n "kind": "FileRead",\n "port": "Argument(0)"\n },\n {\n "kind": "FileRead",\n "port": "Argument(1)"\n }\n ]\n }\n}\n')),(0,i.mdx)("h3",{id:"return-sinks"},"Return Sinks"),(0,i.mdx)("p",null,"Return sinks can be used to describe that a method should not return tainted information. A return sink is just a normal sink with a ",(0,i.mdx)("inlineCode",{parentName:"p"},"Return")," port."),(0,i.mdx)("h3",{id:"propagation"},"Propagation"),(0,i.mdx)("p",null,"Propagations \u2212 also called ",(0,i.mdx)("strong",{parentName:"p"},"tito")," (Taint In Taint Out) or ",(0,i.mdx)("strong",{parentName:"p"},"passthrough")," in other tools \u2212 describe how the method propagates taint. A propagation as an ",(0,i.mdx)("strong",{parentName:"p"},"input")," (where the taint comes from) and an ",(0,i.mdx)("strong",{parentName:"p"},"output")," (where the taint is moved to)."),(0,i.mdx)("p",null,"Here is an example of a propagation:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"public static String concat(String x, String y) {\n return x + y;\n}\n")),(0,i.mdx)("p",null,"The return value of the method can be controlled by both parameters, hence it has the propagations ",(0,i.mdx)("inlineCode",{parentName:"p"},"Argument(0) -> Return")," and ",(0,i.mdx)("inlineCode",{parentName:"p"},"Argument(1) -> Return"),". The JSON model generator for this method could be:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "parent": "Lcom/example/Class",\n "name": "concat"\n }\n ],\n "model": {\n "propagation": [\n {\n "input": "Argument(0)",\n "output": "Return"\n },\n {\n "input": "Argument(1)",\n "output": "Return"\n }\n ]\n }\n}\n')),(0,i.mdx)("h3",{id:"features"},"Features"),(0,i.mdx)("p",null,"Features (also called ",(0,i.mdx)("strong",{parentName:"p"},"breadcrumbs"),") can be used to tag a flow and help filtering issues. A feature describes a property of a flow. A feature can be any arbitrary string."),(0,i.mdx)("p",null,"For instance, the feature ",(0,i.mdx)("inlineCode",{parentName:"p"},"via-numerical-operator")," is used to describe that the data flows through a numerical operator such as an addition."),(0,i.mdx)("p",null,"Features are very useful to filter flows in the SAPP UI. E.g. flows with a cast from string to integer are can sometimes be less important during triaging since controlling an integer is more difficult to exploit than controlling a full string."),(0,i.mdx)("p",null,"Note that features ",(0,i.mdx)("strong",{parentName:"p"},"do not stop")," the flow, they just help triaging."),(0,i.mdx)("h4",{id:"attach-to-sources"},"Attach to Sources"),(0,i.mdx)("p",null,(0,i.mdx)("em",{parentName:"p"},"Attach to sources")," is used to add a set of ",(0,i.mdx)("a",{parentName:"p",href:"#features"},"features")," on any sources flowing out of a method through a given parameter or return value."),(0,i.mdx)("p",null,"For instance, if we want to add the feature ",(0,i.mdx)("inlineCode",{parentName:"p"},"via-signed")," to all sources flowing out of the given method:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"public String getSignedCookie();\n")),(0,i.mdx)("p",null,"We could use the following JSON model generator:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "parent": "Lcom/example/Class",\n "name": "getSignedCookie"\n }\n ],\n "model": {\n "attach_to_sources": [\n {\n "features": [\n "via-signed"\n ],\n "port": "Return"\n }\n ]\n }\n}\n')),(0,i.mdx)("p",null,"Note that this is only useful for sources inferred by the analysis. If you know that ",(0,i.mdx)("inlineCode",{parentName:"p"},"getSignedCookie")," returns a source of a given kind, you should use a source instead."),(0,i.mdx)("h4",{id:"attach-to-sinks"},"Attach to Sinks"),(0,i.mdx)("p",null,(0,i.mdx)("em",{parentName:"p"},"Attach to sinks")," is used to add a set of ",(0,i.mdx)("a",{parentName:"p",href:"#features"},"features")," on all sinks on the given parameter of a method."),(0,i.mdx)("p",null,"For instance, if we want to add the feature ",(0,i.mdx)("inlineCode",{parentName:"p"},"via-user")," on all sinks of the given method:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"class User {\n public static User findUser(String username) {\n // The code here might use SQL, Thrift, or anything. We don't need to know.\n }\n}\n")),(0,i.mdx)("p",null,"We could use the following JSON model generator:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "parent": "Lcom/example/User",\n "name": "findUser"\n }\n ],\n "model": {\n "attach_to_sinks": [\n {\n "features": [\n "via-user"\n ],\n "port": "Argument(0)"\n }\n ]\n }\n}\n')),(0,i.mdx)("p",null,"Note that this is only useful for sinks inferred by the analysis. If you know that ",(0,i.mdx)("inlineCode",{parentName:"p"},"findUser")," is a sink of a given kind, you should use a sink instead."),(0,i.mdx)("h4",{id:"attach-to-propagations"},"Attach to Propagations"),(0,i.mdx)("p",null,(0,i.mdx)("em",{parentName:"p"},"Attach to propagations")," is used to add a set of ",(0,i.mdx)("a",{parentName:"p",href:"#features"},"features")," on all propagations from or to a given parameter or return value of a method."),(0,i.mdx)("p",null,"For instance, if we want to add the feature ",(0,i.mdx)("inlineCode",{parentName:"p"},"via-concat")," to the propagations of the given method:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"public static String concat(String x, String y);\n")),(0,i.mdx)("p",null,"We could use the following JSON model generator:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "parent": "Lcom/example/Class",\n "name": "concat"\n }\n ],\n "model": {\n "attach_to_propagations": [\n {\n "features": [\n "via-concat"\n ],\n "port": "Return" // We could also use Argument(0) and Argument(1)\n }\n ]\n }\n}\n')),(0,i.mdx)("p",null,"Note that this is only useful for propagations inferred by the analysis. If you know that ",(0,i.mdx)("inlineCode",{parentName:"p"},"concat")," has a propagation, you should model it as a propagation directly."),(0,i.mdx)("h4",{id:"add-features-to-arguments"},"Add Features to Arguments"),(0,i.mdx)("p",null,(0,i.mdx)("em",{parentName:"p"},"Add features to arguments")," is used to add a set of ",(0,i.mdx)("a",{parentName:"p",href:"#features"},"features")," on all sources that ",(0,i.mdx)("strong",{parentName:"p"},"might")," flow on a given parameter of a method."),(0,i.mdx)("p",null,(0,i.mdx)("em",{parentName:"p"},"Add features to arguments")," implies ",(0,i.mdx)("em",{parentName:"p"},"Attach to sources"),", ",(0,i.mdx)("em",{parentName:"p"},"Attach to sinks")," and ",(0,i.mdx)("em",{parentName:"p"},"Attach to propagations"),", but it also accounts for possible side effects at call sites."),(0,i.mdx)("p",null,"For instance:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},'public static void log(String message) {\n System.out.println(message);\n}\npublic void buyView() {\n String username = getParameter("username");\n String product = getParameter("product");\n log(username);\n buy(username, product);\n}\n')),(0,i.mdx)("p",null,"Technically, the ",(0,i.mdx)("inlineCode",{parentName:"p"},"log")," method doesn't have any source, sink or propagation. We can use ",(0,i.mdx)("em",{parentName:"p"},"add features to arguments")," to add a feature ",(0,i.mdx)("inlineCode",{parentName:"p"},"was-logged")," on the flow from ",(0,i.mdx)("inlineCode",{parentName:"p"},'getParameter("username")')," to ",(0,i.mdx)("inlineCode",{parentName:"p"},"buy(username, product)"),". We could use the following JSON model generator for the ",(0,i.mdx)("inlineCode",{parentName:"p"},"log")," method:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "parent": "Lcom/example/Class",\n "name": "log"\n }\n ],\n "model": {\n "add_features_to_arguments": [\n {\n "features": [\n "was-logged"\n ],\n "port": "Argument(0)"\n }\n ]\n }\n}\n')),(0,i.mdx)("h4",{id:"via-type-features"},"Via-type Features"),(0,i.mdx)("p",null,(0,i.mdx)("em",{parentName:"p"},"Via-type")," features are used to keep track of the type of a callable\u2019s port seen at its callsites during taint flow analysis. They are specified in model generators within the \u201csources\u201d or \u201csinks\u201d field of a model with the \u201cvia_type_of\u201d field. It is mapped to a nonempty list of ports of the method for which we want to create via-type features."),(0,i.mdx)("p",null,"For example, if we were interested in the specific Activity subclasses with which the method below was called:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"\npublic void startActivityForResult(Intent intent, int requestCode);\n\n// At some callsite:\nActivitySubclass activitySubclassInstance;\nactivitySubclassInstance.startActivityForResult(intent, requestCode);\n\n")),(0,i.mdx)("p",null,"we could use the following JSON to specifiy a via-type feature that would materialize as ",(0,i.mdx)("inlineCode",{parentName:"p"},"via-type:ActivitySubclass"),":"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "parent": "Lcom/example/Class",\n "name": "startActivityForResult"\n }\n ],\n "model": {\n "sinks": [\n {\n "port": "Argument(1)",\n "kind": "SinkKind",\n "via_type_of": [\n "Argument(0)"\n ]\n }\n ]\n }\n}\n')),(0,i.mdx)("h4",{id:"via-value-features"},"Via-value Features"),(0,i.mdx)("p",null,(0,i.mdx)("em",{parentName:"p"},"Via-value")," feature captures the value of the specified callable's port seen at its callsites during taint flow analysis. They are specified similar to ",(0,i.mdx)("inlineCode",{parentName:"p"},"Via-type"),' features -- in model generators within the "sources" or "sinks" field of a model with the "via_value_of" field. It is mapped to a nonempty list of ports of the method for which we want to create via-value features.'),(0,i.mdx)("p",null,"For example, if we were interested in the specific ",(0,i.mdx)("inlineCode",{parentName:"p"},"mode")," with which the method below was called:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},'public void log(String mode, String message);\n\nclass Constants {\n public static final String MODE = "M1";\n}\n\n// At some callsite:\nlog(Constants.MODE, "error message");\n\n')),(0,i.mdx)("p",null,"we could use the following JSON to specifiy a via-value feature that would materialize as ",(0,i.mdx)("inlineCode",{parentName:"p"},"via-value:M1"),":"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "parent": "Lcom/example/Class",\n "name": "log"\n }\n ],\n "model": {\n "sinks": [\n {\n "port": "Argument(1)",\n "kind": "SinkKind",\n "via_value_of": [\n "Argument(0)"\n ]\n }\n ]\n }\n}\n')),(0,i.mdx)("p",null,"Note that this only works for numeric and string literals. In cases where the argument is not a constant, the feature will appear as ",(0,i.mdx)("inlineCode",{parentName:"p"},"via-value:unknown"),"."),(0,i.mdx)("h3",{id:"taint-broadening"},"Taint Broadening"),(0,i.mdx)("p",null,(0,i.mdx)("strong",{parentName:"p"},"Taint broadening")," (also called ",(0,i.mdx)("strong",{parentName:"p"},"collapsing"),") happens when Mariana Trench needs to make an approximation about a taint flow. It is the operation of reducing a ",(0,i.mdx)("strong",{parentName:"p"},"taint tree")," into a single element. A ",(0,i.mdx)("strong",{parentName:"p"},"taint tree")," is a tree where edges are field names and nodes are taint element. This is how Mariana Trench represents internally which fields (or sequence of fields) are tainted."),(0,i.mdx)("p",null,"For instance, analyzing the following code:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"MyClass var = new MyClass();\nvar.a = sourceX();\nvar.b.c = sourceY();\nvar.b.d = sourceZ();\n")),(0,i.mdx)("p",null,"The taint tree of variable ",(0,i.mdx)("inlineCode",{parentName:"p"},"var")," would be:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre"}," .\n a / \\ b\n { X } .\n c / \\ d\n { Y } { Z }\n")),(0,i.mdx)("p",null,"After collapsing, the tree is reduced to a single node ",(0,i.mdx)("inlineCode",{parentName:"p"},"{ X, Y, Z }"),", which is less precise."),(0,i.mdx)("p",null,"In conclusion, taint broadening effectively leads to considering the whole object as tainted while only some specific fields were initially tainted. This might happen for the correctness of the analysis or for performance reasons."),(0,i.mdx)("p",null,"In the following sections, we will discuss when collapsing can happen. In most cases, a feature is automatically added on collapsed taint to help detect false positives."),(0,i.mdx)("h4",{id:"propagation-broadening"},"Propagation Broadening"),(0,i.mdx)("p",null,"Taint collapsing is applied when taint is propagated through a method."),(0,i.mdx)("p",null,"For instance:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"MyClass input = new MyClass();\ninput.a = SourceX();\nMyClass output = SomeClass.UnknownMethod(input);\nSink(output.b); // Considered an issue since `output` is considered tainted. This could be a False Negative without collapsing.\n")),(0,i.mdx)("p",null,"In that case, the ",(0,i.mdx)("a",{parentName:"p",href:"#feature"},"feature")," ",(0,i.mdx)("inlineCode",{parentName:"p"},"via-propagation-broadening")," will be automatically added on the taint. This can help identify false positives."),(0,i.mdx)("p",null,"If you know that this method ",(0,i.mdx)("strong",{parentName:"p"},"preserves the structure")," of the parameter, you could specify a model and disable collapsing using the ",(0,i.mdx)("inlineCode",{parentName:"p"},"collapse")," attribute within a ",(0,i.mdx)("a",{parentName:"p",href:"#propagation"},(0,i.mdx)("inlineCode",{parentName:"a"},"propagation")),":"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "parent": "Lcom/example/SomeClass",\n "name": "UnknownMethod"\n }\n ],\n "model": {\n "propagation": [\n {\n "input": "Argument(0)",\n "output": "Return",\n "collapse": false\n }\n ]\n }\n}\n')),(0,i.mdx)("p",null,"Note that Mariana Trench can usually infer when a method propagates taint without collapsing it when it has access to the code of that method and subsequent calls. For instance:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"public String identity(String x) {\n // Automatically infers a propagation `Arg(0) -> Return` with `collapse=false`\n return x;\n}\n")),(0,i.mdx)("h5",{id:"issue-broadening-feature"},"Issue Broadening Feature"),(0,i.mdx)("p",null,"The ",(0,i.mdx)("inlineCode",{parentName:"p"},"via-issue-broadening")," feature is added to issues where the taint flowing into the sink was not held directly on the object passed in but on one of its fields. For example:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"Class input = new Class();\ninput.field = source();\nsink(input); // `input` is not tainted, but `input.field` is tainted and creates an issue\n")),(0,i.mdx)("h5",{id:"widen-broadening-feature"},"Widen Broadening Feature"),(0,i.mdx)("p",null,"For performance reasons, if a given taint tree becomes very large (either in depth or in number of nodes at a given level), Mariana Trench collapses the tree to a smaller size. In these cases, the ",(0,i.mdx)("inlineCode",{parentName:"p"},"via-widen-broadening")," feature is added to the collapsed taint"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"Class input = new Class();\nif (\\* condition *\\) {\n input.field1 = source();\n input.field2 = source();\n ...\n} else {\n input.fieldA = source();\n input.fieldB = source();\n ...\n}\nsink(input); // Too many fields are sources so the whole input object becomes tainted\n")),(0,i.mdx)("h3",{id:"sanitizers"},"Sanitizers"),(0,i.mdx)("p",null,"Specifying sanitizers on a model allow us to stop taint flowing through that method. In Mariana Trench, they can be one of three types -"),(0,i.mdx)("ul",null,(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"sources"),": prevent any taint sources from flowing out of the method"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"sinks"),": prevent taint from reaching any sinks within the method"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"propagations"),": prevent propagations from being inferred between any two ports of the method.")),(0,i.mdx)("p",null,"These can be specified in model generators as follows -"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": ...,\n "model": {\n "sanitizers": [\n {\n "sanitize": "sources"\n },\n {\n "sanitize": "sinks"\n },\n {\n "sanitize": "propagations"\n }\n ],\n ...\n }\n}\n')),(0,i.mdx)("p",null,"Note, if there are any user-specificed sources, sinks or propagations on the model, sanitizers will not affect them, but it will prevent them from being propagated outward to callsites."),(0,i.mdx)("h4",{id:"kind-specific-sanitizers"},"Kind-specific Sanitizers"),(0,i.mdx)("p",null,(0,i.mdx)("inlineCode",{parentName:"p"},"sources")," and ",(0,i.mdx)("inlineCode",{parentName:"p"},"sinks")," sanitizers may include a list of kinds (each with or without a partial_label) to restrict the sanitizer to only sanitizing taint of those kinds. (When unspecified, as in the example above, all taint is sanitized regardless of kind)."),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'"sanitizers": [\n {\n "sanitize": "sinks",\n "kinds": [\n {\n "kind": "SinkKindA"\n },\n {\n "kind": "SinkKindB",\n "partial_label": "A"\n }\n ]\n }\n]\n')),(0,i.mdx)("h4",{id:"port-specific-sanitizers"},"Port-specific Sanitizers"),(0,i.mdx)("p",null,"Sanitizers can also specify a specific port (",(0,i.mdx)("a",{parentName:"p",href:"/docs/models#access-path-format"},"access path")," root) they sanitize (ignoring all the rest). This field ",(0,i.mdx)("inlineCode",{parentName:"p"},"port")," has a slightly different meaning for each kind of sanitizer -"),(0,i.mdx)("ul",null,(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"sources"),": represents the output port through which sources may not leave the method"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"sinks"),": represents the input port through which taint may not trigger any sinks within the model"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"propagations"),": represents the input port through which a propagation to any other port may not be inferred")),(0,i.mdx)("p",null,"For example if the following method"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"public void someMethod(Object argument1, Object argument2) {\n toSink(argument1);\n toSink(argument2);\n}\n")),(0,i.mdx)("p",null,"had the following sanitizer in its model,"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'"sanitizers": [\n {\n "sanitize": "sinks",\n "port": "Argument(1)"\n }\n]\n')),(0,i.mdx)("p",null,"Then a source flowing into ",(0,i.mdx)("inlineCode",{parentName:"p"},"argument1")," would be able to cause an issue, but not a source flowing into ",(0,i.mdx)("inlineCode",{parentName:"p"},"argument2"),"."),(0,i.mdx)("p",null,"Kind and port specifications may be included in the same sanitizer."),(0,i.mdx)("h3",{id:"modes"},"Modes"),(0,i.mdx)("p",null,"Modes are used to describe specific behaviors of methods. Available modes are:"),(0,i.mdx)("ul",null,(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"skip-analysis"),": skip the analysis of the method;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"add-via-obscure-feature"),": add a feature/breadcrumb called ",(0,i.mdx)("inlineCode",{parentName:"li"},"via-obscure:")," to sources flowing through this method;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"taint-in-taint-out"),": propagate the taint on arguments to the return value;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"taint-in-taint-this"),": propagate the taint on arguments into the ",(0,i.mdx)("inlineCode",{parentName:"li"},"this")," parameter;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"no-join-virtual-overrides"),": do not consider all possible overrides when handling a virtual call to this method;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"no-collapse-on-propagation"),": do not collapse input paths when applying propagations;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"alias-memory-location-on-invoke"),": aliases existing memory location at the callsite instead of creating a new one;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"strong-write-on-propagation"),": performs a strong write from input path to the output path on propagation;")),(0,i.mdx)("h3",{id:"default-model"},"Default model"),(0,i.mdx)("p",null,"A default model is created for each method, except if it is provided by a model generator. The default model has a set of heuristics:"),(0,i.mdx)("p",null,"If the method has no source code, the model is automatically marked with the modes ",(0,i.mdx)("inlineCode",{parentName:"p"},"skip-analysis")," and ",(0,i.mdx)("inlineCode",{parentName:"p"},"add-via-obscure-feature"),"."),(0,i.mdx)("p",null,"If the method has more than 40 overrides, it is marked with the mode ",(0,i.mdx)("inlineCode",{parentName:"p"},"no-join-virtual-overrides"),"."),(0,i.mdx)("p",null,"Otherwise, the default model is empty (no sources/sinks/propagations)."),(0,i.mdx)("h3",{id:"field-models"},"Field Models"),(0,i.mdx)("p",null,"These models represent user-defined taint on class fields (as opposed to methods, as described in all the previous sections on this page). They are specified in a similar way to method models as described below."),(0,i.mdx)("blockquote",null,(0,i.mdx)("p",{parentName:"blockquote"},(0,i.mdx)("strong",{parentName:"p"},"NOTE:")," Field sources should not be applied to fields that are both final and of a primitive type (",(0,i.mdx)("inlineCode",{parentName:"p"},"int"),", ",(0,i.mdx)("inlineCode",{parentName:"p"},"char"),", ",(0,i.mdx)("inlineCode",{parentName:"p"},"float"),", etc as well as ",(0,i.mdx)("inlineCode",{parentName:"p"},"java.lang.String"),") as the Java compiler optimizes accesses of these fields in the bytecode into accesses of the constant value they hold. In this scenario, Mariana Trench has no way of recognizing that the constant was meant to carry a source.")),(0,i.mdx)("p",null,"Example field model generator for sources:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "fields",\n "where": [\n {\n "constraint": "name",\n "pattern": "SOURCE_EXAMPLE"\n }\n ],\n "model": {\n "sources" : [\n {\n "kind": "FieldSource"\n }\n ]\n }\n}\n')),(0,i.mdx)("p",null,"Example code:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"public class TestClass {\n // Field that we know to be tainted\n public Object SOURCE_EXAMPLE = ...;\n\n void flow() {\n sink(EXAMPLE, ...);\n }\n}\n")),(0,i.mdx)("p",null,"Example field model generator for sinks:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "fields",\n "where": [\n {\n "constraint": "name",\n "pattern": "SINK_EXAMPLE"\n }\n ],\n "model": {\n "sinks" : [\n {\n "kind": "FieldSink"\n }\n ]\n }\n}\n')),(0,i.mdx)("p",null,"Example code:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"public class TestClass {\n public Object SINK_EXAMPLE = ...;\n\n void flow() {\n SINK_EXAMPLE = source();\n }\n}\n")),(0,i.mdx)("p",null,"Field signature formats follow the Dalvik bytecode format similar to methods as discussed ",(0,i.mdx)("a",{parentName:"p",href:"#method-name-format"},"above"),". This is of the form ",(0,i.mdx)("inlineCode",{parentName:"p"},".:"),"."),(0,i.mdx)("h3",{id:"literal-models"},"Literal Models"),(0,i.mdx)("p",null,"Literal models represent user-defined taints on string literals matching configurable regular expressions. They can only be configured as sources and are intended to identify suspicious patterns, such as user-controlled data being concatenated with a string literal which looks like an SQL query."),(0,i.mdx)("blockquote",null,(0,i.mdx)("p",{parentName:"blockquote"},(0,i.mdx)("strong",{parentName:"p"},"NOTE:")," Each use of a literal in the analysed code which matches a pattern in a literal model will generate a new taint which needs to be explored by Mariana Trench. Using overly broad patterns like ",(0,i.mdx)("inlineCode",{parentName:"p"},".*")," should thus be avoided, as they can lead to poor performance and high memory usage.")),(0,i.mdx)("p",null,"Example literal models:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre"},'[\n {\n "pattern": "SELECT \\\\*.*",\n "description": "Potential SQL Query",\n "sources": [\n {\n "kind": "SqlQuery"\n }\n ]\n },\n {\n "pattern": "AI[0-9A-Z]{16}",\n "description": "Suspected Google API Key",\n "sources": [\n {\n "kind": "GoogleAPIKey"\n }\n ]\n }\n]\n')),(0,i.mdx)("p",null,"Example code:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},'void testRegexSource() {\n String prefix = "SELECT * FROM USERS WHERE id = ";\n String aci = getAttackerControlledInput();\n String query = prefix + aci; // Sink\n}\n\nvoid testRegexSourceGoogleApiKey() {\n String secret = "AIABCD1234EFGH5678";\n sink(secret);\n}\n')),(0,i.mdx)("h2",{id:"model-generators"},"Model Generators"),(0,i.mdx)("p",null,"Mariana Trench allows for dynamic model specifications. This allows a user to specify models of methods before running the analysis. This is used to specify sources, sinks, propagation and modes."),(0,i.mdx)("p",null,"Model generators are specified in a generator configuration file, specified by the ",(0,i.mdx)("inlineCode",{parentName:"p"},"--generator-configuration-path")," parameter. By default, we use ",(0,i.mdx)("a",{parentName:"p",href:"https://github.com/facebook/mariana-trench/blob/main/configuration/default_generator_config.json"},(0,i.mdx)("inlineCode",{parentName:"a"},"default_generator_config.json")),"."),(0,i.mdx)("h3",{id:"example"},"Example"),(0,i.mdx)("p",null,"Examples of model generators are located in the ",(0,i.mdx)("a",{parentName:"p",href:"https://github.com/facebook/mariana-trench/tree/main/configuration/model-generators"},(0,i.mdx)("inlineCode",{parentName:"a"},"configuration/model-generators"))," directory."),(0,i.mdx)("p",null,"Below is an example of a JSON model generator:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "model_generators": [\n {\n "find": "methods",\n "where": [{"constraint": "name", "pattern": "toString"}],\n "model": {\n "propagation": [\n {\n "input": "Argument(0)",\n "output": "Return"\n }\n ]\n }\n },\n {\n "find": "methods",\n "where": [\n {\n "constraint": "parent",\n "inner": {\n "constraint": "extends",\n "inner": {\n "constraint": "name",\n "pattern": "SandcastleCommand"\n }\n }\n },\n {"constraint": "name", "pattern": "Time"}\n ],\n "model": {\n "sources": [\n {\n "kind": "Source",\n "port": "Return"\n }\n ]\n }\n },\n {\n "find": "methods",\n "where": [\n {\n "constraint": "parent",\n "inner": {\n "constraint": "extends",\n "inner": {"constraint": "name", "pattern": "IEntWithPurposePolicy"}\n }\n },\n {"constraint": "name", "pattern": "gen.*"},\n {\n "constraint": "parameter",\n "idx": 0,\n "inner": {\n "constraint": "type",\n "kind": "extends",\n "class": "IViewerContext"\n }\n },\n {\n "constraint": "return",\n "inner": {\n "constraint": "extends",\n "inner": {"constraint": "name", "pattern": "Ent"}\n }\n }\n ],\n "model": {\n "modes": ["add-via-obscure-feature"],\n "sinks": [\n {\n "kind": "Sink",\n "port": "Argument(0)",\n "features": ["via-gen"]\n }\n ]\n }\n }\n ]\n}\n')),(0,i.mdx)("h3",{id:"specification"},"Specification"),(0,i.mdx)("p",null,"Each JSON file is a JSON object with a key ",(0,i.mdx)("inlineCode",{parentName:"p"},"model_generators"),' associated with a list of "rules".'),(0,i.mdx)("p",null,'Each "rule" defines a "filter" (which uses "constraints" to specify methods for which a "model" should be generated) and a "model". A rule has the following key/values:'),(0,i.mdx)("ul",null,(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("inlineCode",{parentName:"p"},"find"),": The type of thing to find. We support ",(0,i.mdx)("inlineCode",{parentName:"p"},"methods")," and ",(0,i.mdx)("inlineCode",{parentName:"p"},"fields"),";")),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("inlineCode",{parentName:"p"},"where"),': A list of "constraints". All constraints ',(0,i.mdx)("strong",{parentName:"p"},"must be satisfied")," by a method or field in order to generate a model for it. All the constraints are listed below, grouped by the type of object they are applied to:"),(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("strong",{parentName:"p"},"Method"),":"),(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"signature_match"),": Expects at least one of the two allowed groups of extra properties: ",(0,i.mdx)("inlineCode",{parentName:"li"},"[name | names] [parent | parents | extends [include_self]]")," where:",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"name")," (a single string) or ",(0,i.mdx)("inlineCode",{parentName:"li"},"names")," (a list of alternative strings): is exact matched to the method name"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"parent")," (a single string) or ",(0,i.mdx)("inlineCode",{parentName:"li"},"parents")," (a list of alternative strings) is exact matched to the class of the method or ",(0,i.mdx)("inlineCode",{parentName:"li"},"extends")," (either a single string or a list of alternative strings) is exact matched to the base classes or interfaces of the method. ",(0,i.mdx)("inlineCode",{parentName:"li"},"extends")," allows an additional property ",(0,i.mdx)("inlineCode",{parentName:"li"},"includes_self")," which is a boolean to indicate if the constraint is applied to the class itself or not."))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"signature | signature_pattern"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"pattern")," which is a regex to fully match the full signature (class, method, argument types) of a method;",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("strong",{parentName:"li"},"NOTE:")," Usage of this constraint is discouraged as it has poor performance. Try using ",(0,i.mdx)("inlineCode",{parentName:"li"},"signature_match")," instead!"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"parent"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"inner")," ","[Type]"," which contains a nested constraint to apply to the class holding the method;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"parameter"),": Expects an extra properties ",(0,i.mdx)("inlineCode",{parentName:"li"},"idx")," and ",(0,i.mdx)("inlineCode",{parentName:"li"},"inner")," ","[Parameter]"," or ","[Type]",", matches when the idx-th parameter of the function or method matches the nested constraint inner;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"any_parameter"),": Expects an optional extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"start_idx")," and ",(0,i.mdx)("inlineCode",{parentName:"li"},"inner")," ","[Parameter]"," or ","[Type]",", matches when there is any parameters (starting at start_idx) of the function or method matches the nested constraint inner;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"return"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"inner")," ","[Type]"," which contains a nested constraint to apply to the return of the method;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"is_static | is_constructor | is_native | has_code"),": Accepts an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"value")," which is either ",(0,i.mdx)("inlineCode",{parentName:"li"},"true")," or ",(0,i.mdx)("inlineCode",{parentName:"li"},"false"),". By default, ",(0,i.mdx)("inlineCode",{parentName:"li"},"value")," is considered ",(0,i.mdx)("inlineCode",{parentName:"li"},"true"),";"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"number_parameters"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"inner")," ","[Integer]"," which contains a nested constraint to apply to the number of parameters (counting the implicit ",(0,i.mdx)("inlineCode",{parentName:"li"},"this")," parameter);"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"number_overrides"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"inner")," ","[Integer]"," which contains a nested constraint to apply on the number of method overrides."))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("strong",{parentName:"p"},"Parameter:")),(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"parameter_has_annotation"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"type")," and an optional property ",(0,i.mdx)("inlineCode",{parentName:"li"},"pattern"),", respectively a string and a regex fully matching the value of the parameter annotation."))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("strong",{parentName:"p"},"Type:")),(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"extends"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"inner")," ","[Type]"," which contains a nested constraint that must apply to one of the base classes or itself. The optional property ",(0,i.mdx)("inlineCode",{parentName:"li"},"includes_self")," is a boolean that tells whether the constraint must be applied on the type itself or not;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"super"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"inner")," ","[Type]"," which contains a nested constraint that must apply on the direct superclass;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"is_class | is_interface"),": Accepts an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"value")," which is either ",(0,i.mdx)("inlineCode",{parentName:"li"},"true")," or ",(0,i.mdx)("inlineCode",{parentName:"li"},"false"),". By default, ",(0,i.mdx)("inlineCode",{parentName:"li"},"value")," is considered ",(0,i.mdx)("inlineCode",{parentName:"li"},"true"),";"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("strong",{parentName:"p"},"Field"),":"),(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"signature"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"pattern")," which is a regex to fully match the full signature of the field. This is of the form ",(0,i.mdx)("inlineCode",{parentName:"li"},".:"),";"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"parent"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"inner")," ","[Type]"," which contains a nested constraint to apply to the class holding the field;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"is_static"),": Accepts an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"value")," which is either ",(0,i.mdx)("inlineCode",{parentName:"li"},"true")," or ",(0,i.mdx)("inlineCode",{parentName:"li"},"false"),". By default, ",(0,i.mdx)("inlineCode",{parentName:"li"},"value")," is considered ",(0,i.mdx)("inlineCode",{parentName:"li"},"true"),";"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("strong",{parentName:"p"},"Method, Type or Field:")),(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"name"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"pattern")," which is a regex to fully match the name of the item;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"has_annotation"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"type")," and an optional property ",(0,i.mdx)("inlineCode",{parentName:"li"},"pattern"),", respectively a string and a regex fully matching the value of the annotation."),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"visibility"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"is")," which is either ",(0,i.mdx)("inlineCode",{parentName:"li"},"public"),", ",(0,i.mdx)("inlineCode",{parentName:"li"},"private")," or ",(0,i.mdx)("inlineCode",{parentName:"li"},"protected"),"; (Note this does not apply to ",(0,i.mdx)("inlineCode",{parentName:"li"},"Field"),")"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("strong",{parentName:"p"},"Integer:")),(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"< | <= | == | > | >= | !="),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"value")," which contains an integer that the input integer is compared with. The input is the left hand side."))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("strong",{parentName:"p"},"Any (Method, Parameter, Type, Field or Integer):")),(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"all_of"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"inners")," ","[Any]"," which is an array holding nested constraints which must all apply;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"any_of"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"inners")," ","[Any]"," which is an array holding nested constraints where one of them must apply;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"not"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"inner")," ","[Any]"," which contains a nested constraint that should not apply. (Note this is not yet implemented for ",(0,i.mdx)("inlineCode",{parentName:"li"},"Field"),"s)"))))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("inlineCode",{parentName:"p"},"model"),": A model, describing sources/sinks/propagations/etc."),(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("strong",{parentName:"p"},"For method models")),(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"sources"),"*",": A list of sources, i.e a source flowing out of the method via return value or flowing in via an argument. A source has the following key/values:",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"kind"),": The source name;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"port"),"*","*",": The source access path (e.g, ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Return"')," or ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Argument(1)"'),");"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"features"),"*",": A list of features/breadcrumbs names;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"via_type_of"),"*",": A list of ports;"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"sinks"),"*",": A list of sinks, i.e describing that a parameter of the method flows into a sink. A sink has the following key/values:",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"kind"),": The sink name;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"port"),": The sink access path (e.g, ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Return"')," or ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Argument(1)"'),");"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"features"),"*",": A list of features/breadcrumbs names;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"via_type_of"),"*",": A list of ports;"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"propagation"),"*",": A list of propagations (also called passthrough) that describe whether a taint on a parameter should result in a taint on the return value or another parameter. A propagation has the following key/values:",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"input"),": The input access path (e.g, ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Argument(1)"'),");"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"output"),": The output access path (e.g, ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Return"')," or ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Argument(2)"'),");"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"features"),"*",": A list of features/breadcrumbs names;"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"attach_to_sources"),"*",": A list of attach-to-sources that describe that all sources flowing out of the method on the given parameter or return value must have the given features. An attach-to-source has the following key/values:",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"port"),": The access path root (e.g, ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Return"')," or ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Argument(1)"'),");"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"features"),": A list of features/breadcrumb names;"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"attach_to_sinks"),"*",": A list of attach-to-sinks that describe that all sources flowing in the method on the given parameter must have the given features. An attach-to-sink has the following key/values:",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"port"),": The access path root (e.g, ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Argument(1)"'),");"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"features"),": A list of features/breadcrumb names;"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"attach_to_propagations"),"*",": A list of attach-to-propagations that describe that inferred propagations of sources flowing in or out of a given parameter or return value must have the given features. An attach-to-propagation has the following key/values:",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"port"),": The access path root (e.g, ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Return"')," or ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Argument(1)"'),");"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"features"),": A list of features/breadcrumb names;"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"add_features_to_parameters"),"*",": A list of add-features-to-parameters that describe that flows that might flow on the given parameter must have the given features. An add-features-to-parameter has the following key/values:",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"port"),": The access path root (e.g, ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Argument(1)"'),");"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"features"),": A list of features/breadcrumb names;"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"modes"),"*",": A list of mode names that describe specific behaviors of a method;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"for_all_parameters"),": Generate sources/sinks/propagations/attach",(0,i.mdx)("em",{parentName:"li"},"to"),"*"," for all parameters of a method that satisfy some constraints. It accepts the following key/values:",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"variable"),": A symbolic name for the parameter;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"where"),": An optional list of ","[Parameter]"," or ","[Type]"," constraints on the parameter;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"sources | sinks | propagation"),': Same as under "model", but we accept the variable name as a parameter number.'))))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("inlineCode",{parentName:"p"},"verbosity"),"*",": A logging level, to help debugging. 1 is the most verbose, 5 is the least. The default verbosity level is 5.")),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("strong",{parentName:"p"},"For Field models")),(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"sources"),"*",": A list of sources the field should hold. A source has the following key/values:",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"kind"),": The source name;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"features"),"*",": A list of features/breadcrumbs names;"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"sinks"),"*",": A list of sinks the field should hold. A sink has the following key/values:",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"kind"),": The sink name;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"features"),"*",": A list of features/breadcrumds names;")))))))),(0,i.mdx)("p",null,"In the above bullets,"),(0,i.mdx)("ul",null,(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"*")," denotes optional key/value."),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"**")," denotes optional key/value. Default is ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Return"'),".")),(0,i.mdx)("p",null,"Note, the implicit ",(0,i.mdx)("inlineCode",{parentName:"p"},"this")," parameter for methods has the parameter number 0."),(0,i.mdx)("h3",{id:"development"},"Development"),(0,i.mdx)("h4",{id:"when-sources-or-sinks-dont-appear-in-results"},"When Sources or Sinks don't appear in Results"),(0,i.mdx)("ol",null,(0,i.mdx)("li",{parentName:"ol"},(0,i.mdx)("p",{parentName:"li"},"This could be because your model generator did not find any method matching your query. You can use the ",(0,i.mdx)("inlineCode",{parentName:"p"},'"verbosity": 1')," option in your model generator to check if it matched any method. For instance:"),(0,i.mdx)("pre",{parentName:"li"},(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "model_generators": [\n {\n "find": "methods",\n "where": /* ... */,\n "model": {\n /* ... */\n },\n "verbosity": 1\n }\n ]\n}\n')),(0,i.mdx)("p",{parentName:"li"},"When running mariana trench, this should print:"),(0,i.mdx)("pre",{parentName:"li"},(0,i.mdx)("code",{parentName:"pre"},"INFO Method `...` satisfies all constraints in json model generator ...\n"))),(0,i.mdx)("li",{parentName:"ol"},(0,i.mdx)("p",{parentName:"li"},"Make sure that your model generator is actually running. You can use the ",(0,i.mdx)("inlineCode",{parentName:"p"},"--verbosity 2")," option to check that. Make sure your model generator is specified in ",(0,i.mdx)("inlineCode",{parentName:"p"},"configuration/default_generator_config.json"),".")),(0,i.mdx)("li",{parentName:"ol"},(0,i.mdx)("p",{parentName:"li"},"You can also check the output models. Use ",(0,i.mdx)("inlineCode",{parentName:"p"},"grep SourceKind models@*")," to see if your source or sink kind exists. Use ",(0,i.mdx)("inlineCode",{parentName:"p"},"grep 'Lcom/example/;.:' models@*")," to see if a given method exists in the app."))),(0,i.mdx)(m,{mdxType:"FbModels"}))}c.isMDXComponent=!0}}]); \ No newline at end of file +"use strict";(self.webpackChunkwebsite=self.webpackChunkwebsite||[]).push([[133],{3905:(e,n,a)=>{a.r(n),a.d(n,{MDXContext:()=>d,MDXProvider:()=>u,mdx:()=>x,useMDXComponents:()=>p,withMDXComponents:()=>m});var t=a(67294);function i(e,n,a){return n in e?Object.defineProperty(e,n,{value:a,enumerable:!0,configurable:!0,writable:!0}):e[n]=a,e}function r(){return r=Object.assign||function(e){for(var n=1;n=0||(i[a]=e[a]);return i}(e,n);if(Object.getOwnPropertySymbols){var r=Object.getOwnPropertySymbols(e);for(t=0;t=0||Object.prototype.propertyIsEnumerable.call(e,a)&&(i[a]=e[a])}return i}var d=t.createContext({}),m=function(e){return function(n){var a=p(n.components);return t.createElement(e,r({},n,{components:a}))}},p=function(e){var n=t.useContext(d),a=n;return e&&(a="function"==typeof e?e(n):l(l({},n),e)),a},u=function(e){var n=p(e.components);return t.createElement(d.Provider,{value:n},e.children)},c={inlineCode:"code",wrapper:function(e){var n=e.children;return t.createElement(t.Fragment,{},n)}},h=t.forwardRef((function(e,n){var a=e.components,i=e.mdxType,r=e.originalType,o=e.parentName,d=s(e,["components","mdxType","originalType","parentName"]),m=p(a),u=i,h=m["".concat(o,".").concat(u)]||m[u]||c[u]||r;return a?t.createElement(h,l(l({ref:n},d),{},{components:a})):t.createElement(h,l({ref:n},d))}));function x(e,n){var a=arguments,i=n&&n.mdxType;if("string"==typeof e||i){var r=a.length,o=new Array(r);o[0]=h;var l={};for(var s in n)hasOwnProperty.call(n,s)&&(l[s]=n[s]);l.originalType=e,l.mdxType="string"==typeof e?e:i,o[1]=l;for(var d=2;d{a.r(n),a.d(n,{assets:()=>s,contentTitle:()=>o,default:()=>c,frontMatter:()=>r,metadata:()=>l,toc:()=>d});var t=a(87462),i=(a(67294),a(3905));const r={id:"models",title:"Models & Model Generators",sidebar_label:"Models & Model Generators"},o=void 0,l={unversionedId:"models",id:"models",title:"Models & Model Generators",description:"The main way to configure the analysis is through defining model generators. Each model generator defines (1) a filter, made up of constraints to specify the methods (or fields) for which a model should be generated, and (2) a model, an abstract representation of how data flows through a method.",source:"@site/documentation/models.md",sourceDirName:".",slug:"/models",permalink:"/docs/models",draft:!1,editUrl:"https://github.com/facebook/mariana-trench/tree/main/documentation/website/documentation/models.md",tags:[],version:"current",frontMatter:{id:"models",title:"Models & Model Generators",sidebar_label:"Models & Model Generators"},sidebar:"docs",previous:{title:"Rules",permalink:"/docs/rules"},next:{title:"Shims",permalink:"/docs/shims"}},s={},d=[{value:"Models",id:"models",level:2},{value:"Method name format",id:"method-name-format",level:3},{value:"Access path format",id:"access-path-format",level:3},{value:"Kinds",id:"kinds",level:3},{value:"Sources",id:"sources",level:3},{value:"Sinks",id:"sinks",level:3},{value:"Return Sinks",id:"return-sinks",level:3},{value:"Propagation",id:"propagation",level:3},{value:"Features",id:"features",level:3},{value:"Attach to Sources",id:"attach-to-sources",level:4},{value:"Attach to Sinks",id:"attach-to-sinks",level:4},{value:"Attach to Propagations",id:"attach-to-propagations",level:4},{value:"Add Features to Arguments",id:"add-features-to-arguments",level:4},{value:"Via-type Features",id:"via-type-features",level:4},{value:"Via-value Features",id:"via-value-features",level:4},{value:"Taint Broadening",id:"taint-broadening",level:3},{value:"Propagation Broadening",id:"propagation-broadening",level:4},{value:"Issue Broadening Feature",id:"issue-broadening-feature",level:5},{value:"Widen Broadening Feature",id:"widen-broadening-feature",level:5},{value:"Sanitizers",id:"sanitizers",level:3},{value:"Kind-specific Sanitizers",id:"kind-specific-sanitizers",level:4},{value:"Port-specific Sanitizers",id:"port-specific-sanitizers",level:4},{value:"Modes",id:"modes",level:3},{value:"Default model",id:"default-model",level:3},{value:"Field Models",id:"field-models",level:3},{value:"Literal Models",id:"literal-models",level:3},{value:"Model Generators",id:"model-generators",level:2},{value:"Example",id:"example",level:3},{value:"Specification",id:"specification",level:3},{value:"Development",id:"development",level:3},{value:"When Sources or Sinks don't appear in Results",id:"when-sources-or-sinks-dont-appear-in-results",level:4}],m=(p="FbModels",function(e){return console.warn("Component "+p+" was not imported, exported, or provided by MDXProvider as global scope"),(0,i.mdx)("div",e)});var p;const u={toc:d};function c(e){let{components:n,...a}=e;return(0,i.mdx)("wrapper",(0,t.Z)({},u,a,{components:n,mdxType:"MDXLayout"}),(0,i.mdx)("p",null,"The main way to configure the analysis is through defining model generators. Each model generator defines (1) a ",(0,i.mdx)("strong",{parentName:"p"},"filter"),", made up of constraints to specify the methods (or fields) for which a model should be generated, and (2) a ",(0,i.mdx)("strong",{parentName:"p"},"model"),", an abstract representation of how data flows through a method."),(0,i.mdx)("p",null,"Model generators are what define Sink and Source kinds which are the key component of ",(0,i.mdx)("a",{parentName:"p",href:"/docs/rules"},"Rules"),". Model generators can do other things too, like attach ",(0,i.mdx)("strong",{parentName:"p"},"features")," (a.k.a. breadcrumbs) to flows and ",(0,i.mdx)("strong",{parentName:"p"},"sanitize"),' (redact) flows which go through certain "data-safe" methods (e.g. a method which hashes a user\'s password).'),(0,i.mdx)("p",null,"Filters are conceptually straightforward. Thus, this page focuses heavily on conceptualizing and providing examples for the various types of models. See the ",(0,i.mdx)("a",{parentName:"p",href:"#model-generators"},"Model Generators")," section for full implementation documentation for both filters and models."),(0,i.mdx)("h2",{id:"models"},"Models"),(0,i.mdx)("p",null,"A model is an abstract representation of how data flows through a method."),(0,i.mdx)("p",null,"A model essentialy consists of:"),(0,i.mdx)("ul",null,(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("a",{parentName:"li",href:"#sources"},"Sources"),": a set of sources that the method produces or receives on parameters;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("a",{parentName:"li",href:"#sinks"},"Sinks"),": a set of sinks on the method;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("a",{parentName:"li",href:"#propagation"},"Propagation"),": a description of how the method propagates taint coming into it (e.g, the first parameter updates the second, the second parameter updates the return value, etc.);"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("a",{parentName:"li",href:"#attach-to-sources"},"Attach to Sources"),": a set of features/breadcrumbs to add on an any sources flowing out of the method;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("a",{parentName:"li",href:"#attach-to-sinks"},"Attach to Sinks"),": a set of features/breadcrumbs to add on sinks of a given parameter;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("a",{parentName:"li",href:"#attach-to-propagations"},"Attach to Propagations"),": a set of features/breadcrumbs to add on propagations for a given parameter or return value;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("a",{parentName:"li",href:"#add-features-to-arguments"},"Add Features to Arguments"),": a set of features/breadcrumbs to add on any taint that might flow in a given parameter;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("a",{parentName:"li",href:"#sanitizers"},"Sanitizers"),": specifications of taint flows to stop;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("a",{parentName:"li",href:"#modes"},"Modes"),": a set of flags describing specific behaviors (see below).")),(0,i.mdx)("p",null,"Models can be specified in JSON. For example to mark the string parameter to our ",(0,i.mdx)("inlineCode",{parentName:"p"},"Logger.log")," function as a sink we can specify it as"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "parent": "Lcom/example/Logger;",\n "name": "log"\n }\n ],\n "model": {\n "sinks": [\n {\n "kind": "Logging",\n "port": "Argument(1)"\n }\n ]\n }\n}\n')),(0,i.mdx)("p",null,"Note that the naming of methods follow the ",(0,i.mdx)("a",{parentName:"p",href:"#method-name-format"},"Dalvik's bytecode format"),"."),(0,i.mdx)("h3",{id:"method-name-format"},"Method name format"),(0,i.mdx)("p",null,"The format used for method names is:"),(0,i.mdx)("p",null,(0,i.mdx)("inlineCode",{parentName:"p"},".:()")),(0,i.mdx)("p",null,"Example: ",(0,i.mdx)("inlineCode",{parentName:"p"},"Landroidx/fragment/app/Fragment;.startActivity:(Landroid/content/Intent;)V")),(0,i.mdx)("p",null,"For the parameters and return types use the following table to pick the correct one (please refer to ",(0,i.mdx)("a",{parentName:"p",href:"https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.3.2-200"},"JVM doc")," for more details)"),(0,i.mdx)("ul",null,(0,i.mdx)("li",{parentName:"ul"},"V - void"),(0,i.mdx)("li",{parentName:"ul"},"Z - boolean"),(0,i.mdx)("li",{parentName:"ul"},"B - byte"),(0,i.mdx)("li",{parentName:"ul"},"S - short"),(0,i.mdx)("li",{parentName:"ul"},"C - char"),(0,i.mdx)("li",{parentName:"ul"},"I - int"),(0,i.mdx)("li",{parentName:"ul"},"J - long (64 bits)"),(0,i.mdx)("li",{parentName:"ul"},"F - float"),(0,i.mdx)("li",{parentName:"ul"},"D - double (64 bits)")),(0,i.mdx)("p",null,"Classes take the form ",(0,i.mdx)("inlineCode",{parentName:"p"},"Lpackage/name/ClassName;")," - where the leading ",(0,i.mdx)("inlineCode",{parentName:"p"},"L")," indicates that it is a class type, ",(0,i.mdx)("inlineCode",{parentName:"p"},"package/name/")," is the package that the class is in. A nested class will take the form ",(0,i.mdx)("inlineCode",{parentName:"p"},"Lpackage/name/ClassName$NestedClassName")," (the ",(0,i.mdx)("inlineCode",{parentName:"p"},"$")," will need to be double escaped ",(0,i.mdx)("inlineCode",{parentName:"p"},"\\\\$")," in json regex)."),(0,i.mdx)("blockquote",null,(0,i.mdx)("p",{parentName:"blockquote"},(0,i.mdx)("strong",{parentName:"p"},"NOTE:")," Instance (i.e, non-static) method parameters are indexed starting from 1! The 0th parameter is the ",(0,i.mdx)("inlineCode",{parentName:"p"},"this")," parameter in dalvik byte-code. For static method parameter, indices start from 0.")),(0,i.mdx)("h3",{id:"access-path-format"},"Access path format"),(0,i.mdx)("p",null,'An access path describes the symbolic location of a taint. This is commonly used to indicate where a source or a sink originates from. The "port" field of any model is represented by an access path.'),(0,i.mdx)("p",null,"An access path is composed of a root and a path."),(0,i.mdx)("p",null,"The root is either:"),(0,i.mdx)("ul",null,(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"Return"),", representing the returned value;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"Argument(x)")," (where ",(0,i.mdx)("inlineCode",{parentName:"li"},"x")," is an integer), representing the parameter number ",(0,i.mdx)("inlineCode",{parentName:"li"},"x"),";")),(0,i.mdx)("p",null,"The path is a (possibly empty) list of path elements. A path element can be any of the following kinds:"),(0,i.mdx)("ul",null,(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"field"),": represents a field name. String encoding is a dot followed by the field name: ",(0,i.mdx)("inlineCode",{parentName:"li"},".field_name"),";"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"index"),": represents a user defined index for dictionary like objects. String encoding uses square braces to enclose any user defined index: ",(0,i.mdx)("inlineCode",{parentName:"li"},"[index_name]"),";"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"any index"),": represents any or unresolved indices in dictionary like objects. String encoding is an asterisk enclosed in square braces: ",(0,i.mdx)("inlineCode",{parentName:"li"},"[*]"),";"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"index from value of"),": captures the value of the specified callable's port seen at its callsites during taint flow analysis as an ",(0,i.mdx)("inlineCode",{parentName:"li"},"index")," or ",(0,i.mdx)("inlineCode",{parentName:"li"},"any index")," (if the value cannot be resolved). String encoding uses ",(0,i.mdx)("em",{parentName:"li"},"argument root")," to specify the callable's port and encloses it in ",(0,i.mdx)("inlineCode",{parentName:"li"},"[<"),"...",(0,i.mdx)("inlineCode",{parentName:"li"},">]")," to represent that its value is resolved at the callsite to create an index: ",(0,i.mdx)("inlineCode",{parentName:"li"},"[]"),";")),(0,i.mdx)("p",null,"Examples:"),(0,i.mdx)("ul",null,(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"Argument(1).name")," corresponds to the ",(0,i.mdx)("em",{parentName:"li"},"field")," ",(0,i.mdx)("inlineCode",{parentName:"li"},"name")," of the second parameter;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"Argument(1)[name]")," corresponds to the ",(0,i.mdx)("em",{parentName:"li"},"index")," ",(0,i.mdx)("inlineCode",{parentName:"li"},"name")," of the dictionary like second parameter;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"Argument(1)[*]")," corresponds to ",(0,i.mdx)("em",{parentName:"li"},"any index")," of the dictionary like second parameter;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"Argument(1)[]")," corresponds to an ",(0,i.mdx)("em",{parentName:"li"},"index")," of the dictionary like second parameter whose value is resolved from the third parameter;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"Return")," corresponds to the returned value;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"Return.x")," correpsonds to the field ",(0,i.mdx)("inlineCode",{parentName:"li"},"x")," of the returned value;")),(0,i.mdx)("h3",{id:"kinds"},"Kinds"),(0,i.mdx)("p",null,"A source has a ",(0,i.mdx)("strong",{parentName:"p"},"kind")," that describes its content (e.g, user input, file system, etc). A sink also has a ",(0,i.mdx)("strong",{parentName:"p"},"kind")," that describes the operation the method performs (e.g, execute a command, read a file, etc.). Kinds can be arbitrary strings (e.g, ",(0,i.mdx)("inlineCode",{parentName:"p"},"UserInput"),"). We usually avoid whitespaces."),(0,i.mdx)("h3",{id:"sources"},"Sources"),(0,i.mdx)("p",null,"Sources describe sources produced or received by a given method. A source can either flow out via the return value or flow via a given parameter. A source has a ",(0,i.mdx)("strong",{parentName:"p"},"kind")," that describes its content (e.g, user input, file system, etc)."),(0,i.mdx)("p",null,"Here is an example where the source flows by return value:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},'public static String getPath() {\n return System.getenv().get("PATH");\n}\n')),(0,i.mdx)("p",null,"The JSON model generator for this method could be:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "parent": "Lcom/example/Class;",\n "name": "getPath"\n }\n ],\n "model": {\n "sources": [\n {\n "kind": "UserControlled",\n "port": "Return"\n }\n ]\n }\n}\n')),(0,i.mdx)("p",null,"Here is an example where the source flows in via an argument:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"class MyActivity extends Activity {\n public void onNewIntent(Intent intent) {\n // intent should be considered a source here.\n }\n}\n")),(0,i.mdx)("p",null,"The JSON model generator for this method could be:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "extends": "Landroid/app/Activity",\n "name": "onNewIntent"\n }\n ],\n "model": {\n "sources": [\n {\n "kind": "UserControlled",\n "port": "Argument(1)"\n }\n ]\n }\n}\n')),(0,i.mdx)("p",null,"Note that the implicit ",(0,i.mdx)("inlineCode",{parentName:"p"},"this")," parameter is considered the argument 0."),(0,i.mdx)("h3",{id:"sinks"},"Sinks"),(0,i.mdx)("p",null,"Sinks describe dangerous or sensitive methods in the code. A sink has a ",(0,i.mdx)("strong",{parentName:"p"},"kind")," that represents the type of operation the method does (e.g, command execution, file system operation, etc). A sink must be attached to a given parameter of the method. A method can have multiple sinks."),(0,i.mdx)("p",null,"Here is an example of a sink:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"public static String readFile(String path, String extension, int mode) {\n // Return the content of the file path.extension\n}\n")),(0,i.mdx)("p",null,"Since ",(0,i.mdx)("inlineCode",{parentName:"p"},"path")," and ",(0,i.mdx)("inlineCode",{parentName:"p"},"extension")," can be used to read arbitrary files, we consider them sinks. We do not consider ",(0,i.mdx)("inlineCode",{parentName:"p"},"mode")," as a sink since we do not care whether the user can control it. The JSON model generator for this method could be:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "parent": "Lcom/example/Class",\n "name": "readFile"\n }\n ],\n "model": {\n "sinks": [\n {\n "kind": "FileRead",\n "port": "Argument(0)"\n },\n {\n "kind": "FileRead",\n "port": "Argument(1)"\n }\n ]\n }\n}\n')),(0,i.mdx)("h3",{id:"return-sinks"},"Return Sinks"),(0,i.mdx)("p",null,"Return sinks can be used to describe that a method should not return tainted information. A return sink is just a normal sink with a ",(0,i.mdx)("inlineCode",{parentName:"p"},"Return")," port."),(0,i.mdx)("h3",{id:"propagation"},"Propagation"),(0,i.mdx)("p",null,"Propagations \u2212 also called ",(0,i.mdx)("strong",{parentName:"p"},"tito")," (Taint In Taint Out) or ",(0,i.mdx)("strong",{parentName:"p"},"passthrough")," in other tools \u2212 describe how the method propagates taint. A propagation as an ",(0,i.mdx)("strong",{parentName:"p"},"input")," (where the taint comes from) and an ",(0,i.mdx)("strong",{parentName:"p"},"output")," (where the taint is moved to)."),(0,i.mdx)("p",null,"Here is an example of a propagation:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"public static String concat(String x, String y) {\n return x + y;\n}\n")),(0,i.mdx)("p",null,"The return value of the method can be controlled by both parameters, hence it has the propagations ",(0,i.mdx)("inlineCode",{parentName:"p"},"Argument(0) -> Return")," and ",(0,i.mdx)("inlineCode",{parentName:"p"},"Argument(1) -> Return"),". The JSON model generator for this method could be:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "parent": "Lcom/example/Class",\n "name": "concat"\n }\n ],\n "model": {\n "propagation": [\n {\n "input": "Argument(0)",\n "output": "Return"\n },\n {\n "input": "Argument(1)",\n "output": "Return"\n }\n ]\n }\n}\n')),(0,i.mdx)("h3",{id:"features"},"Features"),(0,i.mdx)("p",null,"Features (also called ",(0,i.mdx)("strong",{parentName:"p"},"breadcrumbs"),") can be used to tag a flow and help filtering issues. A feature describes a property of a flow. A feature can be any arbitrary string."),(0,i.mdx)("p",null,"For instance, the feature ",(0,i.mdx)("inlineCode",{parentName:"p"},"via-numerical-operator")," is used to describe that the data flows through a numerical operator such as an addition."),(0,i.mdx)("p",null,"Features are very useful to filter flows in the SAPP UI. E.g. flows with a cast from string to integer are can sometimes be less important during triaging since controlling an integer is more difficult to exploit than controlling a full string."),(0,i.mdx)("p",null,"Note that features ",(0,i.mdx)("strong",{parentName:"p"},"do not stop")," the flow, they just help triaging."),(0,i.mdx)("h4",{id:"attach-to-sources"},"Attach to Sources"),(0,i.mdx)("p",null,(0,i.mdx)("em",{parentName:"p"},"Attach to sources")," is used to add a set of ",(0,i.mdx)("a",{parentName:"p",href:"#features"},"features")," on any sources flowing out of a method through a given parameter or return value."),(0,i.mdx)("p",null,"For instance, if we want to add the feature ",(0,i.mdx)("inlineCode",{parentName:"p"},"via-signed")," to all sources flowing out of the given method:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"public String getSignedCookie();\n")),(0,i.mdx)("p",null,"We could use the following JSON model generator:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "parent": "Lcom/example/Class",\n "name": "getSignedCookie"\n }\n ],\n "model": {\n "attach_to_sources": [\n {\n "features": [\n "via-signed"\n ],\n "port": "Return"\n }\n ]\n }\n}\n')),(0,i.mdx)("p",null,"Note that this is only useful for sources inferred by the analysis. If you know that ",(0,i.mdx)("inlineCode",{parentName:"p"},"getSignedCookie")," returns a source of a given kind, you should use a source instead."),(0,i.mdx)("h4",{id:"attach-to-sinks"},"Attach to Sinks"),(0,i.mdx)("p",null,(0,i.mdx)("em",{parentName:"p"},"Attach to sinks")," is used to add a set of ",(0,i.mdx)("a",{parentName:"p",href:"#features"},"features")," on all sinks on the given parameter of a method."),(0,i.mdx)("p",null,"For instance, if we want to add the feature ",(0,i.mdx)("inlineCode",{parentName:"p"},"via-user")," on all sinks of the given method:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"class User {\n public static User findUser(String username) {\n // The code here might use SQL, Thrift, or anything. We don't need to know.\n }\n}\n")),(0,i.mdx)("p",null,"We could use the following JSON model generator:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "parent": "Lcom/example/User",\n "name": "findUser"\n }\n ],\n "model": {\n "attach_to_sinks": [\n {\n "features": [\n "via-user"\n ],\n "port": "Argument(0)"\n }\n ]\n }\n}\n')),(0,i.mdx)("p",null,"Note that this is only useful for sinks inferred by the analysis. If you know that ",(0,i.mdx)("inlineCode",{parentName:"p"},"findUser")," is a sink of a given kind, you should use a sink instead."),(0,i.mdx)("h4",{id:"attach-to-propagations"},"Attach to Propagations"),(0,i.mdx)("p",null,(0,i.mdx)("em",{parentName:"p"},"Attach to propagations")," is used to add a set of ",(0,i.mdx)("a",{parentName:"p",href:"#features"},"features")," on all propagations from or to a given parameter or return value of a method."),(0,i.mdx)("p",null,"For instance, if we want to add the feature ",(0,i.mdx)("inlineCode",{parentName:"p"},"via-concat")," to the propagations of the given method:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"public static String concat(String x, String y);\n")),(0,i.mdx)("p",null,"We could use the following JSON model generator:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "parent": "Lcom/example/Class",\n "name": "concat"\n }\n ],\n "model": {\n "attach_to_propagations": [\n {\n "features": [\n "via-concat"\n ],\n "port": "Return" // We could also use Argument(0) and Argument(1)\n }\n ]\n }\n}\n')),(0,i.mdx)("p",null,"Note that this is only useful for propagations inferred by the analysis. If you know that ",(0,i.mdx)("inlineCode",{parentName:"p"},"concat")," has a propagation, you should model it as a propagation directly."),(0,i.mdx)("h4",{id:"add-features-to-arguments"},"Add Features to Arguments"),(0,i.mdx)("p",null,(0,i.mdx)("em",{parentName:"p"},"Add features to arguments")," is used to add a set of ",(0,i.mdx)("a",{parentName:"p",href:"#features"},"features")," on all sources that ",(0,i.mdx)("strong",{parentName:"p"},"might")," flow on a given parameter of a method."),(0,i.mdx)("p",null,(0,i.mdx)("em",{parentName:"p"},"Add features to arguments")," implies ",(0,i.mdx)("em",{parentName:"p"},"Attach to sources"),", ",(0,i.mdx)("em",{parentName:"p"},"Attach to sinks")," and ",(0,i.mdx)("em",{parentName:"p"},"Attach to propagations"),", but it also accounts for possible side effects at call sites."),(0,i.mdx)("p",null,"For instance:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},'public static void log(String message) {\n System.out.println(message);\n}\npublic void buyView() {\n String username = getParameter("username");\n String product = getParameter("product");\n log(username);\n buy(username, product);\n}\n')),(0,i.mdx)("p",null,"Technically, the ",(0,i.mdx)("inlineCode",{parentName:"p"},"log")," method doesn't have any source, sink or propagation. We can use ",(0,i.mdx)("em",{parentName:"p"},"add features to arguments")," to add a feature ",(0,i.mdx)("inlineCode",{parentName:"p"},"was-logged")," on the flow from ",(0,i.mdx)("inlineCode",{parentName:"p"},'getParameter("username")')," to ",(0,i.mdx)("inlineCode",{parentName:"p"},"buy(username, product)"),". We could use the following JSON model generator for the ",(0,i.mdx)("inlineCode",{parentName:"p"},"log")," method:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "parent": "Lcom/example/Class",\n "name": "log"\n }\n ],\n "model": {\n "add_features_to_arguments": [\n {\n "features": [\n "was-logged"\n ],\n "port": "Argument(0)"\n }\n ]\n }\n}\n')),(0,i.mdx)("h4",{id:"via-type-features"},"Via-type Features"),(0,i.mdx)("p",null,(0,i.mdx)("em",{parentName:"p"},"Via-type")," features are used to keep track of the type of a callable\u2019s port seen at its callsites during taint flow analysis. They are specified in model generators within the \u201csources\u201d or \u201csinks\u201d field of a model with the \u201cvia_type_of\u201d field. It is mapped to a nonempty list of ports of the method for which we want to create via-type features."),(0,i.mdx)("p",null,"For example, if we were interested in the specific Activity subclasses with which the method below was called:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"\npublic void startActivityForResult(Intent intent, int requestCode);\n\n// At some callsite:\nActivitySubclass activitySubclassInstance;\nactivitySubclassInstance.startActivityForResult(intent, requestCode);\n\n")),(0,i.mdx)("p",null,"we could use the following JSON to specifiy a via-type feature that would materialize as ",(0,i.mdx)("inlineCode",{parentName:"p"},"via-type:ActivitySubclass"),":"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "parent": "Lcom/example/Class",\n "name": "startActivityForResult"\n }\n ],\n "model": {\n "sinks": [\n {\n "port": "Argument(1)",\n "kind": "SinkKind",\n "via_type_of": [\n "Argument(0)"\n ]\n }\n ]\n }\n}\n')),(0,i.mdx)("h4",{id:"via-value-features"},"Via-value Features"),(0,i.mdx)("p",null,(0,i.mdx)("em",{parentName:"p"},"Via-value")," feature captures the value of the specified callable's port seen at its callsites during taint flow analysis. They are specified similar to ",(0,i.mdx)("inlineCode",{parentName:"p"},"Via-type"),' features -- in model generators within the "sources" or "sinks" field of a model with the "via_value_of" field. It is mapped to a nonempty list of ports of the method for which we want to create via-value features.'),(0,i.mdx)("p",null,"For example, if we were interested in the specific ",(0,i.mdx)("inlineCode",{parentName:"p"},"mode")," with which the method below was called:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},'public void log(String mode, String message);\n\nclass Constants {\n public static final String MODE = "M1";\n}\n\n// At some callsite:\nlog(Constants.MODE, "error message");\n\n')),(0,i.mdx)("p",null,"we could use the following JSON to specifiy a via-value feature that would materialize as ",(0,i.mdx)("inlineCode",{parentName:"p"},"via-value:M1"),":"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "parent": "Lcom/example/Class",\n "name": "log"\n }\n ],\n "model": {\n "sinks": [\n {\n "port": "Argument(1)",\n "kind": "SinkKind",\n "via_value_of": [\n "Argument(0)"\n ]\n }\n ]\n }\n}\n')),(0,i.mdx)("p",null,"Note that this only works for numeric and string literals. In cases where the argument is not a constant, the feature will appear as ",(0,i.mdx)("inlineCode",{parentName:"p"},"via-value:unknown"),"."),(0,i.mdx)("h3",{id:"taint-broadening"},"Taint Broadening"),(0,i.mdx)("p",null,(0,i.mdx)("strong",{parentName:"p"},"Taint broadening")," (also called ",(0,i.mdx)("strong",{parentName:"p"},"collapsing"),") happens when Mariana Trench needs to make an approximation about a taint flow. It is the operation of reducing a ",(0,i.mdx)("strong",{parentName:"p"},"taint tree")," into a single element. A ",(0,i.mdx)("strong",{parentName:"p"},"taint tree")," is a tree where edges are field names and nodes are taint element. This is how Mariana Trench represents internally which fields (or sequence of fields) are tainted."),(0,i.mdx)("p",null,"For instance, analyzing the following code:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"MyClass var = new MyClass();\nvar.a = sourceX();\nvar.b.c = sourceY();\nvar.b.d = sourceZ();\n")),(0,i.mdx)("p",null,"The taint tree of variable ",(0,i.mdx)("inlineCode",{parentName:"p"},"var")," would be:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre"}," .\n a / \\ b\n { X } .\n c / \\ d\n { Y } { Z }\n")),(0,i.mdx)("p",null,"After collapsing, the tree is reduced to a single node ",(0,i.mdx)("inlineCode",{parentName:"p"},"{ X, Y, Z }"),", which is less precise."),(0,i.mdx)("p",null,"In conclusion, taint broadening effectively leads to considering the whole object as tainted while only some specific fields were initially tainted. This might happen for the correctness of the analysis or for performance reasons."),(0,i.mdx)("p",null,"In the following sections, we will discuss when collapsing can happen. In most cases, a feature is automatically added on collapsed taint to help detect false positives."),(0,i.mdx)("h4",{id:"propagation-broadening"},"Propagation Broadening"),(0,i.mdx)("p",null,"Taint collapsing is applied when taint is propagated through a method."),(0,i.mdx)("p",null,"For instance:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"MyClass input = new MyClass();\ninput.a = SourceX();\nMyClass output = SomeClass.UnknownMethod(input);\nSink(output.b); // Considered an issue since `output` is considered tainted. This could be a False Negative without collapsing.\n")),(0,i.mdx)("p",null,"In that case, the ",(0,i.mdx)("a",{parentName:"p",href:"#feature"},"feature")," ",(0,i.mdx)("inlineCode",{parentName:"p"},"via-propagation-broadening")," will be automatically added on the taint. This can help identify false positives."),(0,i.mdx)("p",null,"If you know that this method ",(0,i.mdx)("strong",{parentName:"p"},"preserves the structure")," of the parameter, you could specify a model and disable collapsing using the ",(0,i.mdx)("inlineCode",{parentName:"p"},"collapse")," attribute within a ",(0,i.mdx)("a",{parentName:"p",href:"#propagation"},(0,i.mdx)("inlineCode",{parentName:"a"},"propagation")),":"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": [\n {\n "constraint": "signature_match",\n "parent": "Lcom/example/SomeClass",\n "name": "UnknownMethod"\n }\n ],\n "model": {\n "propagation": [\n {\n "input": "Argument(0)",\n "output": "Return",\n "collapse": false\n }\n ]\n }\n}\n')),(0,i.mdx)("p",null,"Note that Mariana Trench can usually infer when a method propagates taint without collapsing it when it has access to the code of that method and subsequent calls. For instance:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"public String identity(String x) {\n // Automatically infers a propagation `Arg(0) -> Return` with `collapse=false`\n return x;\n}\n")),(0,i.mdx)("h5",{id:"issue-broadening-feature"},"Issue Broadening Feature"),(0,i.mdx)("p",null,"The ",(0,i.mdx)("inlineCode",{parentName:"p"},"via-issue-broadening")," feature is added to issues where the taint flowing into the sink was not held directly on the object passed in but on one of its fields. For example:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"Class input = new Class();\ninput.field = source();\nsink(input); // `input` is not tainted, but `input.field` is tainted and creates an issue\n")),(0,i.mdx)("h5",{id:"widen-broadening-feature"},"Widen Broadening Feature"),(0,i.mdx)("p",null,"For performance reasons, if a given taint tree becomes very large (either in depth or in number of nodes at a given level), Mariana Trench collapses the tree to a smaller size. In these cases, the ",(0,i.mdx)("inlineCode",{parentName:"p"},"via-widen-broadening")," feature is added to the collapsed taint"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"Class input = new Class();\nif (\\* condition *\\) {\n input.field1 = source();\n input.field2 = source();\n ...\n} else {\n input.fieldA = source();\n input.fieldB = source();\n ...\n}\nsink(input); // Too many fields are sources so the whole input object becomes tainted\n")),(0,i.mdx)("h3",{id:"sanitizers"},"Sanitizers"),(0,i.mdx)("p",null,"Specifying sanitizers on a model allow us to stop taint flowing through that method. In Mariana Trench, they can be one of three types -"),(0,i.mdx)("ul",null,(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"sources"),": prevent any taint sources from flowing out of the method"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"sinks"),": prevent taint from reaching any sinks within the method"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"propagations"),": prevent propagations from being inferred between any two ports of the method.")),(0,i.mdx)("p",null,"These can be specified in model generators as follows -"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "methods",\n "where": ...,\n "model": {\n "sanitizers": [\n {\n "sanitize": "sources"\n },\n {\n "sanitize": "sinks"\n },\n {\n "sanitize": "propagations"\n }\n ],\n ...\n }\n}\n')),(0,i.mdx)("p",null,"Note, if there are any user-specificed sources, sinks or propagations on the model, sanitizers will not affect them, but it will prevent them from being propagated outward to callsites."),(0,i.mdx)("h4",{id:"kind-specific-sanitizers"},"Kind-specific Sanitizers"),(0,i.mdx)("p",null,(0,i.mdx)("inlineCode",{parentName:"p"},"sources")," and ",(0,i.mdx)("inlineCode",{parentName:"p"},"sinks")," sanitizers may include a list of kinds (each with or without a partial_label) to restrict the sanitizer to only sanitizing taint of those kinds. (When unspecified, as in the example above, all taint is sanitized regardless of kind)."),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'"sanitizers": [\n {\n "sanitize": "sinks",\n "kinds": [\n {\n "kind": "SinkKindA"\n },\n {\n "kind": "SinkKindB",\n "partial_label": "A"\n }\n ]\n }\n]\n')),(0,i.mdx)("h4",{id:"port-specific-sanitizers"},"Port-specific Sanitizers"),(0,i.mdx)("p",null,"Sanitizers can also specify a specific port (",(0,i.mdx)("a",{parentName:"p",href:"/docs/models#access-path-format"},"access path")," root) they sanitize (ignoring all the rest). This field ",(0,i.mdx)("inlineCode",{parentName:"p"},"port")," has a slightly different meaning for each kind of sanitizer -"),(0,i.mdx)("ul",null,(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"sources"),": represents the output port through which sources may not leave the method"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"sinks"),": represents the input port through which taint may not trigger any sinks within the model"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"propagations"),": represents the input port through which a propagation to any other port may not be inferred")),(0,i.mdx)("p",null,"For example if the following method"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"public void someMethod(Object argument1, Object argument2) {\n toSink(argument1);\n toSink(argument2);\n}\n")),(0,i.mdx)("p",null,"had the following sanitizer in its model,"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'"sanitizers": [\n {\n "sanitize": "sinks",\n "port": "Argument(1)"\n }\n]\n')),(0,i.mdx)("p",null,"Then a source flowing into ",(0,i.mdx)("inlineCode",{parentName:"p"},"argument1")," would be able to cause an issue, but not a source flowing into ",(0,i.mdx)("inlineCode",{parentName:"p"},"argument2"),"."),(0,i.mdx)("p",null,"Kind and port specifications may be included in the same sanitizer."),(0,i.mdx)("h3",{id:"modes"},"Modes"),(0,i.mdx)("p",null,"Modes are used to describe specific behaviors of methods. Available modes are:"),(0,i.mdx)("ul",null,(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"skip-analysis"),": skip the analysis of the method;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"add-via-obscure-feature"),": add a feature/breadcrumb called ",(0,i.mdx)("inlineCode",{parentName:"li"},"via-obscure:")," to sources flowing through this method;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"taint-in-taint-out"),": propagate the taint on arguments to the return value;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"taint-in-taint-this"),": propagate the taint on arguments into the ",(0,i.mdx)("inlineCode",{parentName:"li"},"this")," parameter;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"no-join-virtual-overrides"),": do not consider all possible overrides when handling a virtual call to this method;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"no-collapse-on-propagation"),": do not collapse input paths when applying propagations;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"alias-memory-location-on-invoke"),": aliases existing memory location at the callsite instead of creating a new one;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"strong-write-on-propagation"),": performs a strong write from input path to the output path on propagation;")),(0,i.mdx)("h3",{id:"default-model"},"Default model"),(0,i.mdx)("p",null,"A default model is created for each method, except if it is provided by a model generator. The default model has a set of heuristics:"),(0,i.mdx)("p",null,"If the method has no source code, the model is automatically marked with the modes ",(0,i.mdx)("inlineCode",{parentName:"p"},"skip-analysis")," and ",(0,i.mdx)("inlineCode",{parentName:"p"},"add-via-obscure-feature"),"."),(0,i.mdx)("p",null,"If the method has more than 40 overrides, it is marked with the mode ",(0,i.mdx)("inlineCode",{parentName:"p"},"no-join-virtual-overrides"),"."),(0,i.mdx)("p",null,"Otherwise, the default model is empty (no sources/sinks/propagations)."),(0,i.mdx)("h3",{id:"field-models"},"Field Models"),(0,i.mdx)("p",null,"These models represent user-defined taint on class fields (as opposed to methods, as described in all the previous sections on this page). They are specified in a similar way to method models as described below."),(0,i.mdx)("blockquote",null,(0,i.mdx)("p",{parentName:"blockquote"},(0,i.mdx)("strong",{parentName:"p"},"NOTE:")," Field sources should not be applied to fields that are both final and of a primitive type (",(0,i.mdx)("inlineCode",{parentName:"p"},"int"),", ",(0,i.mdx)("inlineCode",{parentName:"p"},"char"),", ",(0,i.mdx)("inlineCode",{parentName:"p"},"float"),", etc as well as ",(0,i.mdx)("inlineCode",{parentName:"p"},"java.lang.String"),") as the Java compiler optimizes accesses of these fields in the bytecode into accesses of the constant value they hold. In this scenario, Mariana Trench has no way of recognizing that the constant was meant to carry a source.")),(0,i.mdx)("p",null,"Example field model generator for sources:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "fields",\n "where": [\n {\n "constraint": "name",\n "pattern": "SOURCE_EXAMPLE"\n }\n ],\n "model": {\n "sources" : [\n {\n "kind": "FieldSource"\n }\n ]\n }\n}\n')),(0,i.mdx)("p",null,"Example code:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"public class TestClass {\n // Field that we know to be tainted\n public Object SOURCE_EXAMPLE = ...;\n\n void flow() {\n sink(EXAMPLE, ...);\n }\n}\n")),(0,i.mdx)("p",null,"Example field model generator for sinks:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "find": "fields",\n "where": [\n {\n "constraint": "name",\n "pattern": "SINK_EXAMPLE"\n }\n ],\n "model": {\n "sinks" : [\n {\n "kind": "FieldSink"\n }\n ]\n }\n}\n')),(0,i.mdx)("p",null,"Example code:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},"public class TestClass {\n public Object SINK_EXAMPLE = ...;\n\n void flow() {\n SINK_EXAMPLE = source();\n }\n}\n")),(0,i.mdx)("p",null,"Field signature formats follow the Dalvik bytecode format similar to methods as discussed ",(0,i.mdx)("a",{parentName:"p",href:"#method-name-format"},"above"),". This is of the form ",(0,i.mdx)("inlineCode",{parentName:"p"},".:"),"."),(0,i.mdx)("h3",{id:"literal-models"},"Literal Models"),(0,i.mdx)("p",null,"Literal models represent user-defined taints on string literals matching configurable regular expressions. They can only be configured as sources and are intended to identify suspicious patterns, such as user-controlled data being concatenated with a string literal which looks like an SQL query."),(0,i.mdx)("blockquote",null,(0,i.mdx)("p",{parentName:"blockquote"},(0,i.mdx)("strong",{parentName:"p"},"NOTE:")," Each use of a literal in the analysed code which matches a pattern in a literal model will generate a new taint which needs to be explored by Mariana Trench. Using overly broad patterns like ",(0,i.mdx)("inlineCode",{parentName:"p"},".*")," should thus be avoided, as they can lead to poor performance and high memory usage.")),(0,i.mdx)("p",null,"Example literal models:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre"},'[\n {\n "pattern": "SELECT \\\\*.*",\n "description": "Potential SQL Query",\n "sources": [\n {\n "kind": "SqlQuery"\n }\n ]\n },\n {\n "pattern": "AI[0-9A-Z]{16}",\n "description": "Suspected Google API Key",\n "sources": [\n {\n "kind": "GoogleAPIKey"\n }\n ]\n }\n]\n')),(0,i.mdx)("p",null,"Example code:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-java"},'void testRegexSource() {\n String prefix = "SELECT * FROM USERS WHERE id = ";\n String aci = getAttackerControlledInput();\n String query = prefix + aci; // Sink\n}\n\nvoid testRegexSourceGoogleApiKey() {\n String secret = "AIABCD1234EFGH5678";\n sink(secret);\n}\n')),(0,i.mdx)("h2",{id:"model-generators"},"Model Generators"),(0,i.mdx)("p",null,"Mariana Trench allows for dynamic model specifications. This allows a user to specify models of methods before running the analysis. This is used to specify sources, sinks, propagation and modes."),(0,i.mdx)("p",null,"Model generators are specified in a generator configuration file, specified by the ",(0,i.mdx)("inlineCode",{parentName:"p"},"--generator-configuration-path")," parameter. By default, we use ",(0,i.mdx)("a",{parentName:"p",href:"https://github.com/facebook/mariana-trench/blob/main/configuration/default_generator_config.json"},(0,i.mdx)("inlineCode",{parentName:"a"},"default_generator_config.json")),"."),(0,i.mdx)("h3",{id:"example"},"Example"),(0,i.mdx)("p",null,"Examples of model generators are located in the ",(0,i.mdx)("a",{parentName:"p",href:"https://github.com/facebook/mariana-trench/tree/main/configuration/model-generators"},(0,i.mdx)("inlineCode",{parentName:"a"},"configuration/model-generators"))," directory."),(0,i.mdx)("p",null,"Below is an example of a JSON model generator:"),(0,i.mdx)("pre",null,(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "model_generators": [\n {\n "find": "methods",\n "where": [{"constraint": "name", "pattern": "toString"}],\n "model": {\n "propagation": [\n {\n "input": "Argument(0)",\n "output": "Return"\n }\n ]\n }\n },\n {\n "find": "methods",\n "where": [\n {\n "constraint": "parent",\n "inner": {\n "constraint": "extends",\n "inner": {\n "constraint": "name",\n "pattern": "SandcastleCommand"\n }\n }\n },\n {"constraint": "name", "pattern": "Time"}\n ],\n "model": {\n "sources": [\n {\n "kind": "Source",\n "port": "Return"\n }\n ]\n }\n },\n {\n "find": "methods",\n "where": [\n {\n "constraint": "parent",\n "inner": {\n "constraint": "extends",\n "inner": {"constraint": "name", "pattern": "IEntWithPurposePolicy"}\n }\n },\n {"constraint": "name", "pattern": "gen.*"},\n {\n "constraint": "parameter",\n "idx": 0,\n "inner": {\n "constraint": "type",\n "kind": "extends",\n "class": "IViewerContext"\n }\n },\n {\n "constraint": "return",\n "inner": {\n "constraint": "extends",\n "inner": {"constraint": "name", "pattern": "Ent"}\n }\n }\n ],\n "model": {\n "modes": ["add-via-obscure-feature"],\n "sinks": [\n {\n "kind": "Sink",\n "port": "Argument(0)",\n "features": ["via-gen"]\n }\n ]\n }\n }\n ]\n}\n')),(0,i.mdx)("h3",{id:"specification"},"Specification"),(0,i.mdx)("p",null,"Each JSON file is a JSON object with a key ",(0,i.mdx)("inlineCode",{parentName:"p"},"model_generators"),' associated with a list of "rules".'),(0,i.mdx)("p",null,'Each "rule" defines a "filter" (which uses "constraints" to specify methods for which a "model" should be generated) and a "model". A rule has the following key/values:'),(0,i.mdx)("ul",null,(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("inlineCode",{parentName:"p"},"find"),": The type of thing to find. We support ",(0,i.mdx)("inlineCode",{parentName:"p"},"methods")," and ",(0,i.mdx)("inlineCode",{parentName:"p"},"fields"),";")),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("inlineCode",{parentName:"p"},"where"),': A list of "constraints". All constraints ',(0,i.mdx)("strong",{parentName:"p"},"must be satisfied")," by a method or field in order to generate a model for it. All the constraints are listed below, grouped by the type of object they are applied to:"),(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("strong",{parentName:"p"},"Method"),":"),(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"signature_match"),": Expects at least one of the two allowed groups of extra properties: ",(0,i.mdx)("inlineCode",{parentName:"li"},"[name | names] [parent | parents | extends [include_self]]")," where:",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"name")," (a single string) or ",(0,i.mdx)("inlineCode",{parentName:"li"},"names")," (a list of alternative strings): is exact matched to the method name"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"parent")," (a single string) or ",(0,i.mdx)("inlineCode",{parentName:"li"},"parents")," (a list of alternative strings) is exact matched to the class of the method or ",(0,i.mdx)("inlineCode",{parentName:"li"},"extends")," (either a single string or a list of alternative strings) is exact matched to the base classes or interfaces of the method. ",(0,i.mdx)("inlineCode",{parentName:"li"},"extends")," allows an optional property ",(0,i.mdx)("inlineCode",{parentName:"li"},"include_self")," which is a boolean to indicate if the constraint is applied to the class itself or not (defaults to ",(0,i.mdx)("inlineCode",{parentName:"li"},"true"),")."))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"signature | signature_pattern"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"pattern")," which is a regex to fully match the full signature (class, method, argument types) of a method;",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("strong",{parentName:"li"},"NOTE:")," Usage of this constraint is discouraged as it has poor performance. Try using ",(0,i.mdx)("inlineCode",{parentName:"li"},"signature_match")," instead!"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"parent"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"inner")," ","[Type]"," which contains a nested constraint to apply to the class holding the method;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"parameter"),": Expects an extra properties ",(0,i.mdx)("inlineCode",{parentName:"li"},"idx")," and ",(0,i.mdx)("inlineCode",{parentName:"li"},"inner")," ","[Parameter]"," or ","[Type]",", matches when the idx-th parameter of the function or method matches the nested constraint inner;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"any_parameter"),": Expects an optional extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"start_idx")," and ",(0,i.mdx)("inlineCode",{parentName:"li"},"inner")," ","[Parameter]"," or ","[Type]",", matches when there is any parameters (starting at start_idx) of the function or method matches the nested constraint inner;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"return"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"inner")," ","[Type]"," which contains a nested constraint to apply to the return of the method;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"is_static | is_constructor | is_native | has_code"),": Accepts an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"value")," which is either ",(0,i.mdx)("inlineCode",{parentName:"li"},"true")," or ",(0,i.mdx)("inlineCode",{parentName:"li"},"false"),". By default, ",(0,i.mdx)("inlineCode",{parentName:"li"},"value")," is considered ",(0,i.mdx)("inlineCode",{parentName:"li"},"true"),";"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"number_parameters"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"inner")," ","[Integer]"," which contains a nested constraint to apply to the number of parameters (counting the implicit ",(0,i.mdx)("inlineCode",{parentName:"li"},"this")," parameter);"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"number_overrides"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"inner")," ","[Integer]"," which contains a nested constraint to apply on the number of method overrides."))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("strong",{parentName:"p"},"Parameter:")),(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"parameter_has_annotation"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"type")," and an optional property ",(0,i.mdx)("inlineCode",{parentName:"li"},"pattern"),", respectively a string and a regex fully matching the value of the parameter annotation."))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("strong",{parentName:"p"},"Type:")),(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"extends"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"inner")," ","[Type]"," which contains a nested constraint that must apply to one of the base classes or itself. The optional property ",(0,i.mdx)("inlineCode",{parentName:"li"},"include_self")," is a boolean that tells whether the constraint must be applied on the type itself or not (defaults to ",(0,i.mdx)("inlineCode",{parentName:"li"},"true"),");"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"super"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"inner")," ","[Type]"," which contains a nested constraint that must apply on the direct superclass;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"is_class | is_interface"),": Accepts an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"value")," which is either ",(0,i.mdx)("inlineCode",{parentName:"li"},"true")," or ",(0,i.mdx)("inlineCode",{parentName:"li"},"false"),". By default, ",(0,i.mdx)("inlineCode",{parentName:"li"},"value")," is considered ",(0,i.mdx)("inlineCode",{parentName:"li"},"true"),";"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("strong",{parentName:"p"},"Field"),":"),(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"signature"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"pattern")," which is a regex to fully match the full signature of the field. This is of the form ",(0,i.mdx)("inlineCode",{parentName:"li"},".:"),";"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"parent"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"inner")," ","[Type]"," which contains a nested constraint to apply to the class holding the field;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"is_static"),": Accepts an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"value")," which is either ",(0,i.mdx)("inlineCode",{parentName:"li"},"true")," or ",(0,i.mdx)("inlineCode",{parentName:"li"},"false"),". By default, ",(0,i.mdx)("inlineCode",{parentName:"li"},"value")," is considered ",(0,i.mdx)("inlineCode",{parentName:"li"},"true"),";"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("strong",{parentName:"p"},"Method, Type or Field:")),(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"name"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"pattern")," which is a regex to fully match the name of the item;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"has_annotation"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"type")," and an optional property ",(0,i.mdx)("inlineCode",{parentName:"li"},"pattern"),", respectively a string and a regex fully matching the value of the annotation."),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"visibility"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"is")," which is either ",(0,i.mdx)("inlineCode",{parentName:"li"},"public"),", ",(0,i.mdx)("inlineCode",{parentName:"li"},"private")," or ",(0,i.mdx)("inlineCode",{parentName:"li"},"protected"),"; (Note this does not apply to ",(0,i.mdx)("inlineCode",{parentName:"li"},"Field"),")"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("strong",{parentName:"p"},"Integer:")),(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"< | <= | == | > | >= | !="),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"value")," which contains an integer that the input integer is compared with. The input is the left hand side."))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("strong",{parentName:"p"},"Any (Method, Parameter, Type, Field or Integer):")),(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"all_of"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"inners")," ","[Any]"," which is an array holding nested constraints which must all apply;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"any_of"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"inners")," ","[Any]"," which is an array holding nested constraints where one of them must apply;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"not"),": Expects an extra property ",(0,i.mdx)("inlineCode",{parentName:"li"},"inner")," ","[Any]"," which contains a nested constraint that should not apply. (Note this is not yet implemented for ",(0,i.mdx)("inlineCode",{parentName:"li"},"Field"),"s)"))))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("inlineCode",{parentName:"p"},"model"),": A model, describing sources/sinks/propagations/etc."),(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("strong",{parentName:"p"},"For method models")),(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"sources"),"*",": A list of sources, i.e a source flowing out of the method via return value or flowing in via an argument. A source has the following key/values:",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"kind"),": The source name;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"port"),"*","*",": The source access path (e.g, ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Return"')," or ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Argument(1)"'),");"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"features"),"*",": A list of features/breadcrumbs names;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"via_type_of"),"*",": A list of ports;"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"sinks"),"*",": A list of sinks, i.e describing that a parameter of the method flows into a sink. A sink has the following key/values:",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"kind"),": The sink name;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"port"),": The sink access path (e.g, ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Return"')," or ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Argument(1)"'),");"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"features"),"*",": A list of features/breadcrumbs names;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"via_type_of"),"*",": A list of ports;"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"propagation"),"*",": A list of propagations (also called passthrough) that describe whether a taint on a parameter should result in a taint on the return value or another parameter. A propagation has the following key/values:",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"input"),": The input access path (e.g, ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Argument(1)"'),");"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"output"),": The output access path (e.g, ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Return"')," or ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Argument(2)"'),");"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"features"),"*",": A list of features/breadcrumbs names;"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"attach_to_sources"),"*",": A list of attach-to-sources that describe that all sources flowing out of the method on the given parameter or return value must have the given features. An attach-to-source has the following key/values:",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"port"),": The access path root (e.g, ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Return"')," or ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Argument(1)"'),");"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"features"),": A list of features/breadcrumb names;"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"attach_to_sinks"),"*",": A list of attach-to-sinks that describe that all sources flowing in the method on the given parameter must have the given features. An attach-to-sink has the following key/values:",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"port"),": The access path root (e.g, ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Argument(1)"'),");"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"features"),": A list of features/breadcrumb names;"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"attach_to_propagations"),"*",": A list of attach-to-propagations that describe that inferred propagations of sources flowing in or out of a given parameter or return value must have the given features. An attach-to-propagation has the following key/values:",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"port"),": The access path root (e.g, ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Return"')," or ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Argument(1)"'),");"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"features"),": A list of features/breadcrumb names;"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"add_features_to_parameters"),"*",": A list of add-features-to-parameters that describe that flows that might flow on the given parameter must have the given features. An add-features-to-parameter has the following key/values:",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"port"),": The access path root (e.g, ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Argument(1)"'),");"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"features"),": A list of features/breadcrumb names;"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"modes"),"*",": A list of mode names that describe specific behaviors of a method;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"for_all_parameters"),": Generate sources/sinks/propagations/attach",(0,i.mdx)("em",{parentName:"li"},"to"),"*"," for all parameters of a method that satisfy some constraints. It accepts the following key/values:",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"variable"),": A symbolic name for the parameter;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"where"),": An optional list of ","[Parameter]"," or ","[Type]"," constraints on the parameter;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"sources | sinks | propagation"),': Same as under "model", but we accept the variable name as a parameter number.'))))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("inlineCode",{parentName:"p"},"verbosity"),"*",": A logging level, to help debugging. 1 is the most verbose, 5 is the least. The default verbosity level is 5.")),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("p",{parentName:"li"},(0,i.mdx)("strong",{parentName:"p"},"For Field models")),(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"sources"),"*",": A list of sources the field should hold. A source has the following key/values:",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"kind"),": The source name;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"features"),"*",": A list of features/breadcrumbs names;"))),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"sinks"),"*",": A list of sinks the field should hold. A sink has the following key/values:",(0,i.mdx)("ul",{parentName:"li"},(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"kind"),": The sink name;"),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"features"),"*",": A list of features/breadcrumds names;")))))))),(0,i.mdx)("p",null,"In the above bullets,"),(0,i.mdx)("ul",null,(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"*")," denotes optional key/value."),(0,i.mdx)("li",{parentName:"ul"},(0,i.mdx)("inlineCode",{parentName:"li"},"**")," denotes optional key/value. Default is ",(0,i.mdx)("inlineCode",{parentName:"li"},'"Return"'),".")),(0,i.mdx)("p",null,"Note, the implicit ",(0,i.mdx)("inlineCode",{parentName:"p"},"this")," parameter for methods has the parameter number 0."),(0,i.mdx)("h3",{id:"development"},"Development"),(0,i.mdx)("h4",{id:"when-sources-or-sinks-dont-appear-in-results"},"When Sources or Sinks don't appear in Results"),(0,i.mdx)("ol",null,(0,i.mdx)("li",{parentName:"ol"},(0,i.mdx)("p",{parentName:"li"},"This could be because your model generator did not find any method matching your query. You can use the ",(0,i.mdx)("inlineCode",{parentName:"p"},'"verbosity": 1')," option in your model generator to check if it matched any method. For instance:"),(0,i.mdx)("pre",{parentName:"li"},(0,i.mdx)("code",{parentName:"pre",className:"language-json"},'{\n "model_generators": [\n {\n "find": "methods",\n "where": /* ... */,\n "model": {\n /* ... */\n },\n "verbosity": 1\n }\n ]\n}\n')),(0,i.mdx)("p",{parentName:"li"},"When running mariana trench, this should print:"),(0,i.mdx)("pre",{parentName:"li"},(0,i.mdx)("code",{parentName:"pre"},"INFO Method `...` satisfies all constraints in json model generator ...\n"))),(0,i.mdx)("li",{parentName:"ol"},(0,i.mdx)("p",{parentName:"li"},"Make sure that your model generator is actually running. You can use the ",(0,i.mdx)("inlineCode",{parentName:"p"},"--verbosity 2")," option to check that. Make sure your model generator is specified in ",(0,i.mdx)("inlineCode",{parentName:"p"},"configuration/default_generator_config.json"),".")),(0,i.mdx)("li",{parentName:"ol"},(0,i.mdx)("p",{parentName:"li"},"You can also check the output models. Use ",(0,i.mdx)("inlineCode",{parentName:"p"},"grep SourceKind models@*")," to see if your source or sink kind exists. Use ",(0,i.mdx)("inlineCode",{parentName:"p"},"grep 'Lcom/example/;.:' models@*")," to see if a given method exists in the app."))),(0,i.mdx)(m,{mdxType:"FbModels"}))}c.isMDXComponent=!0}}]); \ No newline at end of file diff --git a/assets/js/runtime~main.4240a0ef.js b/assets/js/runtime~main.12fefa2a.js similarity index 85% rename from assets/js/runtime~main.4240a0ef.js rename to assets/js/runtime~main.12fefa2a.js index e7e535dc..4515853e 100644 --- a/assets/js/runtime~main.4240a0ef.js +++ b/assets/js/runtime~main.12fefa2a.js @@ -1 +1 @@ -(()=>{"use strict";var e,t,r,a,c,f={},d={};function o(e){var t=d[e];if(void 0!==t)return t.exports;var r=d[e]={id:e,loaded:!1,exports:{}};return f[e].call(r.exports,r,r.exports,o),r.loaded=!0,r.exports}o.m=f,o.c=d,e=[],o.O=(t,r,a,c)=>{if(!r){var f=1/0;for(b=0;b=c)&&Object.keys(o.O).every((e=>o.O[e](r[n])))?r.splice(n--,1):(d=!1,c0&&e[b-1][2]>c;b--)e[b]=e[b-1];e[b]=[r,a,c]},o.n=e=>{var t=e&&e.__esModule?()=>e.default:()=>e;return o.d(t,{a:t}),t},r=Object.getPrototypeOf?e=>Object.getPrototypeOf(e):e=>e.__proto__,o.t=function(e,a){if(1&a&&(e=this(e)),8&a)return e;if("object"==typeof e&&e){if(4&a&&e.__esModule)return e;if(16&a&&"function"==typeof e.then)return e}var c=Object.create(null);o.r(c);var f={};t=t||[null,r({}),r([]),r(r)];for(var d=2&a&&e;"object"==typeof d&&!~t.indexOf(d);d=r(d))Object.getOwnPropertyNames(d).forEach((t=>f[t]=()=>e[t]));return f.default=()=>e,o.d(c,f),c},o.d=(e,t)=>{for(var r in t)o.o(t,r)&&!o.o(e,r)&&Object.defineProperty(e,r,{enumerable:!0,get:t[r]})},o.f={},o.e=e=>Promise.all(Object.keys(o.f).reduce(((t,r)=>(o.f[r](e,t),t)),[])),o.u=e=>"assets/js/"+({13:"01a85c17",42:"b27405df",48:"dc74e05e",53:"935f2afb",89:"a6aa9e1f",93:"123f1cf0",102:"1287daf0",103:"ccc49370",110:"66406991",133:"d6ed0749",178:"096bfee4",195:"c4f5d8e4",361:"143ae67f",434:"9c92d0fc",453:"30a24c52",477:"b2f554cd",485:"08379d6f",506:"f7096b83",512:"fdc007b8",514:"1be78505",533:"b2b675dd",535:"814f3328",608:"9e4087bc",610:"6875c492",633:"031793e1",648:"747cb41a",683:"068c57da",713:"a7023ddc",800:"208ad3c1",848:"a2d634e7",918:"17896441",949:"e013b240",959:"6eb23e6e"}[e]||e)+"."+{13:"e74cb76f",42:"392ad01a",48:"8af97fbe",53:"792c8ea3",89:"9d8e1dc6",93:"8c08c261",102:"24133656",103:"5cd26f2b",110:"f24f8aa5",133:"2129e329",178:"e16cdc43",195:"d9dc1603",361:"a5e062e2",434:"97cddd67",453:"857ada13",477:"6c544c1e",485:"24180917",506:"12141996",512:"b88c52f2",514:"1ad936f6",533:"37b72b08",535:"7b549cba",608:"370d1eb5",610:"3838b8d3",633:"d46437ba",648:"c3d27131",683:"13450460",713:"be55e6f4",800:"4431c552",848:"f1914ba1",887:"c5736f9a",918:"a29b3b89",949:"49992f75",959:"5251cbb0",972:"b8d9e251"}[e]+".js",o.miniCssF=e=>{},o.g=function(){if("object"==typeof globalThis)return globalThis;try{return this||new Function("return this")()}catch(e){if("object"==typeof window)return window}}(),o.o=(e,t)=>Object.prototype.hasOwnProperty.call(e,t),a={},c="website:",o.l=(e,t,r,f)=>{if(a[e])a[e].push(t);else{var d,n;if(void 0!==r)for(var i=document.getElementsByTagName("script"),b=0;b{d.onerror=d.onload=null,clearTimeout(s);var c=a[e];if(delete a[e],d.parentNode&&d.parentNode.removeChild(d),c&&c.forEach((e=>e(r))),t)return t(r)},s=setTimeout(u.bind(null,void 0,{type:"timeout",target:d}),12e4);d.onerror=u.bind(null,d.onerror),d.onload=u.bind(null,d.onload),n&&document.head.appendChild(d)}},o.r=e=>{"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(e,"__esModule",{value:!0})},o.nmd=e=>(e.paths=[],e.children||(e.children=[]),e),o.p="/",o.gca=function(e){return e={17896441:"918",66406991:"110","01a85c17":"13",b27405df:"42",dc74e05e:"48","935f2afb":"53",a6aa9e1f:"89","123f1cf0":"93","1287daf0":"102",ccc49370:"103",d6ed0749:"133","096bfee4":"178",c4f5d8e4:"195","143ae67f":"361","9c92d0fc":"434","30a24c52":"453",b2f554cd:"477","08379d6f":"485",f7096b83:"506",fdc007b8:"512","1be78505":"514",b2b675dd:"533","814f3328":"535","9e4087bc":"608","6875c492":"610","031793e1":"633","747cb41a":"648","068c57da":"683",a7023ddc:"713","208ad3c1":"800",a2d634e7:"848",e013b240:"949","6eb23e6e":"959"}[e]||e,o.p+o.u(e)},(()=>{var e={303:0,532:0};o.f.j=(t,r)=>{var a=o.o(e,t)?e[t]:void 0;if(0!==a)if(a)r.push(a[2]);else if(/^(303|532)$/.test(t))e[t]=0;else{var c=new Promise(((r,c)=>a=e[t]=[r,c]));r.push(a[2]=c);var f=o.p+o.u(t),d=new Error;o.l(f,(r=>{if(o.o(e,t)&&(0!==(a=e[t])&&(e[t]=void 0),a)){var c=r&&("load"===r.type?"missing":r.type),f=r&&r.target&&r.target.src;d.message="Loading chunk "+t+" failed.\n("+c+": "+f+")",d.name="ChunkLoadError",d.type=c,d.request=f,a[1](d)}}),"chunk-"+t,t)}},o.O.j=t=>0===e[t];var t=(t,r)=>{var a,c,f=r[0],d=r[1],n=r[2],i=0;if(f.some((t=>0!==e[t]))){for(a in d)o.o(d,a)&&(o.m[a]=d[a]);if(n)var b=n(o)}for(t&&t(r);i{"use strict";var e,t,r,a,c,f={},d={};function o(e){var t=d[e];if(void 0!==t)return t.exports;var r=d[e]={id:e,loaded:!1,exports:{}};return f[e].call(r.exports,r,r.exports,o),r.loaded=!0,r.exports}o.m=f,o.c=d,e=[],o.O=(t,r,a,c)=>{if(!r){var f=1/0;for(i=0;i=c)&&Object.keys(o.O).every((e=>o.O[e](r[n])))?r.splice(n--,1):(d=!1,c0&&e[i-1][2]>c;i--)e[i]=e[i-1];e[i]=[r,a,c]},o.n=e=>{var t=e&&e.__esModule?()=>e.default:()=>e;return o.d(t,{a:t}),t},r=Object.getPrototypeOf?e=>Object.getPrototypeOf(e):e=>e.__proto__,o.t=function(e,a){if(1&a&&(e=this(e)),8&a)return e;if("object"==typeof e&&e){if(4&a&&e.__esModule)return e;if(16&a&&"function"==typeof e.then)return e}var c=Object.create(null);o.r(c);var f={};t=t||[null,r({}),r([]),r(r)];for(var d=2&a&&e;"object"==typeof d&&!~t.indexOf(d);d=r(d))Object.getOwnPropertyNames(d).forEach((t=>f[t]=()=>e[t]));return f.default=()=>e,o.d(c,f),c},o.d=(e,t)=>{for(var r in t)o.o(t,r)&&!o.o(e,r)&&Object.defineProperty(e,r,{enumerable:!0,get:t[r]})},o.f={},o.e=e=>Promise.all(Object.keys(o.f).reduce(((t,r)=>(o.f[r](e,t),t)),[])),o.u=e=>"assets/js/"+({13:"01a85c17",42:"b27405df",48:"dc74e05e",53:"935f2afb",89:"a6aa9e1f",93:"123f1cf0",102:"1287daf0",103:"ccc49370",110:"66406991",133:"d6ed0749",178:"096bfee4",195:"c4f5d8e4",361:"143ae67f",434:"9c92d0fc",453:"30a24c52",477:"b2f554cd",485:"08379d6f",506:"f7096b83",512:"fdc007b8",514:"1be78505",533:"b2b675dd",535:"814f3328",608:"9e4087bc",610:"6875c492",633:"031793e1",648:"747cb41a",683:"068c57da",713:"a7023ddc",800:"208ad3c1",848:"a2d634e7",918:"17896441",949:"e013b240",959:"6eb23e6e"}[e]||e)+"."+{13:"e74cb76f",42:"392ad01a",48:"8af97fbe",53:"792c8ea3",89:"9d8e1dc6",93:"8c08c261",102:"24133656",103:"5cd26f2b",110:"f24f8aa5",133:"f1f4c22d",178:"e16cdc43",195:"d9dc1603",361:"a5e062e2",434:"97cddd67",453:"857ada13",477:"6c544c1e",485:"24180917",506:"12141996",512:"b88c52f2",514:"1ad936f6",533:"37b72b08",535:"7b549cba",608:"370d1eb5",610:"3838b8d3",633:"d46437ba",648:"c3d27131",683:"13450460",713:"be55e6f4",800:"4431c552",848:"f1914ba1",887:"c5736f9a",918:"a29b3b89",949:"49992f75",959:"5251cbb0",972:"b8d9e251"}[e]+".js",o.miniCssF=e=>{},o.g=function(){if("object"==typeof globalThis)return globalThis;try{return this||new Function("return this")()}catch(e){if("object"==typeof window)return window}}(),o.o=(e,t)=>Object.prototype.hasOwnProperty.call(e,t),a={},c="website:",o.l=(e,t,r,f)=>{if(a[e])a[e].push(t);else{var d,n;if(void 0!==r)for(var b=document.getElementsByTagName("script"),i=0;i{d.onerror=d.onload=null,clearTimeout(s);var c=a[e];if(delete a[e],d.parentNode&&d.parentNode.removeChild(d),c&&c.forEach((e=>e(r))),t)return t(r)},s=setTimeout(u.bind(null,void 0,{type:"timeout",target:d}),12e4);d.onerror=u.bind(null,d.onerror),d.onload=u.bind(null,d.onload),n&&document.head.appendChild(d)}},o.r=e=>{"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(e,"__esModule",{value:!0})},o.nmd=e=>(e.paths=[],e.children||(e.children=[]),e),o.p="/",o.gca=function(e){return e={17896441:"918",66406991:"110","01a85c17":"13",b27405df:"42",dc74e05e:"48","935f2afb":"53",a6aa9e1f:"89","123f1cf0":"93","1287daf0":"102",ccc49370:"103",d6ed0749:"133","096bfee4":"178",c4f5d8e4:"195","143ae67f":"361","9c92d0fc":"434","30a24c52":"453",b2f554cd:"477","08379d6f":"485",f7096b83:"506",fdc007b8:"512","1be78505":"514",b2b675dd:"533","814f3328":"535","9e4087bc":"608","6875c492":"610","031793e1":"633","747cb41a":"648","068c57da":"683",a7023ddc:"713","208ad3c1":"800",a2d634e7:"848",e013b240:"949","6eb23e6e":"959"}[e]||e,o.p+o.u(e)},(()=>{var e={303:0,532:0};o.f.j=(t,r)=>{var a=o.o(e,t)?e[t]:void 0;if(0!==a)if(a)r.push(a[2]);else if(/^(303|532)$/.test(t))e[t]=0;else{var c=new Promise(((r,c)=>a=e[t]=[r,c]));r.push(a[2]=c);var f=o.p+o.u(t),d=new Error;o.l(f,(r=>{if(o.o(e,t)&&(0!==(a=e[t])&&(e[t]=void 0),a)){var c=r&&("load"===r.type?"missing":r.type),f=r&&r.target&&r.target.src;d.message="Loading chunk "+t+" failed.\n("+c+": "+f+")",d.name="ChunkLoadError",d.type=c,d.request=f,a[1](d)}}),"chunk-"+t,t)}},o.O.j=t=>0===e[t];var t=(t,r)=>{var a,c,f=r[0],d=r[1],n=r[2],b=0;if(f.some((t=>0!==e[t]))){for(a in d)o.o(d,a)&&(o.m[a]=d[a]);if(n)var i=n(o)}for(t&&t(r);b Archive | Mariana Trench - +

Archive

Archive

- + \ No newline at end of file diff --git a/blog/index.html b/blog/index.html index 3f91e84b..b375af06 100644 --- a/blog/index.html +++ b/blog/index.html @@ -5,14 +5,14 @@ Blog | Mariana Trench - +
- + \ No newline at end of file diff --git a/blog/tags/facebook/index.html b/blog/tags/facebook/index.html index 52a768d0..fc0972b6 100644 --- a/blog/tags/facebook/index.html +++ b/blog/tags/facebook/index.html @@ -5,14 +5,14 @@ One post tagged with "facebook" | Mariana Trench - +

One post tagged with "facebook"

View All Tags
- + \ No newline at end of file diff --git a/blog/tags/hello/index.html b/blog/tags/hello/index.html index d9598d3f..e78a7262 100644 --- a/blog/tags/hello/index.html +++ b/blog/tags/hello/index.html @@ -5,14 +5,14 @@ One post tagged with "hello" | Mariana Trench - +

One post tagged with "hello"

View All Tags
- + \ No newline at end of file diff --git a/blog/tags/index.html b/blog/tags/index.html index 11756b05..9f597224 100644 --- a/blog/tags/index.html +++ b/blog/tags/index.html @@ -5,14 +5,14 @@ Tags | Mariana Trench - +

Tags

- + \ No newline at end of file diff --git a/blog/welcome/index.html b/blog/welcome/index.html index e60b1791..4db5097d 100644 --- a/blog/welcome/index.html +++ b/blog/welcome/index.html @@ -5,14 +5,14 @@ Welcome | Mariana Trench - +
- + \ No newline at end of file diff --git a/docs/configuration/index.html b/docs/configuration/index.html index b8386cd2..f34b502a 100644 --- a/docs/configuration/index.html +++ b/docs/configuration/index.html @@ -5,14 +5,14 @@ Configuration | Mariana Trench - +

Configuration

Mariana Trench is highly configurable and we recommend that you invest time into adjusting the tool to your specific use cases. At Facebook, we have dedicated security engineers that will spend a significant amount of their time adding new rules and model generators to improve the analysis results.

This page will cover the more important, non-trivial configuration options. Note that you will spend most of your time configuring Mariana Trench writing model generators. These are covered in the next section.

Command Line Options

You can get a full set of options by running mariana-trench --help. The following is an abbreviated version of the output.

$ mariana-trench --help

Target arguments:
--apk-path APK_PATH The APK to analyze.

Output arguments:
--output-directory OUTPUT_DIRECTORY
The directory to store results in.

Configuration arguments:
--system-jar-configuration-path SYSTEM_JAR_CONFIGURATION_PATH
A JSON configuration file with a list of paths to the system jars.
--rules-paths RULES_PATHS
A `;`-separated list of rules files and directories containing rules files.
--repository-root-directory REPOSITORY_ROOT_DIRECTORY
The root of the repository. Resulting paths will be relative to this.
--source-root-directory SOURCE_ROOT_DIRECTORY
The root where source files for the APK can be found.
--model-generator-configuration-paths MODEL_GENERATOR_CONFIGURATION_PATHS
A `;`-separated list of paths specifying JSON configuration files. Each file is a list of paths to JSON model generators relative to the
configuration file or names of CPP model generators.
--model-generator-search-paths MODEL_GENERATOR_SEARCH_PATHS
A `;`-separated list of paths where we look up JSON model generators.
--maximum-source-sink-distance MAXIMUM_SOURCE_SINK_DISTANCE
Limits the distance of sources and sinks from a trace entry point.

--apk-path

Mariana Trench analyzes Dalvik bytecode. You provide it with the android app (APK) to analyze.

--output-directory OUTPUT_DIRECTORY

The output of the analysis is a file containing metadata about the particular run in JSON format as well as sharded files containing data flow specifications for every method in the APK. These files need to be processed by SAPP (see Getting Started) after the analysis. The flag specifies where these files are saved.

--system-jar-configuration-path SYSTEM_JAR_CONFIGURATION_PATH

This path points to a json file containing a list of .jar files that the analysis should include in the analysis. It's important that this contains at least the android.jar on your system. This file is typically located in your android SDK distribution at $ANDROID_SDK/platforms/android-30/android.jar. Without the android.jar, Mariana Trench will not know about many methods from the standard library that might be important for your model generators.

--rules-paths RULES_PATHS

A ; separated search path pointing to files and directories containing rules files. These files specify what taint flows Mariana Trench should look for. Check out the rules.json that's provided by default. It specifies that we want to find flows from user controlled input (ActivityUserInput) into CodeExecution sinks and that this constitutes a remote code execution.

--source-root-directory SOURCE_ROOT_DIRECTORY

Mariana Trench will do a source indexing path before the analysis. This is because Dalvik/Java bytecode does not contain complete location information, only filenames (not paths) and line numbers. The index is later used to emit precise locations.

--model-generator-configuration-paths MODEL_GENERATOR_CONFIGURATION_PATHS

A ; separated set of files containing the names of model generators to run. See default_generator_config.json for an example.

--model-generator-search-paths MODEL_GENERATOR_SEARCH_PATHS

A ; separated search path where Mariana Trench will try to find the model generators specified in the generator configuration.

--maximum-source-sink-distance MAXIMUM_SOURCE_SINK_DISTANCE

For performance reasons it can be useful to limit the maximum length of a trace Mariana Trench tries to find (note that longer traces also tend to be harder to interpret). Due to the modular nature of the analysis the value specified here limits the maximum length from the trace root to the source, and from the trace root to the sink. This means found traces can have length of 2 x MAXIMUM_SOURCE_SINK_DISTANCE.

- + \ No newline at end of file diff --git a/docs/contribution/index.html b/docs/contribution/index.html index dc9aba29..c9a30c1d 100644 --- a/docs/contribution/index.html +++ b/docs/contribution/index.html @@ -5,7 +5,7 @@ Contribution | Mariana Trench - + @@ -15,7 +15,7 @@ We recommend to run this step inside a virtual environment.

$ cd .. # Go back to the root directory
$ python scripts/setup.py \
--binary "$MT_INSTALL_DIRECTORY/bin/mariana-trench-binary" \
--pyredex "$MT_INSTALL_DIRECTORY/bin/pyredex" \
install

Development

If you are making changes to Mariana Trench, you can use the mariana-trench wrapper inside the build directory:

$ cd build
$ ./mariana-trench --help

This way, you don't have to call scripts/setup.py between every changes. Python changes will be automatically picked up. C++ changes will be picked up after running make.

Note that you will need to install all python dependencies:

$ pip install pyre_extensions fb-sapp

Run the tests

To run the tests after building Mariana Trench, use:

$ cd build
$ make check
- + \ No newline at end of file diff --git a/docs/customize-sources-and-sinks/index.html b/docs/customize-sources-and-sinks/index.html index 33b3baca..480ead6f 100644 --- a/docs/customize-sources-and-sinks/index.html +++ b/docs/customize-sources-and-sinks/index.html @@ -5,14 +5,14 @@ Customize Sources and Sinks | Mariana Trench - +

Customize Sources and Sinks

This page provides a high-level overview of the steps needed to update or create new sources and sinks.

Overview

Under the context of Mariana Trench, we talk about sources and sinks in terms of methods (or, rarely, fields). For example, we may say that the return value of a method is a source (or a sink). We may also say that the 2nd parameter of a method is a source (or a sink). Such description of a method is called a "model". See Models & Model Generators for more details about models and writing them.

To define sources or sinks that are not contained in the default set of sources and sinks, a user needs to:

  1. Write one or more JSON files that respect our model generator Domain Specific Language (DSL), which express how to generate models from methods and are hence called "model generators".

    • For example, a model generator may say that, for all methods (that will be analyzed by Mariana Trench) whose name is onActivityResult, specify their 2nd parameter as a source.
    {
    "model_generators": [
    {
    "find": "methods",
    "where": [
    {
    "constraint": "name",
    "pattern": "onActivityResult"
    }
    ],
    "model": {
    "sources": [
    {
    "kind": "TestSensitiveUserInput",
    "port": "Argument(2)"
    }
    ]
    }
    }
    ]
    }
  2. Instruct Mariana Trench to read from your model generator, so that Mariana Trench will generate models at runtime.

    • Intuitively, the models generated (by interpreting model generators) express sources and sinks for each method before running Mariana Trench. Based on such models, Mariana Trench will automatically infer new models for each method at runtime.
    • To instruct Mariana Trench to read from customized JSON model generators, add your json model generator here.
    • Add the model generator name (i.e, the file name) in the JSON configuration file.
  3. Update "rules" if necessary.

    • Background: Mariana Trench categorizes sources and sinks into different "kinds", which are string-typed. For example, a source may have a kind ofJavascriptInterfaceUserInput. A sink may have a kind of Logging. Mariana Trench only finds data flow from sources of a particular kind to sinks of another paritcular kind, which are called "rules". See Rules for writing them.
    • To specify kinds that are not mentioned in the default set of rules or to specify rules that are different than the default rules, you need to specify a new rule in file rules.json, in order to instruct Mariana Trench to find data flow that matches the new rule.
    • For example, to catch flows from TestSensitiveUserInput in the example above and the sink kind Logging, you can add the following rule to the default rules.json:
    {
    "name": "TestRule",
    "code": 18,
    "description": "A test rule",
    "sources": [
    "TestSensitiveUserInput"
    ],
    "sinks": [
    "Logging"
    ]
    }
- + \ No newline at end of file diff --git a/docs/debugging-fp-fns/index.html b/docs/debugging-fp-fns/index.html index e9af44af..23c736d1 100644 --- a/docs/debugging-fp-fns/index.html +++ b/docs/debugging-fp-fns/index.html @@ -5,7 +5,7 @@ Debugging False Positives/False Negatives | Mariana Trench - + @@ -13,7 +13,7 @@

Debugging False Positives/False Negatives

This document is mainly intended for software engineers, to help them debug false positives and false negatives.

## Setup

First, you need to run the analysis on your computer. This will create model@XXX.json files in the current directory, containing the results of the analysis.

Investigate the output models

Now, your objective is to understand in which method we lost the flow (false negative) or introduced the invalid flow (false positive). You will need to look into the output models for that. I recommend to use the explore_models.py bento script.

Run the following command in the directory containing the output model files (i.e, model@XXX.json):

``` python3 -i mariana_trench_repository/scripts/explore_models.py ```

This provides you with a few helper functions:

  index('.')                    Index all available models in the given directory.
method_containing('Foo;.bar') Find all methods containing the given string.
method_matching('Foo.*') Find all methods matching the given regular expression.
get_model('Foo;.bar') Get the model for the given method.
print_model('Foo;.bar') Pretty print the model for the given method.

Use index to index all models first:

In [1]: index()

Now you can search for methods with method_containing and print their models with print_model. You probably want to look at the first or last frame of the trace, to see if the source or sink is present. Then, you will want to follow the frames until you find the problematic method.

Example

Let's suppose I am investigating a false negative, I want to find in which method we are losing the flow. I could start looking at the last frame, i.e the sink:

In [2]: method_containing('Landroid/content/Context;.sendOrderedBroadcast')
Out[2]:
['Landroid/content/Context;.sendOrderedBroadcast:(Landroid/content/Intent;Ljava/lang/String;)V',
'Landroid/content/Context;.sendOrderedBroadcast:(Landroid/content/Intent;Ljava/lang/String;Landroid/content/BroadcastReceiver;Landroid/os/Handler;ILjava/lang/String;Landroid/os/Bundle;)V',
'Landroid/content/Context;.sendOrderedBroadcastAsUser:(Landroid/content/Intent;Landroid/os/UserHandle;Ljava/lang/String;Landroid/content/BroadcastReceiver;Landroid/os/Handler;ILjava/lang/String;Landroid/os/Bundle;)V']

In [3]: print_model('Landroid/content/Context;.sendOrderedBroadcast:(Landroid/content/Intent;Ljava/lang/String;Landroid/content/BroadcastReceiver;Landroid/os/Handler;ILjava/lang/String;Landroid/os/Bundle;)V')
{
"method": "Landroid/content/Context;.sendOrderedBroadcast:(Landroid/content/Intent;Ljava/lang/String;Landroid/content/BroadcastReceiver;Landroid/os/Handler;ILjava/lang/String;Landroid/os/Bundle;)V",
"modes": [
"skip-analysis",
"add-via-obscure-feature",
"taint-in-taint-out",
"taint-in-taint-this",
"no-join-virtual-overrides"
],
"position": {
"path": "android/content/Context.java"
},
...
"sinks": [
{
"callee_port": "Leaf",
"caller_port": "Argument(1)",
"kind": "LaunchingComponent",
"origins": [
"Landroid/content/Context;.sendOrderedBroadcast:(Landroid/content/Intent;Ljava/lang/String;Landroid/content/BroadcastReceiver;Landroid/os/Handler;ILjava/lang/String;Landroid/os/Bundle;)V"
]
}
]
}

As expected, the method has a sink on Argument(1), so we are good for now. Next, I want to check the previous frame, which calls Context.sendOrderedBroadcast:

In [2]: method_containing('ShortcutManagerCompat;.requestPinShortcut:')
Out[2]: ['Landroidx/core/content/pm/ShortcutManagerCompat;.requestPinShortcut:(Landroid/content/Context;Landroidx/core/content/pm/ShortcutInfoCompat;Landroid/content/IntentSender;)Z']

In [3]: print_model('Landroidx/core/content/pm/ShortcutManagerCompat;.requestPinShortcut:(Landroid/content/Context;Landroidx/core/content/pm/ShortcutInfoCompat;Landroid/content/IntentSender;)Z')
{
"method": "Landroidx/core/content/pm/ShortcutManagerCompat;.requestPinShortcut:(Landroid/content/Context;Landroidx/core/content/pm/ShortcutInfoCompat;Landroid/content/IntentSender;)Z",
"position": {
"line": 112,
"path": "androidx/core/content/pm/ShortcutManagerCompat.java"
},
...
"sinks": [
...
{
"always_features": [
"via-obscure",
"via-obscure-taint-in-taint-this",
"via-intent-extra",
"has-intent-extras"
],
"call_position": {
"line": 130,
"path": "androidx/core/content/pm/ShortcutManagerCompat.java"
},
"callee": "Landroid/content/Context;.sendOrderedBroadcast:(Landroid/content/Intent;Ljava/lang/String;Landroid/content/BroadcastReceiver;Landroid/os/Handler;ILjava/lang/String;Landroid/os/Bundle;)V",
"callee_port": "Argument(1)",
"caller_port": "Argument(1).mIntents",
"distance": 1,
"kind": "LaunchingComponent",
"local_positions": [
{
"line": 121
}
],
"origins": [
"Landroid/content/Context;.sendOrderedBroadcast:(Landroid/content/Intent;Ljava/lang/String;Landroid/content/BroadcastReceiver;Landroid/os/Handler;ILjava/lang/String;Landroid/os/Bundle;)V"
]
}
]
}

I can see the frame from ShortcutManagerCompat.requestPinShortcut on Argument(1).mIntents to Context.sendOrderedBroadcast on Argument(1). I can keep following frames until I find the method that misses a source or sink.

For frames from the source to the root callable, I should look at generations, and for frames from the root callable to the sink, I should look at sinks. On the root callable, I should look at issues.

Investigating the transfer function

Once you know in which method you are losing the flow or introducing an invalid flow, you will need to run the analysis with logging enabled for that method, using:

``` mariana-trench \ --apk-path='your-apk' \ --log-method='method-name' ```

This will log everything the transfer function does in that method, which might be a lot of logs. You can pipe this into a file or into less. Using logs, you should be able to see in which instruction you are losing the taint. Remember, the analysis computes a fixpoint, so the method will be analyzed multiple times. You should look at the last time it was analyzed (i.e, end of the logs).

Happy debugging!

- + \ No newline at end of file diff --git a/docs/feature-descriptions/index.html b/docs/feature-descriptions/index.html index ea4ba874..df0fe3bc 100644 --- a/docs/feature-descriptions/index.html +++ b/docs/feature-descriptions/index.html @@ -5,14 +5,14 @@ Feature Glossary | Mariana Trench - +

Feature Glossary

As explained in the features section of the models wiki, a feature can be used to tag a flow and help filtering issues. A feature describes a property of a flow. A feature can be any arbitrary string. A feature that's prefixed with always- signals that every path in the issue has that feature associated with it, while lacking that prefix means that at least one path, but not all paths, contains that feature.

This page will cover the purpose of the pre-configured features to help you understand how you can use them best.

Pre-configured features

  • via-caller-exported
    • This feature is applied when the root callable is directly or indirectly called from an exported component defined in the Android manifest. For example, if the root callable is in the MainActivity and the MainActivity is exported, this feature will be attached. It is needed in order to determine if an Intent source is third-party controllable or not. This feature is sometimes accompanied by via-class which tells you which class Mariana Trench used to determine that the root callable is called from an exported class.
  • via-caller-unexported
    • Same as via-caller-exported but applied if the root callable is considered to be called only via unexported components
  • via-caller-permission
    • Similair to via-caller-exported but applied if the root callable paths to a manifest entry that has a protectionLevel or Android permission declared.
  • via-explicit-intent
    • Applied when the taint flow goes via a class or package name setter on an Intent. This can be used to infer whether a launched Intent can resolve to third party apps or only to a specifically defined app (implicit versus explicit intents).
  • via-inner-class-this
    • Anonymous classes in Java byte code transfer the taint from the parent class to the anonymous class via this.this$0 which can lead to broaden false positives. This feature can be used to filter out such flows when they are a common false positive pattern.
  • cast:[...]
    • Cast features such as cast:boolean are applied when the tainted data is converted to that specific type. This allows for example to filter out data flows such as taintedString.length() where the returned tainted integer may no longer be of interest.
  • via-obscure
    • Obscure methods are methods for which Mariana Trench doesn't have any byte code available. Therefore we generally apply taint-in-taint-out behaviour on these methods and add the feature via-obscure to tell the user that the data flow went along an obscure method.
  • via-[...]-broadening
    • Is applied when any of the four broaden operations is applied (see Models).
- + \ No newline at end of file diff --git a/docs/getting-started/index.html b/docs/getting-started/index.html index c0dfe951..5e7386a3 100644 --- a/docs/getting-started/index.html +++ b/docs/getting-started/index.html @@ -5,14 +5,14 @@ Getting Started | Mariana Trench - +

Getting Started

This guide will walk you through setting up Mariana Trench on your machine and get you to find your first remote code execution vulnerability in a small sample app.

Prerequisites

Mariana Trench requires a recent version of Python. On MacOS you can get a current version through homebrew:

$ brew install python3

On a Debian flavored Linux (Ubuntu, Mint, Debian), you can use apt-get:

$ sudo apt-get install python3 python3-pip python3-venv

This guide also assumes you have the Android SDK installed and an environment variable $ANDROID_SDK pointed to the location of the SDK.

For the rest of this guide, we assume that you are working inside of a virtual environment. You can set this up with

$ python3 -m venv ~/.venvs/mariana-trench
$ source ~/.venvs/mariana-trench/bin/activate
(mariana-trench)$

The name of the virtual environment in front of your shell prompt indicates that the virtual environment is active.

Installing Mariana Trench

Inside your virtual environment installing Mariana Trench is as easy as running

(mariana-trench)$ pip install mariana-trench

Running Mariana Trench

We'll use a small app that is part of our documentation. You can get it by running

(mariana-trench)$ git clone https://github.com/facebook/mariana-trench
(mariana-trench)$ cd mariana-trench/documentation/sample-app

We are now ready to run the analysis

(mariana-trench)$ mariana-trench \
--system-jar-configuration-path=$ANDROID_SDK/platforms/android-30/android.jar
--apk-path=sample-app-debug.apk \
--source-root-directory=app/src/main/java
# ...
INFO Analyzed 68886 models in 4.04s. Found 4 issues!
# ...

The analysis has found 4 issues in our sample app. The output of the analyis is a set of specifications for each method of the application.

Post Processing

The specifications themselves are not meant to be read by humans. We need an additional processing step in order to make the results more presentable. We do this with SAPP PyPi installed for us:

(mariana-trench)$ sapp --tool=mariana-trench analyze .
(mariana-trench)$ sapp --database-name=sapp.db server --source-directory=app/src/main/java
# ...
2021-05-12 12:27:22,867 [INFO] * Running on http://localhost:13337/ (Press CTRL+C to quit)

The last line of the output tells us that SAPP started a local webserver that lets us look at the results. Open the link and you will see the 4 issues found by the analyis.

Exploring Results

Let's focus on the remote code execution issue found in the sample app. You can identify it by its issue code 1 (for all remote code executions) and the callable void MainActivit.onCreate(Bundle). With only 4 issues to see it's easy to identify the issue manually but once more rules run, the filter functionality at the top right of the page comes in handy.

Single Issue Display

The issue tells you that Mariana Trench found a remote code execution in MainActivit.onCreate where the data is coming from Activity.getIntent one call away, and flows into the constructor of ProcessBuilder 3 calls away. Click on "Traces" in the top right corner of the issue to see an example trace.

The trace surfaced by Mariana Trench consists of three parts.

The source trace represents where the data is coming from. In our example, the trace is very short: Activity.getIntent is called in MainActivity.onCreate directly.

Trace Source

The trace root represents where the source trace meets the sink trace. In our example this is the activitie's onCreate method.

Trace Root

The final part of the trace is the sink trace: This is where the data from the source flows down into a sink. In our example from onCreate, to onClick, to execute, and finally into the constructor of ProcessBuilder.

Trace Source

Configuring Mariana Trench

You might be asking yourself, "how does the tool know what is user controlled data, and what is a sink?". This guide is meant to quickly get you started on a small app. We did not cover how to configure Mariana Trench. You can read more about that in the Configuration section.

- + \ No newline at end of file diff --git a/docs/known-false-negatives/index.html b/docs/known-false-negatives/index.html index 69896785..95590666 100644 --- a/docs/known-false-negatives/index.html +++ b/docs/known-false-negatives/index.html @@ -5,14 +5,14 @@ Known False Negatives | Mariana Trench - +

Known False Negatives

Like any static analysis tools, Mariana Trench has false negatives. This documents the more well-known places where taint is dropped. Note that this is not an exhaustive list. See this wiki for instructions on how to debug them.

Many of these options are configurable, not hard limits. There are analysis time, memory, and quality tradeoffs.

Trace too Long

Mariana Trench stops propagating taint beyond a certain depth. This depth is currently configured at 7. In code:

// This method has depth 1.
public int get_source_1() { return source(); }

// This method has depth 2.
public int get_source_2() { return get_source_1(); }

...

// This method has depth 7.
public int get_source_7() { return get_source_6(); }

// This method theoretically has depth 8, but MT drops the source here.
public int get_source_8() { return get_source_7(); }

Workaround: If the chain of wrappers obviously leads to a source or sink, instead of defining the source at source(), one could write an additional model marking get_source_7() as a source.

Fields of Fields of Fields of Fields...

Taint of an object is dropped when it occurs too deep within the object. This depth is configured at 4. In code:

public void taintedThis() {
this.mField1 = source(); // This is OK
this.mField1.mField2.mField3.mField4.mField5 = source(); // This gets dropped
}

Workaround: This isn’t much of a workaround, but one can manually configure the source on “this.mField1.....mField4” instead. This will be a form of over-abstraction and could lead to false positives.

Fanouts

If a virtual method has too many overrides, beyond a certain number (currently configured at 40), we stop considering all overrides and look only at the direct method being called. In code:

interface IFace {
public int possibleSource();
}

class Class1 implements IFace {
public int possibleSource() { return 1; }
}
...

class Class41 implements IFace {
public int possibleSource() { return source(); }
}

int maybeIssue(IFace iface) {
// The source will get dropped here because there are too many overrides.
// MT will not report an issue.
sink(iface.possibleSource());
}


Workaround: Unfortunately, there are no known workarounds.

Propagation across Arguments

Mariana Trench computes propagations for each method (this may be known as “tito” (taint-in-taint-out) in other tools). Propagations tell the analysis that if an argument is tainted by a source, whether its return value, or the method’s “this” object become tainted by the argument. However, without explictly specifying --propagate-across-arguments, Mariana Trench does not propagate taint from one argument to another. In code:

void setIntentVaue(Intent intent, Uri uri) {
// MT sees that intent.putExtra has a propagation from uri (Argument(2)) to
// intent (Argument(0) or this).
intent.putExtra("label", uri);

// However, when it finishes analyzing setIntentValue, it will not track the
// propagation from uri to intent.
}

void falseNegative() {
Uri uri = source();
Intent intent = new Intent();

// If this were the code, MT will detect a source->sink flow at launchActivitySink.
// intent.putExtra("label", uri);

// MT loses the flow from uri->intent at this point.
setIntentValue(intent, uri);

launchActivitySink(intent);
}

Workaround 1: Write an explicit propagation model for the method. While Mariana Trench does not infer propagations across arguments, it does allow manual specification of such models.

Workaround 2: Enable --propagate-across-arguments, which enables taint propagation across method invocations for object. Note that the behaviour is enabled globally, meaning that this may incur a significant runtime and memory overhead.

- + \ No newline at end of file diff --git a/docs/models/index.html b/docs/models/index.html index e4fefdc3..ea404961 100644 --- a/docs/models/index.html +++ b/docs/models/index.html @@ -5,14 +5,14 @@ Models & Model Generators | Mariana Trench - +
-

Models & Model Generators

The main way to configure the analysis is through defining model generators. Each model generator defines (1) a filter, made up of constraints to specify the methods (or fields) for which a model should be generated, and (2) a model, an abstract representation of how data flows through a method.

Model generators are what define Sink and Source kinds which are the key component of Rules. Model generators can do other things too, like attach features (a.k.a. breadcrumbs) to flows and sanitize (redact) flows which go through certain "data-safe" methods (e.g. a method which hashes a user's password).

Filters are conceptually straightforward. Thus, this page focuses heavily on conceptualizing and providing examples for the various types of models. See the Model Generators section for full implementation documentation for both filters and models.

Models

A model is an abstract representation of how data flows through a method.

A model essentialy consists of:

  • Sources: a set of sources that the method produces or receives on parameters;
  • Sinks: a set of sinks on the method;
  • Propagation: a description of how the method propagates taint coming into it (e.g, the first parameter updates the second, the second parameter updates the return value, etc.);
  • Attach to Sources: a set of features/breadcrumbs to add on an any sources flowing out of the method;
  • Attach to Sinks: a set of features/breadcrumbs to add on sinks of a given parameter;
  • Attach to Propagations: a set of features/breadcrumbs to add on propagations for a given parameter or return value;
  • Add Features to Arguments: a set of features/breadcrumbs to add on any taint that might flow in a given parameter;
  • Sanitizers: specifications of taint flows to stop;
  • Modes: a set of flags describing specific behaviors (see below).

Models can be specified in JSON. For example to mark the string parameter to our Logger.log function as a sink we can specify it as

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"parent": "Lcom/example/Logger;",
"name": "log"
}
],
"model": {
"sinks": [
{
"kind": "Logging",
"port": "Argument(1)"
}
]
}
}

Note that the naming of methods follow the Dalvik's bytecode format.

Method name format

The format used for method names is:

<className>.<methodName>:(<parameterType1><parameterType2>)<returnType>

Example: Landroidx/fragment/app/Fragment;.startActivity:(Landroid/content/Intent;)V

For the parameters and return types use the following table to pick the correct one (please refer to JVM doc for more details)

  • V - void
  • Z - boolean
  • B - byte
  • S - short
  • C - char
  • I - int
  • J - long (64 bits)
  • F - float
  • D - double (64 bits)

Classes take the form Lpackage/name/ClassName; - where the leading L indicates that it is a class type, package/name/ is the package that the class is in. A nested class will take the form Lpackage/name/ClassName$NestedClassName (the $ will need to be double escaped \\$ in json regex).

NOTE: Instance (i.e, non-static) method parameters are indexed starting from 1! The 0th parameter is the this parameter in dalvik byte-code. For static method parameter, indices start from 0.

Access path format

An access path describes the symbolic location of a taint. This is commonly used to indicate where a source or a sink originates from. The "port" field of any model is represented by an access path.

An access path is composed of a root and a path.

The root is either:

  • Return, representing the returned value;
  • Argument(x) (where x is an integer), representing the parameter number x;

The path is a (possibly empty) list of path elements. A path element can be any of the following kinds:

  • field: represents a field name. String encoding is a dot followed by the field name: .field_name;
  • index: represents a user defined index for dictionary like objects. String encoding uses square braces to enclose any user defined index: [index_name];
  • any index: represents any or unresolved indices in dictionary like objects. String encoding is an asterisk enclosed in square braces: [*];
  • index from value of: captures the value of the specified callable's port seen at its callsites during taint flow analysis as an index or any index (if the value cannot be resolved). String encoding uses argument root to specify the callable's port and encloses it in [<...>] to represent that its value is resolved at the callsite to create an index: [<Argument(x)>];

Examples:

  • Argument(1).name corresponds to the field name of the second parameter;
  • Argument(1)[name] corresponds to the index name of the dictionary like second parameter;
  • Argument(1)[*] corresponds to any index of the dictionary like second parameter;
  • Argument(1)[<Argument(2)>] corresponds to an index of the dictionary like second parameter whose value is resolved from the third parameter;
  • Return corresponds to the returned value;
  • Return.x correpsonds to the field x of the returned value;

Kinds

A source has a kind that describes its content (e.g, user input, file system, etc). A sink also has a kind that describes the operation the method performs (e.g, execute a command, read a file, etc.). Kinds can be arbitrary strings (e.g, UserInput). We usually avoid whitespaces.

Sources

Sources describe sources produced or received by a given method. A source can either flow out via the return value or flow via a given parameter. A source has a kind that describes its content (e.g, user input, file system, etc).

Here is an example where the source flows by return value:

public static String getPath() {
return System.getenv().get("PATH");
}

The JSON model generator for this method could be:

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"parent": "Lcom/example/Class;",
"name": "getPath"
}
],
"model": {
"sources": [
{
"kind": "UserControlled",
"port": "Return"
}
]
}
}

Here is an example where the source flows in via an argument:

class MyActivity extends Activity {
public void onNewIntent(Intent intent) {
// intent should be considered a source here.
}
}

The JSON model generator for this method could be:

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"extends": "Landroid/app/Activity",
"name": "onNewIntent"
}
],
"model": {
"sources": [
{
"kind": "UserControlled",
"port": "Argument(1)"
}
]
}
}

Note that the implicit this parameter is considered the argument 0.

Sinks

Sinks describe dangerous or sensitive methods in the code. A sink has a kind that represents the type of operation the method does (e.g, command execution, file system operation, etc). A sink must be attached to a given parameter of the method. A method can have multiple sinks.

Here is an example of a sink:

public static String readFile(String path, String extension, int mode) {
// Return the content of the file path.extension
}

Since path and extension can be used to read arbitrary files, we consider them sinks. We do not consider mode as a sink since we do not care whether the user can control it. The JSON model generator for this method could be:

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"parent": "Lcom/example/Class",
"name": "readFile"
}
],
"model": {
"sinks": [
{
"kind": "FileRead",
"port": "Argument(0)"
},
{
"kind": "FileRead",
"port": "Argument(1)"
}
]
}
}

Return Sinks

Return sinks can be used to describe that a method should not return tainted information. A return sink is just a normal sink with a Return port.

Propagation

Propagations − also called tito (Taint In Taint Out) or passthrough in other tools − describe how the method propagates taint. A propagation as an input (where the taint comes from) and an output (where the taint is moved to).

Here is an example of a propagation:

public static String concat(String x, String y) {
return x + y;
}

The return value of the method can be controlled by both parameters, hence it has the propagations Argument(0) -> Return and Argument(1) -> Return. The JSON model generator for this method could be:

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"parent": "Lcom/example/Class",
"name": "concat"
}
],
"model": {
"propagation": [
{
"input": "Argument(0)",
"output": "Return"
},
{
"input": "Argument(1)",
"output": "Return"
}
]
}
}

Features

Features (also called breadcrumbs) can be used to tag a flow and help filtering issues. A feature describes a property of a flow. A feature can be any arbitrary string.

For instance, the feature via-numerical-operator is used to describe that the data flows through a numerical operator such as an addition.

Features are very useful to filter flows in the SAPP UI. E.g. flows with a cast from string to integer are can sometimes be less important during triaging since controlling an integer is more difficult to exploit than controlling a full string.

Note that features do not stop the flow, they just help triaging.

Attach to Sources

Attach to sources is used to add a set of features on any sources flowing out of a method through a given parameter or return value.

For instance, if we want to add the feature via-signed to all sources flowing out of the given method:

public String getSignedCookie();

We could use the following JSON model generator:

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"parent": "Lcom/example/Class",
"name": "getSignedCookie"
}
],
"model": {
"attach_to_sources": [
{
"features": [
"via-signed"
],
"port": "Return"
}
]
}
}

Note that this is only useful for sources inferred by the analysis. If you know that getSignedCookie returns a source of a given kind, you should use a source instead.

Attach to Sinks

Attach to sinks is used to add a set of features on all sinks on the given parameter of a method.

For instance, if we want to add the feature via-user on all sinks of the given method:

class User {
public static User findUser(String username) {
// The code here might use SQL, Thrift, or anything. We don't need to know.
}
}

We could use the following JSON model generator:

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"parent": "Lcom/example/User",
"name": "findUser"
}
],
"model": {
"attach_to_sinks": [
{
"features": [
"via-user"
],
"port": "Argument(0)"
}
]
}
}

Note that this is only useful for sinks inferred by the analysis. If you know that findUser is a sink of a given kind, you should use a sink instead.

Attach to Propagations

Attach to propagations is used to add a set of features on all propagations from or to a given parameter or return value of a method.

For instance, if we want to add the feature via-concat to the propagations of the given method:

public static String concat(String x, String y);

We could use the following JSON model generator:

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"parent": "Lcom/example/Class",
"name": "concat"
}
],
"model": {
"attach_to_propagations": [
{
"features": [
"via-concat"
],
"port": "Return" // We could also use Argument(0) and Argument(1)
}
]
}
}

Note that this is only useful for propagations inferred by the analysis. If you know that concat has a propagation, you should model it as a propagation directly.

Add Features to Arguments

Add features to arguments is used to add a set of features on all sources that might flow on a given parameter of a method.

Add features to arguments implies Attach to sources, Attach to sinks and Attach to propagations, but it also accounts for possible side effects at call sites.

For instance:

public static void log(String message) {
System.out.println(message);
}
public void buyView() {
String username = getParameter("username");
String product = getParameter("product");
log(username);
buy(username, product);
}

Technically, the log method doesn't have any source, sink or propagation. We can use add features to arguments to add a feature was-logged on the flow from getParameter("username") to buy(username, product). We could use the following JSON model generator for the log method:

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"parent": "Lcom/example/Class",
"name": "log"
}
],
"model": {
"add_features_to_arguments": [
{
"features": [
"was-logged"
],
"port": "Argument(0)"
}
]
}
}

Via-type Features

Via-type features are used to keep track of the type of a callable’s port seen at its callsites during taint flow analysis. They are specified in model generators within the “sources” or “sinks” field of a model with the “via_type_of” field. It is mapped to a nonempty list of ports of the method for which we want to create via-type features.

For example, if we were interested in the specific Activity subclasses with which the method below was called:


public void startActivityForResult(Intent intent, int requestCode);

// At some callsite:
ActivitySubclass activitySubclassInstance;
activitySubclassInstance.startActivityForResult(intent, requestCode);

we could use the following JSON to specifiy a via-type feature that would materialize as via-type:ActivitySubclass:

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"parent": "Lcom/example/Class",
"name": "startActivityForResult"
}
],
"model": {
"sinks": [
{
"port": "Argument(1)",
"kind": "SinkKind",
"via_type_of": [
"Argument(0)"
]
}
]
}
}

Via-value Features

Via-value feature captures the value of the specified callable's port seen at its callsites during taint flow analysis. They are specified similar to Via-type features -- in model generators within the "sources" or "sinks" field of a model with the "via_value_of" field. It is mapped to a nonempty list of ports of the method for which we want to create via-value features.

For example, if we were interested in the specific mode with which the method below was called:

public void log(String mode, String message);

class Constants {
public static final String MODE = "M1";
}

// At some callsite:
log(Constants.MODE, "error message");

we could use the following JSON to specifiy a via-value feature that would materialize as via-value:M1:

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"parent": "Lcom/example/Class",
"name": "log"
}
],
"model": {
"sinks": [
{
"port": "Argument(1)",
"kind": "SinkKind",
"via_value_of": [
"Argument(0)"
]
}
]
}
}

Note that this only works for numeric and string literals. In cases where the argument is not a constant, the feature will appear as via-value:unknown.

Taint Broadening

Taint broadening (also called collapsing) happens when Mariana Trench needs to make an approximation about a taint flow. It is the operation of reducing a taint tree into a single element. A taint tree is a tree where edges are field names and nodes are taint element. This is how Mariana Trench represents internally which fields (or sequence of fields) are tainted.

For instance, analyzing the following code:

MyClass var = new MyClass();
var.a = sourceX();
var.b.c = sourceY();
var.b.d = sourceZ();

The taint tree of variable var would be:

      .
a / \ b
{ X } .
c / \ d
{ Y } { Z }

After collapsing, the tree is reduced to a single node { X, Y, Z }, which is less precise.

In conclusion, taint broadening effectively leads to considering the whole object as tainted while only some specific fields were initially tainted. This might happen for the correctness of the analysis or for performance reasons.

In the following sections, we will discuss when collapsing can happen. In most cases, a feature is automatically added on collapsed taint to help detect false positives.

Propagation Broadening

Taint collapsing is applied when taint is propagated through a method.

For instance:

MyClass input = new MyClass();
input.a = SourceX();
MyClass output = SomeClass.UnknownMethod(input);
Sink(output.b); // Considered an issue since `output` is considered tainted. This could be a False Negative without collapsing.

In that case, the feature via-propagation-broadening will be automatically added on the taint. This can help identify false positives.

If you know that this method preserves the structure of the parameter, you could specify a model and disable collapsing using the collapse attribute within a propagation:

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"parent": "Lcom/example/SomeClass",
"name": "UnknownMethod"
}
],
"model": {
"propagation": [
{
"input": "Argument(0)",
"output": "Return",
"collapse": false
}
]
}
}

Note that Mariana Trench can usually infer when a method propagates taint without collapsing it when it has access to the code of that method and subsequent calls. For instance:

public String identity(String x) {
// Automatically infers a propagation `Arg(0) -> Return` with `collapse=false`
return x;
}
Issue Broadening Feature

The via-issue-broadening feature is added to issues where the taint flowing into the sink was not held directly on the object passed in but on one of its fields. For example:

Class input = new Class();
input.field = source();
sink(input); // `input` is not tainted, but `input.field` is tainted and creates an issue
Widen Broadening Feature

For performance reasons, if a given taint tree becomes very large (either in depth or in number of nodes at a given level), Mariana Trench collapses the tree to a smaller size. In these cases, the via-widen-broadening feature is added to the collapsed taint

Class input = new Class();
if (\* condition *\) {
input.field1 = source();
input.field2 = source();
...
} else {
input.fieldA = source();
input.fieldB = source();
...
}
sink(input); // Too many fields are sources so the whole input object becomes tainted

Sanitizers

Specifying sanitizers on a model allow us to stop taint flowing through that method. In Mariana Trench, they can be one of three types -

  • sources: prevent any taint sources from flowing out of the method
  • sinks: prevent taint from reaching any sinks within the method
  • propagations: prevent propagations from being inferred between any two ports of the method.

These can be specified in model generators as follows -

{
"find": "methods",
"where": ...,
"model": {
"sanitizers": [
{
"sanitize": "sources"
},
{
"sanitize": "sinks"
},
{
"sanitize": "propagations"
}
],
...
}
}

Note, if there are any user-specificed sources, sinks or propagations on the model, sanitizers will not affect them, but it will prevent them from being propagated outward to callsites.

Kind-specific Sanitizers

sources and sinks sanitizers may include a list of kinds (each with or without a partial_label) to restrict the sanitizer to only sanitizing taint of those kinds. (When unspecified, as in the example above, all taint is sanitized regardless of kind).

"sanitizers": [
{
"sanitize": "sinks",
"kinds": [
{
"kind": "SinkKindA"
},
{
"kind": "SinkKindB",
"partial_label": "A"
}
]
}
]

Port-specific Sanitizers

Sanitizers can also specify a specific port (access path root) they sanitize (ignoring all the rest). This field port has a slightly different meaning for each kind of sanitizer -

  • sources: represents the output port through which sources may not leave the method
  • sinks: represents the input port through which taint may not trigger any sinks within the model
  • propagations: represents the input port through which a propagation to any other port may not be inferred

For example if the following method

public void someMethod(Object argument1, Object argument2) {
toSink(argument1);
toSink(argument2);
}

had the following sanitizer in its model,

"sanitizers": [
{
"sanitize": "sinks",
"port": "Argument(1)"
}
]

Then a source flowing into argument1 would be able to cause an issue, but not a source flowing into argument2.

Kind and port specifications may be included in the same sanitizer.

Modes

Modes are used to describe specific behaviors of methods. Available modes are:

  • skip-analysis: skip the analysis of the method;
  • add-via-obscure-feature: add a feature/breadcrumb called via-obscure:<method> to sources flowing through this method;
  • taint-in-taint-out: propagate the taint on arguments to the return value;
  • taint-in-taint-this: propagate the taint on arguments into the this parameter;
  • no-join-virtual-overrides: do not consider all possible overrides when handling a virtual call to this method;
  • no-collapse-on-propagation: do not collapse input paths when applying propagations;
  • alias-memory-location-on-invoke: aliases existing memory location at the callsite instead of creating a new one;
  • strong-write-on-propagation: performs a strong write from input path to the output path on propagation;

Default model

A default model is created for each method, except if it is provided by a model generator. The default model has a set of heuristics:

If the method has no source code, the model is automatically marked with the modes skip-analysis and add-via-obscure-feature.

If the method has more than 40 overrides, it is marked with the mode no-join-virtual-overrides.

Otherwise, the default model is empty (no sources/sinks/propagations).

Field Models

These models represent user-defined taint on class fields (as opposed to methods, as described in all the previous sections on this page). They are specified in a similar way to method models as described below.

NOTE: Field sources should not be applied to fields that are both final and of a primitive type (int, char, float, etc as well as java.lang.String) as the Java compiler optimizes accesses of these fields in the bytecode into accesses of the constant value they hold. In this scenario, Mariana Trench has no way of recognizing that the constant was meant to carry a source.

Example field model generator for sources:

{
"find": "fields",
"where": [
{
"constraint": "name",
"pattern": "SOURCE_EXAMPLE"
}
],
"model": {
"sources" : [
{
"kind": "FieldSource"
}
]
}
}

Example code:

public class TestClass {
// Field that we know to be tainted
public Object SOURCE_EXAMPLE = ...;

void flow() {
sink(EXAMPLE, ...);
}
}

Example field model generator for sinks:

{
"find": "fields",
"where": [
{
"constraint": "name",
"pattern": "SINK_EXAMPLE"
}
],
"model": {
"sinks" : [
{
"kind": "FieldSink"
}
]
}
}

Example code:

public class TestClass {
public Object SINK_EXAMPLE = ...;

void flow() {
SINK_EXAMPLE = source();
}
}

Field signature formats follow the Dalvik bytecode format similar to methods as discussed above. This is of the form <className>.<fieldName>:<fieldType>.

Literal Models

Literal models represent user-defined taints on string literals matching configurable regular expressions. They can only be configured as sources and are intended to identify suspicious patterns, such as user-controlled data being concatenated with a string literal which looks like an SQL query.

NOTE: Each use of a literal in the analysed code which matches a pattern in a literal model will generate a new taint which needs to be explored by Mariana Trench. Using overly broad patterns like .* should thus be avoided, as they can lead to poor performance and high memory usage.

Example literal models:

[
{
"pattern": "SELECT \\*.*",
"description": "Potential SQL Query",
"sources": [
{
"kind": "SqlQuery"
}
]
},
{
"pattern": "AI[0-9A-Z]{16}",
"description": "Suspected Google API Key",
"sources": [
{
"kind": "GoogleAPIKey"
}
]
}
]

Example code:

void testRegexSource() {
String prefix = "SELECT * FROM USERS WHERE id = ";
String aci = getAttackerControlledInput();
String query = prefix + aci; // Sink
}

void testRegexSourceGoogleApiKey() {
String secret = "AIABCD1234EFGH5678";
sink(secret);
}

Model Generators

Mariana Trench allows for dynamic model specifications. This allows a user to specify models of methods before running the analysis. This is used to specify sources, sinks, propagation and modes.

Model generators are specified in a generator configuration file, specified by the --generator-configuration-path parameter. By default, we use default_generator_config.json.

Example

Examples of model generators are located in the configuration/model-generators directory.

Below is an example of a JSON model generator:

{
"model_generators": [
{
"find": "methods",
"where": [{"constraint": "name", "pattern": "toString"}],
"model": {
"propagation": [
{
"input": "Argument(0)",
"output": "Return"
}
]
}
},
{
"find": "methods",
"where": [
{
"constraint": "parent",
"inner": {
"constraint": "extends",
"inner": {
"constraint": "name",
"pattern": "SandcastleCommand"
}
}
},
{"constraint": "name", "pattern": "Time"}
],
"model": {
"sources": [
{
"kind": "Source",
"port": "Return"
}
]
}
},
{
"find": "methods",
"where": [
{
"constraint": "parent",
"inner": {
"constraint": "extends",
"inner": {"constraint": "name", "pattern": "IEntWithPurposePolicy"}
}
},
{"constraint": "name", "pattern": "gen.*"},
{
"constraint": "parameter",
"idx": 0,
"inner": {
"constraint": "type",
"kind": "extends",
"class": "IViewerContext"
}
},
{
"constraint": "return",
"inner": {
"constraint": "extends",
"inner": {"constraint": "name", "pattern": "Ent"}
}
}
],
"model": {
"modes": ["add-via-obscure-feature"],
"sinks": [
{
"kind": "Sink",
"port": "Argument(0)",
"features": ["via-gen"]
}
]
}
}
]
}

Specification

Each JSON file is a JSON object with a key model_generators associated with a list of "rules".

Each "rule" defines a "filter" (which uses "constraints" to specify methods for which a "model" should be generated) and a "model". A rule has the following key/values:

  • find: The type of thing to find. We support methods and fields;

  • where: A list of "constraints". All constraints must be satisfied by a method or field in order to generate a model for it. All the constraints are listed below, grouped by the type of object they are applied to:

    • Method:

      • signature_match: Expects at least one of the two allowed groups of extra properties: [name | names] [parent | parents | extends [include_self]] where:
        • name (a single string) or names (a list of alternative strings): is exact matched to the method name
        • parent (a single string) or parents (a list of alternative strings) is exact matched to the class of the method or extends (either a single string or a list of alternative strings) is exact matched to the base classes or interfaces of the method. extends allows an additional property includes_self which is a boolean to indicate if the constraint is applied to the class itself or not.
      • signature | signature_pattern: Expects an extra property pattern which is a regex to fully match the full signature (class, method, argument types) of a method;
        • NOTE: Usage of this constraint is discouraged as it has poor performance. Try using signature_match instead!
      • parent: Expects an extra property inner [Type] which contains a nested constraint to apply to the class holding the method;
      • parameter: Expects an extra properties idx and inner [Parameter] or [Type], matches when the idx-th parameter of the function or method matches the nested constraint inner;
      • any_parameter: Expects an optional extra property start_idx and inner [Parameter] or [Type], matches when there is any parameters (starting at start_idx) of the function or method matches the nested constraint inner;
      • return: Expects an extra property inner [Type] which contains a nested constraint to apply to the return of the method;
      • is_static | is_constructor | is_native | has_code: Accepts an extra property value which is either true or false. By default, value is considered true;
      • number_parameters: Expects an extra property inner [Integer] which contains a nested constraint to apply to the number of parameters (counting the implicit this parameter);
      • number_overrides: Expects an extra property inner [Integer] which contains a nested constraint to apply on the number of method overrides.
    • Parameter:

      • parameter_has_annotation: Expects an extra property type and an optional property pattern, respectively a string and a regex fully matching the value of the parameter annotation.
    • Type:

      • extends: Expects an extra property inner [Type] which contains a nested constraint that must apply to one of the base classes or itself. The optional property includes_self is a boolean that tells whether the constraint must be applied on the type itself or not;
      • super: Expects an extra property inner [Type] which contains a nested constraint that must apply on the direct superclass;
      • is_class | is_interface: Accepts an extra property value which is either true or false. By default, value is considered true;
    • Field:

      • signature: Expects an extra property pattern which is a regex to fully match the full signature of the field. This is of the form <className>.<fieldName>:<fieldType>;
      • parent: Expects an extra property inner [Type] which contains a nested constraint to apply to the class holding the field;
      • is_static: Accepts an extra property value which is either true or false. By default, value is considered true;
    • Method, Type or Field:

      • name: Expects an extra property pattern which is a regex to fully match the name of the item;
      • has_annotation: Expects an extra property type and an optional property pattern, respectively a string and a regex fully matching the value of the annotation.
      • visibility: Expects an extra property is which is either public, private or protected; (Note this does not apply to Field)
    • Integer:

      • < | <= | == | > | >= | !=: Expects an extra property value which contains an integer that the input integer is compared with. The input is the left hand side.
    • Any (Method, Parameter, Type, Field or Integer):

      • all_of: Expects an extra property inners [Any] which is an array holding nested constraints which must all apply;
      • any_of: Expects an extra property inners [Any] which is an array holding nested constraints where one of them must apply;
      • not: Expects an extra property inner [Any] which contains a nested constraint that should not apply. (Note this is not yet implemented for Fields)
  • model: A model, describing sources/sinks/propagations/etc.

    • For method models

      • sources*: A list of sources, i.e a source flowing out of the method via return value or flowing in via an argument. A source has the following key/values:
        • kind: The source name;
        • port**: The source access path (e.g, "Return" or "Argument(1)");
        • features*: A list of features/breadcrumbs names;
        • via_type_of*: A list of ports;
      • sinks*: A list of sinks, i.e describing that a parameter of the method flows into a sink. A sink has the following key/values:
        • kind: The sink name;
        • port: The sink access path (e.g, "Return" or "Argument(1)");
        • features*: A list of features/breadcrumbs names;
        • via_type_of*: A list of ports;
      • propagation*: A list of propagations (also called passthrough) that describe whether a taint on a parameter should result in a taint on the return value or another parameter. A propagation has the following key/values:
        • input: The input access path (e.g, "Argument(1)");
        • output: The output access path (e.g, "Return" or "Argument(2)");
        • features*: A list of features/breadcrumbs names;
      • attach_to_sources*: A list of attach-to-sources that describe that all sources flowing out of the method on the given parameter or return value must have the given features. An attach-to-source has the following key/values:
        • port: The access path root (e.g, "Return" or "Argument(1)");
        • features: A list of features/breadcrumb names;
      • attach_to_sinks*: A list of attach-to-sinks that describe that all sources flowing in the method on the given parameter must have the given features. An attach-to-sink has the following key/values:
        • port: The access path root (e.g, "Argument(1)");
        • features: A list of features/breadcrumb names;
      • attach_to_propagations*: A list of attach-to-propagations that describe that inferred propagations of sources flowing in or out of a given parameter or return value must have the given features. An attach-to-propagation has the following key/values:
        • port: The access path root (e.g, "Return" or "Argument(1)");
        • features: A list of features/breadcrumb names;
      • add_features_to_parameters*: A list of add-features-to-parameters that describe that flows that might flow on the given parameter must have the given features. An add-features-to-parameter has the following key/values:
        • port: The access path root (e.g, "Argument(1)");
        • features: A list of features/breadcrumb names;
      • modes*: A list of mode names that describe specific behaviors of a method;
      • for_all_parameters: Generate sources/sinks/propagations/attachto* for all parameters of a method that satisfy some constraints. It accepts the following key/values:
        • variable: A symbolic name for the parameter;
        • where: An optional list of [Parameter] or [Type] constraints on the parameter;
        • sources | sinks | propagation: Same as under "model", but we accept the variable name as a parameter number.
    • verbosity*: A logging level, to help debugging. 1 is the most verbose, 5 is the least. The default verbosity level is 5.

    • For Field models

      • sources*: A list of sources the field should hold. A source has the following key/values:
        • kind: The source name;
        • features*: A list of features/breadcrumbs names;
      • sinks*: A list of sinks the field should hold. A sink has the following key/values:
        • kind: The sink name;
        • features*: A list of features/breadcrumds names;

In the above bullets,

  • * denotes optional key/value.
  • ** denotes optional key/value. Default is "Return".

Note, the implicit this parameter for methods has the parameter number 0.

Development

When Sources or Sinks don't appear in Results

  1. This could be because your model generator did not find any method matching your query. You can use the "verbosity": 1 option in your model generator to check if it matched any method. For instance:

    {
    "model_generators": [
    {
    "find": "methods",
    "where": /* ... */,
    "model": {
    /* ... */
    },
    "verbosity": 1
    }
    ]
    }

    When running mariana trench, this should print:

    INFO Method `...` satisfies all constraints in json model generator ...
  2. Make sure that your model generator is actually running. You can use the --verbosity 2 option to check that. Make sure your model generator is specified in configuration/default_generator_config.json.

  3. You can also check the output models. Use grep SourceKind models@* to see if your source or sink kind exists. Use grep 'Lcom/example/<class-name>;.<method-name>:' models@* to see if a given method exists in the app.

- +

Models & Model Generators

The main way to configure the analysis is through defining model generators. Each model generator defines (1) a filter, made up of constraints to specify the methods (or fields) for which a model should be generated, and (2) a model, an abstract representation of how data flows through a method.

Model generators are what define Sink and Source kinds which are the key component of Rules. Model generators can do other things too, like attach features (a.k.a. breadcrumbs) to flows and sanitize (redact) flows which go through certain "data-safe" methods (e.g. a method which hashes a user's password).

Filters are conceptually straightforward. Thus, this page focuses heavily on conceptualizing and providing examples for the various types of models. See the Model Generators section for full implementation documentation for both filters and models.

Models

A model is an abstract representation of how data flows through a method.

A model essentialy consists of:

  • Sources: a set of sources that the method produces or receives on parameters;
  • Sinks: a set of sinks on the method;
  • Propagation: a description of how the method propagates taint coming into it (e.g, the first parameter updates the second, the second parameter updates the return value, etc.);
  • Attach to Sources: a set of features/breadcrumbs to add on an any sources flowing out of the method;
  • Attach to Sinks: a set of features/breadcrumbs to add on sinks of a given parameter;
  • Attach to Propagations: a set of features/breadcrumbs to add on propagations for a given parameter or return value;
  • Add Features to Arguments: a set of features/breadcrumbs to add on any taint that might flow in a given parameter;
  • Sanitizers: specifications of taint flows to stop;
  • Modes: a set of flags describing specific behaviors (see below).

Models can be specified in JSON. For example to mark the string parameter to our Logger.log function as a sink we can specify it as

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"parent": "Lcom/example/Logger;",
"name": "log"
}
],
"model": {
"sinks": [
{
"kind": "Logging",
"port": "Argument(1)"
}
]
}
}

Note that the naming of methods follow the Dalvik's bytecode format.

Method name format

The format used for method names is:

<className>.<methodName>:(<parameterType1><parameterType2>)<returnType>

Example: Landroidx/fragment/app/Fragment;.startActivity:(Landroid/content/Intent;)V

For the parameters and return types use the following table to pick the correct one (please refer to JVM doc for more details)

  • V - void
  • Z - boolean
  • B - byte
  • S - short
  • C - char
  • I - int
  • J - long (64 bits)
  • F - float
  • D - double (64 bits)

Classes take the form Lpackage/name/ClassName; - where the leading L indicates that it is a class type, package/name/ is the package that the class is in. A nested class will take the form Lpackage/name/ClassName$NestedClassName (the $ will need to be double escaped \\$ in json regex).

NOTE: Instance (i.e, non-static) method parameters are indexed starting from 1! The 0th parameter is the this parameter in dalvik byte-code. For static method parameter, indices start from 0.

Access path format

An access path describes the symbolic location of a taint. This is commonly used to indicate where a source or a sink originates from. The "port" field of any model is represented by an access path.

An access path is composed of a root and a path.

The root is either:

  • Return, representing the returned value;
  • Argument(x) (where x is an integer), representing the parameter number x;

The path is a (possibly empty) list of path elements. A path element can be any of the following kinds:

  • field: represents a field name. String encoding is a dot followed by the field name: .field_name;
  • index: represents a user defined index for dictionary like objects. String encoding uses square braces to enclose any user defined index: [index_name];
  • any index: represents any or unresolved indices in dictionary like objects. String encoding is an asterisk enclosed in square braces: [*];
  • index from value of: captures the value of the specified callable's port seen at its callsites during taint flow analysis as an index or any index (if the value cannot be resolved). String encoding uses argument root to specify the callable's port and encloses it in [<...>] to represent that its value is resolved at the callsite to create an index: [<Argument(x)>];

Examples:

  • Argument(1).name corresponds to the field name of the second parameter;
  • Argument(1)[name] corresponds to the index name of the dictionary like second parameter;
  • Argument(1)[*] corresponds to any index of the dictionary like second parameter;
  • Argument(1)[<Argument(2)>] corresponds to an index of the dictionary like second parameter whose value is resolved from the third parameter;
  • Return corresponds to the returned value;
  • Return.x correpsonds to the field x of the returned value;

Kinds

A source has a kind that describes its content (e.g, user input, file system, etc). A sink also has a kind that describes the operation the method performs (e.g, execute a command, read a file, etc.). Kinds can be arbitrary strings (e.g, UserInput). We usually avoid whitespaces.

Sources

Sources describe sources produced or received by a given method. A source can either flow out via the return value or flow via a given parameter. A source has a kind that describes its content (e.g, user input, file system, etc).

Here is an example where the source flows by return value:

public static String getPath() {
return System.getenv().get("PATH");
}

The JSON model generator for this method could be:

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"parent": "Lcom/example/Class;",
"name": "getPath"
}
],
"model": {
"sources": [
{
"kind": "UserControlled",
"port": "Return"
}
]
}
}

Here is an example where the source flows in via an argument:

class MyActivity extends Activity {
public void onNewIntent(Intent intent) {
// intent should be considered a source here.
}
}

The JSON model generator for this method could be:

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"extends": "Landroid/app/Activity",
"name": "onNewIntent"
}
],
"model": {
"sources": [
{
"kind": "UserControlled",
"port": "Argument(1)"
}
]
}
}

Note that the implicit this parameter is considered the argument 0.

Sinks

Sinks describe dangerous or sensitive methods in the code. A sink has a kind that represents the type of operation the method does (e.g, command execution, file system operation, etc). A sink must be attached to a given parameter of the method. A method can have multiple sinks.

Here is an example of a sink:

public static String readFile(String path, String extension, int mode) {
// Return the content of the file path.extension
}

Since path and extension can be used to read arbitrary files, we consider them sinks. We do not consider mode as a sink since we do not care whether the user can control it. The JSON model generator for this method could be:

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"parent": "Lcom/example/Class",
"name": "readFile"
}
],
"model": {
"sinks": [
{
"kind": "FileRead",
"port": "Argument(0)"
},
{
"kind": "FileRead",
"port": "Argument(1)"
}
]
}
}

Return Sinks

Return sinks can be used to describe that a method should not return tainted information. A return sink is just a normal sink with a Return port.

Propagation

Propagations − also called tito (Taint In Taint Out) or passthrough in other tools − describe how the method propagates taint. A propagation as an input (where the taint comes from) and an output (where the taint is moved to).

Here is an example of a propagation:

public static String concat(String x, String y) {
return x + y;
}

The return value of the method can be controlled by both parameters, hence it has the propagations Argument(0) -> Return and Argument(1) -> Return. The JSON model generator for this method could be:

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"parent": "Lcom/example/Class",
"name": "concat"
}
],
"model": {
"propagation": [
{
"input": "Argument(0)",
"output": "Return"
},
{
"input": "Argument(1)",
"output": "Return"
}
]
}
}

Features

Features (also called breadcrumbs) can be used to tag a flow and help filtering issues. A feature describes a property of a flow. A feature can be any arbitrary string.

For instance, the feature via-numerical-operator is used to describe that the data flows through a numerical operator such as an addition.

Features are very useful to filter flows in the SAPP UI. E.g. flows with a cast from string to integer are can sometimes be less important during triaging since controlling an integer is more difficult to exploit than controlling a full string.

Note that features do not stop the flow, they just help triaging.

Attach to Sources

Attach to sources is used to add a set of features on any sources flowing out of a method through a given parameter or return value.

For instance, if we want to add the feature via-signed to all sources flowing out of the given method:

public String getSignedCookie();

We could use the following JSON model generator:

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"parent": "Lcom/example/Class",
"name": "getSignedCookie"
}
],
"model": {
"attach_to_sources": [
{
"features": [
"via-signed"
],
"port": "Return"
}
]
}
}

Note that this is only useful for sources inferred by the analysis. If you know that getSignedCookie returns a source of a given kind, you should use a source instead.

Attach to Sinks

Attach to sinks is used to add a set of features on all sinks on the given parameter of a method.

For instance, if we want to add the feature via-user on all sinks of the given method:

class User {
public static User findUser(String username) {
// The code here might use SQL, Thrift, or anything. We don't need to know.
}
}

We could use the following JSON model generator:

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"parent": "Lcom/example/User",
"name": "findUser"
}
],
"model": {
"attach_to_sinks": [
{
"features": [
"via-user"
],
"port": "Argument(0)"
}
]
}
}

Note that this is only useful for sinks inferred by the analysis. If you know that findUser is a sink of a given kind, you should use a sink instead.

Attach to Propagations

Attach to propagations is used to add a set of features on all propagations from or to a given parameter or return value of a method.

For instance, if we want to add the feature via-concat to the propagations of the given method:

public static String concat(String x, String y);

We could use the following JSON model generator:

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"parent": "Lcom/example/Class",
"name": "concat"
}
],
"model": {
"attach_to_propagations": [
{
"features": [
"via-concat"
],
"port": "Return" // We could also use Argument(0) and Argument(1)
}
]
}
}

Note that this is only useful for propagations inferred by the analysis. If you know that concat has a propagation, you should model it as a propagation directly.

Add Features to Arguments

Add features to arguments is used to add a set of features on all sources that might flow on a given parameter of a method.

Add features to arguments implies Attach to sources, Attach to sinks and Attach to propagations, but it also accounts for possible side effects at call sites.

For instance:

public static void log(String message) {
System.out.println(message);
}
public void buyView() {
String username = getParameter("username");
String product = getParameter("product");
log(username);
buy(username, product);
}

Technically, the log method doesn't have any source, sink or propagation. We can use add features to arguments to add a feature was-logged on the flow from getParameter("username") to buy(username, product). We could use the following JSON model generator for the log method:

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"parent": "Lcom/example/Class",
"name": "log"
}
],
"model": {
"add_features_to_arguments": [
{
"features": [
"was-logged"
],
"port": "Argument(0)"
}
]
}
}

Via-type Features

Via-type features are used to keep track of the type of a callable’s port seen at its callsites during taint flow analysis. They are specified in model generators within the “sources” or “sinks” field of a model with the “via_type_of” field. It is mapped to a nonempty list of ports of the method for which we want to create via-type features.

For example, if we were interested in the specific Activity subclasses with which the method below was called:


public void startActivityForResult(Intent intent, int requestCode);

// At some callsite:
ActivitySubclass activitySubclassInstance;
activitySubclassInstance.startActivityForResult(intent, requestCode);

we could use the following JSON to specifiy a via-type feature that would materialize as via-type:ActivitySubclass:

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"parent": "Lcom/example/Class",
"name": "startActivityForResult"
}
],
"model": {
"sinks": [
{
"port": "Argument(1)",
"kind": "SinkKind",
"via_type_of": [
"Argument(0)"
]
}
]
}
}

Via-value Features

Via-value feature captures the value of the specified callable's port seen at its callsites during taint flow analysis. They are specified similar to Via-type features -- in model generators within the "sources" or "sinks" field of a model with the "via_value_of" field. It is mapped to a nonempty list of ports of the method for which we want to create via-value features.

For example, if we were interested in the specific mode with which the method below was called:

public void log(String mode, String message);

class Constants {
public static final String MODE = "M1";
}

// At some callsite:
log(Constants.MODE, "error message");

we could use the following JSON to specifiy a via-value feature that would materialize as via-value:M1:

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"parent": "Lcom/example/Class",
"name": "log"
}
],
"model": {
"sinks": [
{
"port": "Argument(1)",
"kind": "SinkKind",
"via_value_of": [
"Argument(0)"
]
}
]
}
}

Note that this only works for numeric and string literals. In cases where the argument is not a constant, the feature will appear as via-value:unknown.

Taint Broadening

Taint broadening (also called collapsing) happens when Mariana Trench needs to make an approximation about a taint flow. It is the operation of reducing a taint tree into a single element. A taint tree is a tree where edges are field names and nodes are taint element. This is how Mariana Trench represents internally which fields (or sequence of fields) are tainted.

For instance, analyzing the following code:

MyClass var = new MyClass();
var.a = sourceX();
var.b.c = sourceY();
var.b.d = sourceZ();

The taint tree of variable var would be:

      .
a / \ b
{ X } .
c / \ d
{ Y } { Z }

After collapsing, the tree is reduced to a single node { X, Y, Z }, which is less precise.

In conclusion, taint broadening effectively leads to considering the whole object as tainted while only some specific fields were initially tainted. This might happen for the correctness of the analysis or for performance reasons.

In the following sections, we will discuss when collapsing can happen. In most cases, a feature is automatically added on collapsed taint to help detect false positives.

Propagation Broadening

Taint collapsing is applied when taint is propagated through a method.

For instance:

MyClass input = new MyClass();
input.a = SourceX();
MyClass output = SomeClass.UnknownMethod(input);
Sink(output.b); // Considered an issue since `output` is considered tainted. This could be a False Negative without collapsing.

In that case, the feature via-propagation-broadening will be automatically added on the taint. This can help identify false positives.

If you know that this method preserves the structure of the parameter, you could specify a model and disable collapsing using the collapse attribute within a propagation:

{
"find": "methods",
"where": [
{
"constraint": "signature_match",
"parent": "Lcom/example/SomeClass",
"name": "UnknownMethod"
}
],
"model": {
"propagation": [
{
"input": "Argument(0)",
"output": "Return",
"collapse": false
}
]
}
}

Note that Mariana Trench can usually infer when a method propagates taint without collapsing it when it has access to the code of that method and subsequent calls. For instance:

public String identity(String x) {
// Automatically infers a propagation `Arg(0) -> Return` with `collapse=false`
return x;
}
Issue Broadening Feature

The via-issue-broadening feature is added to issues where the taint flowing into the sink was not held directly on the object passed in but on one of its fields. For example:

Class input = new Class();
input.field = source();
sink(input); // `input` is not tainted, but `input.field` is tainted and creates an issue
Widen Broadening Feature

For performance reasons, if a given taint tree becomes very large (either in depth or in number of nodes at a given level), Mariana Trench collapses the tree to a smaller size. In these cases, the via-widen-broadening feature is added to the collapsed taint

Class input = new Class();
if (\* condition *\) {
input.field1 = source();
input.field2 = source();
...
} else {
input.fieldA = source();
input.fieldB = source();
...
}
sink(input); // Too many fields are sources so the whole input object becomes tainted

Sanitizers

Specifying sanitizers on a model allow us to stop taint flowing through that method. In Mariana Trench, they can be one of three types -

  • sources: prevent any taint sources from flowing out of the method
  • sinks: prevent taint from reaching any sinks within the method
  • propagations: prevent propagations from being inferred between any two ports of the method.

These can be specified in model generators as follows -

{
"find": "methods",
"where": ...,
"model": {
"sanitizers": [
{
"sanitize": "sources"
},
{
"sanitize": "sinks"
},
{
"sanitize": "propagations"
}
],
...
}
}

Note, if there are any user-specificed sources, sinks or propagations on the model, sanitizers will not affect them, but it will prevent them from being propagated outward to callsites.

Kind-specific Sanitizers

sources and sinks sanitizers may include a list of kinds (each with or without a partial_label) to restrict the sanitizer to only sanitizing taint of those kinds. (When unspecified, as in the example above, all taint is sanitized regardless of kind).

"sanitizers": [
{
"sanitize": "sinks",
"kinds": [
{
"kind": "SinkKindA"
},
{
"kind": "SinkKindB",
"partial_label": "A"
}
]
}
]

Port-specific Sanitizers

Sanitizers can also specify a specific port (access path root) they sanitize (ignoring all the rest). This field port has a slightly different meaning for each kind of sanitizer -

  • sources: represents the output port through which sources may not leave the method
  • sinks: represents the input port through which taint may not trigger any sinks within the model
  • propagations: represents the input port through which a propagation to any other port may not be inferred

For example if the following method

public void someMethod(Object argument1, Object argument2) {
toSink(argument1);
toSink(argument2);
}

had the following sanitizer in its model,

"sanitizers": [
{
"sanitize": "sinks",
"port": "Argument(1)"
}
]

Then a source flowing into argument1 would be able to cause an issue, but not a source flowing into argument2.

Kind and port specifications may be included in the same sanitizer.

Modes

Modes are used to describe specific behaviors of methods. Available modes are:

  • skip-analysis: skip the analysis of the method;
  • add-via-obscure-feature: add a feature/breadcrumb called via-obscure:<method> to sources flowing through this method;
  • taint-in-taint-out: propagate the taint on arguments to the return value;
  • taint-in-taint-this: propagate the taint on arguments into the this parameter;
  • no-join-virtual-overrides: do not consider all possible overrides when handling a virtual call to this method;
  • no-collapse-on-propagation: do not collapse input paths when applying propagations;
  • alias-memory-location-on-invoke: aliases existing memory location at the callsite instead of creating a new one;
  • strong-write-on-propagation: performs a strong write from input path to the output path on propagation;

Default model

A default model is created for each method, except if it is provided by a model generator. The default model has a set of heuristics:

If the method has no source code, the model is automatically marked with the modes skip-analysis and add-via-obscure-feature.

If the method has more than 40 overrides, it is marked with the mode no-join-virtual-overrides.

Otherwise, the default model is empty (no sources/sinks/propagations).

Field Models

These models represent user-defined taint on class fields (as opposed to methods, as described in all the previous sections on this page). They are specified in a similar way to method models as described below.

NOTE: Field sources should not be applied to fields that are both final and of a primitive type (int, char, float, etc as well as java.lang.String) as the Java compiler optimizes accesses of these fields in the bytecode into accesses of the constant value they hold. In this scenario, Mariana Trench has no way of recognizing that the constant was meant to carry a source.

Example field model generator for sources:

{
"find": "fields",
"where": [
{
"constraint": "name",
"pattern": "SOURCE_EXAMPLE"
}
],
"model": {
"sources" : [
{
"kind": "FieldSource"
}
]
}
}

Example code:

public class TestClass {
// Field that we know to be tainted
public Object SOURCE_EXAMPLE = ...;

void flow() {
sink(EXAMPLE, ...);
}
}

Example field model generator for sinks:

{
"find": "fields",
"where": [
{
"constraint": "name",
"pattern": "SINK_EXAMPLE"
}
],
"model": {
"sinks" : [
{
"kind": "FieldSink"
}
]
}
}

Example code:

public class TestClass {
public Object SINK_EXAMPLE = ...;

void flow() {
SINK_EXAMPLE = source();
}
}

Field signature formats follow the Dalvik bytecode format similar to methods as discussed above. This is of the form <className>.<fieldName>:<fieldType>.

Literal Models

Literal models represent user-defined taints on string literals matching configurable regular expressions. They can only be configured as sources and are intended to identify suspicious patterns, such as user-controlled data being concatenated with a string literal which looks like an SQL query.

NOTE: Each use of a literal in the analysed code which matches a pattern in a literal model will generate a new taint which needs to be explored by Mariana Trench. Using overly broad patterns like .* should thus be avoided, as they can lead to poor performance and high memory usage.

Example literal models:

[
{
"pattern": "SELECT \\*.*",
"description": "Potential SQL Query",
"sources": [
{
"kind": "SqlQuery"
}
]
},
{
"pattern": "AI[0-9A-Z]{16}",
"description": "Suspected Google API Key",
"sources": [
{
"kind": "GoogleAPIKey"
}
]
}
]

Example code:

void testRegexSource() {
String prefix = "SELECT * FROM USERS WHERE id = ";
String aci = getAttackerControlledInput();
String query = prefix + aci; // Sink
}

void testRegexSourceGoogleApiKey() {
String secret = "AIABCD1234EFGH5678";
sink(secret);
}

Model Generators

Mariana Trench allows for dynamic model specifications. This allows a user to specify models of methods before running the analysis. This is used to specify sources, sinks, propagation and modes.

Model generators are specified in a generator configuration file, specified by the --generator-configuration-path parameter. By default, we use default_generator_config.json.

Example

Examples of model generators are located in the configuration/model-generators directory.

Below is an example of a JSON model generator:

{
"model_generators": [
{
"find": "methods",
"where": [{"constraint": "name", "pattern": "toString"}],
"model": {
"propagation": [
{
"input": "Argument(0)",
"output": "Return"
}
]
}
},
{
"find": "methods",
"where": [
{
"constraint": "parent",
"inner": {
"constraint": "extends",
"inner": {
"constraint": "name",
"pattern": "SandcastleCommand"
}
}
},
{"constraint": "name", "pattern": "Time"}
],
"model": {
"sources": [
{
"kind": "Source",
"port": "Return"
}
]
}
},
{
"find": "methods",
"where": [
{
"constraint": "parent",
"inner": {
"constraint": "extends",
"inner": {"constraint": "name", "pattern": "IEntWithPurposePolicy"}
}
},
{"constraint": "name", "pattern": "gen.*"},
{
"constraint": "parameter",
"idx": 0,
"inner": {
"constraint": "type",
"kind": "extends",
"class": "IViewerContext"
}
},
{
"constraint": "return",
"inner": {
"constraint": "extends",
"inner": {"constraint": "name", "pattern": "Ent"}
}
}
],
"model": {
"modes": ["add-via-obscure-feature"],
"sinks": [
{
"kind": "Sink",
"port": "Argument(0)",
"features": ["via-gen"]
}
]
}
}
]
}

Specification

Each JSON file is a JSON object with a key model_generators associated with a list of "rules".

Each "rule" defines a "filter" (which uses "constraints" to specify methods for which a "model" should be generated) and a "model". A rule has the following key/values:

  • find: The type of thing to find. We support methods and fields;

  • where: A list of "constraints". All constraints must be satisfied by a method or field in order to generate a model for it. All the constraints are listed below, grouped by the type of object they are applied to:

    • Method:

      • signature_match: Expects at least one of the two allowed groups of extra properties: [name | names] [parent | parents | extends [include_self]] where:
        • name (a single string) or names (a list of alternative strings): is exact matched to the method name
        • parent (a single string) or parents (a list of alternative strings) is exact matched to the class of the method or extends (either a single string or a list of alternative strings) is exact matched to the base classes or interfaces of the method. extends allows an optional property include_self which is a boolean to indicate if the constraint is applied to the class itself or not (defaults to true).
      • signature | signature_pattern: Expects an extra property pattern which is a regex to fully match the full signature (class, method, argument types) of a method;
        • NOTE: Usage of this constraint is discouraged as it has poor performance. Try using signature_match instead!
      • parent: Expects an extra property inner [Type] which contains a nested constraint to apply to the class holding the method;
      • parameter: Expects an extra properties idx and inner [Parameter] or [Type], matches when the idx-th parameter of the function or method matches the nested constraint inner;
      • any_parameter: Expects an optional extra property start_idx and inner [Parameter] or [Type], matches when there is any parameters (starting at start_idx) of the function or method matches the nested constraint inner;
      • return: Expects an extra property inner [Type] which contains a nested constraint to apply to the return of the method;
      • is_static | is_constructor | is_native | has_code: Accepts an extra property value which is either true or false. By default, value is considered true;
      • number_parameters: Expects an extra property inner [Integer] which contains a nested constraint to apply to the number of parameters (counting the implicit this parameter);
      • number_overrides: Expects an extra property inner [Integer] which contains a nested constraint to apply on the number of method overrides.
    • Parameter:

      • parameter_has_annotation: Expects an extra property type and an optional property pattern, respectively a string and a regex fully matching the value of the parameter annotation.
    • Type:

      • extends: Expects an extra property inner [Type] which contains a nested constraint that must apply to one of the base classes or itself. The optional property include_self is a boolean that tells whether the constraint must be applied on the type itself or not (defaults to true);
      • super: Expects an extra property inner [Type] which contains a nested constraint that must apply on the direct superclass;
      • is_class | is_interface: Accepts an extra property value which is either true or false. By default, value is considered true;
    • Field:

      • signature: Expects an extra property pattern which is a regex to fully match the full signature of the field. This is of the form <className>.<fieldName>:<fieldType>;
      • parent: Expects an extra property inner [Type] which contains a nested constraint to apply to the class holding the field;
      • is_static: Accepts an extra property value which is either true or false. By default, value is considered true;
    • Method, Type or Field:

      • name: Expects an extra property pattern which is a regex to fully match the name of the item;
      • has_annotation: Expects an extra property type and an optional property pattern, respectively a string and a regex fully matching the value of the annotation.
      • visibility: Expects an extra property is which is either public, private or protected; (Note this does not apply to Field)
    • Integer:

      • < | <= | == | > | >= | !=: Expects an extra property value which contains an integer that the input integer is compared with. The input is the left hand side.
    • Any (Method, Parameter, Type, Field or Integer):

      • all_of: Expects an extra property inners [Any] which is an array holding nested constraints which must all apply;
      • any_of: Expects an extra property inners [Any] which is an array holding nested constraints where one of them must apply;
      • not: Expects an extra property inner [Any] which contains a nested constraint that should not apply. (Note this is not yet implemented for Fields)
  • model: A model, describing sources/sinks/propagations/etc.

    • For method models

      • sources*: A list of sources, i.e a source flowing out of the method via return value or flowing in via an argument. A source has the following key/values:
        • kind: The source name;
        • port**: The source access path (e.g, "Return" or "Argument(1)");
        • features*: A list of features/breadcrumbs names;
        • via_type_of*: A list of ports;
      • sinks*: A list of sinks, i.e describing that a parameter of the method flows into a sink. A sink has the following key/values:
        • kind: The sink name;
        • port: The sink access path (e.g, "Return" or "Argument(1)");
        • features*: A list of features/breadcrumbs names;
        • via_type_of*: A list of ports;
      • propagation*: A list of propagations (also called passthrough) that describe whether a taint on a parameter should result in a taint on the return value or another parameter. A propagation has the following key/values:
        • input: The input access path (e.g, "Argument(1)");
        • output: The output access path (e.g, "Return" or "Argument(2)");
        • features*: A list of features/breadcrumbs names;
      • attach_to_sources*: A list of attach-to-sources that describe that all sources flowing out of the method on the given parameter or return value must have the given features. An attach-to-source has the following key/values:
        • port: The access path root (e.g, "Return" or "Argument(1)");
        • features: A list of features/breadcrumb names;
      • attach_to_sinks*: A list of attach-to-sinks that describe that all sources flowing in the method on the given parameter must have the given features. An attach-to-sink has the following key/values:
        • port: The access path root (e.g, "Argument(1)");
        • features: A list of features/breadcrumb names;
      • attach_to_propagations*: A list of attach-to-propagations that describe that inferred propagations of sources flowing in or out of a given parameter or return value must have the given features. An attach-to-propagation has the following key/values:
        • port: The access path root (e.g, "Return" or "Argument(1)");
        • features: A list of features/breadcrumb names;
      • add_features_to_parameters*: A list of add-features-to-parameters that describe that flows that might flow on the given parameter must have the given features. An add-features-to-parameter has the following key/values:
        • port: The access path root (e.g, "Argument(1)");
        • features: A list of features/breadcrumb names;
      • modes*: A list of mode names that describe specific behaviors of a method;
      • for_all_parameters: Generate sources/sinks/propagations/attachto* for all parameters of a method that satisfy some constraints. It accepts the following key/values:
        • variable: A symbolic name for the parameter;
        • where: An optional list of [Parameter] or [Type] constraints on the parameter;
        • sources | sinks | propagation: Same as under "model", but we accept the variable name as a parameter number.
    • verbosity*: A logging level, to help debugging. 1 is the most verbose, 5 is the least. The default verbosity level is 5.

    • For Field models

      • sources*: A list of sources the field should hold. A source has the following key/values:
        • kind: The source name;
        • features*: A list of features/breadcrumbs names;
      • sinks*: A list of sinks the field should hold. A sink has the following key/values:
        • kind: The sink name;
        • features*: A list of features/breadcrumds names;

In the above bullets,

  • * denotes optional key/value.
  • ** denotes optional key/value. Default is "Return".

Note, the implicit this parameter for methods has the parameter number 0.

Development

When Sources or Sinks don't appear in Results

  1. This could be because your model generator did not find any method matching your query. You can use the "verbosity": 1 option in your model generator to check if it matched any method. For instance:

    {
    "model_generators": [
    {
    "find": "methods",
    "where": /* ... */,
    "model": {
    /* ... */
    },
    "verbosity": 1
    }
    ]
    }

    When running mariana trench, this should print:

    INFO Method `...` satisfies all constraints in json model generator ...
  2. Make sure that your model generator is actually running. You can use the --verbosity 2 option to check that. Make sure your model generator is specified in configuration/default_generator_config.json.

  3. You can also check the output models. Use grep SourceKind models@* to see if your source or sink kind exists. Use grep 'Lcom/example/<class-name>;.<method-name>:' models@* to see if a given method exists in the app.

+ \ No newline at end of file diff --git a/docs/overview/index.html b/docs/overview/index.html index 72a977cf..3be53c2e 100644 --- a/docs/overview/index.html +++ b/docs/overview/index.html @@ -5,14 +5,14 @@ Overview | Mariana Trench - +

Overview

What is Mariana Trench

Mariana Trench is a security focused static analysis platform targeting Android. The tool provides an extensible global taint analysis similar to pre-existing tools like Pysa for Python. The tool leverages existing static analysis infrastructure (e.g, SPARTA) built on top of Redex.

By default Mariana Trench analyzes dalvik bytecode and can work with or without access to the source code.

Background

Sources and Sinks

Under the context of taint analysis [1], "sources" usually mean sensitive data that originates from users. For example, sources can be users' passwords or locations. "Sinks" usually mean functions or methods that use data that "flows" from sources, where the term "flow" is generally defined under the context of "information flow" [2].

An operation, or series of operations, that uses the value of some object, say x, to derive a value for another, say y, causes a flow from x to y

As an example, sinks can be a logging API that writes data into a log file.

What does Mariana Trench do?

A flow from sources to sinks indicate that for example user passwords may get logged into a file, which is not desirable and is called as an "issue" under the context of Mariana Trench. Mariana Trench is designed to automatically discover such issues.

Usage

The usage of Mariana Trench can be summarized in three steps:

  1. Specify customized "sources" and "sinks". (See Customize Sources and Sinks)
  2. Run Mariana Trench on an arbitrary Java repository (with the sources and sinks specified in Step 1), whether it be a repository for an Android application project or for a vanilla (or plain old) Java project.
  3. View the analysis results from a web browser. (For steps 2 and 3 see Getting Started)

References

  1. Tripp, Omer, et al. "TAJ: effective taint analysis of web applications." ACM Sigplan Notices 44.6 (2009): 87-97.
  2. Denning, Dorothy E., and Peter J. Denning. "Certification of programs for secure information flow." Communications of the ACM 20.7 (1977): 504-513.
- + \ No newline at end of file diff --git a/docs/rules/index.html b/docs/rules/index.html index 2a27f56e..3a125119 100644 --- a/docs/rules/index.html +++ b/docs/rules/index.html @@ -5,7 +5,7 @@ Rules | Mariana Trench - + @@ -13,7 +13,7 @@

Rules

A rule describes flows that we want to catch (e.g, user input flowing into command execution). A rule is made of a set of source kinds, a set of sink kinds, a name, a code, and a description.

Here is an example of a rule in JSON:

{
"name": "User input flows into code execution (RCE)",
"code": 1,
"description": "Values from user-controlled source may eventually flow into code execution",
"sources": [
"UserCamera",
"UserInput",
],
"sinks": [
"CodeAsyncJob",
"CodeExecution",
]
}

For guidance on modeling sources and sinks, see the next section, Models and Model Generators.

Rules used by Mariana Trench can be specified with the --rules-paths argument. The default set of rules that run can be found in configuration/rules.json.

- + \ No newline at end of file diff --git a/docs/shims/index.html b/docs/shims/index.html index 30f11424..6c413e6d 100644 --- a/docs/shims/index.html +++ b/docs/shims/index.html @@ -5,7 +5,7 @@ Shims | Mariana Trench - + @@ -35,7 +35,7 @@ shim-target, the trace following the call-site of the shimmed-method will be the shim-target and a feature via-shim:<shimmed-method> will be introduced at that point.

Sample shim definitions here.

- + \ No newline at end of file diff --git a/index.html b/index.html index 9d6a46a8..44705313 100644 --- a/index.html +++ b/index.html @@ -5,14 +5,14 @@ Mariana Trench | Mariana Trench - +
Mariana Trench Logo

Mariana Trench

Security-Focused Static Analysis for Android and Java Applications

Android and Java

Find security vulnerabilities in Android and Java applications. Mariana Trench analyzes Dalvik bytecode.

Fast

Mariana Trench is built to run fast on large codebases (10s of millions of lines of code). It can find vulnerabilities as code changes, before it ever lands in your repository.

Customizable

Adapt Mariana Trench to your needs. Find the vulnerabilities that you care about by teaching Mariana Trench where data comes from and where you do not want it to go.

- + \ No newline at end of file