Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add coverage for MSTG-CODE-9 on Android #2089

Merged
merged 20 commits into from
Jul 8, 2022
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
5fe4a89
add coverage for MSTG-CODE-9
cpholguera Mar 22, 2022
7d09335
Merge branch 'master' of https://github.com/OWASP/owasp-mstg into cod…
cpholguera Mar 25, 2022
9defc3f
add more references
cpholguera Mar 29, 2022
a9c2c5a
add java2oat compilation
cpholguera Mar 29, 2022
8c32d8e
add section about the main android compiled binary
cpholguera Mar 29, 2022
efae08d
add general section about obfuscation; update iOS and Android tests f…
cpholguera Mar 30, 2022
32699a6
Update Document/0x05i-Testing-Code-Quality-and-Build-Settings.md
cpholguera Mar 30, 2022
5e0f800
Update Document/0x05i-Testing-Code-Quality-and-Build-Settings.md
cpholguera Mar 30, 2022
26ab8ee
Update Document/0x05i-Testing-Code-Quality-and-Build-Settings.md
cpholguera Mar 30, 2022
a580164
Update Document/0x05i-Testing-Code-Quality-and-Build-Settings.md
cpholguera Mar 30, 2022
3d1d29a
add general section about Binary Protection Mechanisms
cpholguera Mar 31, 2022
4e05093
improve content and consistency of the Android and iOS sections about…
cpholguera Mar 31, 2022
f91af25
improve info about iOS native libraries
cpholguera Mar 31, 2022
0c7e7f3
Merge branch 'master' of https://github.com/OWASP/owasp-mstg into cod…
cpholguera Jun 29, 2022
d22c7f6
Apply suggestions from code review
cpholguera Jul 8, 2022
3faa32d
Update Document/0x04c-Tampering-and-Reverse-Engineering.md
cpholguera Jul 8, 2022
c1b9b86
Merge branch 'master' of https://github.com/OWASP/owasp-mstg into cod…
cpholguera Jul 8, 2022
b56313e
fix linting issues
cpholguera Jul 8, 2022
77744f9
ignore uris in codespell
cpholguera Jul 8, 2022
e25f507
fix spell
cpholguera Jul 8, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 60 additions & 1 deletion Document/0x04c-Tampering-and-Reverse-Engineering.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,10 +69,69 @@ In theory, the mapping between assembly and machine code should be one-to-one, a
- Position independent code (PIC) sequences.
- Hand crafted assembly code.

On a similar vein, decompilation is a very complicated process, involving many deterministic and heuristic based approaches. As a consequence, decompilation is usually not really accurate, but nevertheless very helpful in getting a quick understanding of the function being analyzed. The accuracy of decompilation depends on the amount of information available in the code being decompiled and the sophistication of the decompiler. In addition, many compilation and post-compilation tools introduce additional complexity to the compiled code in order to increase the difficulty of comprehension and/or even decompilation itself. Such code referred to as _obfuscated code_.
On a similar vein, decompilation is a very complicated process, involving many deterministic and heuristic based approaches. As a consequence, decompilation is usually not really accurate, but nevertheless very helpful in getting a quick understanding of the function being analyzed. The accuracy of decompilation depends on the amount of information available in the code being decompiled and the sophistication of the decompiler. In addition, many compilation and post-compilation tools introduce additional complexity to the compiled code in order to increase the difficulty of comprehension and/or even decompilation itself. Such code referred to as [_obfuscated code_](#obfuscation).
cpholguera marked this conversation as resolved.
Show resolved Hide resolved

Over the past decades many tools have perfected the process of disassembly and decompilation, producing output with high fidelity. Advanced usage instructions for any of the available tools can often easily fill a book of their own. The best way to get started is to simply pick up a tool that fits your needs and budget and get a well-reviewed user guide. In this section, we will provide an introduction to some of those tools and in the subsequent "Reverse Engineering and Tampering" Android and iOS chapters we'll focus on the techniques themselves, especially those that are specific to the platform at hand.

### Obfuscation

Obfuscation is the process of transforming code and data to make it more difficult to comprehend (and sometimes even difficult to disassemble). It is usually an integral part of the software protection scheme. Obfuscation isn't something that can be simply turned on or off, programs can be made incomprehensible, in whole or in part, in many ways and to different degrees.

> Note: All presented techniques below may not stop reverse engineers, but combining all of those techniques will make their job significantly harder. The aim of those techniques is to discourage reverse engineers from performing further analysis.
cpholguera marked this conversation as resolved.
Show resolved Hide resolved

The following techniques can be used to obfuscate an application:

- Name obfuscation
- Instruction substitution
- Control flow flattening
- Dead code injection
- String encryption
- Packing

#### Name Obfuscation

The standard compiler generates binary symbols based on class and function names from the source code. Therefore, if no obfuscation was applied, symbol names remain meaningful and can be easily read straight from the app binary. For instance, a function which detects a jailbreak can be located by searching for relevant keywords (e.g. "jailbreak"). The listing below shows the disassembled function `JailbreakDetectionViewController.jailbreakTest4Tapped` from the Damn Vulnerable iOS App (DVIA-v2).
cpholguera marked this conversation as resolved.
Show resolved Hide resolved

```assembly
__T07DVIA_v232JailbreakDetectionViewControllerC20jailbreakTest4TappedyypF:
stp x22, x21, [sp, #-0x30]!
mov rbp, rsp
```

After the obfuscation we can observe that the symbol’s name is no longer meaningful as shown on the listing below.

```assembly
__T07DVIA_v232zNNtWKQptikYUBNBgfFVMjSkvRdhhnbyyFySbyypF:
stp x22, x21, [sp, #-0x30]!
mov rbp, rsp
```

Nevertheless, this only applies to the names of functions, classes and fields. The actual code remains unmodified, so an attacker can still read the disassembled version of the function and try to understand its purpose (e.g. to retrieve the logic of a security algorithm).

#### Instruction Substitution

This technique replaces standard binary operators like addition or subtraction with more complex representations. For example an addition `x = a + b` can be represented as `x = -(-a) - (-b)`. However, using the same replacement representation could be easily reversed, so it is recommended to add multiple substitution techniques for a single case and introduce a random factor. This technique is vulnerable to deobfuscation, but depending on the complexity and depth of the substitutions, applying it can still be time consuming.
cpholguera marked this conversation as resolved.
Show resolved Hide resolved

#### Control Flow Flattening

Control flow flattening replaces original code with a more complex representation. The transformation breaks the body of a function into basic blocks and puts them all inside a single infinite loop with a switch statement that controls the program flow. This makes the program flow significantly harder to follow because it removes the natural conditional constructs that usually make the code easier to read.

![control-flow-flattening](./Images/Chapters/0x06j/control-flow-flattening.png) \

The image shows how control flow flattening alters code (see "[Obfuscating C++ programs via control flow flattening](http://ac.inf.elte.hu/Vol_030_2009/003.pdf)")

#### Dead Code Injection

This technique makes the program's control flow more complex by injecting dead code into the program. Dead code is a stub of code that doesn’t affect the original program’s behaviour but increases the overhead for the reverse engineering process.
cpholguera marked this conversation as resolved.
Show resolved Hide resolved

#### String Encryption

Applications are often compiled with hardcoded keys, licences, tokens and endpoint URLs. By default, all of them are stored in plaintext in the data section of an application’s binary. This technique encrypts these values and injects stubs of code into the program that will decrypt that data before it is used by the program.

#### Packing

[Packing](https://attack.mitre.org/techniques/T1027/002/) is a dynamic rewriting obfuscation technique which compresses or encrypts the original executable into data and dynamically recovers it during execution. Packing an executable changes the file signature in an attempt to avoid signature-based detection.

### Debugging and Tracing

In the traditional sense, debugging is the process of identifying and isolating problems in a program as part of the software development life cycle. The same tools used for debugging are valuable to reverse engineers even when identifying bugs is not the primary goal. Debuggers enable program suspension at any point during runtime, inspection of the process' internal state, and even register and memory modification. These abilities simplify program inspection.
Expand Down
32 changes: 32 additions & 0 deletions Document/0x04h-Testing-Code-Quality.md
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,38 @@ Fuzz testing techniques or scripts (often called "fuzzers") will typically gener

For more information on fuzzing, refer to the [OWASP Fuzzing Guide](https://www.owasp.org/index.php/Fuzzing "OWASP Fuzzing Guide").

## Binary Protection Mechanisms

### Position Independent Code

[PIC (Position Independent Code)](https://en.wikipedia.org/wiki/Position-independent_code) is code that, being placed somewhere in the primary memory, executes properly regardless of its absolute address. PIC is commonly used for shared libraries, so that the same library code can be loaded in a location in each program address space where it does not overlap with other memory in use (for example, other shared libraries).

PIE (Position Independent Executable) are executable binaries made entirely from PIC. PIE binaries are used to enable [ASLR (Address Space Layout Randomization)](https://en.wikipedia.org/wiki/Address_space_layout_randomization) which randomly arranges the address space positions of key data areas of a process, including the base of the executable and the positions of the stack, heap and libraries.

### Memory management

#### Automatic Reference Counting

[ARC (Automatic Reference Counting)](https://en.wikipedia.org/wiki/Automatic_Reference_Counting) is a memory management feature of the Clang compiler exclusive to [Objective-C](https://developer.apple.com/library/content/releasenotes/ObjectiveC/RN-TransitioningToARC/Introduction/Introduction.html) and [Swift](https://docs.swift.org/swift-book/LanguageGuide/AutomaticReferenceCounting.html). ARC automatically frees up the memory used by class instances when those instances are no longer needed. ARC differs from tracing garbage collection in that there is no background process that deallocates the objects asynchronously at runtime.

Unlike tracing garbage collection, ARC does not handle reference cycles automatically. This means that as long as there are "strong" references to an object, it will not be deallocated. Strong cross-references can accordingly create deadlocks and memory leaks. It is up to the developer to break cycles by using weak references. You can learn more about how it differs from Garbage Collection [here](https://fragmentedpodcast.com/episodes/064/).

#### Garbage Collection

[Garbage Collection (GC)](https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)) is an automatic memory management feature of some languages such as Java/Kotlin/Dart. The garbage collector attempts to reclaim memory which was allocated by the program, but is no longer referenced—also called garbage. The Android runtime (ART) makes use of an [improved version of GC](https://source.android.com/devices/tech/dalvik#Improved_GC). You can learn more about how it differs from ARC [here](https://fragmentedpodcast.com/episodes/064/).

#### Manual Memory Management

[Manual memory management](https://en.wikipedia.org/wiki/Manual_memory_management) is typically required in native libraries written in C/C++ where ARC and GC do not apply. The developer is responsible for doing proper memory management. Manual memory management is known to enable several major classes of bugs into a program when used incorrectly, notably violations of [memory safety](https://en.wikipedia.org/wiki/Memory_safety) or [memory leaks](https://en.wikipedia.org/wiki/Memory_leak).

See more information in ["Memory Corruption Bugs (MSTG-CODE-8)"](#memory-corruption-bugs-mstg-code-8).
cpholguera marked this conversation as resolved.
Show resolved Hide resolved

### Stack Smashing Protection

SSP (Stack Smashing Protection aka. [canaries](https://en.wikipedia.org/wiki/Stack_buffer_overflow#Stack_canaries)) helps prevent stack buffer overflow attacks by means of having a small integer right before the return pointer. A buffer overflow attack often overwrites a region of memory in order to overwrite the return pointer and take over the process-control. In that case, the canary gets overwritten as well. Therefore, the value of the canary is always checked to make sure it has not changed before a routine uses the return pointer on the stack.
cpholguera marked this conversation as resolved.
Show resolved Hide resolved

Stack buffer overflow is a type of the more general programming malfunction known as [buffer overflow](https://en.wikipedia.org/wiki/Buffer_overflow) (or buffer overrun). Overfilling a buffer on the stack is more likely to **derail program execution** than overfilling a buffer on the heap because the stack contains the return addresses for all active function calls.
cpholguera marked this conversation as resolved.
Show resolved Hide resolved

## References

### OWASP MASVS
Expand Down
4 changes: 4 additions & 0 deletions Document/0x05a-Platform-Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ The current version of Android executes this bytecode on the Android runtime (AR

In the DVM, bytecode is translated into machine code at execution time, a process known as *just-in-time* (JIT) compilation. This enables the runtime to benefit from the speed of compiled code while maintaining the flexibility of code interpretation. To further improve performance, Android introduced the [Android Runtime (ART)](https://source.android.com/devices/tech/dalvik/configure#how_art_works) to replace the DVM. ART uses a hybrid combination of *ahead-of-time* (AOT), JIT and profile-guided compilation. Apps are recompiled on the device when they are installed, or when the OS undergoes a major update. While the code is being recompiled, device-specific and advanced code optimizations techniques can be applied. The final recompiled code is then used for all subsequent executions. AOT improves performance by a factor of two while reducing power consumption, due to the device-specific optimizations.

![OWASP MSTG](Images/Chapters/0x05a/java2oat.png) \

Source: <https://lief-project.github.io/doc/latest/tutorials/10_android_formats.html>

Android apps don't have direct access to hardware resources, and each app runs in its own virtual machine or sandbox. This enables the OS to have precise control over resources and memory access on the device. For instance, a crashing app doesn't affect other apps running on the same device. Android controls the maximum number of system resources allocated to apps, preventing any one app from monopolizing too many resources. At the same time, this sandbox design can be considered as one of the many principles in Android's global defense-in-depth strategy. A malicious third-party application, with low privileges, shouldn't be able to escape its own runtime and read the memory of a victim application on the same device. In the following section we take a closer look at the different defense layers in the Android operating system.

## Android Security: Defense-in-Depth Approach
Expand Down
20 changes: 20 additions & 0 deletions Document/0x05b-Basic-Security_Testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -511,6 +511,26 @@ As seen above in "[Exploring the App Package](#exploring-the-app-package "Explor

Refer to the section "[Reviewing Decompiled Java Code](0x05c-Reverse-Engineering-and-Tampering.md#reviewing-decompiled-java-code "Reviewing Decompiled Java Code")" in the chapter "[Tampering and Reverse Engineering on Android](0x05c-Reverse-Engineering-and-Tampering.md)" for more information about how to reverse engineer DEX files.

##### Compiled App Binary

In some cases it might be useful to retrieve the compiled app binary (.odex).

First get the path to the app's data directory:

```bash
adb shell pm path com.example.myapplication
package:/data/app/~~DEMFPZh7R4qfUwwwh1czYA==/com.example.myapplication-pOslqiQkJclb_1Vk9-WAXg==/base.apk
```

Remove the `/base.apk` part, add `/oat/arm64/base.odex` and use the resulting path to pull the base.odex from the device:

```bash
adb root
adb pull /data/app/~~DEMFPZh7R4qfUwwwh1czYA==/com.example.myapplication-pOslqiQkJclb_1Vk9-WAXg==/oat/arm64/base.odex
```

Note that you must execute adb with root privileges. If the above doesn't work because the file cannot be found, try exploring yourself using `adb shell` and `cd` to that path until you find the `base.odex`.
cpholguera marked this conversation as resolved.
Show resolved Hide resolved

##### Native Libraries

You can inspect the `lib` folder in the APK:
Expand Down
10 changes: 7 additions & 3 deletions Document/0x05c-Reverse-Engineering-and-Tampering.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,13 @@ Nevertheless, if the code has been purposefully obfuscated (or some tool-breakin

#### Decompiling Java Code

If you don't mind looking at Smali instead of Java, you can simply [open your APK in Android Studio](https://developer.android.com/studio/debug/apk-debugger "Debug pre-built APKs") by clicking **Profile or debug APK** from the Welcome screen (even if you don't intend to debug it you can take a look at the smali code).
**Java Disassembled Code (Smali):**
cpholguera marked this conversation as resolved.
Show resolved Hide resolved

Alternatively you can use [apktool](0x08-Testing-Tools.md#apktool) to extract and disassemble resources directly from the APK archive and disassemble Java bytecode to Smali. apktool allows you to reassemble the package, which is useful for patching and applying changes to e.g. the Android Manifest.
If you want to inspect the app's Smali code (instead of Java), you can simply [open your APK in Android Studio](https://developer.android.com/studio/debug/apk-debugger "Debug pre-built APKs") by clicking **Profile or debug APK** from the "Welcome screen" (even if you don't intend to debug it you can take a look at the smali code).
cpholguera marked this conversation as resolved.
Show resolved Hide resolved

Alternatively you can use [apktool](0x08-Testing-Tools.md#apktool) to extract and disassemble resources directly from the APK archive and disassemble Java bytecode to Smali. apktool allows you to reassemble the package, which is useful for [patching](#patching-repackaging-and-re-signing) the app or applying changes to e.g. the Android Manifest.
cpholguera marked this conversation as resolved.
Show resolved Hide resolved

**Java Decompiled Code:**

If you want to look directly into Java source code on a GUI, simply open your APK using [jadx](0x08-Testing-Tools.md#jadx) or [Bytecode Viewer](0x08-Testing-Tools.md#bytecode-viewer).

Expand Down Expand Up @@ -299,7 +303,7 @@ Remember that in most of the cases, just using static analysis will not be enoug

#### Reviewing Decompiled Java Code

Following the example from "Decompiling Java Code", we assume that you've successfully decompiled and opened the crackme app in IntelliJ. As soon as IntelliJ has indexed the code, you can browse it just like you'd browse any other Java project. Note that many of the decompiled packages, classes, and methods have weird one-letter names; this is because the bytecode has been "minified" with ProGuard at build time. This is a basic type of obfuscation that makes the bytecode a little more difficult to read, but with a fairly simple app like this one, it won't cause you much of a headache. When you're analyzing a more complex app, however, it can get quite annoying.
Following the example from ["Decompiling Java Code"](#decompiling-java-code), we assume that you've successfully decompiled and opened the crackme app in IntelliJ. As soon as IntelliJ has indexed the code, you can browse it just like you'd browse any other Java project. Note that many of the decompiled packages, classes, and methods have weird one-letter names; this is because the bytecode has been "minified" with ProGuard at build time. This is a basic type of [obfuscation](0x04c-Tampering-and-Reverse-Engineering.md#obfuscation) that makes the bytecode a little more difficult to read, but with a fairly simple app like this one, it won't cause you much of a headache. When you're analyzing a more complex app, however, it can get quite annoying.

When analyzing obfuscated code, annotating class names, method names, and other identifiers as you go along is a good practice. Open the `MainActivity` class in the package `sg.vantagepoint.uncrackable1`. The method `verify` is called when you tap the "verify" button. This method passes the user input to a static method called `a.a`, which returns a boolean value. It seems plausible that `a.a` verifies user input, so we'll refactor the code to reflect this.

Expand Down
Loading