4_Understanding_Algorithms/index.html

<!DOCTYPE html>
<html lang="en">
<head>
<title>4 Understanding Algorithms</title>
<meta charset="utf-8">
<link rel="stylesheet" href="../tabs.css">
</head><body><h1>4 Understanding Algorithms</h1><a href="../index.html">Back to index</a><br><br><div id="container"><input id="tab-1" type="radio" name="tab-group" checked="checked" />
<label for="tab-1">common</label><input id="tab-2" type="radio" name="tab-group" />
<label for="tab-2">ida</label><input id="tab-3" type="radio" name="tab-group" />
<label for="tab-3">radare2</label><div id="content"><div id="content-1"><body>
<h1 id="understanding-algorithms">Understanding Algorithms</h1>
<p>When analyzing binaries, a very important thing is to know what the code does. For this, you need to load the binary in your preferred analyzation tool and look at the disassembled code. You could also use the integrated debugger, but it should not be needed, since the exercises are not too hard to understand at all.</p>
<h2 id="exercise-1">Exercise 1</h2>
<p>In this exercise, you have to analyze 3 binary files. Try to figure out what the containing function does and for what platform it is for. Also point out the differences.</p>
<p>Hint: they are libraries and the function-header is similar to <code>int func(int x, int y)</code></p>
<p><a href="ex1.zip">download binary (zip)</a></p>
<h2 id="exercise-2">Exercise 2</h2>
<p>Now we have not only a function, but a program to deal with.</p>
<p>The usage of the program is <code>./ex2 string1 string2</code></p>
<p><a href="ex2.zip">download binary (zip)</a></p>
<h2 id="exercise-3">Exercise 3</h2>
<p>This is now a more complex program with some function-calls from libraries and from the program itself.</p>
<p>Hint: renaming items is very helpful</p>
<p><a href="ex3.zip">download binary (zip)</a></p>
</div><div id="content-2"><body>
<h1 id="solutions-for-ida-pro">Solutions for IDA Pro</h1>
<p>When opening a binary in IDA Pro, we get a dialog with some settings. In our examples, we can just load the binary with the standard-setting.</p>
<h2 id="exercise-1">Exercise 1</h2>
<p>Since .so files are dynamic libraries, we check what functions are exported. Some of them are standard ones, so we may ignore it for this exercise.</p>
<h3 id="ex1_1.so">ex1_1.so</h3>
<p>Not supported (64 bit binary)</p>
<h3 id="ex1_2.so">ex1_2.so</h3>
<p>Exported functions:</p>
<p><img src="ex1_2_exports.jpg" alt="ex1_2.so exports" /></p>
<p>Functions in the binary:</p>
<p><img src="ex1_2_functions.jpg" alt="ex1_2.so functions" /></p>
<p>So it seems, that <code>func</code> is the function we are looking for. Let's view what happens in there:</p>
<p><img src="ex1_2_func.jpg" alt="ex1_2.so func" /></p>
<h3 id="ex1_3.so">ex1_3.so</h3>
<p>Exported functions:</p>
<p><img src="ex1_3_exports.jpg" alt="ex1_3.so exports" /></p>
<p><code>func</code> disassembly:</p>
<p><img src="ex1_3_func.jpg" alt="ex1_3.so func" /></p>
<h3 id="ex1_4.so">ex1_4.so</h3>
<p>Not supported (64 bit binary)</p>
<h3 id="interpretation">Interpretation</h3>
<p>If we look at the disassembled code, we see that they follow a similar structure:</p>
<ol>
<li>base- and stack-pointer preparation</li>
<li>load a value into eax</li>
<li>compare eax with another value</li>
<li>conditional move (<code>cmovge</code> - Move if Greater Than or Equal to Zero)</li>
<li>restore base-pointer and return</li>
</ol>
<p>A hint what is going on here are the <a href="https://en.wikipedia.org/wiki/X86_calling_conventions">calling conventions</a> for the different binaries (only the relevant are listed):</p>
<table>
<thead>
<tr class="header">
<th>Architecture</th>
<th>OS</th>
<th>Parameters in registers</th>
<th>Parameter order on stack</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>IA-32</td>
<td>linux</td>
<td></td>
<td>right to left</td>
</tr>
<tr class="even">
<td>IA-32</td>
<td>windows</td>
<td></td>
<td>right to left</td>
</tr>
</tbody>
</table>
<p>The return value is in the <code>eax</code> register.</p>
<p>Now we see that the 2 function-parameters from <code>int func(int, int)</code> are either on the stack or in registers (depending on the architecture/OS). Since the difference in the code is only the copying of these parameters on the stack and the logic afterwards is the same we may assume that the base-logic of these 4 functions is the same.<br />
When we now look at the 2 lines <code>mov eax, ...</code> and <code>cmp dword ...</code>, we see that this is a comparison between the 2 parameters of the function. When the value of the local value is greater then the one already in <code>eax</code>, the instruction <code>cmovge</code> puts this local value on <code>eax</code>, which is also the return value.</p>
<p>So basically, a C-like code of this function would look like:</p>
<pre><code>int func(int a, int b)
{
    if(a &gt; b)
    {
        a = b;
    }
    return a;
}</code></pre>
<p>or shorter:</p>
<pre><code>int func(int a, int b)
{
    return (a &gt; b) ? a : b;
}</code></pre>
<h2 id="exercise-2">Exercise 2</h2>
<p>After loading the binary, we see that there is automatically a graph generated starting at the main-function:</p>
<p><img src="ex2_graph.jpg" alt="ex2 graph" /></p>
<p>But just to be sure, that this is the only interesting part in the binary, we check the functions:</p>
<p><img src="ex2_functions.jpg" alt="ex2 functions" /></p>
<p>Ok, it really seems like that we only have to deal with the main-function here.</p>
<p>Now let's see what happens here, so first we take a look st the starting block. We know, that the program takes 2 strings as input. We also know, that the entry-point of C-programs is <code>int main(int argc, char* argv[])</code>, so by looking at the code, we can assume that the 2 arguments are <code>argc</code> and <code>argv</code>, and therefore we can rename them by right clicking and select 'rename' (or select the argument and press 'n')</p>
<p><img src="ex2_block1.jpg" alt="ex2 block1" /></p>
<p>So at after dealing with stack- and base-pointer, there is a check, if there are exactly 3 parameters, and if not, we set <code>eax</code> to <code>-1</code> and exit the program.</p>
<p><img src="ex2_block2.jpg" alt="ex2 block2" /></p>
<p><img src="ex2_block3.jpg" alt="ex2 block3" /></p>
<p>Now we take a look to the other blocks. There are 3 blocks which set the exit-code to 0, 1 and 2.</p>
<p><img src="ex2_block5.jpg" alt="ex2 block5" /></p>
<p>And we also have some sort of counter (<code>var_4</code>):</p>
<p><img src="ex2_block6.jpg" alt="ex2 block6" /></p>
<p>The interesting part lies in the 4 similar looking blocks:</p>
<p><img src="ex2_block4.jpg" alt="ex2 block4" /></p>
<p>Which is equally to the c-code <code>if (argv[2][counter]==0)</code> or <code>argv[1][counter]==0</code> for some of the other similar blocks).</p>
<p>Using this we can step through the graph to rebuild a C-code:</p>
<pre><code>int main(int argc, char* argv[])
{
    int counter = 0;
    if(argc == 3)
    {
start:
        if (argv[1][counter]==0)
        {
            if (argv[2][counter]==0)
            {
                return 0;
            }
            else
            {
                counter++;
                goto start;
            }
        }
        else
        {
            if (argv[2][counter]==0)
            {
                return 1;
            }
            else
            {
                if (argv[1][counter]==0)
                {
                    return 2;
                }
                else
                {
                    counter++;
                    goto start;
                }
                
            }
        }

    }
    return -1;
}
</code></pre>
<p>after some cleaning-up we get:</p>
<pre><code>int main(int argc, char* argv[])
{
    int counter = 0;
    if(argc == 3)
    {
        while(1)
        {
            if (argv[1][counter]==0)
            {
                if (argv[2][counter]==0)
                {
                    return 0;
                }
            }
            else
            {
                if (argv[2][counter]==0)
                {
                    return 1;
                }
                else
                {
                    if (argv[1][counter]==0)
                    {
                        return 2;
                    }
                }
            }
        }

    }
    return -1;
}
</code></pre>
<p>So this program returns which of the 2 given strings is smaller (or <code>0</code> if they are equally long) and <code>-1</code> when the wrong amount of arguments are provided.</p>
<h2 id="exercise-3">Exercise 3</h2>
<p>Again, first we check the functions of this binary:</p>
<p><img src="ex3_functions.jpg" alt="ex3 functions" /></p>
<p>Well, here we have some new interesting sounding functions:</p>
<ul>
<li><code>swap</code></li>
<li><code>getLetterCount</code></li>
<li><code>getMOLetter</code></li>
</ul>
<p>Next we take a look at the graph:</p>
<p><img src="ex3_graph.jpg" alt="ex3 graph" /></p>
<p>Now we have calls to additional functions, some of them are c-functions (<code>_printf</code> and <code>_strlen</code>) and some are in the same binary (<code>getMOLetter</code> and <code>swap</code>)</p>
<p>The strings of the <code>printf</code> function are giving us very good hints what is going on.<br />
First, we have a help-text to display the correct usage of the program. Then we will display the text 'Reversed: ' followed by a string and then we will display the letter with the most occurrences.</p>
<p>Similar to the second exercise, we have some local variables, which we can try to figure out what their purpose is:</p>
<ul>
<li><code>local_4</code> is set to the result of strlen, so it should be better named length</li>
<li><code>local_3</code> is initialized with 0, gets increased by 1, and gets compared with <code>length/2</code>, so that must be a counter of something like a for-loop</li>
<li><code>local_2</code> is not clear, but since it is at the clean-up part of the program, we can ignore it (seems to be used to save ecx, ebx and ebp to ebp-0x8, ebp-0x4, ebp)</li>
</ul>
<p>We might also notice, that <code>ebx</code> is used quiet often, and is only set once in the beginning to the address of <code>argc</code></p>
<p>Also, we have some code-blocks with</p>
<pre><code>mov eax, dword [ebx + 4]
add eax, 4
mov register, dword [eax]</code></pre>
<p>which is like <code>register = argv[1]</code></p>
<p>So what is the main-function doing now?</p>
<ul>
<li>At the beginning, <code>argc</code> is compared with <code>2</code>, and if it fails, a help-text about the program usage is displayed and the program quits.</li>
<li>Then, we initialize <code>length</code> with <code>strlen(argv[1])</code> and <code>counter</code> with <code>0</code>.</li>
<li>Afterwards we jump to a comparison if <code>counter</code> is smaller than <code>length/2</code>, and if it succeed we call <code>swap</code> with <code>argv[1][counter]</code> and <code>argv[1][length-1-counter]</code>. From the function name, we can guess that the variables just get swapped in place, since we pass the addresses of the variables and not the value.</li>
<li>When the loop is finished, we print <code>argv[1]</code>, which is now reversed.</li>
<li>Next, <code>getMOLetter</code> is called and the result is displayed with the format-text <code>Letter with the most occurrency is: %c</code>. This may give us the conclusion, that <code>getMOLetter</code> is returning the character of the letter with the highest occurrence.</li>
</ul>
<p>So, with this, the task of this exercise is completed. We could also analyze the functions itself, but it is not really necessary, because we already know what is going on. However, when we inspect some malware or binaries with some obfuscation, we would still hate to take a look inside of these content, because we can never be sure that the functions do what the name says.</p>
</div><div id="content-3"><body>
<h1 id="solutions-for-radare2">Solutions for Radare2</h1>
<p>To understand what a binary does, we have to load it into Radare2. In this exercised, we don't need debugging yet, so the command is just a simple <code>r2 binary</code></p>
<p>The first thing to do will always be to look at the properties of the binary using the command</p>
<p><code>i</code> - information</p>
<p>There we can see what platform and architecture this binary is for and also some other information (programming language, security-flags, ...)</p>
<p>Another important command is to analyze the binary. (note: This could also be done with the command line parameter <code>r2 binary -A</code>)</p>
<p><code>aa</code> - analyze all</p>
<h2 id="exercise-1">Exercise 1</h2>
<p>First we look at the properties of the binaries (using the <code>i</code> command):</p>
<table>
<thead>
<tr class="header">
<th>File</th>
<th>Bits</th>
<th>Architecture</th>
<th>OS</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>ex1_1.so</td>
<td>64</td>
<td>AMD x86-64</td>
<td>linux</td>
</tr>
<tr class="even">
<td>ex1_2.so</td>
<td>32</td>
<td>Intel 80386</td>
<td>linux</td>
</tr>
<tr class="odd">
<td>ex1_3.so</td>
<td>32</td>
<td>i386</td>
<td>windows</td>
</tr>
<tr class="even">
<td>ex1_4.so</td>
<td>64</td>
<td>AMD 64</td>
<td>windows</td>
</tr>
</tbody>
</table>
<p>Here we see, that this are Windows and Linux binaries compiled for 32 and 64bit architectures.</p>
<p>The next step is to look at the function <code>func</code>. To achieve this, we need to analyze the code with <code>aa</code> and then look for our function-name in the list given by <strong>a</strong>nalyze <strong>f</strong>unction <strong>l</strong>ist <code>afl</code></p>
<p>Since this list could be rather long, we can use the grep-command to find the function:</p>
<p><code>afl~func</code></p>
<p>To print the code of the function, we use <strong>p</strong>rint <strong>d</strong>isassembled <strong>f</strong>unction with the found function name:</p>
<p><code>pdf @ functionname</code></p>
<h3 id="ex1_1.so">ex1_1.so</h3>
<p><code>sym.func</code> seems to be the right one.</p>
<p>so, <code>pdf @ sym.func</code> gives us:</p>
<pre><code>/ (fcn) sym.func 22
|          ; arg int arg_2_3      @ rbp+0x13
|          ; var int local_0_4    @ rbp-0x4
|          ; var int local_1      @ rbp-0x8
|          ;-- sym.func:
|          0x00000620    55             push rbp
|          0x00000621    4889e5         mov rbp, rsp
|          0x00000624    897dfc         mov dword [rbp-local_0_4], edi
|          0x00000627    8975f8         mov dword [rbp-local_1], esi
|          0x0000062a    8b45fc         mov eax, dword [rbp-local_0_4]
|          0x0000062d    3945f8         cmp dword [rbp-local_1], eax    ; [0x13:4]=256
|          0x00000630    0f4d45f8       cmovge eax, dword [rbp-local_1]
|          0x00000634    5d             pop rbp
\          0x00000635    c3             ret</code></pre>
<h3 id="ex1_2.so">ex1_2.so</h3>
<p>again, <code>sym.func</code> is the function we want:</p>
<pre><code>/ (fcn) sym.func 15
|          ; arg int arg_2        @ ebp+0x8
|          ; arg int arg_3        @ ebp+0xc
|          ; arg int arg_4_3      @ ebp+0x13
|          ;-- sym.func:
|          0x000004b0    55             push ebp
|          0x000004b1    89e5           mov ebp, esp
|          0x000004b3    8b4508         mov eax, dword [ebp + 8]        ; [0x8:4]=0
|          0x000004b6    39450c         cmp dword [ebp + 0xc], eax      ; [0x13:4]=256
|          0x000004b9    0f4d450c       cmovge eax, dword [ebp + 0xc]   ; [0xc:4]=0
|          0x000004bd    5d             pop ebp
\          0x000004be    c3             ret</code></pre>
<h3 id="ex1_3.so">ex1_3.so</h3>
<p>Now, we have a different function name (<code>sym.ex1_3.so_func</code>)</p>
<pre><code>/ (fcn) sym.ex1_3.so_func 15
|          ; arg int arg_2        @ ebp+0x8
|          ; arg int arg_3        @ ebp+0xc
|          ; arg int arg_4_3      @ ebp+0x13
|          ;-- sym.ex1_3.so_func:
|          0x677c14b0    55             push ebp
|          0x677c14b1    89e5           mov ebp, esp
|          0x677c14b3    8b4508         mov eax, dword [ebp + 8]        ; [0x8:4]=4
|          0x677c14b6    39450c         cmp dword [ebp + 0xc], eax      ; [0x13:4]=0
|          0x677c14b9    0f4d450c       cmovge eax, dword [ebp + 0xc]   ; [0xc:4]=0xffff 
|          0x677c14bd    5d             pop ebp
\          0x677c14be    c3             ret</code></pre>
<h3 id="ex1_4.so">ex1_4.so</h3>
<p>Here the name is similar to the one above (<code>sym.ex1_4.so_func</code>)</p>
<pre><code>/ (fcn) sym.ex1_4.so_func 22
|          ; arg int arg_2        @ rbp+0x10
|          ; arg int arg_2_3      @ rbp+0x13
|          ; arg int arg_3        @ rbp+0x18
|          ;-- sym.ex1_4.so_func:
|          0x65dc1420    55             push rbp
|          0x65dc1421    4889e5         mov rbp, rsp
|          0x65dc1424    894d10         mov dword [rbp + 0x10], ecx     ; [0x10:4]=184
|          0x65dc1427    895518         mov dword [rbp + 0x18], edx     ; [0x18:4]=64
|          0x65dc142a    8b4510         mov eax, dword [rbp + 0x10]     ; [0x10:4]=184
|          0x65dc142d    394518         cmp dword [rbp + 0x18], eax     ; [0x13:4]=0
|          0x65dc1430    0f4d4518       cmovge eax, dword [rbp + 0x18]  ; [0x18:4]=64
|          0x65dc1434    5d             pop rbp
\          0x65dc1435    c3             ret
</code></pre>
<h3 id="interpretation">Interpretation</h3>
<p>If we look at the disassembled code, we see that they follow a similar structure:</p>
<ol>
<li>base- and stack-pointer preparation</li>
<li>64-bit only: copy registers into local variables on stack</li>
<li>load a value into eax</li>
<li>compare eax with another value</li>
<li>conditional move (<code>cmovge</code> - Move if Greater Than or Equal to Zero)</li>
<li>restore base-pointer and return</li>
</ol>
<p>A hint what is going on here are the <a href="https://en.wikipedia.org/wiki/X86_calling_conventions">calling conventions</a> for the different binaries (only the relevant are listed):</p>
<table>
<thead>
<tr class="header">
<th>Architecture</th>
<th>OS</th>
<th>Parameters in registers</th>
<th>Parameter order on stack</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>IA-32</td>
<td>linux</td>
<td></td>
<td>right to left</td>
</tr>
<tr class="even">
<td>IA-32</td>
<td>windows</td>
<td></td>
<td>right to left</td>
</tr>
<tr class="odd">
<td>x86-64</td>
<td>linux</td>
<td>RDI, RSI, RDX, RCX, R8, R9, XMM0–7</td>
<td>right to left</td>
</tr>
<tr class="even">
<td>x86-64</td>
<td>windows</td>
<td>RCX/XMM0, RDX/XMM1, R8/XMM2, R9/XMM3</td>
<td>right to left</td>
</tr>
</tbody>
</table>
<p>The return value is in the <code>eax</code> register.</p>
<p>Now we see that the 2 function-parameters from <code>int func(int, int)</code> are either on the stack or in registers (depending on the architecture/OS). Since the difference in the code is only the copying of these parameters on the stack and the logic afterwards is the same we may assume that the base-logic of these 4 functions is the same.<br />
When we now look at the 2 lines <code>mov eax, ...</code> and <code>cmp dword ...</code>, we see that this is a comparison between the 2 parameters of the function. When the value of the local value is greater than the one already in <code>eax</code>, the instruction <code>cmovge</code> puts this local value on <code>eax</code>, which is also the return value.</p>
<p>So basically, a C-like code of this function would look like:</p>
<pre><code>int func(int a, int b)
{
    if(a &gt; b)
    {
        a = b;
    }
    return a;
}</code></pre>
<p>or shorter:</p>
<pre><code>int func(int a, int b)
{
    return (a &gt; b) ? a : b;
}</code></pre>
<h2 id="exercise-2">Exercise 2</h2>
<p>Now we have another program fo inspect. First, we start Radare2 with <code>r2 ex2 -A</code> and look at the properties with <code>i</code>:</p>
<pre><code>type     EXEC (Executable file)
file     ex2
fd       6
size     0x1334
blksz    0x0
mode     r--
block    0x100
format   elf
pic      false
canary   false
nx       true
crypto   false
va       true
bintype  elf
class    ELF32
lang     c
arch     x86
bits     32
machine  Intel 80386
os       linux
subsys   linux
endian   little
stripped false
static   false
linenum  true
lsyms    true
relocs   true
rpath    NONE
binsz    3673</code></pre>
<p>Here we see, that the binary is a 32-bit Linux binary.</p>
<p>For the next step, let's see the functions (<code>afl</code>):</p>
<pre><code>0x080482c0  34  1  entry0
0x080482a0  6  1  sym.imp.__libc_start_main
0x080482a6  10  2  fcn.080482a6
0x08048290  12  1  section..plt
0x0804829c  10  1  sub.__libc_start_main_192_29c
0x080482b0  6  1  section..plt.got
0x080482b8  42  1  section_end..plt.got
0x080482f0  4  1  sym.__x86.get_pc_thunk.bx
0x08048300  43  4  sym.deregister_tm_clones
0x0804832c  57  4  fcn.0804832c
0x08048365  41  3  fcn.08048365
0x08048390  43  8  sym.frame_dummy
0x080483bb  135  11  sym.main
0x08048450  93  4  sym.__libc_csu_init
0x080484ad  5  1  fcn.080484ad
0x080484b2  22  1  section_end..text
0x0804826c  35  3  section..init
0x0804828f  13  1  section_end..init</code></pre>
<p>Also let's see where our entry-point is with <code>iM</code>.</p>
<pre><code>[Main]
vaddr=0x080483bb paddr=0x000003bb</code></pre>
<p>which is <code>sym.main</code></p>
<p>Since this names gives us no additional info what is going on in the binary we take a look at the main-function (<code>pdf @ sym.main</code>):</p>
<pre><code>/ (fcn) sym.main 135
|           ; arg int arg_0_3      @ ebp+0x3
|           ; arg int arg_3        @ ebp+0xc
|           ; var int local_1      @ ebp-0x4
|           ; DATA XREF from 0x080482d7 (entry0)
|           ;-- main:
|           ;-- sym.main:
|           0x080483bb    55             push ebp
|           0x080483bc    89e5           mov ebp, esp
|           0x080483be    83ec10         sub esp, 0x10
|           0x080483c1    c745fc000000.  mov dword [ebp-local_1], 0
|           0x080483c8    837d0803       cmp dword [ebp + 8], 3         ; [0x3:4]=0x1010146 
|       ,=&lt; 0x080483cc    7407           je 0x80483d5                 
|       |   0x080483ce    b8ffffffff     mov eax, sym.imp.__gmon_start__ ; sym.imp.__gmon_start__
|      ,==&lt; 0x080483d3    eb6b           jmp 0x8048440                
|      ||   ; JMP XREF from 0x080483cc (sym.main)
|      ||   ; JMP XREF from 0x0804843e (sym.main)
|      |`-&gt; 0x080483d5    8b450c         mov eax, dword [ebp + 0xc]     ; [0xc:4]=0
|      |    0x080483d8    83c004         add eax, 4
|      |    0x080483db    8b10           mov edx, dword [eax]
|      |    0x080483dd    8b45fc         mov eax, dword [ebp-local_1]
|      |    0x080483e0    01d0           add eax, edx
|      |    0x080483e2    0fb600         movzx eax, byte [eax]
|      |    0x080483e5    84c0           test al, al
|     ,===&lt; 0x080483e7    751b           jne 0x8048404                
|     ||    0x080483e9    8b450c         mov eax, dword [ebp + 0xc]     ; [0xc:4]=0
|     ||    0x080483ec    83c008         add eax, 8
|     ||    0x080483ef    8b10           mov edx, dword [eax]
|     ||    0x080483f1    8b45fc         mov eax, dword [ebp-local_1]
|     ||    0x080483f4    01d0           add eax, edx
|     ||    0x080483f6    0fb600         movzx eax, byte [eax]
|     ||    0x080483f9    84c0           test al, al
|    ,====&lt; 0x080483fb    7507           jne 0x8048404                
|    |||    0x080483fd    b800000000     mov eax, 0
|   ,=====&lt; 0x08048402    eb3c           jmp 0x8048440                
|   |||     ; JMP XREF from 0x080483e7 (sym.main)
|   |||     ; JMP XREF from 0x080483fb (sym.main)
|   |``---&gt; 0x08048404    8b450c         mov eax, dword [ebp + 0xc]     ; [0xc:4]=0
|   |  |    0x08048407    83c008         add eax, 8
|   |  |    0x0804840a    8b10           mov edx, dword [eax]
|   |  |    0x0804840c    8b45fc         mov eax, dword [ebp-local_1]
|   |  |    0x0804840f    01d0           add eax, edx
|   |  |    0x08048411    0fb600         movzx eax, byte [eax]
|   |  |    0x08048414    84c0           test al, al
|  ,======&lt; 0x08048416    7507           jne 0x804841f                
|  ||  |    0x08048418    b801000000     mov eax, 1
| ,=======&lt; 0x0804841d    eb21           jmp 0x8048440                
| ||        ; JMP XREF from 0x08048416 (sym.main)
| |`------&gt; 0x0804841f    8b450c         mov eax, dword [ebp + 0xc]     ; [0xc:4]=0
| | |  |    0x08048422    83c004         add eax, 4
| | |  |    0x08048425    8b10           mov edx, dword [eax]
| | |  |    0x08048427    8b45fc         mov eax, dword [ebp-local_1]
| | |  |    0x0804842a    01d0           add eax, edx
| | |  |    0x0804842c    0fb600         movzx eax, byte [eax]
| | |  |    0x0804842f    84c0           test al, al
| ========&lt; 0x08048431    7507           jne 0x804843a                
| | |  |    0x08048433    b802000000     mov eax, 2
| ========&lt; 0x08048438    eb06           jmp 0x8048440                
|           ; JMP XREF from 0x08048431 (sym.main)
| --------&gt; 0x0804843a    8345fc01       add dword [ebp-local_1], 1
| | |  |    0x0804843e    eb95           jmp 0x80483d5                
| | |  |    ; JMP XREF from 0x08048438 (sym.main)
| | |  |    ; JMP XREF from 0x0804841d (sym.main)
| | |  |    ; JMP XREF from 0x08048402 (sym.main)
| | |  |    ; JMP XREF from 0x080483d3 (sym.main)
| `-`--`--&gt; 0x08048440    c9             leave
\           0x08048441    c3             ret</code></pre>
<p>What we see here is that there are quite a few jumps in the function, but no call to another function.</p>
<p>We can now try to figure out what is going on here, but to make this easier for us we can use a different display mode to view the function.</p>
<p>First, we change into the visual mode with <code>V</code>. Now we can change the display mode to see the disassembly (cycle through the modes with <code>p</code> or <code>P</code>). In this mode, we can scroll through the listing with the cursor keys. We can now either find <code>sym.main</code> by our self or just jump to it with (press <code>v</code> to list the functions and select <code>sym.main</code> with the cursor. Then press <code>g</code> to jump to it)</p>
<p>At the beginning of the function, we see a listing of arguments given to the function and local variables.</p>
<pre><code>|           ; arg int arg_0_3      @ ebp+0x3
|           ; arg int arg_3        @ ebp+0xc
|           ; var int local_1      @ ebp-0x4</code></pre>
<p>When we look at <code>local_1</code>, we see that it is incremented at the bottom before we jump back to the beginning. else it is only used to load the variable into <code>eax</code>. So <code>local_1</code> will most like be a counter and we want to rename it for more comfortable reading.</p>
<p>To achieve this, we press <code>:</code>, so we can get a command line and then enter <code>afvn local_1 counter</code>. Now we can see <code>counter</code> instead of <code>local_1</code> in the listing of the function.</p>
<p>The two arguments are not very helpful here, because they do not occur in the listing. But since we know that the program usage is <code>./ex2 string1 string2</code>, we have 2 arguments (<code>int argc</code> and <code>char *argv[]</code>)</p>
<p>Now we look at the stack-layout of x86 c programs:</p>
<table>
<thead>
<tr class="header">
<th>Address</th>
<th>Content</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>...</td>
<td>...</td>
</tr>
<tr class="even">
<td>ebp + 0xC</td>
<td>argument 2</td>
</tr>
<tr class="odd">
<td>ebp + 0x8</td>
<td>argument 1</td>
</tr>
<tr class="even">
<td>ebp + 0x4</td>
<td>return address</td>
</tr>
<tr class="odd">
<td>ebp</td>
<td>saved ebp</td>
</tr>
<tr class="even">
<td>ebp - 0x4</td>
<td>local var 1</td>
</tr>
<tr class="odd">
<td>ebp - 0x8</td>
<td>local var 2</td>
</tr>
<tr class="even">
<td>...</td>
<td>...</td>
</tr>
</tbody>
</table>
<p><code>int argc</code> would be <code>[ebp + 8]</code> and <code>char *argv[]</code> would be <code>[ebp + C]</code></p>
<p>A good way to get an overview of the function is to enter the graph mode by pressing <code>V</code> in the Visual mode.</p>
<p>The basic usage in this mode is:</p>
<p><code>w</code>/<code>s</code>/<code>a</code>/<code>d</code> navigate (faster with <code>shift</code>)<br />
<code>cursor</code> move node (faster with <code>shift</code>)<br />
<code>t</code>/<code>f</code>/<code>u</code> walk around the nodes</p>
<p>The graph will look like this (sorry, no color here -&gt; just use radare2):</p>
<pre><code>     =-------------------------=
     |    0x080483bb           |
     | push ebp                |
     | mov ebp, esp            |
     | sub esp, 0x10           |
     | mov dword [ebp - 4], 0  |
     | cmp dword [ebp + 8], 3  |
     | je 0x80483d5            |
     =-------------------------=
          t f
          | |
          | &#39;---------.
          |           |
          |    =---------------------------------=
          |    | -[ 0x080483ce ]-                |
          |    | mov eax, sym.imp.__gmon_start__ |
          |    | jmp 0x8048440                   |
          |    =---------------------------------=
          |       v
          |       |
        .---------&#39;
        |-------------------------------------------------------------.
     =----------------=    |       =----------------------------=     |
     |    0x08048440  |    |       |    0x080483e9     |        |     |
     | leave          |    |       | mov eax, dword [ebp + 0xc] |     |
     | ret|           |    |       | add eax, 8        |        |     |
     =----------------=    |       | mov edx, dword [eax]       |     |
          |                |       | mov eax, dword [ebp - 4]   |     |
          |                |       | add eax, edx      |        |     |
          |                |       | movzx eax, byte [eax]      |     |
          |                |       | test al, al       |        |     |
          |                |       | jne 0x8048404     |        |     |
          |                |       =----------------------------=     |
          |                |            t f            |              |
          |                |            | |            |              |
          |         .-------------------&#39; &#39;---------------------------|-----------------------------------------------------------------.
          |    .----|      |             .-------------|------. .-----|-----------------------------------.                             |
          |    =----------------------------=          | =----------------------------=            =----------------=            =----------------=
          |    |    0x08048404           |  |          | |    0x0804841f              |            |    0x08048418  |            |    0x080483fd  |
          |    | mov eax, dword [ebp + 0xc] |          | | mov eax, dword [ebp + 0xc] |            | mov eax, 1     |            | mov eax, 0     |
          |    | add eax, 8|             |  |          | | add eax, 4 |               |            | jmp 0x8048440  |            | jmp 0x8048440  |
          |    | mov edx, dword [eax]    |  |          | | mov edx, dword [eax]       |            =----------------=            =----------------=
          |    | mov eax, dword [ebp - 4]|  |          | | mov eax, dword [ebp - 4]   |               v                             v
          |    | add eax, edx            |  |          `-|-add eax, edx---------------|---------------------------------------------&#39;
          |    | movzx eax, byte [eax]   |  |            | movzx eax, byte [eax]      |
          |    | test al, al             |  |            | test al, al                |
          |    | jne 0x804841f           |  |            | jne 0x804843a              |
          |    =----------------------------=            =----------------------------=
          |    |    t f    |             |                    t f
          |    |    `-`----|----------------------------------|-|
          .----------------|----------------------.-------------&#39;
          |    |           |                      |
     =------------------------=            =----------------=
     |    0x0804843a       |  |            |    0x08048433  |
     | add dword [ebp - 4], 1 |            | mov eax, 2     |
     | jmp 0x80483d5       |  |            | jmp 0x8048440  |
     =------------------------=            =----------------=
        v |    |           |                  v
        | |    |           `------------------&#39;
        | |    |           |
        | |    |           |
     =----------------------------=
     |    0x080483d5       |      |
     | mov eax, dword [ebp + 0xc] |
     | add eax, 4          |      |
     | mov edx, dword [eax]|      |
     | mov eax, dword [ebp - 4]   |
     | add eax, edx        |      |
     | movzx eax, byte [eax]      |
     | test al, al         |      |
     | jne 0x8048404       |      |
     =----------------------------=
          t f  |           |
          `-`--------------&#39;
</code></pre>
<p>With this, we may now try to figure out what this program does.</p>
<p>Let's start with the first node</p>
<pre><code>=-------------------------=
| -[ 0x080483bb ]-        |
| push ebp                |
| mov ebp, esp            |
| sub esp, 0x10           |
| mov dword [ebp - 4], 0  |
| cmp dword [ebp + 8], 3  |
| je 0x80483d5            |
=-------------------------=</code></pre>
<p>Here we first have the usual <code>epb</code>/<code>esp</code> stuff, which is done automatically in C. Then <code>counter</code> is initialized with 0 and <code>argc</code> is compared with 3. When we now follow the jump when <code>argc</code> is not 3 (press <code>f</code>), we see that the program quits.</p>
<p>The code-part <code>mov eax, sym.imp.__gmon_start__</code> may be confusing, but looking at the instruction, we see that <code>b8ffffffff</code> means <code>mov eax, 0xffffffff</code>, which is <code>-1</code>, because we return an integer.</p>
<p>So, this would be identical to the following C-snippet (assuming the counter is a 32-bit integer):</p>
<pre><code>int main(int argc, char* argv[])
{
    int counter = 0;
    if(argc == 3)
    {
        ...
    }
    return;
}
</code></pre>
<p>Now we continue with the next node after <code>argc</code> is true:</p>
<pre><code>=----------------------------=
| -[ 0x080483d5 ]-           |   
| mov eax, dword [ebp + 0xc] |
| add eax, 4                 |
| mov edx, dword [eax]       |
| mov eax, dword [ebp - 4]   |
| add eax, edx               |
| movzx eax, byte [eax]      |
| test al, al                |
| jne 0x8048404              |
=----------------------------=</code></pre>
<p>We might notice, that there are 3 similar blocks like this (different <code>jne</code>-address and/or different addition to <code>eax</code> in the second line)</p>
<p>To make notes while analyzing the block we can make comments in the visual mode with <code>;</code></p>
<pre><code>mov eax, dword [ebp + 0xc]    ; [0xc:4]=0 ; load char* argv[]                                                                                                            
add eax, 4                    ; get the address of argv[1]                                                                                                                                                                                         
mov edx, dword [eax]          ; load argv[1]                                                                                                                              
mov eax, dword [ebp-local_1]  ; load i                                                                                                                                    
add eax, edx                  ; load the address of argv[1][i]                                                                                                            
movzx eax, byte [eax]         ; load the byte of argv[1][i] into eax                                                                                                      
test al, al                   ; check if zero                                                                                                                             
jne 0x8048404                 ;[3]                     </code></pre>
<p>Which is equally to the c-code <code>if (argv[1][i]==0)</code> or <code>argv[2][i]==0</code> for some of the other similar blocks).</p>
<p>Using this we can step through the graph to rebuild a c-code:</p>
<pre><code>int main(int argc, char* argv[])
{
    int counter = 0;
    if(argc == 3)
    {
start:
        if (argv[1][counter]==0)
        {
            if (argv[2][counter]==0)
            {
                return 0;
            }
            else
            {
                counter++;
                goto start;
            }
        }
        else
        {
            if (argv[2][counter]==0)
            {
                return 1;
            }
            else
            {
                if (argv[1][counter]==0)
                {
                    return 2;
                }
                else
                {
                    counter++;
                    goto start;
                }
                
            }
        }

    }
    return -1;
}
</code></pre>
<p>after some cleaning-up we get:</p>
<pre><code>int main(int argc, char* argv[])
{
    int counter = 0;
    if(argc == 3)
    {
        while(1)
        {
            if (argv[1][counter]==0)
            {
                if (argv[2][counter]==0)
                {
                    return 0;
                }
            }
            else
            {
                if (argv[2][counter]==0)
                {
                    return 1;
                }
                else
                {
                    if (argv[1][counter]==0)
                    {
                        return 2;
                    }
                }
            }
        }

    }
    return -1;
}
</code></pre>
<p>So this program returns which of the 2 given strings is smaller (or <code>0</code> if they are equally long) and <code>-1</code> when the wrong amount of arguments is provided.</p>
<h2 id="exercise-3">Exercise 3</h2>
<p><em>Please note: Since stepping through all of the instructions like in example 2, we try now a more abstract approach. Our main-goal is to figure out what is going on in the program, so it is sufficient that we only analyze the parts, which are unclear and need deeper inspection.</em></p>
<p>Again, we load the binary with <code>r2 ex3 -A</code> and look at the properties <code>i</code> and what the name of the main-function is.</p>
<p>It is a 32-bit Linux binary and the name of the main-function is again <code>sym.main</code></p>
<p>The function-listing (<code>afl</code>) is now:</p>
<pre><code>0x08048360  34  1  entry0
0x08048340  6  1  sym.imp.__libc_start_main
0x08048346  10  2  fcn.08048346
0x08048300  12  1  section..plt
0x0804830c  10  1  sub.printf_108_30c
0x08048316  10  1  fcn.08048316
0x08048320  6  1  sym.imp.tolower
0x08048326  10  1  fcn.08048326
0x08048330  6  1  sym.imp.strlen
0x08048336  10  1  fcn.08048336
0x08048350  6  1  section..plt.got
0x08048358  42  1  section_end..plt.got
0x08048390  4  1  sym.__x86.get_pc_thunk.bx
0x080483a0  43  4  sym.deregister_tm_clones
0x080483cc  57  4  fcn.080483cc
0x08048405  41  3  fcn.08048405
0x08048430  43  8  sym.frame_dummy
0x0804845b  38  1  sym.swap
0x08048481  92  6  sym.getLetterCount
0x080484dd  88  6  sym.getMOLetter
0x08048535  235  7  sym.main
0x08048620  93  4  sym.__libc_csu_init
0x0804867d  5  1  fcn.0804867d
0x08048682  22  1  section_end..text
0x080482d0  35  3  section..init
0x080482f3  25  1  section_end..init</code></pre>
<p>Well, here we have some new interesting sounding functions:</p>
<ul>
<li><code>sym.swap</code></li>
<li><code>sym.getLetterCount</code></li>
<li><code>sym.getMOLetter</code></li>
</ul>
<p>Now let's take a look at the main-function. using the commands <code>s sym.main</code> to search the function and <code>pdf</code> to display it:</p>
<pre><code>/ (fcn) sym.main 235
|          ; var int local_2      @ ebp-0x8
|          ; var int local_3      @ ebp-0xc
|          ; var int local_4      @ ebp-0x10
|          ; DATA XREF from 0x08048377 (entry0)
|          ;-- main:
|          ;-- sym.main:
|          0x08048535    8d4c2404       lea ecx, dword [esp + 4]        ; 0x4 
|          0x08048539    83e4f0         and esp, 0xfffffff0
|          0x0804853c    ff71fc         push dword [ecx - 4]
|          0x0804853f    55             push ebp
|          0x08048540    89e5           mov ebp, esp
|          0x08048542    53             push ebx
|          0x08048543    51             push ecx
|          0x08048544    83ec10         sub esp, 0x10
|          0x08048547    89cb           mov ebx, ecx
|          0x08048549    833b02         cmp dword [ebx], 2              ; [0x2:4]=0x101464c 
|      ,=&lt; 0x0804854c    7420           je 0x804856e                  
|      |   0x0804854e    8b4304         mov eax, dword [ebx + 4]        ; [0x4:4]=0x10101 
|      |   0x08048551    8b00           mov eax, dword [eax]
|      |   0x08048553    83ec08         sub esp, 8
|      |   0x08048556    50             push eax
|      |   0x08048557    68a0860408     push str.Program_usage:__s_string ; &quot;Program usage: %s string&quot; @ 0x80486a0
|      |   0x0804855c    e8affdffff     call sym.imp.printf             ; sub.printf_108_30c+0x4 ;sub.printf_108_30c(unk, unk) ; sym.imp.printf
|      |   0x08048561    83c410         add esp, 0x10
|      |   0x08048564    b8ffffffff     mov eax, sym.imp.__gmon_start__ ; sym.imp.__gmon_start__
|     ,==&lt; 0x08048569    e9a8000000     jmp 0x8048616                 
|     ||   ; JMP XREF from 0x0804854c (sym.main)
|     |`-&gt; 0x0804856e    8b4304         mov eax, dword [ebx + 4]        ; [0x4:4]=0x10101 
|     |    0x08048571    83c004         add eax, 4
|     |    0x08048574    8b00           mov eax, dword [eax]
|     |    0x08048576    83ec0c         sub esp, 0xc
|     |    0x08048579    50             push eax
|     |    0x0804857a    e8b1fdffff     call sym.imp.strlen ;sym.imp.strlen(unk)
|     |    0x0804857f    83c410         add esp, 0x10
|     |    0x08048582    8945f0         mov dword [ebp-local_4], eax
|     |    0x08048585    c745f4000000.  mov dword [ebp-local_3], 0
|    ,===&lt; 0x0804858c    eb31           jmp 0x80485bf                 
|          ; JMP XREF from 0x080485ce (sym.main)
|   .----&gt; 0x0804858e    8b4304         mov eax, dword [ebx + 4]        ; [0x4:4]=0x10101 
|   |||    0x08048591    83c004         add eax, 4
|   |||    0x08048594    8b00           mov eax, dword [eax]
|   |||    0x08048596    8b55f0         mov edx, dword [ebp-local_4]
|   |||    0x08048599    83ea01         sub edx, 1
|   |||    0x0804859c    2b55f4         sub edx, dword [ebp-local_3]
|   |||    0x0804859f    01c2           add edx, eax
|   |||    0x080485a1    8b4304         mov eax, dword [ebx + 4]        ; [0x4:4]=0x10101 
|   |||    0x080485a4    83c004         add eax, 4
|   |||    0x080485a7    8b08           mov ecx, dword [eax]
|   |||    0x080485a9    8b45f4         mov eax, dword [ebp-local_3]
|   |||    0x080485ac    01c8           add eax, ecx
|   |||    0x080485ae    83ec08         sub esp, 8
|   |||    0x080485b1    52             push edx
|   |||    0x080485b2    50             push eax
|   |||    0x080485b3    e8a3feffff     call sym.swap ;sym.swap(unk, unk)
|   |||    0x080485b8    83c410         add esp, 0x10
|   |||    0x080485bb    8345f401       add dword [ebp-local_3], 1
|   ||     ; JMP XREF from 0x0804858c (sym.main)
|   |`---&gt; 0x080485bf    8b45f0         mov eax, dword [ebp-local_4]
|   | |    0x080485c2    89c2           mov edx, eax
|   | |    0x080485c4    c1ea1f         shr edx, 0x1f
|   | |    0x080485c7    01d0           add eax, edx
|   | |    0x080485c9    d1f8           sar eax, 1
|   | |    0x080485cb    3b45f4         cmp eax, dword [ebp-local_3]
|   `====&lt; 0x080485ce    7fbe           jg 0x804858e                  
|     |    0x080485d0    8b4304         mov eax, dword [ebx + 4]        ; [0x4:4]=0x10101 
|     |    0x080485d3    83c004         add eax, 4
|     |    0x080485d6    8b00           mov eax, dword [eax]
|     |    0x080485d8    83ec08         sub esp, 8
|     |    0x080485db    50             push eax
|     |    0x080485dc    68b9860408     push str.Reversed:__s_n        ; &quot;Reversed: %s.&quot; @ 0x80486b9
|     |    0x080485e1    e82afdffff     call sym.imp.printf             ; sub.printf_108_30c+0x4 ;sub.printf_108_30c(unk, unk) ; sym.imp.printf
|     |    0x080485e6    83c410         add esp, 0x10
|     |    0x080485e9    8b4304         mov eax, dword [ebx + 4]        ; [0x4:4]=0x10101 
|     |    0x080485ec    83c004         add eax, 4
|     |    0x080485ef    8b00           mov eax, dword [eax]
|     |    0x080485f1    83ec0c         sub esp, 0xc
|     |    0x080485f4    50             push eax
|     |    0x080485f5    e8e3feffff     call sym.getMOLetter ;sym.getMOLetter(unk)
|     |    0x080485fa    83c410         add esp, 0x10
|     |    0x080485fd    0fbec0         movsx eax, al
|     |    0x08048600    83ec08         sub esp, 8
|     |    0x08048603    50             push eax
|     |    0x08048604    68c8860408     push str.Letter_with_the_most_occurrency_is:__c_n ; &quot;Letter with the most occurrency is: %c.&quot; @ 0x80486c8
|     |    0x08048609    e802fdffff     call sym.imp.printf             ; sub.printf_108_30c+0x4 ;sub.printf_108_30c(unk, unk) ; sym.imp.printf
|     |    0x0804860e    83c410         add esp, 0x10
|     |    0x08048611    b800000000     mov eax, 0
|     |    ; JMP XREF from 0x08048569 (sym.main)
|     `--&gt; 0x08048616    8d65f8         lea esp, dword [ebp-local_2]
|          0x08048619    59             pop ecx
|          0x0804861a    5b             pop ebx
|          0x0804861b    5d             pop ebp
|          0x0804861c    8d61fc         lea esp, dword [ecx - 4]
\          0x0804861f    c3             ret</code></pre>
<p>Now we have calls to additional functions, some of them are c-functions (<code>sym.imp.printf</code> and <code>sym.imp.strlen</code>) and some are in the same binary (<code>sym.getMOLetter</code> and <code>sym.swap</code>)</p>
<p>The strings of the <code>printf</code> function are giving us very good hints what is going on.<br />
First, we have a help-text to display the correct usage of the program. Then we will display the text 'Reversed: ' followed by a string and then we will display the letter with the most occurrences.</p>
<p>Similar to the second exercise, we have some local variables, which we can try to figure out what their purpose is:</p>
<ul>
<li><code>local_4</code> is set to the result of strlen (0x08048582), so it should be better named length</li>
<li><code>local_3</code> is initialized with 0 (0x08048585), gets increased by 1 (0x080485bb), and gets compared with <code>length/2</code> (0x080485cb), so that must be a counter of something like a for-loop</li>
<li><code>local_2</code> is not clear, but since it is at the clean-up part of the program, we can ignore it (seems to be used to save ecx, ebx and ebp to ebp-0x8, ebp-0x4, ebp)</li>
</ul>
<p>There variables can be renamed with <code>afvn name newname</code></p>
<p>We might also notice, that <code>ebx</code> is used quiet often, and is only set once in the beginning to the address of <code>argc</code></p>
<p>Also, we have some code-blocks with</p>
<pre><code>mov eax, dword [ebx + 4]        ; [0x4:4]=0x10101 
add eax, 4
mov register, dword [eax]</code></pre>
<p>which is like <code>register = argv[1]</code></p>
<p>So what is the main-function doing now?</p>
<ul>
<li>At the beginning, <code>argc</code> is compared with <code>2</code>, and if it fails, a help-text about the program usage is displayed and the program quits. (0x08048549)</li>
<li>Then, we initialize <code>length</code> with <code>strlen(argv[1])</code> and <code>counter</code> with <code>0</code>. (0x0804856e - 0x08048585)</li>
<li>Afterwards we jump to a comparison if <code>counter</code> is smaller than <code>length/2</code>, and if it succeed we call <code>sym.swap</code> with <code>argv[1][counter]</code> and <code>argv[1][length-1-counter]</code>. From the function name, we can guess that the variables just get swapped in place, since we pass the addresses of the variables and not the value.</li>
<li>When the loop is finished, we print <code>argv[1]</code>, which is now reversed.</li>
<li>Next, <code>sym.getMOLetter</code> is called and the result is displayed with the format-text <code>Letter with the most occurrence is: %c</code>. This may give us the conclusion, that <code>getMOLetter</code> is returning the character of the letter with the highest occurrence.</li>
</ul>
<p>So, with this, the task of this exercise is completed. We could also analyze the functions itself, but it is not really necessary, because we already know what is going on. However, when we inspect some malware or binaries with some obfuscation, we would still hate to take a look inside of these content, because we can never be sure that the functions does what the name says.</p>
</div></div></div></body>