forked from aaronbloomfield/pdr
-
Notifications
You must be signed in to change notification settings - Fork 228
/
Copy path12-memory.html
639 lines (560 loc) · 20.4 KB
/
12-memory.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>CS 2150: 12-memory slide set</title>
<meta name="description" content="A set of slides for a course on Program and Data Representation">
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, minimal-ui">
<link rel="stylesheet" href="../slides/reveal.js/dist/reset.css">
<link rel="stylesheet" href="../slides/reveal.js/dist/reveal.css">
<link rel="stylesheet" href="../slides/reveal.js/dist/theme/black.css" id="theme">
<link rel="stylesheet" href="../slides/css/pdr.css">
<!-- Code syntax highlighting -->
<link rel="stylesheet" href="../slides/reveal.js/plugin/highlight/zenburn.css">
<!-- Printing and PDF exports -->
<script>
var link = document.createElement( 'link' );
link.rel = 'stylesheet';
link.type = 'text/css';
link.href = window.location.search.match( /print-pdf/gi ) ? '../slides/reveal.js/css/print/pdf.scss' : '../slides/reveal.js/css/print/paper.scss';
document.getElementsByTagName( 'head' )[0].appendChild( link );
</script>
<!--[if lt IE 9]>
<script src="../slides/reveal.js/lib/js/html5shiv.js"></script>
<![endif]-->
<style>.reveal li { font-size:93%; line-height:120%; }</style>
</head>
<body>
<div class="reveal">
<!-- Any section element inside of this container is displayed as a slide -->
<div class="slides">
<section data-markdown id="cover"><script type="text/template">
# CS 2150
### Program and Data Representation
<p class='titlep'> </p>
<div class="titlesmall"><p>
<a href="http://www.cs.virginia.edu/~mrf8t">Mark Floryan</a> (mrf8t@virginia.edu)<br>
<a href="http://www.cs.virginia.edu/~asb">Aaron Bloomfield</a> (aaron@virginia.edu)<br>
<a href="http://github.com/uva-cs/pdr">@github</a> | <a href="index.html">↑</a> | <a href="./12-memory.html?print-pdf"><img class="print" width="20" src="../slides/images/print-icon.png" style="top:0px;vertical-align:middle"></a>
</p></div>
<p class='titlep'> </p>
## Memory
</script></section>
<section>
<h2>CS 2150 Roadmap</h2>
<table class="wide">
<tr><td colspan="3"><p class="center">Data Representation</p></td><td></td><td colspan="3"><p class="center">Program Representation</p></td></tr>
<tr>
<td class="top"><small> <br> <br>string<br> <br> <br> <br>int x[3]<br> <br> <br> <br>char x<br> <br> <br> <br>0x9cd0f0ad<br> <br> <br> <br>01101011</small></td>
<!-- image adapted from http://openclipart.org/detail/3677/arrow-left-right-by-torfnase -->
<td><img class="noborder" src="images/red-double-arrow.png" height="500" alt="vertical red double arrow"></td>
<td class="top"> <br>Objects<br> <br>Arrays<br> <br>Primitive types<br> <br>Addresses<br> <br>bits</td>
<td> </td>
<td class="top"><small> <br> <br>Java code<br> <br> <br>C++ code<br> <br> <br>C code<br> <br> <br>x86 code<br> <br> <br>IBCM<br> <br> <br>hexadecimal</small></td>
<!-- image adapted from http://openclipart.org/detail/3677/arrow-left-right-by-torfnase -->
<td><img class="noborder" src="images/green-double-arrow.png" height="500" alt="vertical green double arrow"></td>
<td class="top"> <br>High-level language<br> <br>Low-level language<br> <br>Assembly language<br> <br>Machine code</td>
</tr>
</table>
</section>
<section data-markdown><script type="text/template">
# Contents
[Memory Hierarchy](#memorysec)
[String Functions](#strings)
</script></section>
<section>
<section id="memorysec" data-markdown class="center"><script type="text/template">
# Memory Hierarchy
</script></section>
<section data-markdown><script type="text/template">
## Static/Dynamic Allocation
- Static: space required is *known* before program starts (at "compile time")
- Dynamic: space required is *not known* before the program starts
- Can be placed on either the stack or the heap
- But most often on the heap
</script></section>
<section data-markdown><script type="text/template">
## Memory in C
- Stack
- Managed by compiler automatically
- Lifetime is determined by program scope
- Cannot outlive procedure return
- Address space grows "down" from the top
- Heap
- Managed by programmer explicitly
- Lifetime is controlled by programmer
- Lives until freed by program
- Address space grows "up" from the bottom
</script></section>
<section data-markdown><script type="text/template">
## Memory Layout
![memory diagram](images/12-memory/memory-diagram.svg)
</script></section>
<section>
<h2>Memory Allocators</h2>
<table>
<tr style="background-color:transparent"><td colspan="2" style="border-bottom:0em"></td><th colspan="2">Lifetime</th></tr>
<tr style="background-color:transparent"><td colspan="2"></td><th style="border-right:1px solid">Scoped</th><th>Unlimited</th></tr>
<tr><th style="border-bottom:0em;border-right:1px solid"> <br>Size</th><th>Known</th><td style="border-right:1px solid">local variable declarations</td><td>global, static variable declarations</td></tr>
<tr><th style="border-right:1px solid"></th><th>Unknown</th><td style="border-right:1px solid"><code>alloca()</code></td><td><code>new</code> & <code>malloc()</code></td></tr>
</table>
<p> </p>
<p><code>alloca()</code>: like <code>malloc()</code>, but on the stack, not the heap (rarely used)</p>
</section>
<section>
<h2>x86 callee epilogue</h2>
<table class="transparent">
<tr><td class="top"><pre><code>; subroutine epilogue
; recover saved registers,
; reverse of push order
pop esi
pop edi
; deallocate local var(s)
mov esp, ebp
; restore base pointer
pop ebp
; return
ret </code></pre></td><td style="width:50px"></td>
<td><img src="images/08-x86/x86-activation-record.svg" alt="activation record"></td></tr></table>
<p> </p>
<p>How would this change with <code>alloca()</code>?</p>
</section>
<section data-markdown><script type="text/template">
## malloc
```
void *malloc (size_t size)
```
- Returns an untyped pointer (can point to anything)
- Returns an address that is the location of at least size bytes of previously unused memory, and reserves that space.
- Returns `NULL` if there isn't enough space.
- Parameter is the size in bytes
</script></section>
<section data-markdown><script type="text/template">
## malloc Example
```
char *s = (char *) malloc (sizeof(*s) * n)
```
- `(char *s)`
- Type cast: malloc only returns `void *`
- Cast tells compiler that program will use it as a `char *`
- `sizeof(*s)`
- sizeof operator: Takes a type or expression, evaluates to the number of bytes to store it
</script></section>
<section data-markdown><script type="text/template">
## How is the heap managed?
- Compiled into the program's code is a heap directory management routine
- (it's in the libc library, really)
- This adds extra time to a new or `malloc()`
- It allocates memory in two ways:
- Fixed-size-blocks allocation
- A free list of available blocks is kept, and the appropriate number are allocated
- Buddy blocks
- Blocks are divided into powers of 2, so the memory takes up the next highest power of 2 bytes
</script></section>
<section data-markdown><script type="text/template">
## What happens on dynamic memory allocation
- A subroutine is invoked (for either `new` or `malloc()`)
- The OS is consulted, if necessary, to allocate a *page* of memory
- This requires switching context back to the OS
- The heap directory is examined
- And new space is determined somehow
- The subroutine returns
How expensive would this be?
</script></section>
<section data-markdown><script type="text/template">
## Why Do We Care About Memory?
- Memory hierarchy, locality
- Computer architecture
- What does this have to do with this course?
- Course: Program and data representation
- Does the memory hierarchy affect how we think about efficiency?
- What about data in memory?
</script></section>
<section>
<h2>Memory Hierarchy, part 1</h2>
<table class="transparent"><tr><td class="top">
<img src="images/07-ibcm/memory-hierarchy.png" alt="memory hierarchy">
</td><td>
<ul>
<li>CPU registers<ul>
<li>1 access per CPU cycle</li>
<li>3*10<sup>9</sup> accesses per sec</li>
<li>1 Kb total storage</li></ul></li>
<li>Cache<ul>
<li>SDRAM: 10 ns</li>
<li>10<sup>8</sup> accesses per sec</li>
<li>Multiple levels possible</li>
<li>Higher levels are bigger and slower</li>
<li>1 Mb total storage</li></ul></li>
</ul>
</td>
</tr></table>
</section>
<section>
<h2>Memory Hierarchy, part 2</h2>
<table class="transparent"><tr><td class="top">
<img src="images/07-ibcm/memory-hierarchy.png" alt="memory hierarchy">
</td><td class="top">
<ul>
<li>Main memory<ul>
<li>DRAM: 60 ns</li>
<li>2*10<sup>7</sup> accesses per sec</li>
<li>Limited by bus speeds</li>
<li>1 Gb total storage</li></ul></li>
<li>Disk<ul>
<li>HDD speeds: 5 ms</li>
<li>200 accesses per sec</li>
<li>1 Tb total storage</li></ul></li>
</ul>
</td></tr></table>
</section>
<section data-markdown><script type="text/template">
## Definitions
- Processor Cycle
- (for our purposes) the time it takes to execute a "simple" instruction
- E.g., add eax, ebx
- Memory access time (Latency)
- Time it takes to access memory
- Memory cycle time
- Time to write to memory
</script></section>
<section data-markdown><script type="text/template">
## Some (older) numbers
- Typical numbers for 1 GHz processor
- Cycle time 1 ns (10<sup>-9</sup>)
- L1 cache: accessed in 2-3 ns
- L2 cache: 20-50 ns
- Main memory: 60-100 ns
- Disk: 5-12 ms (5 x 10<sup>6</sup> ns to 12 x 10<sup>6</sup> ns)
1 ns = 10<sup>-9</sup> s; 1 ms = 10<sup>-3</sup> s
</script></section>
<section data-markdown><script type="text/template">
## More Example Numbers: Memory Capacities
- Page size: 1 Kb
- L1 cache: 64 Kb
- L2 cache: 0.5 Mb
- L3 cache: 6 Mb
- Main memory: 4 Gb
- Disk: 1 Tb
</script></section>
<section data-markdown><script type="text/template">
## Caches
- Content at each level is a *subset* of the level below
- Cache Hit: address requested is in cache
- Cache Miss: address requested is NOT in cache
- Cache page size (chunk size, line size): the number of contiguous bytes that are moved into the cache at one time
</script></section>
<section data-markdown><script type="text/template">
## So What?
- Program speed is affected by where the data is
- Data in main memory much slower to access than data in cache
- Goal: attempt to reduce the number of access to slower levels
- How?
</script></section>
<section data-markdown><script type="text/template">
## Locality
- Temporal Locality
- Locality in time
- If an item is referenced, it will tend to be referenced again soon
- Spatial Locality
- Locality in space
- If an item is referenced, items whose addresses are close by will tend to be referenced soon
</script></section>
<section data-markdown><script type="text/template">
## A Locality Example
```
int sum = 0;
for ( int i = 0; i < n; i++ ) {
sum += a[i];
}
```
</script></section>
<section data-markdown><script type="text/template">
## Example code
Source code: [cache.cpp](code/12-memory/cache.cpp.html) ([src](code/12-memory/cache.cpp))
```
cout << "page size: " << getpagesize() << endl;
int array[1024][1024];
for ( int i = 0; i < 1024; i++ )
for ( int j = 0; j < 1024; j++ )
array[i][j] = 0;
for ( int c = 0; c < 1024; c++ )
for ( int i = 0; i < 1024; i++ )
for ( int j = 0; j < 1024; j++ )
array[i][j]++;
int sum = 0;
for ( int i = 0; i < 1024; i++ )
for ( int j = 0; j < 1024; j++ )
sum += array[i][j];
cout << sum << endl;
```
What if we swap each of the three `i` and `j` loops?
</script></section>
<section data-markdown><script type="text/template">
## Execution of cache.cpp
- On an older Linux machine (3 Ghz Intel)
- Page size is 4Kb (4096 bytes)
- With `i` as the outer and `j` as the inner: 2.7 seconds
- With `j` as the outer and `i` as the inner: 124.9 seconds
- A factor of almost 50!
- On a more modern Linux machine (3.4 Ghz AMD)
- Page size is also 4Kb (4096 bytes)
- With `i` as the outer and `j` as the inner: 0.696 seconds
- With `j` as the outer and `i` as the inner: 31.4 seconds
- Also a factor of almost 50!
</script></section>
<section data-markdown><script type="text/template">
## Trends (Example)
Over a 20 year period:
- CPU speed: 600x speed
- SRAM (cache memory):
- 200x capacity
- 100x latency
- DRAM (main memory):
- 8,000x capacity
- 6x latency
- Disk:
- 50,000x capacity
- 10x latency
See any implications?
</script></section>
<section data-markdown><script type="text/template">
## Who is Working on This Problem?
- Architects
- Compiler writers
- Programmers
- Operating system
</script></section>
<section data-markdown><script type="text/template">
## Locality and Data Structures
- Which has better spatial locality?
- Between arrays or linked lists?
- How does this change the big-theta analysis of the data structures?
</script></section>
<section data-markdown><script type="text/template">
## Some actual numbers
![memtest shot](images/12-memory/amd-bios-memtest.png)
</script></section>
</section>
<section>
<section id="strings" data-markdown class="center"><script type="text/template">
# String Functions
</script></section>
<section data-markdown><script type="text/template">
## C-strings
- A C-style string is just a pointer to a `char`
- Consider:
```
char *str = "hello world";
```
- This produces:
![c string](graphs/c-string-1.svg)
- Each byte is the ASCII encoding of the character
- The last byte is binary zero
</script></section>
<section data-markdown><script type="text/template">
## C-strings
How we view it:
![c string](graphs/c-string-1.svg)
How the computer views it:
![c string](graphs/c-string-2.svg)
</script></section>
<section data-markdown><script type="text/template">
## C-string functions
- `int strlen(char *s)`: returns number of chars in `s`
- `char *strcpy(char *s1, const char *s2)`: copies `s2` to `s1`
- `char *strcat(char *s1, const char * s2)`: appends `s2` to `s1`
</script></section>
<section data-markdown><script type="text/template">
## String Concatenation
Source code: [strings.c](code/12-memory/strings.c.html) ([src](code/12-memory/strings.c))
```
int main (int argc, char **argv) {
int i = 0;
// allocate a space in memory for result
char *result = (char *) malloc (sizeof (*result));
*result = '\0';
while (i < argc) { // while there are still args
char *s = (char *) malloc (sizeof (*s) *
(strlen(result) + strlen(argv[i]) + 1));
strcpy (s, result);
strcat (s, argv[i]);
result = s;
i++;
}
printf ("Concatenation: %s\n", result);
return 0;
}
```
Do you see any memory leak problems?
</script></section>
<section data-markdown><script type="text/template">
## Concatenating Strings
- Allocate space for result and next argument:
```
char *s = (char *)
malloc (sizeof (*s) *
(strlen(result) + strlen(argv[i]) + 1));
```
- Why +1?
- Copy result to s:
```
strcpy (s, result);
```
- Concatenate next argument:
```
strcat (s, argv[i]);
```
</script></section>
<section data-markdown><script type="text/template">
## Reclaiming Storage
- Storage allocated by malloc is reserved forever
- Give it back by passing it to free
```
void free(void *ptr);
```
</script></section>
<section data-markdown><script type="text/template">
## Memory Leaks
- Program fails to release memory when no longer needed
- Consequences/symptoms
- Reduces amount of available memory, run out of memory available for allocation
- Virtual memory. Effect: once RAM has run out, increasing use of hard disk
- Usually, the performance of the application and/or system will have become so slow that they will be considered to have failed
</script></section>
<section data-markdown><script type="text/template">
## Plugging Memory Leaks
```
while (i < argc) { // while there are still arguments
char *s = (char *) malloc (sizeof (*s) *
(strlen(result) + strlen(argv[i]) + 1));
strcpy (s, result);
// add "free(result);" here...
strcat (s, argv[i]);
// ... or here
result = s;
i++;
}
```
</script></section>
<section data-markdown><script type="text/template">
## Memory Leak
- There is no reference to allocated storage
- It can never be reached
- It can never be reclaimed
- Losing references
- Variable goes out of scope
- Variable reassigned
</script></section>
<section data-markdown><script type="text/template">
## When not to free...
```
while (i < argc) { // while there are still arguments
char *s = (char *) malloc (sizeof (*s) *
(strlen(result) + strlen(argv[i]) + 1));
strcpy (s, result);
// add "free(result)" here ...
strcat (s, argv[i]);
// or here
result = s;
i++;
}
```
- Upon close fo the `while` block, the scope of `s` is closed; should we
`free(s)` first?
- No! `result` now references same storage
- After the `result = s` line, there is no way to reach storage `result` previously pointed to
</script></section>
<section data-markdown><script type="text/template">
## realloc
```
void *realloc(void *ptr, size_t size);
```
- The `realloc()` function changes the size of the memory object pointed to by `ptr` to the size specified by `size`
- But it's risky...
- It changes the address of the pointer
- So other pointers pointing there are now pointing to invalid memory
</script></section>
<section data-markdown><script type="text/template">
## Running time of realloc
- Time to reserve new space: Θ(1)
- Time to copy old data into new space:
- Θ(*n*) where *n* is the size of the old data
</script></section>
<section data-markdown><script type="text/template">
## Efficiency and Order Classes
- Recall Big-Oh, Big-Theta, ...
- When are the differences meaningful?
- Inputs get large
- Difference in order class
- High-level comparison tool
- Not as useful when:
- 2 solutions are in the same order class
- Fine-tuning of an algorithm is important
</script></section>
<section data-markdown><script type="text/template">
## Efficiency and Order Classes
- Order classes based on counting statements
- Is this useful for fine-grained algorithm analysis?
- One assumption
- All operations take the same amount of time
- Is this really true?
</script></section>
<section data-markdown><script type="text/template">
## Factorial: Recursive
```
int factorial_recursive(int x) {
if (x <= 1)
return 1;
else
return x * factorial_recursive(x-1);
}
```
- Note that we have to find the recursive value, and make a modification to it (multiply it by x) before we return it
</script></section>
<section data-markdown><script type="text/template">
## Factorial: Tail Recursion
```
int factorial_tail_recursive(int x, int y) {
if (x <= 1)
return y;
else
return factorial_tail_recursive(x-1, x*y);
}
```
- Note that the answer to a given invocation is exactly the answer to the recursive call
- How could we modify the calling convention for this?
</script></section>
<section data-markdown><script type="text/template">
## Activation Records
![activation record](images/08-x86/x86-activation-record.svg)
</script></section>
<section data-markdown><script type="text/template">
## Factorial: Loop Implementation
```
int factorial_loop(int x) {
int y = 1;
while (x > 1) {
y *= x;
x--;
}
return y;
}
```
</script></section>
</section>
</div>
</div>
<script src='../slides/reveal.js/dist/reveal.js'></script><script src='../slides/reveal.js/plugin/zoom/zoom.js'></script><script src='../slides/reveal.js/plugin/notes/notes.js'></script><script src='../slides/reveal.js/plugin/search/search.js'></script><script src='../slides/reveal.js/plugin/markdown/markdown.js'></script><script src='../slides/reveal.js/plugin/highlight/highlight.js'></script><script src='../slides/reveal.js/plugin/math/math.js'></script>
<script src="js/settings.js"></script>
</body>
</html>