-
Notifications
You must be signed in to change notification settings - Fork 13
/
Copy pathindex.html
669 lines (535 loc) · 43.3 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<!--meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1"-->
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
<title>CSC2626, UofT</title>
<!-- Bootstrap -->
<link href="./css/bootstrap.min.css" rel="stylesheet" media="screen">
<link href="./css/bootstrap-theme.min.css" rel="stylesheet" media="screen">
<link href="./css/main.css" rel="stylesheet" media="screen">
<link rel="stylesheet" href="https://ajax.googleapis.com/ajax/libs/jqueryui/1.11.4/themes/smoothness/jquery-ui.css">
<link href='http://fonts.googleapis.com/css?family=Libre+Baskerville' rel='stylesheet' type='text/css'>
<link href="https://fonts.googleapis.com/css?family=PT+Serif" rel="stylesheet">
</head>
<body>
<div class="container theme-showcase" role="main">
<div class="page-header">
<h1>CSC2626: Imitation Learning for Robotics, Fall 2022</h1>
</div>
<!--div class="imgs">
<div id="pic1">
<a href="http://www.roboticsconference.org/"><img src="./pic1.png"></img></a>
</div>
<div id="pic2">
<a href="https://mars.nasa.gov/mer/home/"><img src="./pic2.jpg"></img></a>
</div>
</div-->
<div class="row">
<div id="ov">
<h3>Overview</h3>
In the next few decades we are going to witness millions of people, from various backgrounds and levels of technical expertise, needing to
effectively interact with robotic technologies on a daily basis. As such, people will need to modify the behavior of their robots without
explicitly writing code, but by providing only a small number of kinesthetic or visual demonstrations, or even natural language commands. At the same time, robots should try to
infer and predict the human's intentions and internal objectives from past interactions, in order to provide assistance before it is explicitly asked.
This <b>graduate-level course</b> will examine some of the most important papers in imitation learning for robot control, placing more emphasis on developments
in the last 10 years. Its purpose is to familiarize students with the frontiers of this research area, to help them identify open problems, and
to enable them to make a novel contribution.
<h3>Prerequisites</h3>
You need to be comfortable with: introductory machine learning concepts (such as from CSC411/CSC413/ECE521 or equivalent), linear algebra, basic multivariable
calculus, intro to probability. You also need to have strong programming skills in Python. <b>Note:</b> if you don't meet all the prerequisites above
please contact the instructor by email. Optional, but recommended: experience with neural networks, such as from CSC321, introductory-level familiarity
with reinforcement learning and control.
</div>
<h3>Teaching Staff</h3>
<div id="dts">
<div class="ts">
<div><b>Instructor</b></div>
<div><a href="http://www.cs.toronto.edu/~florian">Florian Shkurti</a></div>
<div>x@cs.toronto.edu, x=csc2626-instructor</div>
<div>Office Hours: Mon 12-1pm ET, in person at Sandford Fleming 3328 + on Zoom</div>
</div>
<div class="ts">
<div><b>Teaching Assistants</b></div>
<div><a href="https://www.cs.toronto.edu/~lorraine/">Jonathan Lorraine</a>,
<a href="https://khodeir.github.io/">Mohamed Khodeir</a>,
and <a href="">Skylar Hao</a></div>
<div>Please use csc2626-tas@cs.toronto.edu, not personal emails</div>
<div>Office Hours (Jonathan): Tue 11-12pm ET, on Zoom</div>
<div>Office Hours (Skylar): Thu 11-12pm ET, on Zoom</div>
</div>
</div>
<div class="host">
</div>
<h3>Course Details</h3>
<div id="dts">
<div cla-ss="ts">
<div>Lectures: Wednesdays, 11am-1pm ET (in-person, OISE Building 2-212, lectures recorded on Zoom)</div>
<div>Zoom link is posted on the course's Quercus homepage</div>
<div>Announcements will be posted on Quercus</div>
<div>Discussions will take place on <a href="http://www.piazza.com/utoronto.ca/fall2022/csc2626">Piazza</a></div>
<div><a href="https://www.surveymonkey.com/r/LJJV5LY">Anonymous feedback form</a> for suggested improvements</div>
</div>
</div>
<div class="host">
</div>
<div id="cd">
<h3>Grading and Important Dates</h3>
<ul>
<li><b><a href="https://github.com/florianshkurti/csc2626w22/blob/master/assignments/A1/A1.pdf">Assignment 1</a></b> (25%): due Oct 3 at 6pm ET</li>
<li><b><a href="https://github.com/florianshkurti/csc2626w22/blob/master/assignments/A2/CSC2626_Assignment_2.pdf">Assignment 2</a></b> (25%): due Oct 18th at 6pm ET</li>
<li><b>Project Proposal</b> (10%): due Oct 25 at 6pm. Students can take on projects in groups of 2-3 people.
Tips for a good project proposal can be found <a href="./CSC2626_Project_Guidelines.pdf">here</a>. Proposals should not be based only on papers covered in class by the proposal due date.
Students are encouraged to look further ahead in the schedule and to start planning their project definition well ahead of this deadline.
Students who need help choosing or crystallizing a project idea should email the instructor or the TAs, come to office hours, or book appointments to discuss ideas.
</li>
<li><b>Midterm Progress Report</b> (5%): due Nov 10 at 6pm ET. Tips and expectations for a good midterm progress report are <a href="./CSC2626_Project_Guidelines.pdf">here</a>.
</li>
<li><b>Project Presentation</b> (5%): in class on Dec 7. This will be a short presentation, approximately 5 minutes, depending on the number of groups. More detailed instructions
will be posted towards the end of the term.
</li>
<li><b>Final Project Report and Code</b> (30%): due Dec 12 at 6pm ET. Tips and expectations for a good final project report can be found <a href="./CSC2626_Project_Guidelines.pdf">here</a>.</li>
</ul>
<h3>Course Description</h3>
This course will broadly cover the following areas:
<br/><br/>
<ul>
<li>Imitating the policies of demonstrators (people, expensive algorithms, optimal controllers)</li>
<li>Connections between imitation learning, optimal control, and reinforcement learning </li>
<li>Learning the cost functions that best explain a set of demonstrations</li>
<li>Shared autonomy between humans and robots for real-time control</li>
</ul>
</div>
</div>
<h3>Schedule</h3>
<div class="row">
<div class="col-md-12">
<table class="table">
<thead>
<tr>
<th>Lecture</th>
<th>Date</th>
<th>Topics</th>
<th></th>
<th>Slides</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td id="day">Sep 14</td>
<td>
<b>Introduction</b><br/>
Motivation, logistics, rough description of the topics to be covered.<br/><br/>
<b>Imitation vs. Robust Behavioral Cloning</b><br/>
<a href="https://papers.nips.cc/paper/95-alvinn-an-autonomous-land-vehicle-in-a-neural-network">ALVINN: An autonomous land vehicle in a neural network</a><br/>
<a href="https://ieeexplore.ieee.org/document/5509140/">Visual path following on a manifold in unstructured three-dimensional terrain</a><br/>
<a href="https://arxiv.org/abs/1604.07316">End-to-end learning for self-driving cars</a><br/>
<a href="https://ieeexplore.ieee.org/document/7358076/">A machine learning approach to visual perception of forest trails for mobile robots</a><br/>
<a href="https://arxiv.org/abs/1011.0686">DAgger: A reduction of imitation learning and structured prediction to no-regret online learning</a><br/>
<a href="https://arxiv.org/abs/1211.1690">Learning monocular reactive UAV control in cluttered natural environments</a><br/>
<a href="https://www.ri.cmu.edu/pub_files/2015/3/InvitationToImitation_3_1415.pdf">An invitation to imitation</a><br/>
<br/>
<b>Optional Reading</b><br/>
<a href="https://dl.acm.org/citation.cfm?id=1524008">A survey of robot learning from demonstration</a><br/>
<a href="https://arxiv.org/abs/1812.03079">ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst</a><br/>
<a href="https://arxiv.org/abs/1906.05838">Goal-conditioned imitation learning</a><br/>
<a href="http://www.roboticsproceedings.org/rss16/p048.html">Vision-based goal-conditioned policies for underwater navigation in the presence of obstacles</a><br/>
<a href="https://arxiv.org/abs/1907.03423">On-policy robot imitation learning from a converging supervisor</a><br/>
<a href="https://openreview.net/forum?id=rkgbYyHtwB">Disagreement-regularized imitation learning</a><br/>
<br/>
<b>Optional Reading: Only Query the Expert when the Learner is Uncertain</b><br/>
<a href="https://arxiv.org/abs/1506.02142">Dropout as a Bayesian approximation: representing model uncertainty in deep learning</a><br/>
<a href="http://jmlr.org/papers/v15/srivastava14a.html">Dropout: A simple way to prevent neural networks from overfitting</a><br/>
<a href="http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html">What my deep model doesn't know</a><br/>
<a href="https://arxiv.org/abs/1505.05424">Weight uncertainty in neural networks</a><br/>
<a href="http://www.roboticsproceedings.org/rss09/p38.pdf">Maximum mean discrepancy imitation learning</a><br/>
<a href="https://arxiv.org/abs/1709.06166">DropoutDAgger: A Bayesian approach to safe imitation learning</a><br/>
<a href="https://ieeexplore.ieee.org/document/7487167/">SHIV: Reducing supervisor burden in DAgger using support vectors</a><br/>
<a href="https://arxiv.org/abs/1605.06450">Query-efficient imitation learning for end-to-end autonomous driving</a><br/>
<a href="https://arxiv.org/abs/2006.01862">Consistent estimators for learning to defer to an expert</a><br/>
<br/>
</td>
<td></td>
<td><a href="Quiz_0.pdf">Quiz 0</a><br/><a href="CSC2626_Syllabus.pdf">Syllabus</a><br/><a href="https://github.com/florianshkurti/csc2626w22/blob/master/lectures/Lecture1.pdf">Slides</a></td>
</tr>
<tr>
<td>2</td>
<td>Sep 21</td>
<td>
<b>Intro to Optimal Control and Model-Based Reinforcement Learning</b><br>
<a href="http://people.eecs.berkeley.edu/~somil/Papers/lqrlecture.pdf">Linear Quadratic Regulator</a> and some <a href="https://web.archive.org/web/20210613114312/https://stanford.edu/class/engr108/lectures/control_slides.pdf">examples</a><br/>
<a href="https://homes.cs.washington.edu/~todorov/papers/LiICINCO04.pdf">Iterative Linear Quadratic Regulator</a><br/>
<a href="https://ieeexplore.ieee.org/document/845037">Model Predictive Control</a><br/>
<a href="http://www.argmin.net/2018/06/25/outsider-rl/">Ben Recht: An outsider's tour of RL</a> (watch his <a href="https://people.eecs.berkeley.edu/~brecht/l2c-icml2018/">ICML'18 tutorial</a>, too)<br/>
<br/>
<b>Optional Reading: Model-based RL</b><br/>
<a href="http://mlg.eng.cam.ac.uk/pilco/">PILCO: Probabilistic inference for learning control</a><br/>
<a href="https://arxiv.org/abs/1805.12114">Deep reinforcement learning in a handful of trials using probabilistic dynamics models</a><br/>
<a href="https://arxiv.org/abs/1810.01566">Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids</a><br/>
<a href="https://papers.nips.cc/paper/7948-end-to-end-differentiable-physics-for-learning-and-control">End-to-end differentiable physics for learning and control</a><br/>
<a href="https://arxiv.org/abs/1803.02291">Synthesizing neural network controllers with probabilistic model based reinforcement learning</a><br/>
<a href="https://arxiv.org/abs/1807.02303">A survey on policy search algorithms for learning robot controllers in a handful of trials</a><br/>
<a href="https://journals.sagepub.com/doi/abs/10.1177/0278364913495721">Reinforcement learning in robotics: a survey</a><br/>
<a href="http://deepmpc.cs.cornell.edu/">DeepMPC: Learning deep latent features for model predictive control</a><br/>
<a href="https://planetrl.github.io/">Learning latent dynamics for planning from pixels</a><br/>
<br/>
<b>Optional Reading: Monotonic Improvement of the Value Function</b><br/>
<a href="https://arxiv.org/abs/1807.03858">Algorithmic framework for model-based deep reinforcement learning with theoretical guarantees</a><br/>
<a href="https://arxiv.org/abs/1906.08253">When to Trust Your Model: Model-Based Policy Optimization</a><br/>
<br/>
<b>Optional Reading: Learning Dynamics Where it Matters for the Value Function</b><br/>
<a href="https://arxiv.org/abs/2204.01464">Value Gradient Weighted Model-Based Reinforcement Learning</a><br/>
<br/>
</td>
<td></td>
<td><a href="https://github.com/florianshkurti/csc2626w22/blob/master/lectures/Lecture2.pdf">Slides</a></td>
</tr>
<tr>
<td>3</td>
<td>Sep 28</td>
<td>
<b>Offline / Batch Reinforcement Learning</b><br>
<a href="https://arxiv.org/abs/1909.12200">Scaling data-driven robotics with reward sketching and batch reinforcement learning</a><br/>
<a href="https://arxiv.org/abs/1812.02900">Off-policy deep reinforcement learning without exploration</a><br/>
<a href="https://arxiv.org/abs/2006.04779">Conservative Q-Learning for offline reinforcement learning</a><br/>
<a href="https://arxiv.org/abs/2004.07219">D4RL: Datasets for deep data-driven reinforcement learning</a><br/>
<a href="https://arxiv.org/abs/2108.03298">What matters in learning from offline human demonstrations for robot manipulation</a><br/>
<a href="https://sites.google.com/view/offlinerltutorial-neurips2020/home">NeurIPS 2020 tutorial on offline RL</a><br/>
<br/>
<b>Optional Reading</b><br/>
<a href="https://arxiv.org/abs/2005.01643">Offline reinforcement learning: tutorial, review, and perspectives on open problems</a><br/>
<a href="https://openreview.net/forum?id=AP1MKT37rJ">Should I run offline reinforcement learning or behavioral cloning?</a><br/>
<a href="https://arxiv.org/abs/2201.12417">Why should I trust you, Bellman? The Bellman error is a poor replacement for value error</a><br/>
<a href="https://arxiv.org/abs/2106.06860">A minimalist approach to offline reinforcement learning</a><br/>
<a href="https://arxiv.org/abs/1910.01708">Benchmarking batch deep reinforcement learning algorithms</a><br/>
<a href="https://arxiv.org/abs/1906.00949">Stabilizing off-policy Q-Learning via bootstrapping error reduction</a><br/>
<a href="https://arxiv.org/abs/1907.04543">An optimistic perspective on offline reinforcement learning</a><br/>
<a href="https://arxiv.org/abs/2010.14500">COG: Connecting new skills to past experience with offline reinforcement learning</a><br/>
<a href="https://arxiv.org/abs/1911.05321">IRIS: Implicit reinforcement without interaction at scale for learning control from offline robot manipulation data</a><br/>
<a href="http://ml.informatik.uni-freiburg.de/former/_media/publications/gr_09.pdf">(Batch) reinforcement learning for robot soccer</a><br/>
<a href="https://arxiv.org/abs/2103.04947">Instabilities of offline RL with pre-trained neural representation</a><br/>
<a href="https://openreview.net/forum?id=Is5Hpwg2R-h">Targeted environment design from offline data</a><br/>
<br/>
</td>
<td></td>
<td><a href="https://github.com/florianshkurti/csc2626w22/blob/master/lectures/Lecture3.pdf">Slides</a></td>
</tr>
<tr>
<td>4</td>
<td>Oct 5</td>
<td>
<b>Imitation Learners Guided by Optimal Control Experts and Physics-based Dynamics Models</b><br>
<a href="https://people.eecs.berkeley.edu/~svlevine/papers/mfcgps.pdf">Learning neural network policies with guided policy search under unknown dynamics</a><br/>
<a href="https://arxiv.org/abs/1603.00622">PLATO: Policy learning using adaptive trajectory optimization</a><br/>
<a href="https://link.springer.com/article/10.1007/s10514-017-9648-7">Using probabilistic movement primitives in robotics</a><br/>
<a href="https://arxiv.org/abs/1804.02717">DeepMimic: Example-guided deep reinforcement learning of physics-based character skills</a><br/>
<a href="https://arxiv.org/abs/2102.03861">Dynamic Movement Primitives in robotics: a tutorial survey</a><br/>
<br/>
<b>Optional Reading</b><br/>
<a href="https://ieeexplore.ieee.org/document/6630832/">Model-based imitation learning by probabilistic trajectory matching</a><br/>
<a href="https://ropemanipulation.github.io/">Combining self-supervised learning and imitation for vision-based rope manipulation</a><br/>
<a href="https://arxiv.org/abs/1905.11108">SQIL: Imitation learning via reinforcement learning with sparse rewards</a><br/>
<a href="https://arxiv.org/abs/2006.09359">Accelerating online reinforcement learning with offline datasets</a><br/>
<a href="http://journals.sagepub.com/doi/abs/10.1177/0278364917713116">Learning movement primitive libraries through probabilistic segmentation </a><br/>
<br/>
</td>
<td></td>
<td><a href="https://github.com/florianshkurti/csc2626w22/blob/master/lectures/Lecture9.pdf">Slides</a></td>
</tr>
<tr>
<td>5</td>
<td>Oct 12</td>
<td>
<b>Imitation as Program Induction. Modular Decomposition of Demonstrations into Skills. Imitating Long-Horizon Tasks.</b><br>
<a href="https://arxiv.org/abs/1710.01813">Neural Task Programming: Learning to generalize across hierarchical tasks</a><br/>
<a href="https://arxiv.org/abs/1803.01840">TACO: Learning task decomposition via temporal alignment for control</a><br/>
<a href="https://arxiv.org/abs/2003.06085">Learning to generalize across long-horizon tasks from human demonstrations</a><br/>
<a href="https://arxiv.org/abs/1511.06279">Neural programmer-interpreters</a><br/>
<a href="https://ieeexplore.ieee.org/document/6457507/">The motion grammar: analysis of a linguistic method for robot control</a><br/>
<br/>
<b>Optional Reading</b><br/>
<a href="https://www.sciencedirect.com/science/article/pii/S0010027709001607?via%3Dihub">Action understanding as inverse planning</a><br/>
<a href="https://ieeexplore.ieee.org/document/5650500/">Incremental learning of subtasks from unsegmented demonstration</a><br/>
<a href="https://www.ias.informatik.tu-darmstadt.de/uploads/Team/RudolfLioutikov/lioutikov_movement_pcfg_icra2018.pdf">Inducing probabilistic context-free grammars for the sequencing of movement primitives</a><br/>
<a href="https://arxiv.org/abs/1807.03480">Neural Task Graphs: Generalizing to unseen tasks from a single video demonstration</a><br/>
<a href="https://shaohua0116.github.io/demo2program/">Neural program synthesis from diverse demonstration videos</a><br/>
<a href="https://arxiv.org/abs/1809.06305">Automata guided reinforcement learning with demonstrations</a><br/>
<a href="https://www.sciencedirect.com/science/article/pii/S0921889013001449">A syntactic approach to robot imitation learning using probabilistic activity grammars</a><br/>
<a href="http://journals.sagepub.com/doi/abs/10.1177/0278364911428653">Robot learning from demonstration by constructing skill trees</a><br/>
<a href="https://journals.sagepub.com/doi/abs/10.1177/0278364917743319">Transition state clustering: Unsupervised surgical trajectory segmentation for robot learning</a><br/>
<a href="https://ieeexplore.ieee.org/document/6943187">Learning to sequence movement primitives from demonstrations</a><br/>
<a href="https://arxiv.org/abs/1907.05431">Imitation-projected programmatic reinforcement learning</a><br/>
<a href="https://arxiv.org/abs/1802.09564">Reinforcement and imitation learning for diverse visuomotor skills</a><br/>
<a href="https://proceedings.mlr.press/v100/park20a.html">Inferring task goals and constraints using Bayesian nonparametric inverse reinforcement learning</a><br/>
<a href="https://arxiv.org/abs/2201.12716">You only demonstrate once: category-level manipulation from single visual demonstration</a><br/>
<a href="https://ieeexplore.ieee.org/abstract/document/9695333">Bottom-up skill discovery from unsegmented demonstrations for long-horizon robot manipulation</a><br/>
<br/>
</td>
<td></td>
<td><a href="https://github.com/florianshkurti/csc2626w22/blob/master/lectures/Lecture5.pdf">Slides</a></td>
</tr>
<tr>
<td>6</td>
<td>Oct 19</td>
<td>
<b>Inverse Reinforcement Learning</b><br>
<a href="https://www.aaai.org/Papers/AAAI/2008/AAAI08-227.pdf">Maximum entropy inverse reinforcement learning</a><br/>
<a href="http://rss2017.lids.mit.edu/program/papers/04/">Active preference-based learning of reward functions</a><br/>
<a href="https://arxiv.org/abs/1904.06387">Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations</a><br/>
<a href="https://journals.sagepub.com/doi/abs/10.1177/0278364917722396">Large-scale cost function learning for path planning using deep inverse reinforcement learning</a><br/>
<a href="https://arxiv.org/abs/1603.00448">Guided Cost Learning: Deep inverse optimal control via policy optimization</a><br/>
<a href="http://journals.sagepub.com/doi/abs/10.1177/0278364917745980">Inverse KKT: Learning cost functions of manipulation tasks from demonstrations</a><br/>
<a href="https://www.aaai.org/Papers/IJCAI/2007/IJCAI07-416.pdf">Bayesian inverse reinforcement learning</a><br/>
<br/>
<b>Optional Reading</b><br/>
<a href="https://arxiv.org/abs/1711.02827">Inverse reward design</a><br/>
<a href="https://papers.nips.cc/paper/4420-nonlinear-inverse-reinforcement-learning-with-gaussian-processes">Nonlinear inverse reinforcement learning with gaussian processes</a><br/>
<a href="https://dl.acm.org/citation.cfm?id=1143936">Maximum margin planning</a><br/>
<a href="https://papers.nips.cc/paper/6800-compatible-reward-inverse-reinforcement-learning.pdf">Compatible reward inverse reinforcement learning</a><br/>
<a href="https://arxiv.org/abs/1512.05832">Learning the preferences of ignorant, inconsistent agents</a><br/>
<a href="https://ieeexplore.ieee.org/abstract/document/6045410">Imputing a convex objective function</a><br/>
<a href="https://arxiv.org/abs/2106.12142">IQ-Learn: inverse soft-Q learning for imitation</a><br/>
<a href="https://arxiv.org/abs/1907.03976">Better-than-demonstrator imitation learning via automatically-ranked demonstrations</a><br/>
<br/>
<b>Optional Reading: Applications of IRL</b><br/>
<a href="http://journals.sagepub.com/doi/10.1177/0278364915619772">Socially compliant mobile robot navigation via inverse reinforcement learning</a><br/>
<a href="http://www.cim.mcgill.ca/~florian/pdfs/pursuit-icra2018.pdf">Model-based probabilistic pursuit via inverse reinforcement learning</a><br/>
<a href="https://arxiv.org/abs/1612.07796">First-person activity forecasting with online inverse reinforcement learning</a><br/>
<a href="https://www.ias.informatik.tu-darmstadt.de/uploads/Site/EditPublication/Muelling_BICY_2014.pdf">Learning strategies in table tennis using inverse reinforcement learning</a><br/>
<a href="https://www.ri.cmu.edu/pub_files/2009/10/planning-based-prediction-pedestrians.pdf">Planning-based prediction for pedestrians</a><br/>
<a href="http://www.cs.cmu.edu/~kkitani/pdf/KZBH-ECCV12.pdf">Activity forecasting</a><br/>
<br/>
</td>
<td></td>
<td><a href="https://github.com/florianshkurti/csc2626w22/blob/master/lectures/Lecture6.pdf">Slides</a></td>
<td></td>
<td></td>
</tr>
<tr>
<td>7</td>
<td>Oct 26</td>
<td>
<b>Shared Autonomy for Robot Control and Human in-the-Loop Imitation</b><br>
<a href="http://bair.berkeley.edu/blog/2018/04/18/shared-autonomy/">Shared autonomy via deep reinforcement learning</a><br/>
<a href="https://www.ri.cmu.edu/pub_files/2015/7/Javdani15Hindsight.pdf">Shared autonomy via hindsight optimization</a><br/>
<a href="https://arxiv.org/abs/1808.08268">Learning models for shared control of human-machine systems with unknown dynamics</a><br/>
<a href="http://www.roboticsproceedings.org/rss14/p43.html">RelaxedIK: Real-time synthesis of accurate and feasible robot arm motion</a><br/>
<a href="https://arxiv.org/abs/2012.06733">Human-in-the-loop imitation learning using remote teleoperation</a><br/>
<a href="https://proceedings.mlr.press/v164/wong22a.html">Error-aware imitation learning from teleoperation data for mobile manipulation</a><br/>
<a href="https://ieeexplore.ieee.org/abstract/document/9197197">Controlling assistive robots with learned latent actions</a><br/>
<br/>
<b>Optional Reading</b><br/>
<a href="https://www.cc.gatech.edu/social-machines/papers/cakmak12_hri_active.pdf">Designing robot learners that ask good questions</a><br/>
<a href="https://ieeexplore.ieee.org/document/1513835/">Blending human and robot inputs for sliding scale autonomy</a><br/>
<a href="https://ieeexplore.ieee.org/document/7799299/">Inferring and assisting with constraints in shared autonomy</a><br/>
<a href="https://ieeexplore.ieee.org/document/6135817/">Collaborative control for a robotic wheelchair: evaluation of performance, attention, and workload</a><br/>
<a href="https://onlinelibrary.wiley.com/doi/full/10.1002/rob.21681">Director: A user interface designed for robot operation with shared autonomy</a><br/>
<a href="https://ieeexplore.ieee.org/abstract/document/9561491">Learning multi-arm manipulation through collaborative teleoperation</a><br/>
<a href="http://www.cim.mcgill.ca/~mrl/pubs/anqixu/icius2014_apexcommander.pdf">Interactive autonomous driving through adaptation from participation</a><br/>
<br/>
</td>
<td></td>
<td><a href="https://github.com/florianshkurti/csc2626w22/blob/master/lectures/Lecture7.pdf">Slides</a></td>
</tr>
<tr>
<td>8</td>
<td>Nov 2</td>
<td>
<b>Adversarial Imitation Learning</b><br>
<a href="https://arxiv.org/abs/1606.03476">GAIL: Generative adversarial imitation learning</a><br/>
<a href="https://arxiv.org/abs/1710.11248">Learning robust rewards with adversarial inverse reinforcement learning</a><br/>
<a href="https://arxiv.org/abs/1703.08840">InfoGAIL: interpretable imitation learning from visual demonstrations</a><br/>
<a href="https://arxiv.org/abs/1605.08478">Model-free imitation learning with policy optimization</a><br/>
<a href="https://openreview.net/forum?id=Hyg-JC4FDr">Imitation learning via off-policy distribution matching</a><br/>
<a href="https://arxiv.org/abs/1910.00105">Domain adaptive imitation learning</a><br/>
<a href="https://arxiv.org/abs/2106.00672">What matters for adversarial imitation learning?</a><br/>
<br/>
</td>
<td></td>
<td><a href="https://github.com/florianshkurti/csc2626w22/blob/master/lectures/Lecture8.pdf">Slides</a></td>
</tr>
<tr>
<td>9</td>
<td>Nov 9</td>
<td>
<b>Reading Week</b><br>
<br/>No lectures or office hours this week (Nov 7 - 11)<br/><br/>
</td>
<td></td>
<td></td>
</tr>
<tr>
<td>10</td>
<td>Nov 16</td>
<td>
<b>Imitation Learning Combined with Reinforcement Learning and Planning. Imitating Long-Horizon Tasks.</b><br>
<a href="https://arxiv.org/abs/1406.5979">AggreVaTe: Reinforcement and imitation learning via interactive no-regret learning</a><br/>
<a href="https://arxiv.org/abs/1709.07174">Agile off-road autonomous driving using end-to-end deep imitation learning</a><br/>
<a href="https://arxiv.org/abs/1710.02410">End-to-end driving via conditional imitation learning</a><br/>
<a href="https://arxiv.org/abs/1910.11956">Relay Policy Learning: solving long-horizon tasks via imitation and reinforcement learning</a><br/>
<a href="https://arxiv.org/abs/1704.03732">Deep Q-learning from demonstrations</a><br/>
<a href="https://openaccess.thecvf.com/content_CVPR_2019/html/Zeng_End-To-End_Interpretable_Neural_Motion_Planner_CVPR_2019_paper.html">End-to-end interpretable neural motion planner</a><br/>
<a href="https://arxiv.org/abs/1709.10087">Learning complex dexterous manipulation with deep reinforcement learning and demonstrations</a><br/>
<a href="https://arxiv.org/abs/1803.00590">Hierarchical imitation and reinforcement learning</a><br/>
<br/>
<b>Optional Reading: Imitation from Cost-to-Go Queries</b><br/>
<a href="https://arxiv.org/abs/1703.01030">Deeply AggreVaTeD: Differentiable imitation learning for sequential prediction</a><br/>
<a href="https://arxiv.org/abs/1801.07292">Convergence of value aggregation for imitation learning</a><br/>
<a href="https://arxiv.org/abs/1805.11240">Truncated Horizon Policy Search: Combining reinforcement learning & imitation learning</a><br/>
<a href="https://arxiv.org/abs/1805.10413">Fast policy learning through imitation and reinforcement</a><br/>
<br/>
<b>Optional Reading: Imitation and Reinforcement Learning with Imperfect Demonstrations</b><br/>
<a href="https://arxiv.org/abs/1802.05313">Reinforcement learning from imperfect demonstrations</a><br/>
<a href="https://arxiv.org/abs/2011.01298">Shaping rewards for reinforcement learning with imperfect demonstrations using generative models</a><br/>
<a href="https://arxiv.org/abs/1911.07109">Reinforcement learning from imperfect demonstrations under soft expert guidance</a><br/>
<a href="https://arxiv.org/abs/2010.10181">Robust imitation learning from noisy demonstrations</a><br/>
<br/>
<b>Optional Reading: Imitation can Improve Search and Exploration</b><br/>
<a href="https://arxiv.org/abs/1709.10089">Overcoming exploration in reinforcement learning with demonstrations</a><br/>
<a href="https://arxiv.org/abs/1611.04180">Learning to gather information via imitation</a><br/>
<a href="https://dl.acm.org/citation.cfm?id=2936990">Exploration from demonstration for interactive reinforcement learning</a><br/>
<a href="https://arxiv.org/abs/1804.00846">Learning to search via retrospective imitation</a><br/>
<br/>
</td>
<td></td>
<td><a href="https://github.com/florianshkurti/csc2626w22/blob/master/lectures/Lecture9.pdf">Slides</a></td>
</tr>
<tr>
<td>11</td>
<td>Nov 23</td>
<td>
<b>Representation Learning and Generalization Guarantees for Imitation Learning</b><br>
<a href="https://arxiv.org/abs/2008.01913">Generalization guarantees for imitation learning</a><br/>
<a href="https://arxiv.org/abs/2105.12272">Provable representation learning for imitation with contrastive Fourier features</a><br/>
<a href="https://arxiv.org/abs/2110.14770">TRAIL: near-optimal imitation learning with suboptimal data</a><br/>
<a href="http://proceedings.mlr.press/v139/yang21h.html">Representation matters: offline pretraining for sequential decision making</a><br/>
<a href="https://arxiv.org/abs/2012.09293">Imitation learning with stability and safety guarantees</a><br/>
<br/>
<b>Optional Reading</b><br/>
<a href="https://arxiv.org/abs/2002.10544">Provable representation learning for imitation learning via bi-level optimization</a><br/>
<a href="https://openreview.net/forum?id=kBNhgqXatI">An empirical investigation of representation learning for imitation</a><br/>
<a href="https://arxiv.org/abs/2111.14629">Improving zero-shot generalization in offline reinforcement learning using Generalized Similarity Functions</a><br/>
<a href="https://arxiv.org/abs/2201.00632">Neural network training under semidefinite constraints</a><br/>
<br/>
</td>
<td></td>
<td></td>
</tr>
<tr>
<td>12</td>
<td>Nov 30</td>
<td>
<b>Rewards, Task Specification, and Value Alignment</b><br>
<a href="https://people.eecs.berkeley.edu/~pabbeel/cs287-fa09/readings/NgHaradaRussell-shaping-ICML1999.pdf">Policy invariance under reward transformations: theory and applications to reward shaping</a><br/>
<a href="https://arxiv.org/abs/1606.06565">Concrete problems in AI safety</a><br/>
<a href="http://interactive.mit.edu/bayesian-inference-temporal-task-specifications-demonstrations-0">Bayesian inference of temporal task specifications from demonstrations</a><br/>
<a href="https://dl.acm.org/doi/10.5555/2900423.2900661">Understanding natural language commands for robotic navigation and mobile manipulation</a><br/>
<a href="https://proceedings.mlr.press/v168/cui22a.html">Can foundation models perform zero-shot task specification for robot manipulation?</a><br/>
<a href="https://say-can.github.io/">Do as I can, not as I say: grounding language in robotic affordances</a><br/>
<br/>
<b>Optional Reading</b><br/>
<a href="https://arxiv.org/abs/1606.03137">Cooperative inverse reinforcement learning</a><br/>
<a href="https://arxiv.org/abs/1811.07871">Scalable agent alignment via reward modeling: a research direction</a><br/>
<a href="https://www.annualreviews.org/doi/abs/10.1146/annurev-control-101119-071628">Robots that use language</a><br/>
<a href="https://dspace.mit.edu/handle/1721.1/81275">Learning perceptually grounded word meanings from unaligned parallel data</a><br/>
<a href="https://dspace.mit.edu/handle/1721.1/116010">Asking for help using inverse semantics</a><br/>
<a href="https://arxiv.org/abs/1801.09624">Learning the reward function for a misspecified model</a><br/>
<a href="https://peract.github.io/">Perceiver-Actor: A multi-task transformer for robotic manipulation</a><br/>
<a href="https://code-as-policies.github.io/">Code as Policies: Language model programs for embodied control</a><br/>
<br/>
</td>
<td></td>
<td></td>
</tr>
<tr>
<td>13</td>
<td>Dec 7</td>
<td>
<b>Project Presentations</b><br>
</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="rt">
<h3>Recommended, but optional, books</h3>
<ul>
<li>Robot programming by demonstration, by Aude Billard, Sylvain Calinon, Rudiger Dillmann, Stefan Schaal</li>
<li>Robot learning from human teachers, by Sonia Chernova, Andrea Thomaz</li>
<li>An algorithmic perspective on imitation learning, by Takayuki Osa, Joni Pajarinen, Gerhard Neumann, Andrew Bagnell, Pieter Abbeel, Jan Peters</li>
</ul>
</div>
<div class="rc">
<h3>Recommended simulators and datasets</h3>
You are encouraged to use the simplest possible simulator to accomplish the task you are interested in. In most cases this means Mujoco, but feel free to build your own.<br/>
For all the starred environments below, please be aware of the 1-machine/student licensing restriction for the Mujoco physics engine:<br/><br/>
<ul>
<li><a href="https://gym.openai.com/envs/#robotics">OpenAI Gym</a> (Robotics*, Mujoco*, Box2D, Classic Control)</li>
<li><a href="https://github.com/deepmind/dm_control">DeepMind control suite</a>*</li>
<li><a href="https://github.com/StanfordVL/robosuite">Surreal Robosuite</a> (manipulation*)</li>
<li><a href="http://motion.pratt.duke.edu/klampt/">Klampt</a> (manipulation and locomotion tasks, contact modeling)</li>
<li><a href="http://dartsim.github.io/">DART</a> (manipulation and locomotion tasks, contact modeling)</li>
<li><a href="https://github.com/udacity/self-driving-car-sim">Udacity self-driving car simulator</a> (based on Unity, needs a GPU)</li>
<li><a href="http://carla.org/">CARLA self-driving car simulator</a> (based on Unreal Engine 4, needs a GPU)</li>
<li><a href="https://holodeck.cs.byu.edu/">Holodeck</a> (based on Unreal Engine 4, needs a GPU)</li>
<li><a href="https://github.com/Microsoft/AirSim/">AirSim</a> (flying vehicles and cars, based on Unreal Engine 4, needs a GPU)</li>
<li><a href="https://github.com/ugo-nama-kun/gym_torcs">TORCS self-driving car simulator</a></li>
<li><a href="http://www.coppeliarobotics.com/">V-REP</a> (robot arms, humanoids, hexapods)</li>
<li><a href="https://github.com/deepmind/lab">DeepMind Lab</a> (navigation in mazes)</li>
<li><a href="https://github.com/StanfordVL/GibsonEnv">Gibson environment</a> (navigation, locomotion in indoor environments, needs a GPU)</li>
<li><a href="https://github.com/stepjam/RLBench">RLBench</a> (vision-based manipulation, has demonstrations)</li>
<li><a href="https://clvrai.github.io/furniture/">IKEA furniture assembly environment</a> (vision-based dual-arm manipulation for furniture assembly)</li>
<li><a href="https://askforalfred.com/">ALFRED</a> (vision and language based navigation and manipulation)</li>
<li><a href="https://sites.google.com/view/d4rl/home">D4RL</a> (manipulation and navigation datasets for offline RL)</li>
<li><a href="https://roboturk.stanford.edu/">RoboTurk</a> (demonstration data for manipulation)</li>
<li><a href="https://aihabitat.org/">AI Habitat</a> (visual navigation)</li>
<li><a href="https://developer.nvidia.com/isaac-gym">Isaac Gym</a> (gym environments and more, but blazing fast, end-to-end GPU accelerated)</li>
<li><a href="https://raisim.com/">RaiSim</a> (supports biomechanics of human motion, as well as quadrupeds)</li>
<li><a href="https://uzh-rpg.github.io/flightmare/">Flightmare</a> (fast multi-quadrotor simulation)</li>
<li><a href="https://github.com/utiasDSL/gym-pybullet-drones">PyBullet Drones</a> (fast multi-quadrotor simulation, more aerodynamic effects)</li>
<li><a href="https://berkeleyautomation.github.io/bags/">Deformable Ravens</a> (deformable object simulation in PyBullet with demonstrations)</li>
</ul>
</div>
<div class="rc">
<h3>Resources for planning, control, and RL</h3>
<ul>
<li><a href="http://ompl.kavrakilab.org/">Open Motion Planning Library</a></li>
<li><a href="https://gym.openai.com/envs/#robotics">Control Toolbox from ETHZ</a> (C++ only at the moment, but includes automatic differentiation)</li>
<li><a href="http://rll.berkeley.edu/trajopt/doc/sphinx_build/html/">Trajectory optimization</a></li>
<li><a href="https://github.com/resibots/blackdrops">Black-DROPS Policy Search</a> (C++ only at the moment)</li>
<li><a href="http://rll.berkeley.edu/gps/">Guided Policy Search</a></li>
<li><a href="https://github.com/openai/baselines">OpenAI Baselines</a></li>
</ul>
</div>
<div class="rc">
<h3>Resources for ML</h3>
<ul>
<li><a href="https://pytorch.org/">PyTorch</a></li>
<li><a href="https://www.tensorflow.org/">Tensorflow</a></li>
<li><a href="https://gpytorch.ai/">GPyTorch</a> (for gaussian processes)</li>
</ul>
</div>
<div class="rc">
<h3>Recommended courses</h3>
<ul>
<li><a href="http://rl.cs.rutgers.edu/robotlearningseminar.html">Robot Learning Seminar</a> by Abdeslam Boularias</li>
<li><a href="http://rail.eecs.berkeley.edu/deeprlcourse/">Deep RL course</a> by Sergey Levine, John Schulman, Chelsea Finn</li>
<li><a href="https://csc2541-f18.github.io/">Deep RL course</a> by Jimmy Ba</li>
<li><a href="http://wcms.inf.ed.ac.uk/ipab/rlsc">Robot Learning and Sensorimotor Control course</a> by Sethu Vijayakumar</li>
<li><a href="http://people.eecs.berkeley.edu/~anca/AHRI.html">Algorithmic HRI course</a> by Anca Dragan</li>
<li>Related sections from <a href="http://underactuated.mit.edu/">Russ Tedrake's underactuated robotics course</a>
</ul>
</div>
<!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
<script src="https://ajax.googleapis.com/ajax/libs/jqueryui/1.11.3/jquery-ui.min.js"></script>
<!-- Include all compiled plugins (below), or include individual files as needed -->
<script src="./js/bootstrap.min.js"></script>
</body>
</html>