Seminar: Reinforcement Learning and Its Applications

58314105
3
Algoritmit ja koneoppiminen
Syventävät opinnot
Vuosi Lukukausi Päivämäärä Periodi Kieli Vastuuhenkilö
2014 kevät 15.01-23.04. 3-4 Englanti Dorota Glowacka

Luennot

Aika Huone Luennoija Päivämäärä
Ke 12-14 C220 Dorota Glowacka 15.01.2014-19.02.2014
Ke 12-14 C220 Dorota Glowacka 12.03.2014-23.04.2014

Yleistä

The subject of the seminar is reinforcement learning, a field in machine learning that explores a problem by performing actions and learning the consequences. On this course students first get acquainted with the basic concepts of reinforcement learning and where it can be used. They will then choose a subtopic of their own, on which they write a report, present it and give feedback to other students. Subjects covered will include: robotics, games AI, personalisation, etc.

The seminar will also support a workshop along side of it, which is 4 additional credits and is recommended for all participants.  During the workshop you have the opportunity to implement a reinforcement learning agent in an environment of your choice (preferrably of the same topic as the seminar). https://www.cs.helsinki.fi/courses/582376/2014/k/k/1

Kurssin suorittaminen

In order to complete the course, students will need to:

1. Write a 8-10 page report on a subject that has been accepted by the supervisors.

2. Give two presentations on the chosen topic.

3. Give feedback for other students on their work.

 

The schedule (details still under consideration, further instructions as the course advances):

15.1.   Introductory lecture. Covered will be technical details, basics of reinforcement learning and suggested topics. Compulsory participation.

22.1.   Second lecture on reinforcement learning, personal help on choosing a topic.

26.1.   Deadline for topic. Send the topic and a preliminary abstract on email to both supervisors.

-Include: A short abstract on your topic. Half a page, citations included. Chosen topics are listed below, so if you wish to change the topic or haven't yet chosen please send an email to the supervisors to discuss it. Some topics cannot have multiple students working on it due to lack of material.

-Also: If you're on the project, the description for the that. Further instructions on the project webpages.

-And: Let us know what topics you would like we discuss next lecture.

29.1.   Third lecture, advanced topics on what people are doing.

18.2.   Send in your current version of the seminar, 2-4 pages written. You may discuss which chapters to work on first via email, or on one of the meetings.

19.2.   Presentation on the chosen topic,  5 minutes, ~5 slides.  You may send the slides before you come to present them for feedbackl.

24.2. - 7.3. Personal meetings for project work / seminar. Contact Joel and agree on a date.

25.3.   Send in the current version of the paper, whatever the state. We will send everyone 3 papers for reading. Comment on them before next meeting. Comments should start by you recapping the content of the paper. After that it may include corrections in grammar, suggestions on structure, what should be discussed more or what less. For each paper, include at least one "I did not understand this, please elaborate,"-comment, if possible.  If everything was crystal clear, then mention that instead. UPDATE: Everyone received 3 papers for reviewing, with instructions on how to do it. Most were sent to the CS-mail, so please check that one.

2.4.     Feedback session. Deadline for reviews are before the session. Discussion on writing, templates, citations and related topics. Slides: http://www.cs.helsinki.fi/u/jgpyykko/RLslides.pdf

9.4.     Feedback session. Deadline for the last review. We will choose the dates for final presentations for everyone during this session, so please everyone attend! You will also receive the reviews by the instructors by this date.

You may send your slides to the instructors for reviewing again. Strongly recommended to get some final notes on those before presenting.

Final presentations, 20 minutes, ~15 slides plus 5 minutes for discussions. You may send your paper's current version for feedback. Include questions you would like us to specifically answer, otherwise we give an estimation of the current grading plus some general notes on how to improve the paper.

Wed 16.4.

10:00-12:00  A307
Johannes
Lasse
Anssi
Arto
12:00-13:00  C220
Joseph
Jaana
14:00-16:00  A307
Jin
Han
Mubarok

Yuan

Thu 17.4.

10:00-12:00  A307
Eric
Tommi
Wang
Sayantan
14:00-15:00  A307
Meysam
Yina

5.5. Deadline for the paper. (Deadline extended, contact if you would like to have the credits sooner.

 

Report template and format 

The report should be written in LaTeX, citations done with bibtex. The LaTeX-template for the paper may be any established publisher's template, such as ACM, IEEE or NIPS. Also the department's template may be used. Please note that  the length of the report will vary between templates, so we will include further instructions on the length on a common template later on.

The chosen template is from NIPS, minimum length 10 pages with that one. Please check your current length with this template to estimate the required length with your chosen template. http://nips.cc/Conferences/2013/PaperInformation/StyleFiles

 

Course is over, thank you for participating! The results should be visible in your register after 10.6. Give us feedback if you feel there was something worth noting!

Kirjallisuus ja materiaali

Basics:
http://www.scholarpedia.org/article/Reinforcement_learning
http://webdocs.cs.ualberta.ca/~sutton/book/ebook/the-book.html

Application
Games:
http://www0.cs.ucl.ac.uk/staff/d.silver/web/Applications_files/dyna2.pdf - Go
http://www0.cs.ucl.ac.uk/staff/d.silver/web/Applications_files/bootstrapping.pdf - Chess
http://cs229.stanford.edu/proj2012/LiaoYiYang-RLtoPlayMario.pdf -Mario

http://people.csail.mit.edu/camato/publications/LearningInCiv-final.pdf -Civilization

http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5035664&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F4967991%2F5035601%2F05035664.pdf%3Farnumber%3D5035664

Robotics:
http://www.ias.tu-darmstadt.de/uploads/Publications/Kober_IJRR_2013.pdf

http://www-robotics.usc.edu/~maja/teaching/cs584/papers/reinf.pdf 

http://www.stanford.edu/~svlevine/papers/dlctrl.pdf - Simulated

http://www.is.tuebingen.mpg.de/fileadmin/user_upload/files/publications/Humanoids2003-Peters_%5B0%5D.pdf - Humanoid robots overview

 

Clinical trials:
http://www.cs.mcgill.ca/~vkules/bandits.pdf

Image retrieval
http://jmlr.org/proceedings/papers/v11/auer10a/auer10a.pdf

Auctions and pricing
https://explochallenge.inria.fr/wp-content/uploads/2012/05/paper1.pdf

Adverts display
http://books.nips.cc/papers/files/nips24/NIPS2011_1232.pdf

Information retrieval
http://www.cs.ubc.ca/~hutter/nips2011workshop/papers_and_posters/nips-2012-rl4ir.pdf

Algorithms (introductory, for implementation there are better sources):

http://www.acm.uiuc.edu/sigart/docs/QLearning.pdf
http://www.scholarpedia.org/article/Temporal_difference_learning

http://videolectures.net/nips09_littman_mbrl/ -Model-based reinforcement learning, and basics too!

Johannes was kind enough to suggest a script for libproxy (you add this script to your bookmarks f. ex.). Libproxy lets you access domains as if you were in the university network, thus allowing access to pdfs of otherwise inaccessible papers.

javascript:(function(d,u){d.domain.indexOf(u)<0&&(location.href=d.URL.replace(d.domain,d.domain+u))})(document,'.libproxy.helsinki.fi')