Seminar on Reinforcement Learning and Information Retrieval

58317102
3
Algoritmit ja koneoppiminen
Syventävät opinnot
Vuosi Lukukausi Päivämäärä Periodi Kieli Vastuuhenkilö
2017 kevät 18.01-03.05. 3-4 Englanti Dorota Glowacka

Luennot

Aika Huone Luennoija Päivämäärä
Ke 10-12 C220 Dorota Glowacka 18.01.2017-01.03.2017
Ke 10-12 C220 Dorota Glowacka 15.03.2017-03.05.2017

Yleistä

The first week of the seminar will briefly cover the main concepts of reinforcement learning and information retrieval. During the first half of the seminar the following topics will be covered in more detail: markov decision processes (MDP) and early information retrieval systems based on MDP; bandit algorithms and their application in: user modelling, ads and news recommendations, document retrieval, image retrieval, music retrieval; examples of real-life systems based on bandit algorithms. The assessment will be based on an essay on a selected subject related to the main topic of the seminar as well as a short presentation.

Kurssin suorittaminen

Possible changes to schedule will appear on the course pages.

18.1. - 29.1. Introductory lectures

30.1. Deadline for topic selection - Send in the topic you wish to write about, and if it's not on the list at least 2 papers that outline the topic you wish to write about. Do this via email by 23:59.

Send it to both of our emails: firstname.lastname at cs.helsinki.fi.

15.2. Presentation of the chosen topic,  5 minutes, ~5 slides.

22.2. Lecture on writing, finding references and presentation.

You're on your own! We will send emails to you later to ask for your current versions on the essay. We will give you feedback via email for them. Same will happen for the presentation slides. Contact us via email if you have any questions, and look at the slides below for instructions on writing and presentin. Happy writing!

29.3. Feedback session (send emails if you wish to meet this day).

5.4. Feedback session (send emails if you wish to meet this day).

12.4. Final presentations, part 1. 20 minutes, ~20 slides.

25.4. Final presentations, part 2. 20 minutes, ~20 slides. (note, the original 19th was during the uni's easter break)

26.4. Final presentations, part 3. 20 minutes, ~20 slides.

3.5 Deadline for the final paper submission.

 

Kirjallisuus ja materiaali

The slides:

http://www.cs.helsinki.fi/u/jgpyykko/RL-slides.pdf

Writing/presenting instructions: https://docs.google.com/presentation/d/1cDQnmW-RbgGzUW1UMzDXvT_8npFLfGsZN7snz3qquuE/edit?usp=sharing

The book:

http://people.inf.elte.hu/lorincz/Files/RL_2006/SuttonBook.pdf

 

Writing platforms

sharelatex.com

overleaf.com

Conferences worth noting:

ICML https://2017.icml.cc/

NIPS https://nips.cc/

CIKM http://www.cikmconference.org/

SIGIR http://sigir.org/

RecSys https://recsys.acm.org/

And check earlier years too.

 

Here are some available topics:

1. Reinforcement learning in music retrieval

X.  Wang,  Y.  Wang,  D.  Hsu,  and  Y.  Wang.   Exploration  in  interactive personalized music recommendation:  A reinforcement learning approach.

ACM  Trans.  Multimedia  Comput.  Commun.  Appl., 11(1):7:1–7:22, Sept.2014.

 

2. Markov decision processes in document and web page recommendation

 S. Zhang, J. Luo, and H. Yang. A pomdp model for content-free document re-ranking. In Proceedings of the 37th International ACM SIGIR Conference  on  Research  &  Development  in  Information  Retrieval,  SIGIR’14, pages 1139–1142, New York, NY, USA, 2014. ACM.

 

3. Application of Q-learning in information retrieval

B.-T. Zhang and Y.-W. Seo. Personalized web-document filtering using reinforcement learning.

Applied Artificial Intelligence, 15(7):665–685, 2001.

 

4. Application of bandit algorithms to ranker evaluation

 M.  Zoghi,  S.  Whiteson,  and  M.  de  Rijke.   Mergerucb:   A  method  for large-scale  online  ranker  evaluation.   In Proceedings  of  the  Eighth  ACM

International  Conference  on  Web  Search  and  Data  Mining, WSDM ’15, pages 17–26, New York, NY, USA, 2015. ACM.

 

B. Brost, Y. Seldin, I. J. Cox, and C. Lioma.  Multi-dueling bandits and their application to online ranker evaluation.  In Proceedings  of  the  25th

ACM  International  on  Conference  on  Information  and  Knowledge  Management, pages 2161–2166. ACM, 2016.

 

 5. Reinforcement learning in image retrieval

A reinforcement learning approach to query-less image retrieval

S Hore, L Tyrvainen, J Pyykko, D Glowacka - International Workshop on Symbiotic Interaction, 2014

 P.-Y. Yin, B. Bhanu, K.-C. Chang, and A. Dong.  Integrating relevance feedback techniques for image retrieval using reinforcement learning.

IEEE Trans. Pattern Anal. Mach. Intell., 27(10):1536–1551, Oct. 2005.

 

 

6. Balancing exploration and exploitation in information retrieval

Balancing exploration and exploitation: Empirical parameterization of exploratory search systems

K Ahukorala, A Medlar, K Ilves, D Glowacka

Proceedings of the 24th ACM International on Conference on Information and ...

 

K. Hofmann, S. Whiteson, and M. de Rijke.  Balancing exploration and exploitation in learning to rank online. In European Conference on Information Retrieval, pages 251–263. Springer, 2011.

 

7. Online ranking with bandit algorithms

F. Radlinski, R. Kleinberg, and T. Joachims.  Learning diverse rankings with multi-armed bandits.  In Proceedings  of  the  25th  international  conference on Machine learning, pages 784–791. ACM, 2008.

 

K. Hofmann, S. Whiteson, and M. de Rijke.  Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrieval. Information Retrieval, 16(1):63–90, 2013.

 

8. Reinforcement learning for news and ad recommendation

L.  Li,  W.  Chu,  J.  Langford,  and  R.  E.  Schapire.   A  contextual-bandit approach to personalized news article recommendation. In Proceedings of

the 19th International Conference on World Wide Web, WWW ’10, pages 661–670, New York, NY, USA, 2010. ACM.

 

S. Yuan and J. Wang.  Sequential selection of correlated ads by pomdps. In Proceedings of the 21st ACM international conference on Information and knowledge management, pages 515–524. ACM, 2012.

 

9. Reinforcement learning in recommender systems

N. Taghipour, A. Kardan, and S. S. Ghidary. Usage-based web recommendations:  A reinforcement learning approach.  In Proceedings  of  the  2007

ACM  Conference  on  Recommender  Systems, RecSys ’07, pages 113–120, New York, NY, USA, 2007. ACM.

 

P. Kohli, M. Salek, and G. Stoddard.  A fast bandit algorithm for recommendation to users with heterogenous tastes.  In AAAI, 2013