Jyrki Kivinen: Publications by topic

Online prediction with expert advice

These lecture notes give a tutorial to the experts model:

Online prediction with expert advice. Notes for a presentation at Machine Learning Summer School 2003 in Canberra.
A detailed analysis of a large class of loss functions shows that the loss Vovk gets with his Aggregating Algorithm is optimal:
David Haussler, Jyrki Kivinen and Manfred K. Warmuth: Sequential prediction of individual sequences under general loss functions. IEEE Transactions on Information Theory 44(5):1906-1925, September 1998. (Preliminary results appeared in EuroCOLT '93 and EuroCOLT '95.)
For the same loss functions as above, loss bounds with only slightly worse constants can ba achieved by a simpler algorithm:
Jyrki Kivinen and Manfred K. Warmuth. Averaging expert predictions. In Paul Fischer and Hans Ulrich Simon, editors, Proc. 4th European Conference on Computational Learning Theory, pages 153-167, Springer LNAI 1572, Berlin, March 1999.

Linear online prediction with sparse target

The Exponentiated Gradient algorithm (EG) for online linear regression is derived and analysed, showing that it outperforms the classic Gradient Descent (GD) when the target weight vector is sparse:
Jyrki Kivinen and Manfred K. Warmuth: Exponentiated gradient versus gradient descent for linear predictors. Information and Computation 132(1):1-63, January 1997. (A preliminary version appeared in STOC '95.)

The results for EG and GD from previous paper are extended to generalised (eg logistic) linear models:

David P. Helmbold, Jyrki Kivinen and Manfred K. Warmuth. Relative loss bounds for single neurons. IEEE Transactions on Neural Networks 10(6):1291-1304, November 1999. (A preliminary version appeared in NIPS*95.)

The EG and GD algorithms are derived and analysed as instances of a common framework based on Bregman divergences:

Jyrki Kivinen and Manfred K. Warmuth. Relative loss bounds for multidimensional regression problems. Machine Learning 45(3):301-329, December 2001. (A preliminary version appeared in NIPS*97.)

Unlike Littlestone's Winnow, the classic Perceptron cannot take advantage of sparseness of the target in high-dimensional problems:

J. Kivinen, M. K. Warmuth and P. Auer: The Perceptron Algorithm vs. Winnow: linear vs. logarithmic mistake bounds when few input variables are relevant. Artificial Intelligence 97(1-2):325-343, December 1997. (A preliminary version appeared in COLT '95.)

Online loss bounds for the p-norm Perceptron can be translated into the signal processing setting of H-infinity optimal filtering:

Jyrki Kivinen, Manfred K. Warmuth and Babak Hassibi: The p-norm generalization of the LMS algorithm for adaptive filtering. IEEE Transactions on Signal Processing 54(5):1782-1793, May 2006. (A preliminary version was presented at 13th IFAC Symposium on System Identification (SYSID 2003).)

Online learning with large margins

Online methods are applied together with kernels and large margins:
Jyrki Kivinen, Alexander J. Smola and Robert C. Williamson: Online learning with kernels. IEEE Transactions on Signal Processing 52(8):2165-2176, August 2004. (Preliminary results were presented at NIPS*2001 and ALT 2002).

We study the effect of the margin in on-line linear classification with the p-norm Perceptron, in particular with a moving target:

Jyrki Kivinen, Alexander J. Smola and Robert C. Williamson: Large margin classification for moving targets. In Nicolò Cesa-Bianchi, Masayuki Numao and Rüdiger Reischuk, editors, Proc. 13th International Conference on Algorithmic Learning Theory (ALT 2002), pages 113-127, Springer LNAI 2533, Berlin, November 2002.

A tutorial paper based on my presentation in Machine Learning Summer School 2002 in Canberra, including some introductory material and results from the above two papers:

Jyrki Kivinen: Online Learning of Linear Classifiers. In S. Mendelson and A. J. Smola, editors, Advanced Lectures on Machine Learning, pages 235-257, Springer LNCS 2600, Berlin, January 2003.

Online approximation to the Bayes Point Machine is a practical method to generate large margin hypotheses:

Edward Harrington, Ralf Herbrich, Jyrki Kivinen, John Platt and Robert C. Williamson: Online Bayes Point Machines. In K.-Y. Whang, J. Jeon, K. Shim and J. Srivatava, editors, 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2003), pages 241-252, Springer LNAI 2637, Berlin, April 2003.

Online Bayes Point Machine applied to signal processing:

Edward Harrington, Jyrki Kivinen and Robert C. Williamson: Channel equalization and the Bayes Point Machine. In Proc. 2003 IEEE Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), pages 493-496.

Other machine learning papers

AdaBoost can be motivated as a constrained relative entropy minimisation:

Jyrki Kivinen and Manfred K. Warmuth. Boosting as entropy projection. In Proc. 12th Annual Conference on Computational Learning Theory, pages 134-144, ACM, New York, July 1999.

By repeated application of the well-known greedy heuristic for Set Cover, we can PAC-learn limited types of decision lists:

Jyrki Kivinen, Heikki Mannila and Esko Ukkonen: Learning hierarchical rule sets. In Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pages 37-44. The Association for Computing Machinery, New York, July 1992.

Jyrki Kivinen, Heikki Mannila and Esko Ukkonen: Learning rules with local exceptions. In John Shawe-Taylor and Martin Anthony, editors, Computational Learning Theory: EuroCOLT '93, pages 35-46. Clarendon Press, Oxford, March 1994.

The k-means++ clustering algorithm can be combined with Bregman clustering while maintaining an approximation guarantee:

Richard Nock, Panu Luosto and Jyrki Kivinen: Mixed Bregman clustering with approximation guarantees. In Walter Daelemans, Bart Goethals and Katharina Morik, editors, European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2008), part II, pages 154–169, Springer LNAI 5212, Berlin, September 2008.

I also have some older papers about decision tree learning (with Tapio Elomaa) and reliable and useful learning (a variant of PAC learning). Let me know if you want copies.

Sampling in data bases

We study various notions of when a functional dependency holds approximately in a relation, and how this can be tested:
Jyrki Kivinen and Heikki Mannila: Approximate dependency inference from relations. Theoretical Computer Science 149(1):129-149, September 1995. (A preliminary version appeared in ICDT '92.)

The ideas from the previous paper are extended to more general logical formulas that might be approximately true in a relation:

Jyrki Kivinen and Heikki Mannila: The power of sampling in knowledge discovery. In Proceedings of the 13th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 77-85. The Association for Computing Machinery, New York, May 1994.


Jyrki Kivinen 25 September 2008