Google-UW Machine Learning Seminar Series

The Machine Learning Series provides a forum for presentation and discussion of interesting and current machine learning issues. The talks that are scheduled for 2010 will be listed below.

Unless otherwise noted, all talks will be in room DC 1302. Coffee to be confirmed

Presentation slides will be posted whenever possible. Please click on the presentation title to access these notes (usually in pdf format).

Machine Learning Seminar Series is supported by

2010 Seminars - Distinguished Speakers

Wednesday, March 17, 2010, 4:00, DC 1302
Title:	On Noise-Tolerant Learning using Linear Classifiers
Speaker:	Phil Long
Abstract:	This talk is about learning using linear hypotheses in the presence of noise, including the following topics: * New algorithms that tolerate a lot of "malicious noise" given constraints on a probability distribution generating the examples. * The ability of linear classifiers to approximate the optimal error rate for some tree-structured two-layer sources with the class designation at the root, the observed variables at the leaves, and some hidden variables in between. * Limitations on the noise-tolerance of some boosting algorithms based on convex optimization. (This is joint work with Nader Bshouty, Adam Klivans and Rocco Servedio.)
Bio:	Dr. Phil Long is a world known researcher in the area of theoretical machine learning. His leadership roles include co-chairing the program committee of COLT'99,serving as an editor for the Machine Learning Journal and currently serving as area chair from both ICML2010 and NIPS2010. Dr. Long did his Ph.D. at UC Santa Cruz, and postdocs at Technische Universitaet Graz and Duke. Then he joined the faculty atthe National University of Singapore, followed by a stint at the Genome Institute of Singapore. Next, he went to the Center for Computational Learning Systems at Columbia. Since 2005 he has been a member of Google's research unit.

Wednesday, May 5, 2010, 4:00 p.m., MC 5158 webcast
Title:	Frequents vs. Bayesians, the PAC-Bayesian synthesis, and support vector machines.
Speaker:	David McAllester
Abstract:	We will start with a description of the frequentist (objective probability) and Bayesian (subjective probability) positions. We will then describe the PAC-Bayesian theorem which allows for a kind of formal synthesis of the two positions. The talk will then focus on support vector machines as a case study in PAC-Bayesian analysis. We will discuss the "SVM scandal" --- no meaningful formal justification for the hinge loss of soft SVMs has ever been given. We will also apply PAC-Bayesian analysis to recent trends in structural SVMs. Structural SVMs are a way of training the parameters of graphical models and are becoming increasingly popular in areas such as computer vision and natural language processing.
Bio:	Professor McAllester received his B.S., M.S., and Ph.D. degrees from the Massachusetts Institute of Technology in 1978, 1979, and 1987 respectively. He served on the faculty of Cornell University for the academic year of 1987-1988 and served on the faculty of MIT from 1988 to 1995. He was a member of technical staff at AT&T Labs-Research from 1995 to 2002. He has been a fellow of the American Association of Artificial Intelligence (AAAI) since 1997. Since 2002 he has been Chief Academic Officer at the Toyota Technological Institute at Chicago. He has authored over 90 refereed publications. Professor McAllester's research areas include machine learning, the theory of programming languages, automated reasoning, AI planning, computer game playing (computer chess), computational linguistics and computer vision. A 1991 paper on AI planning proved to be one of the most influential papers of the decade in that area. A 1993 paper on computer game algorithms influenced the design of the algorithms used in the Deep Blue system that defeated Gary Kasparov. A 1998 paper on machine learning theory introduced PAC-Bayesian theorems which combine Bayesian and nonBayesian methods. He is currently part of a team that has scored in the top two places in the PASCAL object detection challenge (computer vision) in 2007, 2008 and 2009.

Wednesday, July 14, 2010 2:15 p.m., DC 1304 webcast
Title:	Machine Learning in the Data Revolution Era
Speaker:	Shai Shalev-Shwartz
Abstract:	Machine learning is playing a central role in the digital revolution, in which massive and never-ending data is collected from various sources such as online commerce, social networking, and online collaboration. This large amount of data is often noisy or partial. In this talk I will present learning algorithms appropriate for this new era: algorithms that not only can handle massive amounts of data but can also leverage large data sets to reduce the required runtime; and algorithms that can use the multitude of examples to compensate for lack of full information on each individual example.
Bio:	Shai Shalev-Shwartz is on the faculty of the Department of Computer Science and Engineering at the Hebrew university of Jerusalem, Israel. Dr. Shalev-Shwartz received the PhD degree in computer science from the Hebrew university, in 2007. Between 2007-2009 he was a research assistant professor at Toyota Technological Institute at Chicago. Shai has written more than 40 research papers, focusing on learning theory, online prediction, optimization techniques, and practical algorithms. He served as a program committee member for the COLT conference in 2008-2010, a program committee member for ALT in 2009, and he is part of the editorial boards of the Journal of Machine Learning Research (JMLR) and the Machine Learning Journal (MLJ).

Tuesday, September 14, 2010, 2:00 p.m., MC 5158
Title:	Hierarchical Bayesian Models of Language and Text
Speaker:	Yee Whye Teh
Abstract:	In this talk I will present a new approach to modelling sequence data called the sequence memoizer. As opposed to most other sequence models, our model does not make any Markovian assumptions. Instead, we use a hierarchical Bayesian approach which enforces sharing of statistical strength across the different parts of the model. To make computations with the model efficient, and to better model the power-law statistics often observed in sequence data, we use a Bayesian nonparametric prior called the Pitman-Yor process as building blocks in the hierarchical model. We show state-of-the-art results on language modelling and text compression. This is joint work with Frank Wood, Jan Gasthaus, Cedric Archambeau and Lancelot James.
Bio:	Yee Whye Teh is a Lecturer (equivalent to an assistant professor in US system) at the Gatsby Computational Neuroscience Unit, UCL. He is interested in machine learning and Bayesian statistics. His current focus is on developing Bayesian nonparametric methodologies for unsupervised learning, computational linguistics, and genetics. Prior to his appointment he was Lee Kuan Yew Postdoctoral Fellow at the National University of Singapore and a postdoctoral fellow at University of California at Berkeley. He obtained his Ph.D. in Computer Science at the University of Toronto in 2003. He is programme co-chair of AISTATS 2010.

Monday, June 6, 2011, Time 2:00 p.m., Google, 200-151 Charles Street W, Kit., ON N2G 1H6, 519 880 2300
Title:	Hypothesis Testing and Bayesian Inference: New Applications of Kernel Methods
Speaker:	Arthur Gretton
Abstract:	In the early days of kernel machines research, the "kernel trick" was considered a useful way of constructing nonlinear learning algorithms from linear ones, by applying the linear algorithms to feature space mappings of the original data. More recently, it has become clear that a potentially more far reaching use of kernels is as a linear way of dealing with higher order statistics, by mapping probabilities to a suitable reproducing kernel Hilbert space (i.e., the feature space is an RKHS). I will describe how probabilities can be mapped to kernel feature spaces, and how to compute distances between these mappings. A measure of strength of dependence between two random variables follows naturally from this distance. Applications that make use of kernel probability embeddings include: * Nonparametric two-sample testing and independence testing in complex (high dimensional) domains. In the latter case, we test whether text in English is translated from the French, as opposed to being random extracts on the same topic. * Inference on graphical models, in cases where the variable interactions are modeled nonparametrically (i.e., when parametric models are impractical or unknown). In experiments, this approach outperforms state-of-the-art nonparametric techniques in 3-D depth reconstruction from 2-D images, and on a protein structure prediction task.
Bio:	Arthur Gretton is a lecturer with the Gatsby Computational Neuroscience Unit since August 2010, and is affiliated as a research scientist with the Max Planck Institute for Biological Cybernetics, where he has worked since September 2002. He received degrees in physics and systems engineering from the Australian National University in 1996 and 1998, respectively; and studied machine learning from 1999 to 2003 with Microsoft Research and the Signal Processing and Communications Laboratory at the University of Cambridge, where he completed his PhD. He worked from 2009-2010 as a project scientist with the Machine Learning Department at Carnegie Mellon University. Arthur's research interests include machine learning, kernel methods, nonparametric inference in graphical models, statistical learning theory, nonparametric hypothesis testing, and blind source separation. He has been an associate editor at IEEE Transactions on Pattern Analysis and Machine Intelligence since March 2009, a member of the NIPS Program Committee in 2008 and 2009, and an Area Chair for ICML in 2011.

Date TBA, Time TBA
Title:	TBA
Speaker:
Abstract:
Bio: