Google-UW Machine Learning Seminar Series
The Machine Learning Series provides a forum for presentation and discussion
	of interesting and current machine learning issues. 
The talks that are scheduled for 2010 will be listed below.
Unless otherwise noted, all talks will be in room DC 1302. Coffee to be confirmed
Presentation slides will be posted whenever  possible. Please
	click on the presentation title to access these notes (usually in pdf format). 
 
Machine Learning Seminar Series is supported by
 
 
2010 Seminars - Distinguished Speakers 
| Wednesday, March 17, 2010, 
				4:00, DC 1302 | 
| Title: | On Noise-Tolerant Learning using Linear Classifiers | 
| Speaker: | Phil Long | 
| Abstract: | This talk is about learning using linear hypotheses in the
			presence of noise, including the following topics: * New algorithms
				that tolerate a lot of "malicious noise" given
	constraints on a probability distribution generating the examples. * The ability
					of linear classifiers to approximate the optimal error rate
				for some tree-structured two-layer sources with the class designation at
				the root, the observed variables at the leaves, and some hidden variables
				in between.  * Limitations on the noise-tolerance of some boosting algorithms
	based on convex optimization. (This is joint work with Nader Bshouty, Adam Klivans and Rocco Servedio.) | 
| Bio: | Dr. Phil Long is a world known researcher in the area of theoretical machine
			learning.  His leadership roles include co-chairing the program committee of
			COLT'99,serving as an editor for the Machine Learning Journal and currently
			serving as area chair from both ICML2010
			and NIPS2010. Dr. Long did his Ph.D. at UC Santa Cruz, and postdocs at 
			Technische Universitaet Graz and Duke. Then he joined the faculty
			atthe National University of Singapore, followed by a stint at the 
			Genome Institute of Singapore. Next, he went to the Center for
			Computational Learning Systems at Columbia. Since 2005 he has been
			a member of Google's research unit.
 | 
| Wednesday, May 5, 2010, 4:00
				p.m., MC 5158 webcast | 
| Title: | Frequents vs. Bayesians, the PAC-Bayesian synthesis, and support
		vector machines. | 
| Speaker: | David McAllester | 
| Abstract: | We will start with a description of the frequentist (objective probability)
				and Bayesian (subjective probability) positions. We will then describe the
				PAC-Bayesian theorem which allows for a kind of formal synthesis of the two
				positions. The talk will then focus on support vector machines as a case
				study in PAC-Bayesian analysis. We will discuss the "SVM scandal" ---
				no meaningful formal justification for the hinge loss of soft SVMs has ever
				been given. We will also apply PAC-Bayesian analysis to recent trends in
				structural SVMs. Structural SVMs are a way of training the parameters of
				graphical models and are becoming increasingly popular in areas such as computer
		vision and natural language processing. | 
| Bio: | Professor McAllester received his B.S., M.S., and Ph.D. degrees from the
			Massachusetts Institute of Technology in 1978, 1979, and 1987 respectively.
			He served on the faculty of Cornell University for the academic year of 1987-1988
			and served on the faculty of MIT from 1988 to 1995. He was a member of technical
			staff at AT&T Labs-Research from 1995 to 2002. He has been a fellow of
			the American Association of Artificial Intelligence (AAAI) since 1997. Since
			2002 he has been Chief Academic Officer at the Toyota Technological Institute
			at Chicago. He has authored over 90 refereed publications. Professor McAllester's
			research areas include machine learning, the theory of programming languages,
			automated reasoning, AI planning, computer game playing (computer chess),
			computational linguistics and computer vision. A 1991 paper on AI planning
			proved to be one of the most influential papers of the decade in that area.
			A 1993 paper on computer game algorithms influenced the design of the algorithms
			used in the Deep Blue system that defeated Gary Kasparov. A 1998 paper on
			machine learning theory introduced PAC-Bayesian theorems which combine Bayesian
			and nonBayesian methods. He is currently part of a team that has scored in
			the top two places in the PASCAL object detection challenge (computer vision)
		in 2007, 2008 and 2009. | 
| Wednesday, July 14, 2010 2:15
				p.m., DC 1304 webcast | 
| Title: | Machine Learning in the Data Revolution Era | 
| Speaker: | Shai Shalev-Shwartz | 
| Abstract: | Machine learning is playing a central role in the digital revolution,
			in which massive and never-ending data is collected from various
			sources such as online commerce, social networking, and online
			collaboration. This large amount of data is often noisy or partial.
			In this talk I will present learning algorithms appropriate for this
			new era: algorithms that not only can handle massive amounts of data
			but can also leverage large data sets to reduce the required runtime;
			and algorithms that can use the multitude of examples to compensate
			for lack of full information on each individual example. | 
| Bio: | Shai Shalev-Shwartz is on the faculty of the Department of Computer
			Science and Engineering at the Hebrew university of Jerusalem, Israel.
			Dr. Shalev-Shwartz received the PhD degree in computer science from
			the Hebrew university, in 2007. Between 2007-2009 he was a research
			assistant professor at Toyota Technological Institute at Chicago.
			Shai has written more than 40 research papers, focusing on learning theory,
			online prediction, optimization techniques, and practical algorithms.
			He served as a program committee member for the COLT conference in
			2008-2010, a program committee member for ALT in 2009, and he is
			part of the editorial boards of the Journal of Machine Learning Research
		(JMLR) and the Machine Learning Journal (MLJ). | 
| Tuesday, September 14, 2010, 2:00 p.m., MC 5158 | 
| Title: | Hierarchical Bayesian Models of Language and Text | 
| Speaker: | Yee Whye Teh | 
| Abstract: | In this talk I will present a new approach to modelling sequence data
			called the sequence memoizer. As opposed to most other sequence models,
			our model does not make any Markovian assumptions. Instead, we use
			a
			hierarchical Bayesian approach which enforces sharing of statistical
			strength across the different parts of the model. To make computations
			with the model efficient, and to better model the power-law statistics
			often observed in sequence data, we use a Bayesian nonparametric
			prior
			called the Pitman-Yor process as building blocks in the hierarchical
			model. We show state-of-the-art results on language modelling and
			text
			compression. This is joint work with Frank Wood, Jan Gasthaus, Cedric Archambeau and
				Lancelot James. | 
| Bio: | Yee Whye Teh is a Lecturer (equivalent to an assistant professor in US
			system) at the Gatsby Computational Neuroscience Unit, UCL. He is
			interested in machine learning and Bayesian statistics. His current
			focus is on developing Bayesian nonparametric methodologies for
			unsupervised learning, computational linguistics, and genetics. Prior
			to his appointment he was Lee Kuan Yew Postdoctoral Fellow at the
			National University of Singapore and a postdoctoral fellow at
			University of California at Berkeley. He obtained his Ph.D. in
			Computer Science at the University of Toronto in 2003. He is
		programme co-chair of AISTATS 2010. | 
| Monday, June 6, 2011, Time 2:00
				p.m., Google, 200-151 Charles Street W,  Kit., ON N2G
				1H6, 519 880 2300 | 
| Title: | Hypothesis Testing and Bayesian Inference: New Applications
		of Kernel Methods | 
| Speaker: | Arthur Gretton | 
| Abstract: | In the early days of kernel machines research, the "kernel trick" was
				considered a useful way of constructing nonlinear learning algorithms 
			from linear ones, by applying the linear algorithms to feature space 
			mappings of the original data. More recently, it has become clear
			that a potentially more far reaching use of kernels is as a linear way
			of 
			dealing with higher order statistics, by mapping probabilities to
			a
 suitable reproducing kernel Hilbert space (i.e., the feature space
			is 
		an RKHS).
 I will describe how probabilities can be mapped to kernel feature 
			spaces, and how to compute distances between these mappings. A 
			measure of strength of dependence between two random variables follows 
			naturally from this distance. Applications that make use of kernel 
			probability embeddings include: * Nonparametric two-sample testing and independence testing in complex 
			(high dimensional) domains. In the latter case, we test whether text
			in English is translated from the French, as opposed to being random 
			extracts on the same topic. * Inference on graphical models, in cases where the variable 
			interactions are modeled nonparametrically (i.e., when parametric 
			models are impractical or unknown). In experiments, this approach 
			outperforms state-of-the-art nonparametric techniques in 3-D depth 
			reconstruction from 2-D images, and on a protein structure prediction 
		task. | 
| Bio: | Arthur Gretton is a lecturer with the Gatsby Computational 
				Neuroscience Unit since August 2010, and is affiliated as a research
				scientist with the Max Planck Institute for Biological Cybernetics, where
				he has worked since September 2002. He received degrees in physics and systems
				engineering from the Australian National University in 1996 and 1998, respectively;
				and studied machine learning from 1999 to 2003 with Microsoft Research and
				the Signal Processing and Communications Laboratory at the University of
				Cambridge, where he completed his PhD. He worked from 2009-2010
				as a
				project scientist with the Machine Learning Department at Carnegie
				Mellon University. Arthur's research interests include machine learning, kernel
					methods, nonparametric inference in graphical models, statistical
				learning theory, nonparametric hypothesis testing, and blind source separation.
			 		He has been an associate editor at IEEE Transactions on Pattern
			 		Analysis and Machine Intelligence since March 2009, a member
					of the NIPS Program Committee in 2008 and 2009, and an Area
				Chair for ICML in
		2011. | 
| Date TBA, Time TBA | 
|---|
| Title: | TBA | 
| Speaker: |  | 
| Abstract: |   | 
| Bio: |  |