To our LIDS blog readers and contributors,

The time has come to bid the LIDS blog a fond farewell as we move towards using other means (some less technical than others) of sharing what's happening in LIDS and finding out what's on the minds of the LIDS community.

For all the most up to date information about the lab, please visit our web site: www.lids.mit.edu.

Thank you all for your time, efforts, and interest!

I have finished up my studies at MIT and commenced on a new journey. Having submitted my thesis entitled "Frugal Hypothesis Testing and Classification," I have begun life as a post doctoral researcher at the IBM Thomas J. Watson Research Center in Yorktown Heights, NY, specifically in the Business Analytics & Mathematical Sciences Department. This book excerpt and this article give kind of a high-level view on the type of stuff I will be looking at.

Join me as I say so long, for now to LIDS. For now because life may lead me back. If you like my voice and haven't gotten enough, check out the Information Ashvins blog.

First of all, props to John Downer for the title of this post.

Energy is starting to become a key area of focus at MIT. Within LIDS, there were several energy-related seminars this academic year. Two weeks ago, the Secretary of Energy, Steven Chu gave a talk and said that a major increase in basic research is necessary in the United States in order to provide the new energy technologies needed to avert catastrophic climate change.

Unfortunately, I was not able to attend because it coincided with a talk by Al Hero on a topic very close to my research: linear dimensionality reduction. Prof. Hero discussed dimensionality reduction when models of statistical structure underlying the data are applicable, for example a graphical model that generates data. He also focused on distributed implementations. In some of my work, I have looked at linear dimensionality reduction for margin-based classification, including with distributed implementation. I will be presenting that work this summer at the Fusion conference.

The *fusion* in this case is information fusion, not the type of fusion that "has the potential to meet future worldwide energy needs in a safe, sustainable manner without carbon dioxide emissions."

I am doing an internship this summer at Lawrence Livermore National Laboratory, in the Systems and Decision Sciences section. The National Ignition Facility, a device to create a fusion reaction, is being dedicated on Friday at the lab. Some people think its a folly, but for me its a jolly. I'll get to hear Secretary Chu after all.

I recently returned from SPARS09, a small workshop dedicated to sparse approximation and compressed sensing. It was held in St Malo, France. I presented a paper on model selection and nonnegative matrix factorization.

My research this year has primarily been focused on rates of convergence for learning tree structured graphical models. This work, co-authored with A. Anandkumar, L. Tong and my advisor Alan Willsky, has been accepted to ISIT 2009 and the longer version has been posted on arXiv.

I'll be interning at Microsoft Research again this summer. It should be interesting because I can do machine learning on real data.

I took a class on measure theory and functional analysis this semester. I think both subjects are absolutely fascinating. The next thing I would like to learn is Geometry on Manifolds.

Vincent

The ocean is deep.

Last week, there was some *halchal* regarding the purported discovery of the lost city of Atlantis by users of Google Earth, which has recently added bathymetry data. Here is Google's write-up about it. (By the way, that write-up also describes the Loihi Seamount, which is a Hawaiian 'almost' island.)

So what was this so-called discovery? It was an artifact of the estimated depth. High quality ocean depth data can be collected by boats using echosounding, but this data acquisition is only local to where the boat is, and it is expensive to cover large areas. Lower quality data can be collected using remote sensing satellites. The artifact was that in the places there were boat measurements, a lower depth was estimated than in places that there weren't boat measurements.

An estimation procedure ought to assimilate the various data available and perhaps also incorporate a prior model so as not to leave artifacts that people might think are Atlantis. As described in the related paper, it is not an easy problem. However, perhaps benefits could be gained by utilizing methods such as those described in this and this.

Speaking of measuring the ocean and boats, after a talk by Marco Duarte a couple of weeks ago, my officemate Matt Johnson was saying how it would be a great demonstration of the power of compressed sensing (CS) if you made a version of the game Battleship in which one player could take CS measurements and exploit structured sparsity, while the other player played normally. The CS player would win, either every time or with overwhelming probability --- I'm not sure exactly which.

An interesting statement in the Atlantis write-up is: "If there really are little green men hiding somewhere, the ocean's not a bad place to do it. Mars, Venus, the moon, and even some asteroids are mapped at far higher resolution than our own oceans (the global map of Mars is about 250 times as accurate as the global map of our own ocean)." It is easier to figure out relief above water than underwater, but the opposite is true for figuring out what is in the crust. Doing large seismological surveys under the ocean with boats is easier than doing it on land with heavy trucks, as I've come to learn from Richard Sears, who sits two doors down from my office.

The ocean is deep and the crust is thick.

The organizers would like to thank everyone for making the LIDS Student Conference a resounding success.

****The Fourteenth Annual LIDS Student Conference****

8:30am-5:00pm, Thursday and Friday, January 29-30, 2009

Stata Center, 32-155, MIT

Sponsored by Draper Laboratory

The 14th Annual LIDS Student Conference is be to held on January 29 and 30, 2009, on Student Street in the Stata Center. The conference promises to be a stimulating two days of student presentations, an intriguing panel discussion, and lectures by our eminent guests:

-- Andrew Barron, Yale

-- Christos Cassandras, BU

-- Vincent Poor, Princeton

-- Steve Shreve, CMU

Student presentations will be in the areas of signals and estimation, optimization and control systems, information and coding theory, communications and networks, and graphical models. Abstracts of both student presentations and invited talks are available at the conference website at http://lidsconf.mit.edu/

We hope to see you at the student conference.

Sincerely,

LIDS Student Conference Committee

One of the many books I read over the summer. This is definitely worth a read, an introduction to the prime number theorem and the Riemann hypothesis. Accessible to anyone who is interested in math and prime numbers.

This is a harder book on the Riemann hypothesis. It's way more technical. I found the du Sautoy book more entertaining.

This is a book on the origins of algebra by the same author as "Prime Obsession".

One of 7 Millennium problems, the Poincare conjecture, was solved by Perelman. This book describes the problem and its solution to laymen.

A powerful book by the President-Elect on his values, his view on politics, the world and his family. The chapter on his family was, by far, my favorite one.

Last Friday before a sumptuous Thanksgiving-themed LIDS lunch, a seminar was presented by Xu Huan of McGill University. Mr. Xu's talk was entitled "Miracle of Regularization," and was joint work with his advisor Shie Mannor, formerly a LIDS post-doc, and Constantine Caramanis, formerly a LIDS student and author of some very nice expository articles. Part of the work was recently featured on An Ergodic Walk, the blog of Anand Sarwate.

The supervised learning problem with regularization is studied from the viewpoint of robust optimization. Xu first motivated why regularization is needed in supervised learning from finite training data by appealing to properties of ill-posed problems stated by Hadamard, in particular that a unique solution does not exist and that the solution does not depend continuously on the data. Other motivations for regularization in supervised learning come from the structural risk minimization principle.

I first learned about robust optimization in a lecture by Melvyn Sim tele-delivered from Singapore for the class 6.255. When data is uncertain but known to belong to an uncertainty set, the basic idea of robust optimization is to optimize the objective function with respect to the worst-case point in the uncertainty set, i.e. doing a min-max or a max-min optimization.

In typical supervised classification formulations, a decision function is to be found that minimizes an empirical risk of training data, often with a margin-based loss function, plus a regularization term, often a norm in the space of decision functions, weighted by a non-negative scalar *c*. Xu, Caramanis, and Mannor show how this regularization formulation arises when a robust optimization problem with uncertainty around training examples is solved. They also discuss what the uncertainty set corresponding to standard regularization terms is, and what *c* means in terms of the uncertainty set.

The training data is generally assumed to be independent samples from a joint distribution of features and class labels. (This joint distribution is unknown.) A classifier is said to be consistent if in the limit as the size of the training set goes to infinity, it converges in probability to the optimal classifier that minimizes probability of error if the joint distribution were known. Several recent papers, including those by Lin; Steinwart; and Bartlett, Jordan, and McAuliffe, discuss the properties of the margin-based loss function in the empirical risk and of the regularization term needed for a classifier to be consistent. Xu et al. show consistency of classifiers using the robust optimization perspective they develop.

Munther Dahleh made several remarks/questions at the end of the talk, including asking about connections to robust estimation studied twenty years ago. Interestingly, when Mike Jordan was visiting LIDS a few weeks ago, he also discussed classifier consistency in his SSG talk, which was based on joint work with XuanLong Nguyen and former LIDS student Martin Wainwright. That work links valid margin-based loss functions and Ali-Silvey distances.

Barack Obama to Joe "The Plumber" Wurzelbacher: *I think when you spread the wealth around, it’s good for everybody.*

I was out of the country last week and as always, was glad to hear the words "welcome home" from the immigration officer upon my return. I came back to America, but apparently not to "real" America. I participated in the IEEE International Workshop on Machine Learning for Signal Processing (MLSP), which was held in Mexico this year. At the conference banquet, I sat between J. J. Remus of Duke University and Sergios Theodoridis of the University of Athens. Remus, who is originally from Wasilla, Alaska (for certain a part of "real" America), had a poster on distance-weighted nearest neighbor classification. Theodoridis gave a plenary talk and a regular oral presentation on using a sequence of projections onto convex sets in reproducing kernel Hilbert spaces to find feasible, but not necessarily optimized, solutions to problems such as finding classifiers.

A paper on sparsity measures, which relates to the quotation at the top of this post, was presented by Niall Hurley and Scott Rickard. Rickard had proposed six desiderata for sparsity measures in 2004, of which four were originally proposed by economists in the early part of the last century in the context of wealth inequity. In the MLSP work, it is shown that among fifteen sparsity measures, including the popular ℓ_{0} and ℓ_{1}, the Gini index alone satisfies the six desiderata. Considering a vector of coefficient absolute values, where a large valued coefficient is analogous to a rich person, the six desiderata are:

- Robin Hood. Stealing from the rich and giving to the poor decreases sparsity.
- Scaling. Multiplying the wealth distribution by a constant does not change sparsity.
- Rising Tide. Adding a constant to each coefficient decreases sparsity.
- Cloning. If there is a twin population with identical wealth distribution, the sparsity in one population is the same as in the combination of the two.
- Bill Gates. As one individual becomes infinitely wealthy, the wealth distribution becomes as sparse as possible.
- Babies. Adding individuals with zero wealth increases sparsity.

According to the analogy, a sparse signal representation is clearly not an Obama signal representation. I think I'll put in a request to change the title of a journal paper of mine to *Palin Representation in Structured Dictionaries*.