Image parsing remains difficult due to the need to combine local and contextual information when labeling a scene. We approach this problem by using the epitome as a prior over label configurations. Several properties make it suited to this task. First, it allows a condensed patch-based representation. Second, efficient E-M based learning and inference algorithms can be used. Third, non-stationarity is easily incorporated. We consider three existing priors, and show how each can be extended using the epitome.
The simplest prior assumes patches of labels are drawn independently from either a mixture model or an epitome. Next we investigate a ‘conditional epitome’ model, which substitutes an epitome for a conditional mixture model. Finally, we develop an ‘epitome tree’ model, which combines the epitome with a tree structured belief network prior. Each model is combined with a per-pixel classifier to perform segmentation. In each case, the epitomized form of the prior provides superior segmentation performance, with the epitome tree performing best overall. We also apply the same models to denoising binary images, with similar results.