Publications in learning deep networks
- Marc'Aurelio Ranzato, Martin Szummer. Semi-supervised
Learning of Compact Document Representations with Deep Networks July
2008 Proc. Intl. Conf. on Machine Learning (ICML) 2008 792-799
Finding good representations of text documents is crucial in information
retrieval and classification systems. Today the most popular document
representation is based on a vector of word counts in the document. This
representation neither captures dependencies between related words, nor
handles synonyms or polysemous words.
In this paper, we propose an algorithm to learn text document representations
based on semi-supervised autoencoders that are stacked to form a deep
network. The model can be trained efficiently on partially labeled corpora,
producing very compact representations of documents, while retaining as
much class information and joint word statistics as possible. We show that
it is advantageous to exploit even a few labeled samples during training.
Publications in spoken dialog systems
- M. Szummer, M. Henderson, C. Breslin, M. Gašić, D. Kim, B. Thomson,
P. Tsiakoulis, S. Young. The BUDS POMDP Dialogue System Advances in Neural Information Processing Systems (NIPS) Demo 2012.
-
M. Gašić, C. Breslin, M. Henderson, D. Kim, M. Szummer, B. Thomson, P. Tsiakoulis,
S. Young. POMDP-based dialogue manager adaptation to extended domains
SIGDIAL 2013 Best Paper Award.
-
M. Gašić, D. Kim, P. Tsiakoulis, C. Breslin, M. Henderson,
M. Szummer, B. Thomson,
S. Young. Incremental
on-line adaptation of POMDP-based dialogue managers to extended
domains, 2014, Interspeech.
-
P. Tsiakoulis, C. Breslin, M. Gašić, M. Henderson D. Kim, M. Szummer, B. Thomson, S. Young.
Dialogue Context Sensitive HMM-Based Speech Synthesis
ICASSP 2014
-
M. Gašić, C. Breslin, M. Henderson, D. Kim, M. Szummer, B. Thomson, P. Tsiakoulis and
S. Young. On-line policy optimisation of Bayesian spoken dialogue systems via human interaction, ICASSP 2013
-
S. Young, C. Breslin, M. Gašić, M. Henderson, D. Kim, M. Szummer, B. Thomson, P. Tsiakoulis and E. Tzirkel Hancock.
Evaluation of Statistical POMDP-based Dialogue Systems in Noisy Environments.
International Workshop Spoken Dialogue Systems (IWSDS) 2014
-
C. Breslin, M. Gašić, M. Henderson, D. Kim, M. Szummer, B. Thomson,
P. Tsiakoulis, K. Yu and S. Young. Continuous ASR for Flexible Incremental Dialogue Intl. Conf. Acoustics Speech and Signal Processing (ICASSP).
Publications in information retrieval
- Martin Szummer, Emine Yilmaz
Semi-supervised Learning to Rank with Preference Regularization October 2011.
Conf. Information and Knowledge Management
(CIKM) Poster
We propose a semi-supervised learning to rank algorithm. It learns from both labeled data (pairwise preferences or absolute labels) and unlabeled data. The data can consist of multiple groups of items (such as queries), some of which may contain only unlabeled items. We introduce a preference regularizer favoring that similar items are similar in preference to each other. The regularizer captures manifold structure in the data, and we also propose a rank-sensitive version designed for top-heavy retrieval metrics including NDCG and mean average precision.
The regularizer is employed in SSLambdaRank, a semi-supervised version of LambdaRank. This algorithm directly optimizes popular retrieval metrics and improves retrieval accuracy over LambdaRank, a state-of-the-art ranker that was used as part of the winner of the Yahoo! Learning to Rank challenge 2010. The algorithm runs in linear time in the number of queries, and can work with huge datasets.
- Chang Wang, Emine Yilmaz, Martin Szummer
Relevance Feedback Exploiting Query-Specific Document Manifolds October 2011.
Conf. Information and Knowledge Management (CIKM)
We incorporate relevance feedback into a learning to rank framework by exploiting query-specific document similarities. Given a few judged feedback documents and many retrieved but unjudged documents for a query, we learn a function that adjusts the initial ranking score of each document. Scores are fit so that documents with similar term content get similar scores, and scores of judged documents are close to their labels. By such smoothing along the manifold of retrieved documents, we avoid overfitting, and can therefore learn a detailed query-specific scoring function with several dozen term weights.
- Daniel Sheldon, Milad Shokouhi, Martin Szummer, Nick Craswell
LambdaMerge:
Merging the Results of Query Reformulations February 2011.
Web Search and Data Mining
(WSDM) Poster
Search engines can automatically reformulate user queries in a variety of ways, often leading to multiple queries that are candidates to replace the original. However, selecting a replacement can be risky: a reformulation may be more effective than the original or significantly worse, depending on the nature of the query, the source of reformulation candidates, and the corpus. In this paper, we explore methods to mitigate this risk by issuing several versions of the query (including the original) and merging their results. We focus on reformulations generated by random walks on the click graph, a method that can produce very good reformulations but is also variable and prone to topic drift. Our primary contribution is LambdaMerge, a supervised merging method that is trained to directly optimize a retrieval metric (such as NDCG or MAP) using features that describe both the reformulations and the documents they return. In experiments on Bing data and GOV2, LambdaMerge outperforms the original query and several unsupervised merging methods. LambdaMerge also outperforms a supervised method to predict and select the best single formulation, and is competitive with an oracle that always selects the best formulation.
- Lorenzo Torresani, Martin Szummer, Andrew Fitzgibbon. Efficient
Object Category Recognition using Classemes September
2010. European Conference on Computer Vision (ECCV)
We introduce a new descriptor for images which allows the construction of efficient and compact classifiers with good accuracy on object category recognition. The descriptor is the output of a large
number of weakly trained object category classifiers on the image. The
trained categories are selected from an ontology of visual concepts,
but the intention is not to encode an explicit decomposition of the scene.
Rather, we accept that existing object category classifiers often encode
not the category per se but ancillary image characteristics; and that these ancillary characteristics can combine to represent visual classes unrelated to the constituent categories' semantic meanings.
The advantage of this descriptor is that it allows object-category queries
to be made against image databases using efficient classifiers (efficient
at test time) such as linear support vector machines, and allows these
queries to be for novel categories. Even when the representation is
reduced to 200 bytes per image, classification accuracy on object category
recognition is comparable with the state of the art (36% versus 42%), but
at orders of magnitude lower computational cost.
- Filip Radlinski, Martin Szummer, Nick Craswell. Metrics for Assessing Sets of Subtopics July 2010. SIGIR Conf Research and Development in Information Retrieval.
To evaluate the diversity of search results, test collections have been
developed that identify multiple intents for each query. Intents are
the different meanings or facets that should be covered in a search results
list. This means that topic development involves proposing a set of
intents. We propose four measurable properties of query-to-intent mappings,
allowing for more principled topic development for such test collections.
- Filip Radlinski, Martin Szummer, Nick Craswell. Inferring
Query Intent from Reformulations and Clicks April 2010. World
Wide Web Conference (WWW).
Many researchers have noted that web search queries are often ambiguous or
unclear. We present an approach for identifying the popular meanings of
queries using web search logs and user click behavior. We show our approach
to produce more complete and user-centric intents than expert judges by
evaluating on TREC queries. This approach was also used by the TREC 2009 Web
Track judges to obtain more representative topic descriptions from real
queries.
- Lorenzo Torresani, Martin Szummer, Andrew Fitzgibbon. Learning
Query-dependent Prefilters for Scalable Image Retrieval (
supplement ) June 2009 Proc Comp. Vision Pattern Recogn. (CVPR)
We describe an algorithm for similar-image search which is designed to be
efficient for extremely large collections of images. For each query, a
small response set is selected by a fast prefilter, after which
a more accurate ranker may be applied to each image in the response set.
We consider a class of prefilters comprising disjunctions of conjunctions
("ORs of ANDs") of Boolean features. AND filters can be implemented
efficiently using skipped inverted files, a key component of
web-scale text search engines. These structures permit search in time
proportional to the response set size. The prefilters are learned from
training examples, and refined at query time to produce an approximately
bounded response set.
We cast prefiltering as an optimization problem: for each test query,
select the OR-of-AND filter which maximizes training-set recall for an
adjustable bound on response set size. This may be efficiently implemented
by selecting from a large pool of candidate conjunctions of Boolean
features using a linear program relaxation. Tests on object class
recognition show that this relatively simple filter
is nevertheless powerful enough to capture some
semantic information.
- Martin Szummer, Nick Craswell. Behavioral
Classification on the Click Graph April 2008 World Wide Web
Conference 1241-1242
A bipartite query-URL graph, where an edge
indicates that a document was clicked for a query, is a useful construct for
finding groups of related queries and URLs. Here we use this behavior graph
for classification. We choose a click graph sampled from two weeks of image
search activity, and the task of ``adult'' filtering: identifying content in
the graph that is inappropriate for minors. We show how to perform
classification using random walks on this graph, and two methods for
estimating classifier parameters.
- Onno Zoeter, Michael Taylor, Ed Snelson, John Guiver, Nick Craswell,
Martin Szummer. A
Decision Theoretic Framework for Ranking using Implicit Feedback July
2008 SIGIR 2008 Workshop on Learning to Rank for Information Retrieval
This paper presents a decision theoretic ranking system that incorporates
both explicit and implicit feedback. The system has a model that predicts,
given all available data at query time, different interactions a person might
have with search results. Possible interactions include relevance labelling
and clicking. We define a utility function that takes as input the outputs of
the interaction model to provide a real valued score to the user's session.
The optimal ranking is the list of documents that, in expectation under the
model, maximizes the utility for a user session. The system presented is based
on a simple example utility function that combines both click behavior and
labelling. The click prediction model is a Bayesian generalized linear model.
Its notable characteristic is that it incorporates both weights for
explanatory features and weights for each query-document pair. This allows the
model to generalize to unseen queries but makes it at the same time flexible
enough to keep in a `memory' where the model should deviate from its feature
based prediction. Such a click-predicting model could be particularly useful
in an application such as enterprise search, allowing on-site adaptation to
local documents and user behaviour. The example utility function has a
parameter that controls the tradeoff between optimizing for clicks and
optimizing for labels. Experimental results in the context of enterprise
search show that a balance in the tradeoff leads to the best NDCG and good
(predicted) clickthrough.
- Nick Craswell, Martin Szummer. Random
Walks on the Click Graph July 2007 SIGIR Conf Research and
Development in Information Retrieval 239-246
Search engines can
record which documents were clicked for which query, and use these
query-document pairs as 'soft' relevance judgments. However, compared to the
true judgments, click logs give noisy and sparse relevance information. We
apply a Markov random walk model to a large click log, producing a
probabilistic ranking of documents for a given query. A key advantage of the
model is its ability to retrieve relevant documents that have not yet been
clicked for that query and rank those effectively. We conduct experiments on
click logs from image search, comparing our ('backward') random walk model to
a different ('forward') random walk, varying parameters such as walk length
and self-transition probability. The most effective combination is a long
backward walk with high self-transition probability.
Publications on Handwriting Adaptation
(Personalization)
- Martin Szummer, Christopher M. Bishop. Discriminative
Writer Adaptation October 2006 10th Intl. Workshop on Frontiers in
Handwriting Recognition (IWFHR) 293-298
We propose a general method
for adapting a writer-independent classifier to an individual writer. We
employ a mixture of experts formulation, where the classifiers are trained on
weighted clusters of writers. The clusters are determined by which experts
classify individual writing correctly. The method adapts by choosing the
appropriate combination of classifiers for a new user. It applies to any
probabilistic discriminative classifier, and adapts discriminatively without
modeling the input feature distribution. We apply the method to online
character recognition. Specifically, we use a mixture of neural networks as
well as a mixture of logistic regressions. We train the mixture via conjugate
gradient ascent or via the EM algorithm on 192,000 Latin characters of 98
classes and 216 writers, and show adaptation results for 21
writers.
Publications on Conditional Random Fields, applied to
hand-drawing analysis
- Martin Szummer, Pushmeet Kohli, Derek Hoiem.
Learning Random Fields using Graph Cuts. October 2010. Book chapter in book on MRFs, MIT press, edited by Andrew Blake, Carsten Rother, Pushmeet Kohli.
- Martin Szummer, Pushmeet Kohli, Derek Hoiem. Learning
CRFs using Graph Cuts October 2008 European Conference on Computer
Vision
Many computer vision problems are naturally formulated as
random fields, specifically MRFs or CRFs. The introduction of graph cuts has
enabled efficient and optimal inference in associative random fields, greatly
advancing applications such as segmentation, stereo reconstruction and many
others. However, while fast inference is now widespread, parameter learning in
random fields has remained an intractable problem. This paper shows how to
apply fast inference algorithms, in particular graph cuts, to learn parameters
of random fields with similar efficiency. We find optimal parameter values
under standard regularized objective functions that ensure good
generalization. Our algorithm enables learning of many parameters in
reasonable time, and we explore further speedup techniques. We also discuss
extensions to non-associative and multi-class problems. We evaluate the method
on image segmentation and geometry recognition.
- Carsten Rother, Vladimir Kolmogorov, Victor Lempitsky, Martin Szummer. Optimizing
Binary MRFs via Extended Roof Duality June 2007 Proc Comp. Vision
Pattern Recogn. (CVPR)
Many computer vision applications rely on the
efficient optimization of challenging, so-called non-submodular, binary
pairwise MRFs. A promising graph cut based approach for optimizing such MRFs
known as "roof duality" was recently introduced into computer vision. We study
two methods which extend this approach. First, we discuss an efficient
implementation of the "probing" technique introduced recently by Boros et al.
2006. It simplifies the MRF while preserving the global optimum. Our code is
400-700 faster on some graphs than the implementation of [Boros 2006]. Second,
we present a new technique which takes an arbitrary input labeling and tries
to improve its energy. We give theoretical characterizations of local minima
of this procedure. We applied both techniques to many applications, including
image segmentation, new view synthesis, super-resolution, diagram recognition,
parameter learning, texture restoration, and image deconvolution. For several
applications we see that we are able to find the global minimum very
efficiently, and considerably outperform the original roof duality approach.
In comparison to existing techniques, such as graph cut, TRW, BP, ICM, and
simulated annealing, we nearly always find a lower energy.
- Philip J. Cowans, Martin Szummer. A
Graphical Model for Simultaneous Partitioning and Labeling January
2005 AI & Statistics
In this work we develop a graphical model
for describing probability distributions over labeled partitions of an
undirected graph which are conditioned on observed data. We show how to
efficiently perform exact inference in these models, by exploiting the
structure of the graph and adapting the sum-product and max-product
algorithms. We demonstrate our approach on the task of segmenting and labeling
hand-drawn ink fragments, and show that a significant performance increase is
obtained by labeling and partitioning simultaneously.
Best Student Paper Award
- Yuan Qi, Martin Szummer, Thomas P. Minka. Bayesian
Conditional Random Fields January 2005 AI & Statistics
269-276
We propose Bayesian Conditional Random Fields (BCRFs) for
classifying interdependent and structured data, such as sequences, images or
webs. BCRFs are a Bayesian approach to training and inference with conditional
random fields, which were previously trained by maximizing likelihood (ML)
(Lafferty et al., 2001). Our framework avoids the problem of overfitting, and
offers the full advantages of a Bayesian treatment. Unlike the ML approach, we
estimate the posterior distribution of the model parameters during training,
and average predictions over this posterior during inference. We apply two
extensions of expectation propagation (EP), the power EP and the novel
transformed EP methods, to incorporate the partition function. For algorithmic
stability and accuracy, we flatten the approximation structures to avoid
two-level approximations. We demonstrate the superior prediction accuracy of
BCRFs over conditional random fields trained with ML or MAP on synthetic and
real datasets
- Yuan Qi, Martin Szummer, Thomas P. Minka. Diagram
Structure Recognition by Bayesian Conditional Random Fields June 2005
Proc Comp. Vision Pattern Recogn. (CVPR) C. Schmid and S. Soatto and C.
Tomasi 191-196
Hand-drawn diagrams present a complex recognition problem.
Elements of the diagram are often individually ambiguous, and require context
to be interpreted. We present a recognition method based on Bayesian
conditional random fields (BCRFs) that jointly analyzes all drawing elements
in order to incorporate contextual cues. The classification of each object
affects the classification of its neighbors. BCRFs allow flexible and
correlated features, and take both spatial and temporal information into
account. BCRFs estimate the posterior distribution of parameters during
training, and average predictions over the posterior for testing. As a result
of model averaging, BCRFs avoid the overfitting problems associated with
maximum likelihood training. We also incorporate Automatic Relevance
Determination (ARD), a Bayesian feature selection technique, into BCRFs. The
result is significantly lower error rates compared to ML- and MAP-trained
CRFs.
- Martin Szummer. Learning
Diagram Parts with Hidden Random Fields August 2005 Intl Conf
Document Analysis and Recognition (ICDAR) 1188-1193
Many diagrams
contain compound objects composed of parts. We propose a recognition framework
that learns parts in an unsupervised way, and requires training labels only
for compound objects. Thus, human labeling effort is reduced and parts are not
predetermined, instead appropriate parts are discovered based on the data. We
model contextual relations between parts, such that the label of a part can
depend simultaneously on the labels of its neighbors, as well as spatial and
temporal information. The model is a Hidden Random Field (HRF), an extension
of a Conditional Random Field. We apply it to find parts of boxes, arrows and
flowchart shapes in hand-drawn diagrams, and also demonstrate improved
recognition accuracy over the conditional random field model without
parts.
- Martin Szummer, Yuan Qi. Contextual
Recognition of Hand-drawn Diagrams with Conditional Random Fields
October 2004 9th Intl. Workshop on Frontiers in Handwriting Recognition
(IWFHR) F. Kimura and H. Fujisawa 32-37
Hand-drawn diagrams present a
complex recognition problem. Fragments of the drawing are often individually
ambiguous, and require context to be interpreted. We present a recognizer
based on conditional random fields (CRFs) that jointly analyze all drawing
fragments in order to incorporate contextual cues. The classification of each
fragment influences the classification of its neighbors. CRFs allow flexible
and correlated features, and take temporal information into account. Training
is done via conditional MAP estimation that is guaranteed to reach the global
optimum. During recognition we propagate information globally to find the
joint MAP or maximum marginal solution for each fragment. We demonstrate the
framework on a container versus connector recognition task.
- Martin Szummer, Philip J. Cowans. Incorporating
Context and User Feedback in Pen-Based Interfaces October 2004 AAAI
Fall symposium, Making Pen-Based Interaction Intelligent and Natural R.
Davis and J. Landay et al. 159-166 FS-04-06
We propose a joint
probabilistic model for grouping and labeling hand-drawn ink strokes. We
demonstrate that simultaneous grouping and labeling yields superior accuracy
to labeling alone. Our probabilistic formulation has many advantages, exact
inference is feasible, and we obtain confidence estimates. We show how to
incorporate user feedback by conditioning our model, and discuss different
types of inference tasks suited for various user interactions.
- Balaji Krishnapuram, Christopher M. Bishop, Martin Szummer. Generative
Models and Bayesian Model Comparison for Shape Recognition October
2004 9th Intl. Workshop on Frontiers in Handwriting Recognition (IWFHR)
F. Kimura and H. Fujisawa 20-25
Recognition of hand-drawn shapes is an
important and widely studied problem. By adopting a generative probabilistic
framework we are able to formulate a robust and flexible approach to shape
recognition which allows for a wide range of shapes and which can recognize
new shapes from a single exemplar. It also provides meaningful probabilistic
measures of model score which can be used as part of a larger probabilistic
framework for interpreting a page of ink. We also show how Bayesian model
comparison allows the trade-off between data fit and model complexity to be
optimized automatically.
Learning from partially labeled data
(semi-supervised learning)
- Martin Szummer, Tommi Jaakkola. Information
Regularization with Partially Labeled Data January 2003 Advances in
Neural Information Processing Systems (NIPS) 1025-1032 15
Classification with partially labeled data requires using a large number
of unlabeled examples (or an estimated marginal P(x)), to further constrain
the conditional P(y|x) beyond a few available labeled examples. We formulate a
regularization approach to linking the marginal and the conditional in a
general way. The regularization penalty measures the information that is
implied about the labels over covering regions. No parametric assumptions are
required and the approach remains tractable even for continuous marginal
densities P(x). We develop algorithms for solving the regularization problem
for finite covers, establish a limiting differential equation, and exemplify
the behavior of the new regularization approach in simple cases.
- Chen-Hsiang Yeang, Martin Szummer. Markov
Random Walk Representations with Continuous Distributions August 2003
Proc. Uncertainty in Artificial Intelligence, UAI U. Kjærulff and C.
Meek 600-607
Representations based on random walks can exploit discrete
data distributions for clustering and classification. We extend such
representations from discrete to continuous distributions. Transition
probabilities are now calculated using a diffusion equation with a diffusion
coefficient that varies inversely with the data density. We relate this
diffusion equation to a path integral and derive the corresponding path
probability measure. The framework is useful for incorporating continuous data
densities and prior knowledge.
- Martin Szummer, Tommi Jaakkola. Partially
labeled classification with Markov random walks January 2002
Advances in Neural Information Processing Systems (NIPS) 945-952 14
To classify a large number of unlabeled examples we combine a limited
number of labeled examples with a Markov random walk representation over the
unlabeled examples. The random walk representation exploits any low
dimensional structure in the data in a robust, probabilistic manner. We
develop and compare several estimation criteria/algorithms suited to this
representation. This includes in particular multi-way classification with an
average margin criterion which permits a closed form solution. The time scale
of the random walk regularizes the representation and can be set through a
margin-based criterion favoring unambiguous classification. We also extend
this basic regularization by adapting time scales for individual examples. We
demonstrate the approach on synthetic examples and on text classification
problems.
- Martin Szummer, Tommi Jaakkola. Kernel
expansions with unlabeled examples January 2001 Advances in Neural
Information Processing Systems (NIPS) 626-632 13
Modern
classification applications necessitate supplementing the few available
labeled examples with unlabeled examples to improve classification
performance. We present a new tractable algorithm for exploiting unlabeled
examples in discriminative classification. This is achieved essentially by
expanding the input vectors into longer feature vectors via both labeled and
unlabeled examples. The resulting classification method can be interpreted as
a discriminative kernel density estimate and is readily trained via the EM
algorithm, which in this case is both discriminative and achieves the optimal
solution. We provide, in addition, a purely discriminative formulation of the
estimation problem by appealing to the maximum entropy framework. We
demonstrate that the proposed approach requires very few labeled examples for
high classification accuracy.
Image retrieval and texture modeling
- Wolfgang Sörgel, Sabine Girod, Martin Szummer, Bernd Girod. Computer
Aided Diagnosis of Bone Lesions in the Facial Skeleton March 1998
Aachen, Germany Workshop Bildverarbeitung für die Medizin
We
present a system for computer aided diagnosis of bone tumors in the facial
skeleton. There are many different lesions with radiographic
manifestation in the jaws. Our system helps performing the differential
diagnosis of these. The input is a digitized orthopantomograph (OPG) in which
the user marks the position of the lesion with a single mouse click. An active
contour model then automatically finds the boundaries of the lesion.
Graylevel histograms, MRSAR texture features and Gabor filter
features are computed for the lesion region. These features are then combined
and used to query a database containing expert-diagnosed reference cases. The
result is a number of similar cases, with tumor position marked and with
available expert annotations. We show good agreement between our results and
differential diagnosis given by humans. The system is also a suitable tool for
training and education.
- Martin Szummer, Rosalind W. Picard. Indoor-Outdoor
Image Classification January 1998 Bombay, India IEEE International
Workshop on Content-Based Access of Image and Video Databases, CAIVD
42-51
We show how high-level scene properties can be inferred from
classification of low-level image features, specifically for the
indoor-outdoor scene retrieval problem. We systematically studied the
features: (1) histograms in the Ohta color space (2) multiresolution,
simultaneous autoregressive model parameters (3) coefficients of a
shift-invariant DCT. We demonstrate that performance is improved by computing
features on subblocks, classifying these subblocks, and then combining these
results in a way reminiscent of ``stacking.'' State of the art single-feature
methods are shown to result in about 75-86% performance, while the new
method results in 90.3% correct classification, when evaluated on a diverse
database of over 1300 consumer images provided by Kodak.
- Rosalind W. Picard, Thomas P. Minka, Martin Szummer. Modeling
User Subjectivity In Image Libraries September 1996 Lausanne,
Switzerland IEEE Intl Conf On Image Processing (ICIP) 777-780 2
In
addition to the problem of which image analysis models to use in digital
libraries, e.g. wavelet, Wold, color histograms, is the problem of how to
combine these models with their different strengths. Most present systems
place the burden of combination on the user, e.g. the user specifies 50%
texture features, 20% color features, etc. This is a problem since most users
do not know how to best pick the settings for the given data and search
problem. This paper addresses this problem, describing research in progress
for a system that (1) automatically infers which combination of models best
represents the data of interest to the user and (2) learns continuously during
interaction with each user. In particular, these two components -- inference
and learning -- provide a solution that adapts to the subjective and
hard-to-predict behaviors frequently seen when people query or browse image
libraries.
- Martin Szummer, Rosalind W. Picard. Temporal
Texture Modeling September 1996 Lausanne, Switzerland IEEE Intl
Conf On Image Processing (ICIP) 823-826 3
Temporal textures are
textures with motion. Examples include wavy water, rising steam and fire. We
model image sequences of temporal textures using the spatio-temporal
autoregressive model (STAR). This model expresses each pixel as a linear
combination of surrounding pixels lagged both in space and in time. The model
provides a base for both recognition and synthesis. We show how the least
squares method can accurately estimate model parameters for large, causal
neighborhoods with more than 1000 parameters. Synthesis results show that the
model can adequately capture the spatial and temporal characteristics of many
temporal textures. A 95% recognition rate is achieved for a 135 element
database with 15 texture classes.
Dataset: Temporal Textures
- Martin Szummer. An
Image Browser that learns from User Interaction December 1995
Large image databases with millions of images are being built. It
is very tedious to browse these databases; the user will only have time to see
a small fraction of the images. Currently, there are very few tools that
assist the user in finding the right selection of images. This project
combines learning algorithms and machine vision techniques to create a
flexible and powerful image browser. The user is presented with a selection of
images. They select positive and negative examples of the type of images they
want to see or avoid seeing. The browser analyzes the examples and chooses the
best search metrics. It then uses these metrics to find images similar to the
examples. The results form a hierarchy that the user can browse with a tree
browser. Next, the user selects more positive and negative examples, and the
process repeats.
- Martin Szummer. Temporal
Texture Modeling September 1995 346 MIT Media Lab Perceptual
Computing
Temporal textures are textures with motion. Examples include
wavy water, rising steam and a crowd milling about. We model image sequences
of temporal textures using the spatio-temporal autoregressive model (STAR).
This model expresses each pixel as a linear combination of surrounding pixels
lagged both in space and in time. The model provides a basis both for
recognition and synthesis. We show how the least squares method can accurately
estimate model parameters for large, causal neighborhoods with more than 1000
parameters. Synthesis results show that the model can adequately capture the
spatial and temporal characteristics of many temporal
textures.
Dataset: Temporal Textures
Natural Language Processing
- Wlodek Zadrozny, Marcin Szummer, Stanislaw Jarecki, David Johnson, Leora
Morgenstern. NL
Understanding with a Grammar of Constructions August 1994 Kyoto, Japan
Intl. Conf. on Computational Linguistics COLING 1289-1293 15
We
present an approach to natural language understanding based on a computable
grammar of constructions. A "construction" consists of a set of features of
form and a description of meaning in a context. A grammar is a set of
constructions. This kind of grammar is the key element of Mincal, an
implemented natural language, speech-enabled interface to an on-line calendar
system. The system consists of a NL grammar, a parser, an on-line calendar, a
domain knowledge base (about dates, times and meetings), an application
knowledge base (about the calendar), a speech recognizer, a speech generator,
and the interfaces between those modules. We claim that this architecture
should work in general for spoken interfaces in small domains. In this paper
we present two novel aspects of the architecture: (a) the use of
constructions, integrating descriptions of form, meaning and context into one
whole; and (b) the separation of domain knowledge from application knowledge.
We describe the data structures for encoding constructions, the structure of
the knowledge bases, and the interactions of the key modules of the
system.