| |
Program Details:
Tutorials and Workshops
| |
Workshops,
Part 1
Sunday, January 9th, 8:30 - 10:00 am
Workshops,
Part 2
Sunday, January 9th, 10:30 am - 12:00 pm
Tutorials,
Part 1
Sunday, January 9th, 1:30 - 3:00 pm
Tutorials,
Part 2
Sunday, January 9th, 3:30 - 5:00 pm
|
[top]
Opening Reception
Sunday, January 9th, 7:00 - 9:30 pm
| |
Kontiki Room, Catamaran Resort Hotel
San Diego, California |
[top]
Welcome and Outstanding Paper Award
Presentation
Monday, January 10th, 9:00 - 9:25 am
| |
Robert St. Amant (North Carolina State University)
John Riedl (University of Minnesota)
Anthony Jameson (DFKI and International University
in Germany)
|
[top]
Invited Talk: Justine Cassell
Monday, January 10th, 9:25 - 10:55 am
Chair: W. Lewis Johnson (University
of Southern California)
| |
Oral Tradition, Aboral Coordination: Building
Rapport with Embodied Conversational Agents
Justine Cassell (Northwestern University)
Abstract
Harmony or rapport between people is essential
for relationships as diverse as seller-buyer and
teacher-learner. In this talk I describe the kinds
of verbal behaviors - such as common interactional
structures and narrative resonance - and non-verbal
behaviors - such as attention, positivity, and
coordination - that function together to establish
a sense of rapport between two people in conversation.
These studies are used as the basis for the implementation
of virtual peers - adults, but also more recently
embodied conversational virtual children who are
capable of acting as friends and learning partners
with real children from different ethnic traditions,
collaborating to tell stories from the child's
own cultural context, and aiding children in making
the transition between home and school language.
About Justine Cassell
Justine Cassell is a full professor in the departments
of Communication Studies and Computer Science
at Northwestern University, the director of the
ArticuLab research group, and the graduate director
of the interdisciplinary Technology and Social
Behavior Ph.D. program in the School of Communication.
Before coming to Northwestern, Cassell was a tenured
associate professor at the MIT Media Lab where
she directed the Gesture and Narrative Language
Research Group. In 2001, Cassell was awarded the
Edgerton Faculty Achievement Award at MIT.
Cassell's research concentrates on building technologies
that simulate, mediate, and facilitate everyday
kinds of talk. These technologies, such as Embodied
Conversational Agents, Story Listening Systems,
and Lowest Common Denominator Online Communities,
in turn allow her to study the nature of human
communication with and through technology.
|
[top]
Papers: Affective Computing
Monday, January 10th, 11:25 am - 12:40 pm
Chair: Dina Goren-Bar (Ben-Gurion
University of the Negev)
| |
Experimental Evaluation of Polite Interaction
Tactics for Pedagogical Agents
Ning Wang and W. Lewis Johnson (University of
Southern California)
Paola Rizzo (University of Rome "La Sapienza")
Erin Shaw and Richard E. Mayer (University of
California, Santa Barbara)
Abstract
Recent research shows that instructors commonly
use politeness strategies to achieve affective
scaffolding in educational contexts. The importance
of affective factors such as self-confidence and
interest that contribute to learner motivation
is well recognized. In this paper, we describe
the results of a Wizard-of-Oz experiment to study
the effect of politeness strategies on both cognitive
and motivational factors. We compare the results
of two different politeness strategies, direct
and polite, in assisting seventeen students in
a computer-based learning task. We find that politeness
can affect students motivational state and help
students learn difficult concepts. The results
of the experiment provide a basis for the design
of a polite pedagogical agent and its tutorial
intervention strategies.
Recognising Emotions in Human and Synthetic
Faces: The Role of the Upper and Lower Parts of
the Face
Erica Costantini, Fabio Pianesi, and Michela Prete
(ITC-irst)
Abstract
Embodied Conversational Agents that can express
emotions are a popular topic. Yet, despite recent
attempts, reliable methods are still lacking to
assess the quality of facial displays. This paper
extends and refines previous work, focusing on
the role of the upper and the lower portions of
the face. We analysed the recognition rates and
errors from the responses of 74 subjects to the
presentations of dynamic (human and synthetic)
faces. The results points to the possibility of:
a) addressing the issue of the naturalness of
synthetic faces, and b) a greater importance of
the upper part.
Extraction and Classification of Facemarks
with Kernel Methods
Yuki Tanaka, Hiroya Takamura, and Manabu Okumura
(Tokyo Institute of Technology)
Abstract
We propose methods for extracting facemarks (emoticons)
in text and classifying them into some emotional
categories. In text-based communication, facemarks
have gained popularity, since they help us understand
what writers imply. However, there are two problems
in text-based communication using facemarks; the
first is the variety of facemarks and the second
is lack of good comprehension in using facemarks.
These problems are more serious in the areas where
2-byte characters are used, because the 2-byte
characters can generate a quite large number of
different facemarks. Therefore, we are going to
propose methods for extraction and classification
of facemarks. Regarding the extraction of facemarks
as a chunking task, we automatically annotate
a tag to each character in text. In the classification
of the extracted facemarks, we apply the dynamic
time alignment kernel (DTAK) and the string subsequence
kernel (SSK) for scoring in the k-nearest neighbor
(k-NN) method and for expanding usual Support
Vector Machines (SVMs) to accept sequential data
such as facemarks. We empirically show that our
methods work well in classification and extraction
of facemarks, with appropriate settings of parameters.
|
[top]
Papers: Multimodal Interaction
Monday, January 10th, 2:10 - 3:50 pm
Chair: Antonio Krüger (University
of Münster)
| |
Two-Way Adaptation for Robust Input Interpretation
in Practical Multimodal Conversation Systems
Shimei Pan, Michelle Zhou, and Keith Houck (IBM
T. J. Watson Research Center)
Siwei Shen (University of Michigan)
Abstract
Multimodal conversation systems allow users to
interact with computers effectively using multiple
modalities, such as natural language and gesture.
However, these systems have not been widely used
in practical applications mainly due to their
limited input understanding capability. As a result,
conversation systems often fail to understand
user requests and leave users frustrated. To address
this issue, most existing approaches focus on
improving a system s interpretation capability.
Nonetheless, such improvements may still be limited,
since they would never cover the entire range
of input expressions. Alternatively, we present
a two-way adaptation framework that allows both
users and systems to dynamically adapt to each
other s capability and needs during the course
of interaction. Compared to existing methods,
our approach offers two unique contributions.
First, it improves the usability and robustness
of a conversation system by helping users to dynamically
learn the system s capabilities in context. Second,
our approach enhances the overall interpretation
capability of a conversation system by learning
new user expressions on the fly. Our preliminary
evaluation shows the promise of this approach.
Linguistic Theories in Efficient Multimodal
Reference Resolution: An Empirical Investigation
Joyce Y. Chai, Zahar Prasov, Joseph Blaim, and
Rong Jin (Michigan State University)
Abstract
Multimodal conversational interfaces provide a
natural means for users to communicate with computer
systems through multiple modalities such as speech,
gesture, and gaze. To build effective multimodal
interfaces, understanding user multimodal inputs
is important. Previous linguistic and cognitive
studies indicate that user language behavior does
not occur randomly, but rather follows certain
linguistic and cognitive principles. Therefore,
this paper investigates the use of linguistic
theories in multimodal interpretation. In particular,
we present a greedy algorithm that incorporates
Conversation Implicature and Givenness Hierarchy
for efficient multimodal reference resolution.
Empirical studies indicate that this algorithm
significantly reduces the complexity in multimodal
reference resolution compared to a previous graph-matching
approach. One major advantage of this greedy algorithm
is that the prior linguistic and cognitive knowledge
can be used to guide the search and significantly
prune the search space. Because of its simplicity
and generality, this approach has the potential
to improve the robustness of interpretation and
provide a more practical solution to multimodal
input interpretation.
Multimodal New Vocabulary Recognition Through
Speech and Handwriting in a Whiteboard Scheduling
Application
Edward C. Kaiser (Oregon Health & Science University)
Abstract
Our goal is to automatically recognize and enroll
new vocabulary in a multimodal interface. To accomplish
this our technique aims to leverage the mutually
disambiguating aspects of co-referenced, co-temporal
handwriting and speech. The co-referenced semantics
are spatially and temporally determined by our
multimodal interface for schedule chart creation.
This paper motivates and describes our technique
for recognizing out-of-vocabulary (OOV) terms
and enrolling them dynamically in the system.
We report results for the detection and segmentation
of OOV words within a small multimodal test set.
On the same test set we also report utterance,
word and pronunciation level error rates both
over individual input modes and multimodally.
We show that combining information from handwriting
and speech yields significantly better results
than achievable by either mode alone.
Multimodal Interaction for Pedestrians: An
Evaluation Study
Matthias Jöst, Jochen Häußler, Matthias Merdes,
and Rainer Malaka (European Media Laboratory)
Abstract
What are the most suitable interaction paradigms
for navigational and informative tasks for pedestrians?
Is there an influence of social and situational
context on multimodal interaction? Our study takes
a closer look at a multimodal system on a handheld
device that was recently developed as a prototype
for mobile navigation assistance. The system allows
visitors of a city to navigate, to get information
on sights, and to use and manipulate map information.
In an outdoor evaluation, we studied the usability
of such a system on site. The study yields insight
about how multimodality can enhance the usability
of hand-held devices with their future services.
|
[top]
Papers: Personal Assistants
Monday, January 10th, 4:20 - 6:00 pm
Chair: Mathias Bauer (DFKI)
| |
Automated Email Activity Management: An Unsupervised
Learning Approach
(Honorable Mention for Outstanding Paper Award)
Nicholas Kushmerick (University College Dublin)
Tessa Lau (IBM T. J. Watson Research Center)
Abstract
Many structured activities are managed by email.
For instance, a consumer purchasing an item from
an e-commerce vendor may receive a message confirming
the order, a warning of a delay, and then a shipment
notification. Existing email clients do not understand
this structure, forcing users to manage their
activities by sifting through lists of messages.
As a first step to developing email applications
that provide high-level support for structured
activities, we consider the problem of automatically
learning an activity's structure. We formalize
activities as finite-state automata, where states
correspond to the status of the process, and transitions
represent messages sent between participants.
We propose several unsupervised machine learning
algorithms in this context, and evaluate them
on a collection of e-commerce email.
TaskTracer: A Desktop Environment to Support
Multi-Tasking Knowledge Workers
Anton Dragunov, Thomas G. Dietterich, Kevin Johnsrude,
Matthew McLaughlin, Lida Li, and Jonathan L. Herlocker
(Oregon State University)
Abstract
This paper reports on TaskTracer - a software
system being designed to help highly multitasking
knowledge workers rapidly locate, discover, and
reuse past processes they used to successfully
complete tasks. The system monitors users interaction
with a computer, collects detailed records of
users' activities and resources accessed, associates
(automatically or with users' assistance) each
interaction event with a particular task, enables
users to access records of past activities and
quickly restore task contexts. We present a novel
Publisher-Subscriber architecture for collecting
and processing users' activity data, describe
several different user interfaces tried with TaskTracer,
and discuss the possibility of applying machine
learning techniques to recognize/predict users
tasks.
Intelligent Data Entry Assistant for XML Using
Ensemble Learning
Danico Lee and Costas Tsatsoulis (University of
Kansas)
Abstract
XML has emerged as the primary standard of data
representation and data exchange. Although many
software tools exist to assist the XML implementation
process, data must be manually entered into the
XML documents. Current form filling technologies
are mostly for simple data entry and do not provide
support for the complexity and nested structures
of XML grammars. This paper presents SmartXAutofill,
an intelligent data entry assistant for predicting
and automating inputs for XML documents based
on the contents of historical document collections
in the same XML domain. SmartXAutofill incorporates
an ensemble classifier, which integrates multiple
internal classification algorithms into a single
architecture. Each internal classifier uses approximate
techniques to propose a value for an empty XML
field, and, through voting, the ensemble classifier
determines which value to accept. As the system
operates it learns which internal classification
algorithms work better for a specific XML document
domain and modifies its weights (confidence) in
their predictive ability. As a result, the ensemble
classifier adapts itself to the specific XML domain,
without the need to develop special learners for
the infinite number of domains that XML users
have created. We evaluated our system performance
using data from eleven different XML domains.
The results show that the ensemble classifier
adapted itself to different XML document domains,
and most of the time (for 9 out of 11 domains)
produced predictive accuracies as good as or better
than the best individual classifier for a domain.
Active Preference Learning for Personalized
Calendar Scheduling Assistance
Melinda Gervasio (SRI International)
Michael D. Moffitt, Martha E. Pollack, and Joseph
M. Taylor (University of Michigan)
Tomas E. Uribe (SRI International)
Abstract
We present PLIANT, a learning system that supports
adaptive assistance in an open calendaring system.
PLIANT learns user preferences from the feedback
that naturally occurs during interactive scheduling.
It contributes a novel application of active learning
in a domain where the choice of candidate schedules
to present to the user must balance usefulness
to the learning module with immediate benefit
to the user. Our experimental results provide
evidence of PLIANT s ability to learn user preferences
under various conditions and reveal the tradeoffs
made by the different active learning selection
strategies.
|
[top]
Invited Talk: Stuart Card
Tuesday, January 11th, 9:00 - 10:30 am
Chair: John Riedl (University
of Minnesota)
| |
Attention-Reactive User Interfaces for Sensemaking
Stuart Card (Palo Alto Research Center)
Abstract
I will talk about an emerging class of user interfaces
that if not exactly intelligent are at least attention-reactive.
They are being developed to handle "sensemaking"
tasks, in which users find, analyze, and create
products or action from large collections of documents.
Applications might be expected to develop in law,
education, scholarship, security, and medicine.
These interfaces have a focus + context visualization
on the front end and a semantic contextual computing
engine on the back end. Ultimately they can be
expected to have mixed initiatives. These interfaces
require the development of a supporting science
of human information interaction that stresses
interaction between the user and information and
deemphasizes the platform through which this occurs.
About Stuart Card
Stuart Card is a Senior Research Fellow and the
manager of the User Interface Research group at
the Palo Alto Research Center. His study of input
devices led to the Fitts's Law characterization
of the mouse and was a major factor leading to
the mouse's commercial introduction by Xerox.
His group has developed theoretical characterizations
of human-machine interaction, including the Model
Human Processor, the GOMS theory of user interaction,
information foraging theory, and statistical descriptions
of Internet use. These theories have been put
to use in new paradigms of human-machine interaction
including the Rooms workspace manager, papertronic
systems, and the Information Visualizer. The work
of his group has resulted in a dozen Xerox products
as well as contributing to the founding of three
software companies, Inxight Software, Outride,
and Content Guard. Card is a co-author of the
book The Psychology of Human-Computer Interaction
and a co-editor of the book, Human Performance
Models for Computer-Aided Engineering. He
has served on many editorial boards, government
panels, and university review boards. He received
his A.B. in Physics from Oberlin College and his
Ph.D. in Psychology from Carnegie Mellon University,
where he pursued an interdisciplinary program
in psychology, artificial intelligence, and computer
science. He has been an adjunct faculty member
at Stanford University. His most recent book,
Readings in Information Visualization was
published in 1999. Card is currently concentrating
on the theory and design of systems for attending
to and interpreting large amounts of information
(information foraging theory and sensemaking theory).
Card is a Fellow of the ACM, the first recipient
of the ACM CHI Lifetime Achievement Award, and
the first member of the ACM CHI Academy.
|
[top]
Papers: Visualization and Presentation
Tuesday, January 11th, 11:00 am - 12:40 pm
Chair: Jim Blythe (University
of Southern California)
| |
The Centrality of Pivotal Points in the Evolution
of Scientific Networks
Chaomei Chen (Drexel University)
Abstract
In this paper, we describe the development of
CiteSpace as an integrated environment for identifying
and tracking thematic trends in scientific literature.
The goal is to simplify the process of finding
not only highly cited clusters of scientific articles,
but also pivotal points and trails that are likely
to characterize fundamental transitions of a knowledge
domain as a whole. The trails of an advancing
research field are captured through a sequence
of snapshots of its intellectual structure over
time in the form of Pathfinder networks. These
networks are subsequently merged with a localized
pruning algorithm. Pivotal points in the merged
network are algorithmically identified and visualized
using the betweenness centrality metric. An example
of finding clinical evidence associated with reducing
risks of heart diseases is included to illustrate
how CiteSpace could be used. The contribution
of the work is its integration of various change
detection algorithms and interactive visualization
capabilities to simply users tasks.
Interfaces for Networked Media Exploration
and Collaborative Annotation
Preetha Appan, Bageshree Shevade, Hari Sundaram,
and David Birchfield (Arizona State University)
Abstract
In this paper, we present our efforts towards
creating interfaces for networked media exploration
and collaborative annotation. The problem is important
since online social networks are emerging as conduits
for exchange of everyday experiences. These networks
do not currently provide media-rich communication
environments. Our approach has two parts collaborative
annotation, and a media exploration framework.
The collaborative annotation takes place through
a web based interface, and provides to each user
personalized recommendations, based on media features,
and by using a common sense inference toolkit.
We develop three media exploration interfaces
that allow for two-way interaction amongst the
participants (a) spatio-temporal evolution, (b)
event cones and (c) viewpoint centric interaction.
We also analyze the user activity to determine
important people and events, for each user. We
also develop subtle visual interface cues for
activity feedback. Preliminary user studies indicate
that the system performs well and is well liked
by the users.
A Graph-Matching Approach to Dynamic Media
Allocation in Intelligent Multimedia Interfaces
(Outstanding Paper Award)
Michelle X. Zhou, Zhen Wen, and Vikram Aggarwal
(IBM T. J. Watson Research Center)
Abstract
To aid users in exploring large and complex data
sets, we are building an intelligent multimedia
conversation system. Given a user request, our
system dynamically creates a multimedia response
that is tailored to the interaction context. In
this paper, we focus on the problem of media allocation,
a process that assigns one or more media, such
as graphics or speech, to best convey the intended
response content. Specifically, we develop a graphmatching
approach to media allocation, whose goal is to
find a set of data-media mappings that maximizes
the satisfaction of various allocation constraints
(e.g., data-media compatibility and presentation
consistency constraints). Compared to existing
rule-based or plan-based approaches to media allocation,
our work offers three unique contributions. First,
we provide an extensible computational framework
that optimizes media assignments by dynamically
balancing all relevant constraints. Second, we
use featurebased metrics to uniformly model various
allocation constraints, including those cross-content
and cross-media constraints, which often require
special treatment in existing approaches. Third,
we further improve the quality of a response by
automatically detecting and repairing undesired
allocation results. We have applied our approach
to two different applications and our preliminary
study has shown the promise of our work.
A Location Representation for Generating Descriptive
Walking Directions
Gary Look, Buddhika Kottahachchi, Robert Laddaga,
and Howard Shrobe (Massachusetts Institute of
Technology)
Abstract
An expressive representation for location is an
important component in many applications. However,
while many location-aware applications can reason
about space at the level of coordinates and containment
relationships, they have no way to express the
semantics that define how a particular space is
used. We present lair, an ontology that addresses
this problem by modeling both the geographical
relationships between spaces as well as the functional
purpose of a given space. We describe how lair
was used to create an application that produces
walking directions comparable to those given by
a person, and a pilot study that evaluated the
quality of these directions. We also describe
how lair can be used to evaluate other intelligent
user interfaces.
|
[top]
Papers: Natural Language and Gestural
Input
Tuesday, January 11th, 2:10 - 3:25 pm
Chair: Alfred Kobsa (University
of California at Irvine)
| |
User Interfaces with Semi-Formal Representations:
A Study of Designing Argumentation Structures
Timothy Chklovski, Varun Ratnakar, and Yolanda
Gil (University of Southern California)
Abstract
When designing mixed-initiative systems, full
formalization of all potentially relevant knowledge
may not be cost-effective or practical. This paper
motivates the need for semi-formal representations
that combine machine-processable structures with
free text statements, and discusses the need to
design them in a way that makes the free text
more amenable to automated structuring and processing.
Our work is done in the context of argumentation
systems, and has explored a range of tradeoffs
in combining informal free-text statements with
formal connectors. The paper compares alternative
argument representations which combine structured
argument connectors with free text. We discuss
merits of the systems based on a variety of analysis
structures that we have collected from Web users
to date.
An Agent-Based Approach to Dialogue Management
in Personal Assistants
Anh Nguyen and Wayne Wobcke (University of New
South Wales)
Abstract
Personal assistants need to allow the user to
interact with the system in a exible and adaptive
way such as through spoken language dialogue.
In this research we focus on an application in
which the user can use a variety of devices to
interact with a collection of personal assistants
each specializing in a task domain such as email
or calendar management, information seeking, etc.
We propose an agent-based approach for developing
the dialogue manager that acts as the central
point maintaining continuous user-system interaction
and coordinating the activities of the assistants.
In addition, this approach enables development
of multi-modal interfaces. We describe our initial
implementation which contains an email management
agent that the user can interact with through
a spoken dialogue and an interface on PDAs. The
dialogue manager was implemented by extending
a BDI agent architecture.
Relaxing Stylus Typing Precision by Geometric
Pattern Matching
Per-Ola Kristensson (Linköping University)
Shumin Zhai (IBM Almaden Research Center)
Abstract
Fitts' law models the inherent speed-accuracy
trade-off constraint in stylus typing. Users attempting
to go beyond the Fitts' law speed ceiling will
tend to land the stylus outside the targeted key,
resulting in erroneous words and increasing users'
frustration. We propose a geometric pattern matching
technique to overcome this problem. Our solution
can be used either as an enhanced spell checker
or as a way to enable users to escape the Fitts'
law constraint in stylus typing, potentially resulting
in higher text entry speeds than what is currently
theoretically modeled. We view the hit points
on a stylus keyboard as a high resolution geometric
pattern. This pattern can be matched against patterns
formed by the letter key center positions of legitimate
words in a lexicon. We present the development
and evaluation of an "elastic" stylus keyboard
capable of correcting words even if the user misses
all the intended keys, as long as the user's tapping
pattern is close enough to the intended word.
|
[top]
Panel Discussion
Tuesday, January 11th, 4:00 - 5:30 pm
| |
The Usability Crisis in High-Tech Home Products:
An Opportunity for Intelligent User Interfaces?
Charles Rich (Mitsubishi Electric Research Laboratories)
David Keyson (Technical University of Delft)
Yogendra Jain (Personica)
Boris de Ruyter (Philips)
Abstract
Ordinary people already have great difficulty
using the advanced features of digitally operated
household devices such as personal video recorders
and DVD burners and "white goods" such as washing
machines, microwave ovens, and programmable thermostats.
The problem is getting worse as more customization
and programming features are continually being
added. This is a challenging and practical application
for intelligent user interface research, one in
which new ideas are badly needed. This panel brings
together industrial and academic researchers as
well as business people to report on their activities
and to stimulate others to join.
|
[top]
Poster / Demo Session and Banquet
Tuesday, January 11th, 6:30 - 10:00 pm
Chairs:
Tessa Lau (IBM T. J. Watson Research)
Daniel Billsus (FX Palo Alto Laboratory)
| |
Note: During this session,
system demonstrations will be given by some of
the short paper presenters, as well as by presenters
of some of the long papers.
Affective Computing
Person-Independent Estimation of Emotional
Experiences From Facial Expressions
Timo Partala, Veikko Surakka, and Toni Vanhala
(University of Tampere)
Abstract
The aim of this research was to develop methods
for the automatic person-independent estimation
of experienced emotions from facial expressions.
Ten subjects watched series of emotionally arousing
pictures and videos, while the electromyographic
(EMG) activity of two facial muscles: zygomaticus
major (activated in smiling) and corrugator supercilii
(activated in frowning) was registered. Based
on the changes in the activity of these two facial
muscles, it was possible to distinguish between
ratings of positive and negative emotional experiences
at a rate of almost 70% for pictures and over
80% for videos. Using these methods, the computer
could adapt its behavior according to the user
s emotions during humancomputer interaction.
Pedagogical Agent Image Matters
Amy Baylor (Florida State University)
Abstract
Pedagogical agent image is a key feature for animated
interface agents. Experimental research indicates
that agent interface images should be carefully
designed, considering both the relevant outcomes
(learning or motivational) together with student
characteristics. This paper summarizes empirically-derived
design guidelines for pedagogical agent image.
Emotive Alert: HMM-Based Emotion Detection
In Voicemail Messages
Zeynep Inanoglu and Ron Caneel (MIT Media Laboratory)
Abstract
Voicemail has become an integral part of our personal
and professional communication. The number of
messages that accumulate in our voice mailboxes
necessitate new ways of prioritizing them. Currently,
we are forced to actively listen to all messages
in order to find out which ones are important
and which ones can be attended to later on. In
this paper, we describe Emotive Alert, a system
that can detect some of the significant emotions
in a new message and notify the account owner
along various affective axes, including ururgency,
formality, valence (happy vs. sad) and arousal
(calm vs. excited). We have used a purely acoustic,
HMM-based approach for identifying the emotions,
which allows application of this system to all
messages independent of language.
Human-Robot Interaction
Vision-Based GUI for Interactive Mobile Robots
Randeep Singh, Bhartendu Seth, and Uday B. Desai
(Indian Institute of Technology)
Abstract
Interactive mobile robots are an active area of
research. This paper presents a framework for
designing a real-time vision based hand-body gesture
user interface for such robots. The said framework
works in real world lighting conditions, with
complex background, and can handle intermittent
motion of the camera. The input signal is captured
by using a singular monocular color camera. Vision
is the only feedback sensor being used. It is
assumed that the gesturer is wearing clothes that
are slightly different from the background. We
have tested this framework on a gesture database
consisting of 11 hand-body gestures and have recorded
recognition accuracy up to 90%.
User Intentions Funneled Through a Human-Robot
Interface
Michael T. Rosenstein, Andrew H. Fagg, Shichao
Ou, and Roderic A. Grupen (University of Massachusetts
Amherst)
Abstract
We describe a method for predicting user intentions
as part of a human-robot interface. In particular,
we show that funnels, i.e., geometric objects
that partition an input space, provide a convenient
means for discriminating individual objects and
for clustering sets of objects for hierarchical
tasks. One advantage of the proposed implementation
is that a simple parametric model can be used
to specify the shape of a funnel, and a straightforward
heuristic for setting initial parameter values
appears promising. We discuss the possibility
of adapting the user interface with machine learning
techniques, and we illustrate the approach with
a humanoid robot performing a variation of a standard
peginsertion task.
Personal Assistants
Context-Based Similar Words Detection and
Its Application in Specialized Search Engines
Hisham Al-Mubaid and Ping Chen (University of
Houston)
Abstract
This paper presents a new context-based method
for automatic detection and extraction of similar
and related words from texts. Finding similar
words is a very important task for many NLP applications
including anaphora resolution, document retrieval,
text segmentation, and text summarization. Here
we use word similarity to improve search quality
for search engines in (general and) specific domains.
Our method is based on rules for extracting the
words in the neighborhood of a target word, then
connecting this with the surroundings of other
occurrences of the same word in the (training)
text corpus. This is an ongoing work, and is still
under extensive testing. The preliminary results,
however, are promising and encouraging more work
in this direction.
Interactively Building Agents for Consumer-Side
Data Mining
Rattapoom Tuchinda and Craig A. Knoblock (University
of Southern California)
Abstract
Integrating and mining data from different web
sources can make end-users well-informed when
they make decisions. One of many limitations that
bars end-users from taking advantages of such
process is the complexity in each of the steps
required to gather, integrate, monitor, and mine
data from different websites. We present the idea
of combining the data integration, monitoring,
and mining as one single process in the form of
an intelligent assistant that guides end-users
to specify their mining tasks by just answering
questions. This easy-to-use approach, which trades
off complexity in terms of available operations
with the ease of use, has the ability to provide
interesting insight into the data that would requires
days of human effort to gather, combine, and mine
manually from the web.
Adaptive Teaching Strategy for Online Learning
Jungsoon Yoo, Cen Li, and Chrisila Pettey (Middle
Tennessee State University)
Abstract
Finding the optimal teaching strategy for an individual
student is difficult even for an experienced teacher.
Identifying and incorporating multiple optimal
teaching strategies for different students in
a class is even harder. This paper presents an
Adaptive tutor for online Learning, AtoL, for
Computer Science laboratories that identifies
and applies the appropriate teaching strategies
for students on an individual basis. The optimal
strategy for a student is identified in two steps.
First, a basic strategy for a student is identified
using rules learned from a supervised learning
system. Then the basic strategy is refined to
better fit the student using models learned using
an unsupervised learning system that takes into
account the temporal nature of the problem solving
process. The learning algorithms as well as the
initial experimental results are presented.
Providing Intelligent Help Across Applications
in Dynamic User and Environment Contexts
Ashwin Ramachandran and R. Michael Young (North
Carolina State University)
Abstract
The problem of providing help for complex application
interfaces has been a source of interest for a
number of researcher efforts. As the computational
power of computers increases, typical applications
not only increase in functionality but also in
the degree of interaction with the computational
environment in which they reside. This paper describes
an ongoing project to design an Intelligent Help
System (IHS) that provides context-sensitivity
not only through its modeling of application states
but also its modeling of the interaction between
applications and between an application and the
environment in which it resides.
Visualization and Presentation
ScentHighlights: Highlighting Conceptually
Related Sentences During Reading
Ed Chi, Lichan Hong, Michelle Gumbrecht, and Stuart
Card (Palo Alto Research Center)
Abstract
Researchers have noticed that readers are increasingly
skimming instead of reading in depth. Skimming
also occur in re-reading activities, where the
goal is to recall specific topical facts. Bookmarks
and highlighters were invented precisely to achieve
this goal. For skimming activities, readers need
effective ways to direct their attention toward
the most relevant passages within text. We describe
how we have enhanced skimming activity by conceptually
highlighting sentences within electronic text
that relate to search keywords. We perform the
conceptual highlighting by computing what conceptual
keywords are related to each other via word co-occurrence
and spreading activation. Spreading activation
is a cognitive model developed in psychology to
simulate how memory chunks and conceptual items
are retrieved in our brain. We describe the method
used, and illustrate the idea with realistic scenarios
using our system.
Personal Reporting of a Museum Visit as an
Entry Point to Future Cultural Experience
Charles Callaway, Tsvi Kuflik, Elena Not, Alessandra
Novello, Oliviero Stock, and Massimo Zancanaro
(ITC-irst)
Abstract
Museum visitors can continue interacting with
museum exhibits even after they have left the
museum. We can help them do this by creating a
report that includes a basic, personalized narration
of their visit, the items and relationships they
found most interesting, pointers to additional
related online information, and suggestions for
future visits to the current and other museums.
In this work we describe the automatic generation
of personalized natural language reports to help
create one episode in an ongoing coherent sequence
of cultural activities.
Speech- and Vision-Based Interfaces
How to Wreck a Nice Beach You Sing Calm Incense
Henry Lieberman, Alexander Faaborg, Waseem Daher,
and José Espinosa (MIT Media Laboratory)
Abstract
A principal problem in speech recognition is distinguishing
between words and phrases that sound similar but
have different meanings. Speech recognition programs
produce a list of weighted candidate hypotheses
for a given audio segment, and choose the "best"
candidate. If the choice is incorrect, the user
must invoke a correction interface that displays
a list of the hypotheses and choose the desired
one. The correction interface is time-consuming,
and accounts for much of the frustration of today's
dictation systems. Conventional dictation systems
prioritize hypotheses based on language models
derived from statistical techniques such as n-grams
and Hidden Markov Models. We propose a supplementary
method for ordering hypotheses based on Commonsense
Knowledge. We filter acoustical and word-frequency
hypotheses by testing their plausibility with
a semantic network derived from 700,000 statements
about everyday life. This often filters out possibilities
that "don't make sense" from the user's viewpoint,
and leads to improved recognition. Reducing the
hypothesis space in this way also makes possible
streamlined correction interfaces that improve
the overall throughput of dictation systems.
HMM-Based Efficient Sketch Recognition
Tevfik Metin Sezgin and Randall Davis (Massachusetts
Institute of Technology)
Abstract
Current sketch recognition systems treat sketches
as images or a collection of strokes, rather than
viewing sketching as an interactive and incremental
process. We show how viewing sketching as an interactive
process allows us to recognize sketches using
Hidden Markov Models. We report results of a user
study indicating that in certain domains people
draw objects using consistent stroke orderings.
We show how this consistency, when present, can
be used to perform sketch recognition efficiently.
This novel approach enables us to have polynomial
time algorithms for sketch recognition and segmentation,
unlike conventional methods with exponential complexity.
Interaction Techniques Using Prosodic Features
of Speech and Audio Localization
Alex Olwal (Royal Institute of Technology)
Steven Feiner (Columbia University)
Abstract
We describe several approaches for using prosodic
features of speech and audio localization to control
interactive applications. This information can
be applied to parameter control, as well as to
speech disambiguation. We discuss how characteristics
of spoken sentences can be exploited in the user
interface; for example, by considering the speed
with which a sentence is spoken and the presence
of extraneous utterances. We also show how coarse
audio localization can be used for low-fidelity
gesture tracking, by inferring the speaker's head
position.
Doubleshot: An Interactive User-Aided Segmentation
Tool
Tom Yeh and Trevor Darrell (Massachusetts Institute
of Technology)
Abstract
In this paper, we describe an intelligent user
interface designed for camera phones to allow
mobile users to specify the object of interest
in the scene simply by taking two pictures: one
with the object and one without the object. By
comparing these two images, the system can reliably
extract the visual appearance of the object, which
can be useful to a wide-range of applications
such as content-based image retrieval and object
recognition.
Generating Semantic Contexts from Spoken Conversation
in Meetings
Jürgen Ziegler and Zoulfa El Jerrroudi (University
of Duisburg-Essen)
Karsten Böhm (University of Leipzig)
Abstract
SemanticTalk is a tool for supporting face-to-face
meetings and discussions by automatically generating
a semantic context from spoken conversations.
We use speech recognition and topic extraction
from a large terminological database to create
a network of discussion topics in real-time. This
network includes concepts explicitly addressed
in the discussion as well as semantically associated
terms, and is visualized to increase conversational
awareness and creativity in the group.
Conventions in Human-Human Multi-Threaded
Dialogues: A Preliminary Study
Peter Heeman and Fan Yang (Oregon Health & Science
University)
Andrew Kun and Alexander Shyrokov (University
of New Hampshire)
Abstract
In this paper, we explore the conventions that
people use in managing multiple dialogue threads.
In particular, we focus on where in a thread people
interrupt when switching to another thread. We
nd that some subjects are able to vary where they
switch depending on how urgent the interrupting
task is. When time-allowed, they switched at the
end of a discourse segment, which we hypothesize
is less disruptive to the interrupted task when
it is later resumed.
Towards Automatic Transcription of Expressive
Oral Percussive Performances
Amaury Hazan and Rafael Ramirez (Pompeu Fabra
University)
Abstract
We describe a tool for transcribing voice generated
percussive rhythms. The system consists of: (a)
a segmentation component which separates the monophonic
input stream into percussive events (b) a descriptors
generation component that computes a set of acoustic
features from each of the extracted segments,
(c) a machine learning component which assigns
to each of the segmented sounds of the input stream
a symbolic class. We describe each of these components
and compare different machine learning strategies
that can be used to obtain a symbolic representation
of the oral percussive performance.
Communicating the User's Focus of Attention
by Image Processing as Input for a Mobile Museum
Guide
Adriano Albertini, Roberto Brunelli, Oliviero
Stock, and Massimo Zancanaro (ITC-irst)
Abstract
The paper presents a first prototype of a handheld
museum guide delivering contextualized information
based on the recognition of drawing details selected
by the user through the guide camera. The resulting
interaction modality has been analyzed and compared
to previous approaches. Finally, alternative,
more scalable, solutions are presented that preserve
the most interesting features of the system described.
Knowledge Acquisition and Knowledge-Based Design
Acquiring Story Scripts Using Common Sense Feedback
Ryan Williams, Barbara Barry, and Push Singh (MIT
Media Laboratory)
Abstract
At the Media Lab we are developing a resource
called StoryNet, a very-large database of story
scripts that can be used for commonsense reasoning
by computers. This paper introduces ComicKit,
an interface for acquiring StoryNet scripts from
casual internet users. The core element of the
interface is its ability to dynamically make commonsense
suggestions that guide user story construction.
We describe the encouraging results of a preliminary
user study, and discuss future directions for
ComicKit.
Metafor: Visualizing Stories as Code
Hugo Liu and Henry Lieberman (MIT Media Laboratory)
Abstract
Every program tells a story. Programming, then,
is the art of constructing a story about the objects
in the program and what they do in various situations.
So-called programming languages, while easy for
the computer to accurately convert into code,
are, unfortunately, difficult for people to write
and understand. We explore the idea of using descriptions
in a natural language as a representation for
programs. While we cannot yet convert arbitrary
English to fully specified code, we can use a
reasonably expressive subset of English as a visualization
tool. Simple descriptions of program objects and
their behavior generate scaffolding (underspecified)
code fragments, that can be used as feedback for
the designer. Roughly speaking, noun phrases can
be interpreted as program objects; verbs can be
functions, adjectives can be properties. A surprising
amount of what we call programmatic semantics
can be inferred from linguistic structure. We
present a program editor, Metafor, that dynamically
converts a user's stories into program code, and
in a user study, participants found it useful
as a brainstorming tool.
Task-Aware Information Access for Diagnosis
of Manufacturing Problems
Larry Birnbaum, Wallace Hopp, Seyed Iravani, Kevin
Livingston, and Biying Shou (Northwestern University)
Abstract
Pinpoint is a promising first step towards using
a rich model of task context in proactive and
dynamic IR systems. Pinpoint allows a user to
navigate decision tree representations of problem
spaces, built by domain experts, while dynamically
entering annotations specific to their problem.
The system then automatically generates queries
to information repositories based on both the
userís annotations and location in the problem
space, producing results that are both task focused
and problem specific. Initial feedback from users
and domain experts has been positive.
Designing Interfaces for Guided Incremental
Collection of Knowledge About Everyday Objects
from Volunteers
Timothy Chklovski (University of Southern California)
Abstract
A new generation of intelligent applications can
be enabled by broad-coverage knowledge repositories
about everyday objects. We distill lessons in
design of intelligent user interfaces which collect
such broad-coverage knowledge from untrained volunteers.
We motivate the knowledge-driven template-based
approach adopted in LEARNER2, a second generation
proactive acquisition interface for eliciting
such knowledge. We present volume, accuracy, and
recall of knowledge collected by fielding the
system for 5 months. LEARNER2 has so far acquired
99,018 general statements, emphasizing knowledge
about parts of and typical uses of objects.
An Ontology-Based Interface for Machine Learning
Mathias Bauer and Stephan Baldes (DFKI, German
Research Center for Artificial Intelligence)
Abstract
Machine learning (ML) is a complex process that
can hardly be carried out by non-expert users.
Especially when using adaptive systems that interpret
and exploit observations of the user to modify
their behavior according to the user's perceived
preferences, even naïve users may be confronted
with learning systems. This paper presents an
approach to make non-expert users understand and
influence an ML system such as to improve trust
and acceptance of the overall system behavior.
Smart Environments and Ubiquitous Computing
A Framework for Designing Intelligent Task-Oriented
Augmented Reality User Interfaces
Leonardo Bonnani, Chia-Hsun Lee, and Ted Selker
(MIT Media Laboratory)
Abstract
A task-oriented space can benefit from an augmented
reality interface that layers the existing tools
and surfaces with useful information to make cooking
more easy, safe and efficient. To serve experienced
users as well as novices, augmented reality interfaces
need to adapt modalities to the user s expertise
and allow for multiple ways to perform tasks.
We present a framework for designing an intelligent
user interface that informs and choreographs multiple
tasks in a single space according to a model of
tasks and users. A residential kitchen has been
outfitted with systems to gather data from tools
and surfaces and project multi-modal interfaces
back onto the tools and surfaces themselves. Based
on user evaluations of this augmented reality
kitchen, we propose a system to tailor information
modalities based on the spatial and temporal qualities
of the task, and the expertise, location and progress
of the user. The intelligent augmented reality
user interface choreographs multiple tasks in
the same space at the same time.
Seamless User Notification in Ambient Soundscapes
Andreas Butz (University of Munich)
Ralf Jung (Saarland University)
Abstract
We describe a method for notifying users through
auditory cues embedded in an ambient soundscape
in the environment. It uses pieces of music which
are composed in such a way, that particular instruments
or motifs can be added or omitted without losing
the aesthetic quality of the overall composition.
This allows for very subtle modifications in the
soundscape which are only noticed by those users
who have chosen this particular instrument or
motif as their notification instrument before.
As a side effect, the soundscape itself can be
used to subtly influence the mood of users. The
method has been implemented in a prototype, which
we briefly discuss. The prototype is implemented
using a spatial audio framework and can hence
notify users from particular directions.
A Cart-Mounted Intelligent Shopping Assistant
Chad Cumby, Andrew Fano, Rayid Ghani, and Marko
Krema (Accenture Technology Labs)
Abstract
This paper describes an Intelligent Shopping Assistant
designed for a shopping cart mounted tablet PC
that enables individual interactions with customers.
We use machine learning algorithms to predict
a shopping list for the customer's current trip
and present this list on the device. As they navigate
through the store, personalized promotions are
presented using consumer models derived from loyalty
card data for each inidvidual. In order for shopping
assistant devices to be effective, we believe
that they have to be powered by algorithms that
are tuned for individual customers and can make
accurate predictions about an individual's actions.
We formally frame the shopping list prediction
as a classification problem, describe the algorithms
and methodology behind our system, and show that
shopping list prediction can be done with high
levels of accuracy, precision, and recall. Beyond
the prediction of shopping lists we briefly introduce
other aspects of the shopping assistant project,
such as the use of consumer models to select appropriate
promotional tactics, and the development of promotion
planning simulation tools to enable retailers
to plan personalized promotions delivered through
such a shopping assistant.
Adaptive Navigation Support with Public Displays
Christian Kray and Gerd Kortuem (Lancaster University)
Antonio Krüger (University of Münster)
Abstract
In this paper, we describe a public navigation
system which uses adaptive displays as directional
signs. The displays are mounted to walls where
they provide passersbys with directional information.
Each sign is an autonomous, wirelessly networked
digital displays connected to a central server.
The signs are position-aware and able to adapt
their display content in accordance with their
current position. Advantages of such a navigation
system include improved exibility, dynamic adaptation
and ease of setup and maintenance.
|
[top]
Invited Talk: Barry Smyth
Wednesday, January 12th, 9:00 - 10:30 am
Chair: Anthony Jameson (DFKI and
International University in Germany)
| |
Adaptive Information Access and the Quest
for the Personalization-Privacy Sweetspot
Barry Smyth (University College Dublin and ChangingWorlds)
Abstract
This talk will focus on how so-called personalization
techniques are being used in response to the information
overload problem. Personalization research brings
together ideas from artificial intelligence, user
profiling, information retrieval and user-interface
design to provide users with more proactive and
intelligent information services that are capable
of predicting the needs of individuals and adapting
to their implicit preferences. We will describe
how personalization techniques have been successfully
applied to the two dominant modes of information
access, browsing and search, with reference to
deployed applications in the mobile Internet and
Web search arenas. Particular attention will be
paid to the natural tension that exists between
the potential value of personalization, on the
one hand, and the perceived privacy risk associated
with profiling, on the other. We will highlight
certain recent approaches to personalization that
appear to achieve a useful balance between personalization
and privacy and argue that realizing this personalization-privacy
sweetspot may be the key to the large-scale success
of personalization technologies in the future.
(The support of the Informatics Research Initiative
of Enterprise Ireland and Science Foundation Ireland
is gratefully acknowledged.)
About Barry Smyth
Barry Smyth holds the Digital Chair of Computer
Science. He is an ECCAI Fellow and is currently
the head of the Department of Computer Science
at University College Dublin. He is also a founder
of ChangingWorlds Ltd., a company that has developed
a range of personalization solutions for the Mobile
Internet space. He received a BSc in Computer
Science from University College Dublin and a PhD
from Trinity College, Dublin. Barry works in several
areas of artificial intelligence including personalization,
case-based reasoning and information retrieval.
He has authored almost 200 technical papers and
received best paper awards from conferences such
as the International Joint Conference on Artificial
Intelligence and the European Conference on Artificial
Intelligence.
|
[top]
Papers: Recommendation and Instruction
Wednesday, January 12th, 11:00 am - 12:40 pm
Chair: Jon Herlocker (Oregon State
University)
| |
Improving Proactive Information Systems
Daniel Billsus and David M. Hilbert (FX Palo Alto
Laboratory)
Dan Maynes-Aminzade (Stanford University)
Abstract
Proactive contextual information systems help
people locate information by automatically suggesting
potentially relevant resources based on their
current tasks or interests. Such systems are becoming
increasingly popular, but designing user interfaces
that effectively communicate recommended information
is a challenge: the interface must be unobtrusive,
yet communicate enough information at the right
time to provide value to the user. In this paper
we describe our experience with the FXPAL Bar,
a proactive information system designed to provide
contextual access to corporate and personal resources.
In particular, we present three features designed
to communicate proactive recommendations more
effectively: translucent recommendation windows
increase the user s awareness of particularly
highly-ranked recommendations, query term highlighting
communicates the relationship between a recommended
document and the user s current context, and a
novel recommendation digest function allows users
to return to the most relevant previously recommended
resources. We present empirical evidence supporting
our design decisions and relate lessons learned
for other designers of contextual recommendation
systems.
Trust in Recommender Systems
John O'Donovan and Barry Smyth (University College
Dublin)
Abstract
Recommender systems have proven to be an important
response to the information overload problem,
by providing users with more proactive and personalized
information services. And collaborative filtering
techniques have proven to be an vital component
of many such recommender systems as they facilitate
the generation of high-quality recommendations
by leveraging the preferences of communities of
similar users. In this paper we suggest that the
traditional emphasis on user similarity may be
overstated. We argue that additional factors have
an important role to play in guiding recommendation.
Specifically we propose that the trustworthiness
of users must be an important consideration. We
present two computational models of trust and
show how they can be readily incorporated into
standard collaborative filtering frameworks in
a variety of ways. We also show how these trust
models can lead to improved predictive accuracy
during recommendation.
Experiments in Dynamic Critiquing
Kevin McCarthy, James Reilly, Lorraine McGinty,
and Barry Smyth (University College Dublin)
Abstract
Conversational recommender systems are commonly
used to help users to navigate through complex
product-spaces by alternatively making product
suggestions and soliciting user feedback in order
to guide subsequent suggestions. Recently, there
has been a surge of interest in developing effective
interfaces that support user interaction in domains
of limited user expertise. Critiquing has proven
to be a popular and successful user feedback mechanism
in this regard, but is typically limited to the
modification of single features. We review a novel
approach to critiquing, dynamic critiquing, that
allows users to modify multiple features simultaneously
by choosing from a range of so-called compound
critiques that are automatically proposed based
on their current position within the product-space.
In addition, we introduce the results of an important
new live-user study that evaluates the practical
benefits of dynamic critiquing.
Animating an Interactive Conversational Character
for an Educational Game System
Andrea Corradini, Manish Mehta, Niels-Ole Bernsen,
and Marcela Charfuelan (University of Southern
Denmark)
Abstract
Within the framework of the project NICE (Natural
Interactive Communication for Edutainment) [2],
we have been developing an educational and entertaining
computer game that allows children and teenagers
to interact with a conversational character impersonating
the fairy tale writer H.C. Andersen (HCA). The
rationale behind our system is to make kids learn
about HCA s life, fairy tales and historical period
while playing and having fun. We report on the
character s generation and realization of both
verbal and 3D graphical non-verbal output behaviors,
such as speech, body gestures and facial expressions.
This conveys the impression of a human-like agent
with relevant domain knowledge, and distinct personality.
With the educational goal in the foreground, coherent
and synchronized output presentation becomes mandatory,
as any inconsistency may undermine the user s
learning process rather than reinforcing it.
|
|