Skip to main content

Multimodal Signal Processing

Theory and Applications for Human-Computer Interaction

  • 1st Edition - November 11, 2009
  • Latest edition
  • Editors: Jean-Philippe Thiran, Ferran Marqués, Hervé Bourlard
  • Language: English

Multimodal signal processing is an important research and development field that processes signals and combines information from a variety of modalities – speech, vision, la… Read more

World Book Day celebration

Where learning shapes lives

Up to 25% off trusted resources that support research, study, and discovery.

Description

Multimodal signal processing is an important research and development field that processes signals and combines information from a variety of modalities – speech, vision, language, text – which significantly enhance the understanding, modelling, and performance of human-computer interaction devices or systems enhancing human-human communication. The overarching theme of this book is the application of signal processing and statistical machine learning techniques to problems arising in this multi-disciplinary field. It describes the capabilities and limitations of current technologies, and discusses the technical challenges that must be overcome to develop efficient and user-friendly multimodal interactive systems.

With contributions from the leading experts in the field, the present book should serve as a reference in multimodal signal processing for signal processing researchers, graduate students, R&D engineers, and computer engineers who are interested in this emerging field.

Key features

  • Presents state-of-art methods for multimodal signal processing, analysis, and modeling
  • Contains numerous examples of systems with different modalities combined
  • Describes advanced applications in multimodal Human-Computer Interaction (HCI) as well as in computer-based analysis and modelling of multimodal human-human communication scenes.

Readership

Signal, acoustic, speech, image and video processing university (applied) researchers, R&D engineers, computer engineers

Table of contents

1. Introduction
Jean-Philippe Thiran, Ferran Marqués, and Hervé Bourlard

Part I -- Signal Processing, Modelling and Related Mathematical Tools

2. Statistical Machine Learning for HCI
Samy Bengio

2.1. Introduction

2.2. Introduction to Statistical Learning

2.3. Support Vector Machines for Binary Classification

2.4. Hidden Markov Models for Speech Recognition

2.5. Conclusion

3. Speech Processing
Thierry Dutoit and Stéphane Dupont

3.1. Introduction

3.2. Speech Recognition

3.3. Speaker Recognition

3.4. Text-to-Speech Synthesis

3.5. Conclusions

4. Natural Language and Dialogue Processing
Olivier Pietquin

4.1. Introduction

4.2. Natural Language Understanding

4.3. Natural Language Generation

4.4. Dialogue Processing

4.5. Conclusion

5. Image and Video Processing Tools for HCI
Montse Pardàs, Verónica Vilaplana and Cristian Canton-Ferrer

5.1. Introduction

5.2. Face Analysis

5.3. Hand-Gesture Analysis

5.4. Head Orientation Analysis and FoA Estimation

5.5. Body Gesture Analysis

5.6. Conclusions

6. Processing of Handwriting and Sketching Dynamics
Claus Vielhauer

6.1. Introduction

6.2. History of Handwriting Modality and the Acquisition of Online Handwriting Signals

6.3. Basics in Acquisition, Examples for Sensors

6.4. Analysis of Online Handwriting and Sketching Signals

6.5. Overview of Recognition Goals in HCI

6.6. Sketch Recognition for User Interface Design

6.7. Similarity Search in Digital Ink

6.8. Summary and Perspectives for Handwriting and Sketching in HCI

Part II -- Multimodal Signal Processing and Modelling

7. Basic Concepts of Multimodal Analysis
Mihai Gurban and Jean-Philippe Thiran

7.1. Defining Multimodality

7.2. Advantages of Multimodal Analysis

7.3. Conclusion

8. Multimodal Information Fusion
Norman Poh and Josef Kittler

8.1. Introduction

8.2. Levels of Fusion

8.3. Adaptive versus Non-Adaptive Fusion

8.4. Other Design Issues

8.5. Conclusions

9. Modality Integration Methods
Mihai Gurban and Jean-Philippe Thiran

9.1. Introduction

9.2. Multimodal Fusion for AVSR

9.3. Multimodal Speaker Localisation

9.4. Conclusion

10. A Multimodal Recognition Framework for Joint Modality Compensation and Fusion
Konstantinos Moustakas, Savvas Argyropoulos and Dimitrios Tzovaras

10.1. Introduction

10.2. Joint Modality Recognition and Applications

10.3. A New Joint Modality Recognition Scheme

10.4. Joint Modality Audio-Visual Speech Recognition

10.5. Joint Modality Recognition in Biometrics

10.6. Conclusions

11. Managing Multimodal Data, Metadata and Annotations: Challenges and Solutions
Andrei Popescu-Belis

11.1. Introduction

11.2. Setting the Stage: Concepts and Projects

11.3. Capturing and Recording Multimodal Data

11.4. Reference Metadata and Annotations

11.5. Data Storage and Access

11.6. Conclusions and Perspectives

Part III -- Multimodal Human–Computer and Human-to-Human Interaction

12. Multimodal Input
Natalie Ruiz, Fang Chen, and Sharon Oviatt

12.1. Introduction

12.2. Advantages of Multimodal Input Interfaces

12.3. Multimodality, Cognition and Performance

12.4. Understanding Multimodal Input Behaviour

12.5. Adaptive Multimodal Interfaces

12.6. Conclusions and Future Directions

13. Multimodal HCI Output: Facial Motion, Gestures and Synthesised Speech Synchronisation
Igor S. Pandžic

13.1. Introduction

13.2. Basic AV Speech Synthesis

13.3. The Animation System

13.4. Coarticulation

13.5. Extended AV Speech Synthesis

13.6. Embodied Conversational Agents

13.7. T TS Timing Issues

13.8. Conclusion

14. Interactive Representations of Multimodal Databases
Stéphane Marchand-Maillet, Donn Morrison, Enikö Szekely, and Eric Bruno

14.1. Introduction

14.2. Multimodal Data Representation

14.3. Multimodal Data Access

14.4. Gaining Semantic from User Interaction

14.5. Conclusion and Discussion

15. Modelling Interest in Face-to-Face Conversations from Multimodal Nonverbal Behaviour
Daniel Gatica-Perez

15.1. Introduction

15.2. Perspectives on Interest Modelling

15.3. Computing Interest from Audio Cues

15.4. Computing Interest from Multimodal Cues

15.5. Other Concepts Related to Interest

15.6. Concluding Remarks

Product details

  • Edition: 1
  • Latest edition
  • Published: November 11, 2009
  • Language: English

About the editors

JT

Jean-Philippe Thiran

Affiliations and expertise
EPFL, Lausanne, Switzerland

FM

Ferran Marqués

Affiliations and expertise
Technical University of Catalonia, Spain

HB

Hervé Bourlard

Affiliations and expertise
Director, IDIAP Research Institute, EPFL, Lausanne, Switzerland

View book on ScienceDirect

Read Multimodal Signal Processing on ScienceDirect