Skip to content


Modeling the multimodal flow of human communication

PI: Cristóbal Pagán Cánovas

Funding: Research Consolidation Grant, Spain’s Ministry of Science, to Cristóbal Pagán Cánovas. 2023-2024. 170,000€.

MULTIFLOW investigates whether gesture, prosody, and their co-occurring language predict one another over time. Do words and phrases have their own profiles of bodily motion, pitch, and intensity of sound? We can now detect detailed correlations in big data that suggest that this may be so. These multimodal signatures co-occurring with verbal patterns question the long-held view that meaning resides in discrete units that abstract away the complexity of the communicative flow. Instead, semantic distinctions could emerge from the subtle interplay of body and voice in the continuity of communicative interaction.

How extended are these multimodal signatures? Do they form systems that assist in discriminating across meanings? What are the principles that allow gesture and prosody to form signatures? What role does temporal (dis)alignment play in their formation? And what can multimodal signatures tell us about the way we think and communicate?

The MULTIFLOW interdisciplinary team, with experts on language, gesture, prosody, computing, and modeling, will build and analyze large datasets containing multiple videos in which the same linguistic expressions are uttered, in English and Spanish. We use an innovative workflow that integrates automatic and manual annotation tools for large-scale audiovisual corpora. Thanks to recent improvements of statistical methods such as generalized additive models, we will predict detailed gesture trajectories and small oscillations of pitch and intensity as a function of meaning and of one another over time, isolating semantic and linguistic variables from the effects of usage and context. Our results may change the way we see communication, from a succession of discrete units to continuous, nuanced action. From chunks to flow.

MULTIFLOW is the research pillar of a large-scale initative that also includes education and infrastructures, through the establishment of an international masters degree in multimodal data science (MULTICOM) and the construction of an online platform for the analysis of multimodal data from video collections (MULTIDATA).

See also:




For code and datasets go to and to our RESOURCES page on this site.

Articles and conference papers in preparation:

  • R functions for creating dataframes from OpenPose raw data
  • Analyzing gesture trajectories from normalized body keypoint detection
  • Building Massive Co Speech Gesture Datasets for Specific Linguistic Patterns
  •  Gestural behavior is systematically attuned to language: Novel data analysis of co speech gesture and its implications for multimodal interfaces



Cristóbal Pagán Cánovas

Modeling the multimodal flow: gesture and semantics. Case Western Reserve University, College of Arts and Sciences, Data Science Colloquium. 28 April 2021.