KN5 | Reimagining Speaking Assessment: The Multimodal Revolution Powered by Generative AI | Dr Sha Liu
-
Recent advances in multimodal AI, driven by Multimodal Large Language Models (MLLMs) with capabilities like those seen in GPT-4o and Google Gemini, are fundamentally transforming L2 speaking assessment. This talk provides an overview of that transformation, showing how the integration of text, audio, and visual data enables richer approaches to evaluating speaking ability. It briefly grounds multimodal speaking assessment in established theories of multimodality, contrasting it with traditional methods, and highlights how MLLMs open new horizons by facilitating greater authenticity through realistic tasks, seamless integration of multimodal resources, enhanced user interaction, scalability, and the evaluation of complex skills like interactional competence, which are often challenging for traditional speaking assessment. The talk explores these new capabilities with examples from recent research and platforms over the past five years.
Key highlights include the expanding MLLM toolkit. This toolkit supports creating authentic multimodal assessment tasks, validating their design (e.g., via simulation or synthetic data), and performing automated scoring across linguistic, paralinguistic, and non-verbal dimensions (such as grammar, prosody, and facial expressions). Furthermore, these tools enable the delivery of real-time, personalised feedback. However, these opportunities necessitate careful consideration of significant challenges. The talk addresses critical areas such as ensuring the responsible and ethical use of AI, promoting fairness by actively mitigating algorithmic bias, safeguarding data privacy, and the importance of fostering AI literacy among all stakeholders. Finally, it proposes crucial areas for further research and commercial initiatives to ensure the effective and equitable implementation of multimodal AI in speaking assessment.