Accurate and timely assessment of the severity of the damage is vital for effective disaster response and prioritization of resources. However, the unstructured and noisy nature of multimodal social media data poses challenges for reliable automated analysis. This paper introduces a robust vision-language pipeline that combines domain-adapted caption generation with BLIP2, semantic alignment filtering using CLIPScore and ImageText Matching (ITM), and chain-of-thought reasoning with a large multimodal language model (LLM) to estimate quantitative severity scores. Evaluations of the CrisisMMD dataset demonstrate significant improvements over the baselines of traditional and unimodal fusion, achieving up to 80.2% precision and strong F1 performance for high-confidence severe damage cases. To address uncertain or low-confidence instances, we advocate a human-in-the-loop design that routes ambiguous cases for manual verification, ensuring trustworthy decision support for emergency response operations.
Shaik, Fuzel & Mourad Oussalah (2025) Towards Trustworthy Disaster Severity Scoring: Combining Semantic Alignment and Chain-of-Thought LLMs, presented at 14th IEEE International Conference on Image Processing Theory, Tools and Applications - IPTA 2025, Istanbul, Türkiye, 13-16 October 2025.