Special Sessions constitute a vital component of the PRCV conference, designed to facilitate focused, smaller-scale symposia addressing emergent and interdisciplinary topics within the domains of pattern recognition and computer vision. Each special session will concentrate on a specific thematic area, presenting salient and contemporary research findings to foster in-depth academic discourse and collaborative initiatives.
Special sessions operate under a dual submission model, encompassing both invited and open submissions. Submissions are to be processed through the official PRCV 2025 submission system, requiring selection of the corresponding special session track. All submitted manuscripts will undergo a rigorous peer-review process, adhering to the same exacting standards and protocols as those applied to the main conference submissions. Accepted special session papers will be presented during the conference proceedings and subsequently included in the official PRCV 2025 conference proceedings. These proceedings will be published by Springer and indexed by EI and ISTP.
The submission deadline for special session papers is July 10, 2025, at 11:59 PM (UTC+8).
Acceptance Notification:
10 Aug., 2025 (Same as the main track)
Camera-Ready:
20 Aug., 2025 (Same as the main track)
To submit your paper, please follow these steps:
1. Access the PRCV submission portal:
https://cmt3.research.microsoft.com/PRCV2025/Submission/Index
2. Click on "Create new submission" in the top-left corner and select "Special Sessions".
3. Under "Subject Areas", choose the specific special session you wish to submit to. All other paper information requirements are identical to those for main conference submissions; please refer to this website for more details: https://www.prcv.cn/CN/CallforPapers/
Multimodal Brain-Computer Interfaces (BCIs), which integrate diverse neurophysiological signals such as EEG, fNIRS, MEG, and ECoG, alongside peripheral physiological signals like EOG and ECG, are emerging as a cutting-edge interdisciplinary research area. By leveraging the complementary temporal and spatial characteristics of these modalities, multimodal BCIs significantly enhance neural decoding accuracy, reliability, and user adaptability—addressing key limitations faced by traditional unimodal systems.This special session focuses on the theoretical foundations, algorithmic innovations, and application-driven developments of multimodal BCIs. Topics include multimodal signal fusion, cross-modal learning, robust neural decoding, and real-world applications in neuroscience, clinical diagnostics, personalized health monitoring, and human-computer interaction. We aim to establish a collaborative platform for researchers from neuroscience, biomedical engineering, machine learning, and cognitive artificial intelligence to jointly explore and accelerate the advancement of next-generation multimodal BCI technologies.
Session Organizer:
Ziyu Jia, Institute of Automation, Chinese Academy of Sciences
Roger Mark, Massachusetts Institute of Technology
Yansen Wang, Microsoft Research Asia
Xinliang Zhou, Stanford University
In recent years, large language models (LLMs) have expanded from natural language processing into interdisciplinary fields such as robotics, healthcare, and scientific exploration, demonstrating immense potential to drive societal progress. This session aims to explore how LLMs can address complex real-world challenges through multimodal data integration and intelligent decision-making. In robotics, LLMs enhance human-machine interaction and autonomous decision-making by processing language, visual, and sensor inputs—enabling precise task execution in dynamic environments. In healthcare, LLMs integrate medical imaging and patient data to improve diagnostic accuracy, personalize treatment plans, and optimize doctor-patient communication, thereby promoting equitable distribution of medical resources. In scientific discovery, LLMs accelerate hypothesis generation and anomaly detection by analyzing multimodal datasets (e.g., astronomical observations, biological experiment data), fostering cross-domain breakthroughs.
The academic significance of this session lies in uncovering innovative applications of LLMs in multimodal pattern recognition, transcending the limitations of traditional autoregressive models and exploring the potential of non-sequential generation and global context modeling. Key innovations include: 1)Proposing a unified framework for LLMs in robotics, healthcare, and science to facilitate multimodal fusion; 2)Demonstrating the efficiency of LLMs in complex reasoning and real-time decision-making; 3)Examining the societal impact and ethical challenges of LLMs. The value of this session to the PRCV conference is reflected in its integration of core computer vision and pattern recognition technologies with the linguistic capabilities of LLMs, exploring the prospects of multimodal intelligent systems in societal applications and fostering interdisciplinary collaboration. By inviting experts from robotics, healthcare, and scientific research, this session will facilitate cutting-edge discussions, introduce fresh research perspectives to the PRCV conference, and contribute to building a smarter, more inclusive socio-technological ecosystem.
Session Organizers:
Dongfang Liu, Rochester Institute of Technology
Zhuang Shao, Newcastle University
Junhan Zhao, Harvard Medical School, The University of Chicago
Kaicheng Yu, Westlake University
This special session targets the emerging paradigm where rigorous quality assessment is embedded directly within the generative pipeline of AI models. By closing the loop between generation and evaluation, we aim to foster high-fidelity, verifiable outputs for safety-critical domains such as medical imaging, autonomous systems, creative media, and decision support. We invite original contributions on (i) quality-aware generative frameworks, (ii) metrics and benchmarks for text-to-image, large language, and multimodal models, (iii) robustness and reliability evaluation, and (iv) generative models purpose-built for quality assessment. The session will bridge technical innovation with practical considerations of trust, reliability, and responsible AI deployment across disciplines.
Session Organizers:
Xiawu Zheng, Xiamen University
Yan Zhang,Xiamen University
Xiongkuo Min,Shanghai Jiao Tong University
Li Yuan,Peking University Shenzhen Graduate School
In recent years, Vision Foundation Models such as ViT and Swin Transformer have achieved remarkable performance in tasks including image classification, object detection, and semantic segmentation, gradually becoming a dominant paradigm in computer vision research. Their strong representational and generalization capabilities have significantly advanced various downstream tasks. However, their large-scale parameterization, high computational and storage demands, and complex deployment pipelines pose substantial challenges for real-world applications.
This session will focus on the efficient computation and cross-domain applications of vision foundation models, covering three major technical directions. First, efficient computation strategies, including distributed training frameworks, dynamic inference mechanisms, model compression and pruning techniques, and neural architecture search, aimed at improving computational efficiency and resource utilization. Second, system deployment optimization, focusing on model adaptation and coordination across heterogeneous hardware platforms, with emphasis on deployment in edge devices, mobile platforms, and cloud environments. Third, cross-domain application practices, exploring the transferability and robustness of vision models in complex tasks such as smart cities, medical image analysis, and industrial inspection, and investigating their generalization and adaptation capabilities across diverse data sources and tasks.
Session Organizer:
Hua Huo, Henan University of Science and Technology
Yan Pang, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Qingjie Meng, University of Birmingham
Infrared images possess unique advantages such as strong adaptability to low-light environments and the ability to penetrate obstacles for target recognition. They are widely applied in critical fields including night vision surveillance, autonomous driving, military security, and remote sensing. However, the technology still faces several challenges, such as low spatial resolution, severe noise interference, limited target information, and difficulty in robust target recognition under complex backgrounds, which hinder accurate and rapid detection of distant targets.
In recent years, with the rapid development of deep learning and multimodal perception, infrared imagery has made significant progress in visual perception, understanding, and intelligent processing. This has given rise to multiple research hotspots, such as infrared image enhancement, cross-modal fusion, semantic segmentation, and object detection. This special session aims to bring together cutting-edge research in this field, focusing on key challenges and innovative methods in infrared visual perception and understanding. Topics include, but are not limited to, infrared image enhancement and reconstruction, infrared-visible image fusion, infrared object detection and tracking under complex scenarios, and cross-modal semantic alignment.
By promoting the in-depth integration of algorithm innovation, performance optimization, and real-world applications, this session is committed to building a collaborative bridge between academia, industry, and research institutes. We warmly welcome researchers in image processing, computer vision, and multimodal learning to submit high-quality contributions, jointly enhancing the academic impact of PRCV in infrared vision research and empowering national security and the intelligent industry.
Session Organizer:
Lei Deng, Beijing Information Science and Technology University
China Society of Image and Graphics (CSIG)
Chinese Association for Artificial Intelligence (CAAI)
China Computer Federation (CCF)
Chinese Association of Automation (CAA)
Shanghai Jiao Tong University (SJTU)
AutoDL