| 66 | 0 | 19 |
| 下载次数 | 被引频次 | 阅读次数 |
面部特征判别性提取这一关键性难题尚待突破,尤其当存在现实场景中普遍出现的干扰因素时,如光照条件变化、部分区域遮挡等,另外,数据分布不均衡时,模型泛化能力减弱的现象也频繁发生。本研究着重于细微表情感知能力的强化与模型鲁棒性的整体提升,基于此背景,构建了改进自ConvNeXt网络架构的ERAnet模型,核心点在于高效区域注意力模块的设计与集成,全局语义信息与局部细粒度特征的深度融合以及动态区域聚焦机制的应用。而最具判别力的面部区域能够被模型自主捕捉,得益于可学习区域掩码同通道注意力的协同作用。实验数据证实了该模型在细微表情感知方面的显著进步,具体表现为:首先通过区域注意力机制完成面部各区域关注度的动态调整;其次实施特征通道权重的优化分配;最终将多尺度分组卷积同注意力机制进行融合以丰富特征,使复杂环境下仍保持高效的特征提取能力。公开数据集FERPlus和RAF-DB上的测试结果显示,识别准确率分别达到91.45%和90.29%,相较于基准模型分别提升了1.76和2.43个百分点,实验证明该方法在表情识别任务中具有明确的应用前景与技术优势。
Abstract:Extracting highly discriminative facial features remains a challenging problem, particularly when there are common interference factors in real scenarios, such as changes in lighting conditions and partial occlusion of certain areas, which often weaken model generalization. To address these issues and significantly improve the perception of subtle expressions as well as overall robustness, this paper proposes ERAnet, an enhanced facial expression recognition model built upon the ConvNeXt architecture. The core contribution lies in the design and integration of an efficient regional attention module that deeply fuses global semantic information with local fine-grained details and employs a dynamic region-focusing mechanism. By combining learnable regional masks with channel attention, the model can automatically capture the most discriminative facial areas. Extensive experiments validate the effectiveness of the proposed approach: it dynamically adjusts attention across different facial regions,optimally reallocates channel weights, and integrates multi-scale grouped convolutions with attention mechanisms to enrich feature representations while maintaining efficient extraction even under complex conditions. On the publicly available FERPlus and RAFDB datasets, ERAnet achieves recognition accuracies of 91.45% and 90.29%, respectively, representing improvements of 1.76 and2.43 percentage points over strong baseline models. These results clearly demonstrate the practical potential and technical superiority of the proposed method for real-world facial expression recognition tasks.
[1]KOPALIDIS T, SOLACHIDIS V, VRETOS N, et al.Advances in facial expression recognition:a survey of methods, benchmarks, models, and datasets[J]. Information, 2024,15(3):135.
[2]WANG H Y, LI B, WU S, et al. Rethinking the learning paradigm for dynamic facial expression recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE, 2023:17958-17968.
[3]侯越,周瑞娟,张鑫.基于自适应动态关联矩阵的时空一致性交通流预测研究[J].兰州交通大学学报,2024,43(6):42-53.
[4]WEN Z Y,LIN W Z,WANG T,et al. Distract your attention:Multi-head cross attention network for facial expression recognition[J]. Biomimetics,2023,8(2):199.
[5]XUE F L,WANG Q C,GUO G D. Transfer:learning relationaware facial expression representations with transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. New York:IEEE,2021:3601-3610.
[6]MA F Y, SUN B, LI S T. Transformer-augmented network with online label correction for facial expression recognition[J]. IEEE Transactions on Affective Computing,2023,15(2):593-605.
[7]LEE I, LEE E, YOO S B. Latent-ofer:detect, mask, and reconstruct with latent vectors for occluded facial expression recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. New York:IEEE, 2023:1536-1546.
[8]WU Z Y, CUI J S. La-net:landmark-aware learning for reliable facial expression recognition under label noise[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. New York:IEEE,2023:20698-20707.
[9]WEN Y D, ZHANG K P, LI Z F, et al. A discriminative feature learning approach for deep face recognition[C]//European Conference on Computer Vision. Cham:Springer International Publishing,2016:499-515.
[10]LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision.New York:IEEE,2017:2980-2988.
[11]LIU Z, MAO H Z, WU C Y, et al. A ConvNet for the2020s[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York:IEEE, 2022:11966-11976..
[12]DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words:transformers for image recognition at scale[C]//International Conference on Learning Representations.Vienna:IEEE, 2021:812-821.
[13]赵贺,白天平,冯越.基于改进YOLOv5的轻量级航拍图像目标检测方法[J].兰州交通大学学报,2025,44(4):94-104.
[14]LI S, DENG W H, DU J P. Reliable crowdsourcingand deep locality-preserving learning for expression recognition in the wild[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York:IEEE, 2017:2852-2861.
[15]BARSOUM E, ZHANG C, FERRER C C, et al. Training deep networks for facial expression recognition with crowdsourced label distribution[C]//Proceedings of the 18th ACM International Conference on Multimodal Interaction. New York:Association for Computing Machinery,2016:279-283.
[16]杨桂芹,张国庆. SDN中基于SA-GRU的DDoS攻击检测[J].兰州交通大学学报,2025,44(5):10-20.
[17]XIE Y L,WANG W H,ZHANG Y B,et al. Dynamic region features learning for facial expression recognition[J]. Signal,Image and Video Processing,2025,19(6):487.
[18]WEN X Y, ZHOU J X, GAN J H, et al. A discriminative multiscale feature extraction network for facial expression recognition in the wild[J]. Measurement Science and Technology,2024,35(4):045005.
[19]ZHANG Z Y,TIAN X,ZHANG Y,et al. Enhanceddiscriminative global-local feature learning with priority for facial expression recognition[J]. Information Sciences, 2023, 630:370-384.
[20]ZHANG Y H, WANG C R, LING X, et al. Learn from all:erasing attention consistency for noisy label facial expression recognition[C]//European Conference on Computer Vision.Cham:Springer,2022:418-434.
[21]HU P Y, TANG X P, YANG L, et al. LCANet:a model for analysis of students real-time sentiment by integrating attention mechanism and joint loss function[J]. Complex&Intelligent Systems,2025,11(1):27-38.
[22]HE H F, LIAO R B, LI Y. MSAFNet:a novel approach to facial expression recognition in embodied AI systems[J].Intelligence&Robotics,2025,5(2):313-332.
[23]ZHANG X, WANG T Y, LI X T, et al. Weakly-supervised text-driven contrastive learning for facial behavior understanding[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.New York:IEEE, 2023:20751-20762.
[24]GONG W J, LA Z Y, QIAN Y R, et al. Hybrid attentionaware learning network for facial expression recognition in the wild[J]. Arabian Journal for Science and Engineering,2024,49(9):12203-12217.
[25]WANG Y,PENG J L,ZHANG J N,et al. Toward high quality facial representation learning[C]//Proceedings of the 31st ACM International Conference on Multimedia.New York:Association for Computing Machinery,2023:5048-5058.
[26]HUI Y N, YI Q H. Local and global view occlusion facial expression recognition method[J]. Journal of Computer Engineering&Applications,2024,60(13):180-189.
[27]刘娟,王颖,胡敏,等.融合全局增强-局部注意特征的表情识别网络[J].计算机科学与探索,2024,18(9):2487-2500.
[28]GAO Z, PATRAS I. Self-supervised facial representation learning with facial region awareness[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE,2024:2081-2092.
[29]梁龙学,袁桂花,吴小所,等.基于轻量化改进CenterNet网络的光伏电池表面缺陷检测算法[J].兰州交通大学学报,2025,44(4):105-116.
基本信息:
DOI:
中图分类号:TP391.41
引用信息:
[1]张志勇,王昱,张顺.基于高效区域注意力机制的面部表情识别[J].兰州交通大学学报,2025,44(06):35-44.
基金信息:
甘肃省科技重大专项(22ZD6GA063); 中央引导地方科技发展资金项目(225107); 敦煌市科技计划项目(20250101)