您当前访问的的浏览器版本过低,为了给您带来更好的体验,建议您升级至Edge浏览器或者推荐使用Google浏览器
取消
悟道·文澜:超大规模多模态预训练模型带来了什么?
发布时间:2022-04-08  作者:卢志武,金琴,宋睿华,文继荣  阅读量:

悟道∙文澜:超大规模多模态预训练模型带来了什么?

卢志武1, 金琴2, 宋睿华1, 文继荣1,2
(1.中国人民大学高瓴人工智能学院,中国 北京100872;2.中国人民大学信息学院,中国 北京100872

摘要:提出了悟道•文澜的BriVL双塔模型。该模型利用6.5亿的互联网图文数据,通过自监督的任务来训练,是目前最大的中文通用图文预训练模型。通过实验发现,该模型在多个国际公开数据集上取得了最佳性能。同时,还提出了悟道•文澜的多语言多模态预训练单塔模型—MLMM。实验结果证明,该模型在多个国际公开数据集上取得了最佳性能,并可以学习到跨语言跨模态的通用常识。设计了实验并讨论超大规模多模态预训练模型对文本编码、图像生成和图文互检带来的影响,以及文澜模型的落地应用与学科交叉成果。  
关键词:多模态预训练;多语言预训练;双塔模型;单塔模型


WuDao-WenLan: What Do Very-Large Multimodal Pre-Training Models Bring?

LU Zhiwu1, JIN Qin2, SONG Ruihua1, WEN Jirong1,2
(1.Gaoling School of Artificial Intelligence, Renmin University of China, Beijing 100872, China;2.School of Information Renmin University of China, Beijing 100872, China)

Abstract: A multimodal pre-training two-tower model called WuDao-WenLan BriVL is proposed, which is trained through self-supervised learning over 650 M image-text pairs crawled from the Web. This is the largest open-sourced Chinese image-text pre-training model. Extensive experiments show that our BriVL achieves the new state-of-the-art on multiple benchmark datasets. Moreover, a multi-lingual pre-training single-tower model called WuDao-WenLan MLMM is also proposed. Extensive experiments show that our MLMM achieves superior performance on multiple multi-lingual benchmark datasets. In addition, experiments are conducted to discuss what very-large multimodal pre-training models bring to text encoding, text-to-image generation, and image-text retrieval, as well as in what applications WenLan can be applied in multiple fields.
Keywords: multimodal pre-training; multi-lingual pre-training; two-tower model; single-tower model

在线PDF浏览: PDF
本期相关文章