First VALSE Workshop on Methods and Technologies for Looking At People(MATLAP)

Workshop Co-Chairs: 山世光(中科院计算所), 孙剑(旷视), 华刚(微软)

时间: 2018年4月20日(下午)

目前的讲者:

Speaker

常乐

中科院神经所

Speaker

邓伟洪

北京邮电大学

Speaker

韩琥

中科院计算所

Speaker

乔宇

中科院深圳先进院

Speaker

俞刚

旷视科技

Speaker

郑良

Singapore University of Technology and Design

Speaker

郑伟诗

中山大学


First VALSE Workshop on Vision and Language(VL)

Workshop Co-Chairs: 白翔(华中科技大学), 罗杰波(罗切斯特大学)

时间: 2018年4月20日(下午)

目前的讲者:

Speaker

白翔

华中科技大学

Speaker

罗杰波

罗切斯特大学

Speaker

黄伟林

Malong Technologies

Speaker

刘成林

中国科学院自动化研究所

Speaker

梅涛

京东

Speaker

沈春华

University of Adelaide

Speaker

吴双

依图科技

Speaker

姚霆

Microsoft Research


First VALSE Workshop on Deep Learning Models: Modern Architecture, Compression and Acceleration(DLMMACA)

Workshop Chairs:王乃岩(图森未来)

时间: 2018年4月20日(上午)

目前的讲者:

Speaker

王乃岩

图森未来

Speaker

程健

中国科学院自动化研究所

Speaker

彭玺

四川大学

Speaker

刘日升

大连理工大学

Speaker

王井东

MSRA

Speaker

陈添水

中山大学


First VALSE Workshop on Pixel Level Image Understanding(PLIU)

Workshop Co-Chairs:程明明(南开大学),林倞(中山大学)

时间: 2018年4月20日(上午)

目前的讲者:

Speaker

程明明

南开大学

Speaker

刘偲

中科院信息工程研究所

Speaker

魏云超

伊利诺伊大学香槟分校

Speaker

王兴刚

华中科技大学

Speaker

董超

商汤科技


First VALSE Workshop on Brain-Inspired Vision&Learning(BIVIL)

Workshop Co-Chairs: 潘纲(浙江大学), 胡晓林(清华大学)

时间: 2018年4月20日(下午)

目前的讲者:

Speaker

张兆翔

中国科学院自动化研究所

Speaker

唐华锦

四川大学

Speaker

陈霸东

西安交大

Speaker

何晖光

中科院自动化所

Speaker

刘健

University of Leicester, UK

Speaker

胡晓林

清华大学


讲者介绍

  常乐 中科院神经所

报告题目:面孔识别的神经机制研究

报告摘要:世界由形形色色的物体构成。在和外部世界打交道的过程中,高等动物进化出复杂而精细的视觉系统从环境中提取物体信息。灵长类负责物体识别的脑区是下颞叶皮层。由于物体属性是很抽象的概念,如何研究这一脑区的编码性质始终是一个难题。面孔是对动物的生存和社会活动至关重要的一类特殊物体。2006年研究人员发现下颞叶存在专门处理面孔的脑区,但其具体的面孔编码机制仍不清楚。为了精确、定量地研究了下颞叶神经元对面孔的编码方式,我们使用了计算视觉领域发展出的面孔生成模型,该模型可以将任意一个50维向量对应为一张面孔。我们将该模型生成的面孔呈现给猕猴看,同时记录下颞叶面孔脑区神经元的反应,发现:与传统观点认为单个面孔细胞编码单个面孔不同,单个细胞编码了面孔空间的某一个维度(即细胞的反应是某些面孔特征的线性组合)。这种研究策略对理解复杂系统如何编码复杂对象可能有一般性的借鉴作用。

讲者信息:常乐博士,2002-2006就读于南开大学数学系,获理学学士学位。2006-2013就读于中国科学院生物物理研究所,获理学博士学位。2013-2017 在美国加州理工学院从事博士后研究工作。常乐博士主要研究灵长类视觉系统如何提取物体信息。2018年1月起任中科院神经科学研究所研究员。

  邓伟洪 北京邮电大学

报告题目:真实世界人脸表情识别

报告摘要:人脸表情是对我们日常体验的一项重要反馈,然而目前大多数数据库和相关研究都局限于实验室控制环境下的特定单一人脸。于是我们建立了一个新颖的人脸表情数据库:真实世界表情人脸数据库(RAF-DB)。该数据库包含了接近三万张跨越不同年龄和种族包含各种不受控制的姿态和光照条件下的人脸图像。通过众包标注,每张样本都被保证能够独立标注大约40次。同时我们开发了EM算法用来对标注的结果进行可靠性的评估从而筛选出最优的表情标注结果。在RAF-DB和CK+上的跨库实验进一步表明了真实世界的表情比那些在实验室环境下严格受控的表情更加复杂多样。针对真实世界的表情识别,我们提出了一个深度局部保持卷积网络(DLP-CNN)。该网络通过在最大化类间距离的同时保留所学特征的局部紧密性,从而提高了深度特征的判别能力。在RAF-DB上进行的七类表情和十一类复合表情实验以及在CK+,MMI和SFEW 2.0上的附加实验结果表明DLP-CNN比目前先进的手工设计特征和深度学习方法更加适用于真实世界的人脸表情识别。为了促进相关表情研究,我们公布了RAF-DB数据库和基于数据库所提取的各种特征和基准实验结果。(http://www.whdeng.cn/RAF/model1.html)

讲者信息:邓伟洪,北京邮电大学信息与通信工程学院教授,博士生导师,研究方向为计算机视觉与模式识别,在包括IEEE TPAMI、CVPR在内的国际期刊和会议上发表论文100余篇,曾入选北京市优秀博士学位论文、教育部“新世纪优秀人才”计划、北京市“科技新星”计划等。

  韩琥 中科院计算所

报告题目:多模态人脸识别与属性学习

报告摘要:近年来基于二维图像的人脸识别取得了突飞猛进的进步,在以LFW为代表的可控场景数据集上可以取得超过99.8%的识别准确率,但在非可控场景下基于二维图像的人脸识别鲁棒性仍然不高,特别是在光照不足和姿态变化大等场景下。2013年9月苹果公司在iPhone 5S上引入TouchID并引领了基于指纹的智能手机身份认证,2017年9月苹果公司在iPhone X上引入了FaceID将基于多模态人脸识别的智能手机身份认证推向风口浪尖。人脸图像除了广泛应用于身份识别之外,还可以形成结构化的人脸属性表达,从而服务于社交媒体、人脸检索和精准广告等面向群体粒度的业务需求。中科院计算所视觉信息处理与学习研究组(简称VIPL)其实早在FaceID发布一年之前,就已经将人脸识别技术应用于智能手机身份认证(华为Magic FaceCode),并率先开展了大规模RGB-D多模态人脸识别方法的研究。讲者本人从2012年起一直在从事多模态人脸识别与属性学习等方面的研究,具体工作涉及多模态互补特征学习与融合、缺失模态补全、跨模态表征学习和单任务与多任务属性学习等多个方向。本次报告中,讲者将对该领域的发展进行简要回顾,对所做的代表性工作进行介绍并对该领域的发展趋势进行展望。

讲者信息:韩琥,中科院计算所副研究员,硕士生导师。2005年毕业于山东大学获学士学位,2011年毕业于中科院计算所获博士学位,之后分别在美国密歇根州立大学和美国谷歌总部从事生物特征识别研究工作,曾担任谷歌总部Abacus项目核心研发成员。2015年加入中科院计算所,主要研究方向为生物特征识别、计算机视觉及模式识别,特别是生物启发的视觉感知建模方法如:多目标检测、多模态感知与多任务协同等。先后在包括IEEE T-PAMI、IEEE T-IFS、PR、ECCV等领域知名国际期刊与会议上发表学术论文30余篇,其中一作与通讯作者CCF-A类期刊长文7篇(一作T-PAMI长文2篇),谷歌学术引用1100余次(H-Index:19);多次应邀担任ICB、IJCB、ACCV、CCBR等知名国内外生物特征识别与计算机视觉会议程序委员会委员。作为负责人承担国家基金委重点项目课题1项,中科院对外合作重点项目1项和国家基金委面上项目1项,作为课题骨干参与科技部973项目课题1项;曾作为技术负责人参与美国司法部和自然科学基金的模拟画像识别和人脸活体判别等多个项目。联合指导研究生获ICCV2015表观年龄识别竞赛亚军、CCBR2016最佳学生论文奖。

  乔宇 中科院深圳先进院

报告题目:Deeply Understanding Human Poses and Actions in the Wild

报告摘要:Human pose estimation and action recognition is receiving extensive research interests in computer vision nowadays due to its wide applications in surveillance, human-computer interface, sports video analysis, and content based video retrieval. The challenges of pose and action understanding come from background clutter, viewpoint changes, and motion and appearance variations. This talk will summarize recent progresses toward human poses estimation and actions recognition in the wild, especially those based on the deep learning methods. We will also analyze the challenges and future directions.

讲者信息:Yu Qiao is a professor with the Shenzhen Institutes of Advanced Technology (SIAT), the Chinese Academy of Science, and the deputy director of multimedia research lab. His research interests include computer vision, deep learning, and robots. He has published more than 150 papers in international journals and conferences, including IEEE T-PAMI, IJCV, IEEE T-IP, IEEE T-SP, CVPR, ICCV, AAAI, ECCV. He received Jiaxi Lv young research award from Chinese academy of sciences. He is a senior member of IEEE. He was the first runner-up at the ImageNet Large Scale Visual Recognition Challenge 2015 in scene recognition, and the winner at the ActivityNet Large Scale Activity Recognition Challenge 2016 in video classification. His group also achieved top places in wide international challenges such as ChaLearn, LSun, THUMOUS.

  俞刚 旷视科技

报告题目:Cascaded Pyramid Network for Multi-Person Pose Estimation

报告摘要:多人姿态估计这个问题近几年有了很大的提升,尤其是在卷积神经网络快速发展下。然而,还有许多情况下存在问题,比如关键点重叠、关键点不可见以及背景复杂的情况。我们提出了一种新的网络结构 Cascaded Pyramid Network,级联金字塔网络 CPN,意在解决这些困难情况下的关键点识别问题。具体来说,算法包含两个阶段,GlobalNet 和 RefineNet。GlobalNet 是一个特征金字塔网络,它可以找到所有“简单”的关键点。然后RefineNet 是专门用来处理“难”的关键点的,它会把 GlobalNet 中所有级别的特征表征和一个难关键点的挖掘损失集成到一起。基于我们的算法,我们在COCO test-dev 数据集上取得73.0的平均精度,并在COCO test-challenge 数据集上取得72.1的平均精度。这一成绩比COCO 2016 关键点检测比赛的最好成绩60.5提升了19%之多。

讲者信息:俞刚,旷视(face++)资深研究员。俞刚博士于2014年毕业于新加坡南洋理工大学。博士毕业后在南洋理工大学从事research fellow的研发工作。2014年底加入旷视科技公司。其主要研究方向主要集中在计算机视觉以及机器学习方面,包括行人动作行为分析,行人姿态估计,物体检测以及语义分割。自2010年以来,已经在顶级会议如CVPR, AAAI, ECCV以及顶级期刊如IEEE Transaction on Image Processing, IEEE Transaction on Multimedia等上面发表学术论文十多篇。同时著有书籍一本。俞刚博士带队参加2017年图像识别国际大赛COCO获得检测第一名,人体姿态估计第一名

  郑良 Singapore University of Technology and Design

报告题目:Improving Person Re-identification with Generative Adversarial Networks

报告摘要:The Generative Adversarial Network (GAN) has made impressive achievement in image generation. Basically, it is composed of a Discriminator and a Generator. The former reveals whether a generated sample is fake or real, while the latter produces samples to cheat the discriminator. While major attention is paid to the visual quality of the generated samples and semi-supervised learning in vivo, we focus on the in vitro application of the generated samples in supervised learning. Particularly, we consider the task of person re-identification, a popular vision topic aiming to retrieve images of a queried identity from a large person gallery. In this talk, after introducing some related works and baselines in GANs and person re-identification, I will describe our recent interesting attempts at using the generated samples to improve the re-identification performance. First, in CNN regularization, we propose the label smoothing regularization for outliers (LSRO). We show that the unlabelled samples generated by GANs effectively improve the baseline accuracy. Second, taking advantage of CycleGAN, we propose a new data augmentation approach named CamStyle. This method smooths the camera style disparities by generating new labeled training samples in a camera-informed manner. We employ the label smooth loss to address style transfer errors during supervised learning. Finally, we consider the transfer learning setting and propose the similarity preserving generative adversarial network (SPGAN), aiming to preserve the underlying ID cues after style transfer. We show that this method considerably improves the CycleGAN baselines and yields competitive accuracy with the state-of-the-art transfer learning methods in person re-identification.

讲者信息:Dr Liang Zheng is joining Singapore University of Technology and Design (SUTD) as an Assistant Professor. Prior to that, he worked as a postdoc researcher in the University of Technology Sydney and in the University of Texas at San Antonio. He obtained his B.E and PhD degrees from Tsinghua University. He has published over 20 papers in highly selected venues such as TPAMI, IJCV, CVPR, ECCV, and ICCV. He has made one of the initial attempts in large-scale person re-identification. His works are extensively cited by the community and the most highly cited paper is cited 250+ times since 2015. Dr Zheng received the Outstanding PhD Thesis from Chinese Association of Artificial Intelligence, and the Early Career R&D Award from D2D CRC, Australia. His research was featured by the MIT Technical Review and selected into the computer science courses in Stanford University and the University of Texas at Austin.

  郑伟诗 中山大学

报告题目:行人重识别的若干难点问题

报告摘要:为了在大范围多摄像机网络下实现行人连续追踪,过去多年以来,行人重识别得到了广泛和深入的发展,现有方法已经在许多标志性的数据库上达到非常高的识别率效果。然而,行人重识别在实际中仍然面临大量开放性难题,包括低分辨率、遮挡、跨模态、少量类标问题等。这个报告将关注这几方面的研究发展以及汇报我们在这几方面的工作和进展。

讲者信息:郑伟诗博士,中山大学数据科学与计算机学院教授。他主要面向大规模智能视频监控,展开视频图像信息与信号的处理研究,并开展大规模机器学习的算法和理论研究。他目前的主要研究应用领域是:视频监控下的行人身份识别与行为信息理解。面向大规模监控网络下的行人追踪问题,他在国内外较早和持续开展跨视域行人重识别的研究,发表一系列以跨视域度量学习为主线的研究工作,他提出的基于相对比较思想建模思路在行人重识别中被广泛深入研究。他已发表100余篇主要学术论文,其中60余篇发表在图像识别和模式分类IEEE TPAMI、IEEE TIP、IEEE TNN、PR、IEEE TCSVT、IEEE TSMC-B等国际主流权威期刊和ICCV、CVPR、IJCAI等计算机学会推荐A类国际学术会议。获国家优秀青年科学基金、英国皇家学会牛顿高级学者基金、广东省自然科学杰出青年基金和广东省创新领军人才项目支持。他目前是国际主流权威期刊Pattern Recognition的编委。主页:http://isee.sysu.edu.cn/~zhwshi/

  白翔 华中科技大学

讲者信息:白翔,华中科技大学电子信息与通信学院教授,博导,国家防伪工程中心副主任。先后于华中科技大学获得学士、硕士、博士学位。他的主要研究领域为计算机视觉与模式识别、深度学习。尤其在形状的匹配与检索、相似性度量与融合、场景OCR取得了一系列重要研究成果,入选2014-17年Elsevier中国高被引学者。他的研究工作曾获微软学者,国家自然科学基金优秀青年基金的资助。他担任VALSE指导委员,IEEE信号处理协会(SPS)武汉Chapter主席;曾担任VALSE在线委员会(VOOC)主席, VALSE 2016大会主席。

  罗杰波 罗切斯特大学

讲者信息:罗杰波教授目前就职于美国罗切斯特大学(University of Rochester, USA)计算机科学系,是IEEE、SPIE和IAPR等国际著名学会的会士(Fellow)。研究涉及图像处理、计算机视觉、机器学习、数据挖掘、社交媒体、医疗影像分析、普适性计算等多个前沿领域。罗杰波教授曾于”柯达实验室“从事研究长达十五年,并担任该实验室首席科学家。罗杰波教授是国际顶级会议ACM Multimedia 2010/2018,CVPR 2012大会共同主席,并担任IEEE Transactions on PAMI, IEEE Transactions on TMM, IEEE Transactions on CSVT, ACM Transactions on TIST, Pattern Recognition等国际顶级学术期刊副编辑。发表超过350篇学术论文,持有超过90项美国专利。近年来,罗杰波教授在社交多媒体研究及其社会应用中做出了开创性贡献。

  黄伟林 Malong Technologies

报告题目:Reading Text in the Wild: from Text Detection to End-to-End Recognition

报告摘要:Text reading in natural images has received increasing attention in vision community, and recent deep learning technologies have advanced this task significantly. In this talk, we present our recent work that goes from scene text detection to an end-to-end recognition. We describe recent  detection approaches which include both bounding-box based detectors and segmentation based methods. We study main challenge and limitation of these approaches, and discuss the difficulty in extending detection approaches to the end-to-end recognition. In addition, we analysis various attention mechanism, which are of critical importance to  this task. The talk covers our recent work published in ECCV 2016, ICCV 2017 and CVPR 2018.

讲者信息:Dr. Weilin Huang is the Chief Scientist of Malong Technologies. He was working as a postdoc researcher with Prof. Andrew Zisserman in Visual Geometry Group (VGG), University of Oxford. He was an Assistant Professor with the Chinese Academy of Science. He received his Ph.D. degree from The University of Manchester, U.K. His research interests include scene text detection/recognition, large-scale image classification and medical image analysis. He has served as a PC Member or Reviewer for main computer vision conferences, including ICCV, CVPR, ECCV and AAAI. His team was the first runner-up at the ImageNet 2015 on scene recognition, and was the winner of WebVision Challenge in CVPR 2017.

  刘成林 中国科学院自动化研究所

报告题目:文档图像识别技术趋势探讨

报告摘要:文档图像分析与识别(简称文字识别)经过50多年的研究,产生了大量的研究成果,但是在实际应用中还存在很多技术不足,需要从应用的角度重新思考聚焦研究问题。本报告中,我首先简要介绍文档图像分析的应用背景、研究历史和技术现状(包括手写文档识别、场景文本检测与识别的主要技术和性能状况)。然后从未来应用需求角度,探讨文档内容理解相关的研究问题和研究方向,如文档结构理解、符号识别、识别可靠性分析、交互式识别、大数据驱动的识别、语义提取、摘要和翻译等。

讲者信息:刘成林,中国科学院自动化研究所副所长,模式识别国家重点实验室主任,研究员、博士生导师。2005年入选中国科学院“百人计划”。2008年获得国家杰出青年科学基金资助。1989年毕业于武汉大学无线电信息工程系,1992年在北京工业大学获电路与系统专业工学硕士学位,1995年在中国科学院自动化研究所获模式识别与智能控制专业工学博士学位。1996年3月到1997年10月在韩国科学技术院(KAIST)从事博士后研究。1997年11月到1999年3月在日本东京农工大学从事博士后研究。1999年3月到2004年12月在日立中央研究所(东京)先后任研究员和主任研究员。研究兴趣包括图像处理、模式识别、机器学习、文字识别与文档分析等。在国际期刊和国际会议上发表论文200余篇,合著英文专著一本。现任国际刊物Pattern Recognition的副主编, Image and Vision Computing, Int. J. DocumentAnalysis and Recognition, Cognitive Computation的编委,国内期刊《自动化学报》的副主编。美国电气电子工程师协会会士(IEEE Fellow)、国际模式识别学会会士(IAPR Fellow)。

  梅涛 京东

报告题目:Recent Advances in Vision to Language

报告摘要:Visual recognition has been a fundamental challenge in computer vision for decades. Thanks to the recent development of deep learning techniques, researchers are striving to bridge vision (image and video) and natural language, which has become an emerging research area. We will present a few recent advances bridging vision and language with deep learning techniques, including image and video captioning, image and video chatting, storytelling, vision and language grounding, datasets, grand challenges, and open issues.

讲者信息:Tao Mei is the Deputy Managing Director of JD AI Research and Technical Vice President of JD.com. He was a Senior Researcher and Research Manager with Microsoft Research Asia. His current research interests include multimedia analysis and computer vision. He has authored or co-authored over 150 papers with 11 best paper awards. He holds over 50 filed U.S. patents (with 20 granted) and has shipped a dozen inventions and technologies to Microsoft products and services. He is or has been an Editorial Board Member of IEEE Trans. on Circuits and Systems for Video Technology, ACM Trans. on Multimedia Computing, Communications, and Applications, and IEEE Trans. on Multimedia. He is the General Co-chair of IEEE ICME 2019, the Program Co-chair of ACM Multimedia 2018, IEEE ICME 2015, and IEEE MMSP 2015. Tao is as a Fellow of IAPR, a Distinguished Scientist of ACM, and an IEEE Signal Processing Society Distinguished Industry Speaker.

  沈春华 University of Adelaide

报告题目:Visual Question Answering:  new datasets and approaches

报告摘要:Combining computer vision and natural language processing is an emerging topic which has been receiving  much research attention recently. Visual Question Answering (VQA) can be seen as a proxy task for evaluating a vision system’s capacity for deeper image understanding. Current datasets, and the models built upon them, have focused on questions which are answerable by direct analysis of the question and image alone. The set of such questions that require no external information to answer is interesting, but very limited. We thus propose a new VQA dataset (FVQA) with additional supporting-facts. In response to the observed limitations of RNN-based approaches, we propose a method which is based on explicit reasoning about the visual concepts detected from images.
Second, an intriguing features of the Visual Question Answering (VQA) challenge is the unpredictability of the questions. Extracting the information required to answer them demands a variety of image operations from detection and counting, to segmentation and reconstruction. We propose a general and scalable approach which exploits the fact that very good methods to achieve these operations already exist, and thus do not need to be trained.
Last, the key challenge in Visual Dialogue is to maintain a consistent, and natural dialogue while continuing to answer questions correctly. We show how to combine Reinforcement Learning and Generative Adversarial Networks (GANs) to generate more human-like responses to questions.

讲者信息:Chunhua Shen is a Full Professor at School of Computer Science, University of Adelaide, leading the Statistical Machine Learning Group. He is a Project Leader and Chief Investigator at the Australian Research Council Centre of Excellence for Robotic Vision (ACRV), for which he leads the project on machine learning for robotic vision. Before he moved to Adelaide, he was with the computer vision program at NICTA (National ICT Australia), Canberra Research Laboratory for about six years. He studied at Nanjing University, at Australian National University, and received his PhD degree from the University of Adelaide. From 2012 to 2016, he holds an Australian Research Council Future Fellowship.

  吴双 依图科技

报告题目:基于肺CT图像的自动诊断报告生成

报告摘要:在基于肺CT的诊断系统中,放射科医生需要阅片并书写诊断报告。为了提高医生的工作质量和效率,我们开发了算法对多张CT图像进行肺部可能病灶的自动检测,分割和良恶性判定,并且根据输出结果自动生成文字的诊断报告,在医院的临床日常使用中,超过92%的自动生成报告被医生不加改动直接采纳,证明了这一系统的准确性。

讲者信息:Dr Wu Shuang is a Research Scientist at YITU Technology. He is a former senior research scientist at Baidu Research’s Silicon Valley AI Lab and a senior architect at Baidu’s US Development Center. He holds a PhD in Physics from the University of Southern California and was a post-doctoral scholar at the University of California, Los Angeles (UCLA). He specializes in machine learning, computer vision, deep neural networks, speech recognition and other related topics in artificial intelligence.

  姚霆 Microsoft Research

报告题目:Describing Multimedia by Localization and Generation

报告摘要:Automatically describing an image/video with natural language is regarded as a fundamental challenge in computer vision. The problem nevertheless is not trivial especially when an image/video contains multiple salient regions/events to be worthy of mention, which often happens in real images/videos. A valid question is how to spatially/temporally localize and then describe regions/events, which is known as “dense image/video captioning.” The goal of this talk is to present recent advances on this topic, understand the opportunities and challenges, showcase innovative methodologies, and evaluate the state of the art. Moreover, we will also discuss the reflection on what is likely to be the next set of developments in captioning and what is likely to be the next big leap.

讲者信息:Ting Yao is currently a Researcher in the Multimedia Search and Mining group at Microsoft Research, Beijing, China. His research interests include video understanding, large scale multimedia search, and deep learning. He has shipped several technologies to Microsoft products, such as Windows Photos, Microsoft XiaoIce, Microsoft Cognitive Services, Microsoft Bing Multimedia Search, etc. Dr. Yao is an active participant of several benchmark evaluations. He is the principal designer of the top-performing multimedia analytic systems in worldwide competitions such as COCO Image Captioning, Visual Domain Adaptation Challenge 2017, ActivityNet Large Scale Activity Recognition Challenge 2017&2016, THUMOS Action Recognition Challenge 2015, and MSR-Bing Image Retrieval Challenge 2014&2013. He is one of the organizers of the MSR Video to Language Challenge 2017&2016. For his contributions to Multimedia Search by Self, External and Crowdsourcing Knowledge, he was awarded the 2015 SIGMM Outstanding Ph.D. Thesis Award. He completed a Ph.D. in computer science (2014) at the City University of Hong Kong, advised by Prof. Chong-Wah Ngo. He received the B.Sc. degree in theoretical and applied mechanics (2004), B.Eng. double degree in electronic information engineering (2004), and M.Eng. degree in signal and information processing (2008) all from the University of Science and Technology of China, Hefei, China.

  王乃岩 个人主页 图森未来

报告题目:Towards Practical Deep Learning Model Compression and Acceleration

报告摘要:Deep neural networks have liberated its extraordinary power on various tasks. However, it is still very challenging to deploy state-of-the-art models into real-world applications due to their high computational complexity. In this talk, I will start with the background of deep model compression and acceleration, and discuss the practical aspect of this technique. Then I will introduce three recent works done in TuSimple by novel techniques in model distillation and sparse model structure selection. By combining these techniques, we can build a fully automatic pipeline for joint model training, performance boosting and model acceleration. These works all demonstrate superior performance in practice, and have been deployed in TuSimple's production.

讲者信息:Naiyan Wang is currently the principal scientist of TuSimple. He leads the algorithm research group in our Beijing branch. Before this, he got his PhD degree from CSE department, HongKong University of Science and Technology in 2015. His research interest focuses on applying statistical computational model to real problems in computer vision and data mining. Currently, he mainly works on the vision based perception and localization part of autonomous driving. Especially he integrates and improves the cutting-edge technologies in academia, and make them work properly in autonomous truck.

  程健 个人主页 中国科学院自动化研究所

报告题目:深度学习高效计算与处理器设计

报告摘要:近年来,深度学习在计算机视觉、语音识别、自然语言处理等领域取得了巨大成功。然而,深度学习的网络模型需要大量的密集计算,一方面给传统通用处理器的运算带来巨大压力;另一方面,随着移动应用和物联网的普及,高效低耗也成为处理器设计需要考虑的关键因素。本报告首先介绍深度学习在高效计算方面面临的挑战;然后从网络模型的优化计算、新型处理器架构设计等方面介绍国内外的主流思想,同时也将分享本人课题组在这方面做的一些最新工作进展;最后,与大家分享个人对该领域未来发展和趋势的看法。

讲者信息:程健,男,现为中国科学院自动化研究所模式识别国家重点实验室研究员、南京人工智能芯片创新研究院常务副院长、人工智能与先进计算联合实验室主任。分别于1998年和2001年在武汉大学获学士和硕士学位,2004年在中国科学院自动化研究所获博士学位。2004年至2006年在诺基亚研究中心做博士后研究。2006年9月至今在中科院自动化研究所工作。目前主要从事深度学习、人工智能芯片设计、图像与视频内容分析等方面研究,在相关领域发表学术论文100余篇,英文编著二本。曾先后获得中科院卢嘉锡青年人才奖、中科院青年促进会优秀会员奖、中国电子学会自然科学一等奖、教育部自然科学二等奖等。目前担任国际期刊《Pattern Recognition》的编委,曾担任2010年ICIMCS国际会议主席、HHME 2010组织主席、CCPR 2012出版主席等。

  彭玺 个人主页 四川大学

报告题目:Advances in Differentiable Programming: Beyond Simple RNN and Optimization

报告摘要:Differentiable programming treats the neural network as a language, which reformulates the traditional statistical inference method (e.g. sparse coding, CRM) as a neural network. As a result, the newly designed neural networks could simultaneously enjoy high interpretability and problem-specific structure given by statistical inference methods, and larger learning capacity and better utilization of Big Data thanks to neural networks. Existing works have two limitations. On the one hand, the newly designed model is a simple recurrent neural network which is just a small portion of neural networks. On the other hand, they focus on implementing existing optimizers in neural networks, which hinders the application and progress of this topic. In this talk, I will introduce our two works which address the above issues in an elegant way. The first work bridges an L1-solver with an LSTM and further shows its effectiveness in image reconstruction and abnormal event detection. The second work recasts the objective function of the vanilla k-means as a feedforward neural network and demonstrates its promising performance in data clustering.

讲者信息:彭玺,2013年12月毕业于四川大学计算机学院,2017年被四川大学引进为特聘研究员;2014年至2017年,就职于新加坡科技研究局资讯通信研究院(Institute for Infocomm, A*STAR)担任研究员(Scientist)。主要研究方向包括深度学习、子空间学习及聚类以及在计算机视觉和数据分析中的应用;曾/任IEEE Trans Neural Netw Learn Syst和 Image Vis Comput的客座编委、ECCV16专题报告组织主席、AAAI17分会主席、IJCAI17领域主席、VCIP17领域主席等。

  刘日升 个人主页 大连理工大学

报告题目: From Optimization to Deep Architecture Design: Unrolling with Strict Convergence Guarantee

报告摘要:Deep learning models have gained great success in many real-world applications. However, most existing networks are typically designed in heuristic manners, thus lack of rigorous mathematical principles and derivations. Several recent studies build deep structures by unrolling a particular optimization model that involves task information. Unfortunately, due to the dynamic nature of network parameters, their resultant deep propagation networks do not possess the nice convergence property as the original optimization scheme does. This talk introduces a series of new ideas to establish deep models by integrating experimentally verified network architectures and rich cues of the tasks. More importantly, we prove in theory that 1) the propagation generated by our unrolled deep model globally converges to a critical-point of a given variational energy, and 2) the proposed framework is still able to learn priors from training data to generate a convergent propagation even when task information is only partially available. Indeed, these theoretical results are the best we can ask for, unless stronger assumptions are enforced.

讲者信息:刘日升,大连理工大学国际信息与软件学院副教授。大连理工大学本科、博士,香港理工大学博后。研究方向为面向视觉问题的可学习优化算法,特别是基于深度模型的非凸非光滑优化建模与求解。近年来在重要学术期刊(TPAMI、TIP、TNNLS、TMM、Machine Learning等)和会议(CVPR、NIPS、AAAI、ACM MM、ECCV、CIKM、ICDM等)发表论文70余篇。相关工作被引用1200余次,最高单篇引用500余次。连续获得ICME(CCF-B类)2015和2014年度最佳学生论文奖,两篇论文入围ICME 2017最佳论文Finalist(Top 3%),获得ICIP 2015(CCF-C类)最佳10%论文奖,ICIMCS 2017最佳论文提名奖,IEEE智能计算专委会亮点论文奖(Publication Spotlight)。获得教育部自然科学二等奖1项(排名第三)、辽宁省自然科学二等奖1项(排名第三)。担任The Visual Computer Journal(CCF-C类期刊)、IET Image Processing(CCF-C类期刊),Journal of Electronic Imaging编委(Associate Editor),CVPR、ICCV、ECCV、NIPS、IJCAI、AAAI、ACCV、BMVC、ICIP等会议PC成员或审稿人以及IJCV、TPAMI、TIP、TNNLS、TKDE、TCSVT、TPDS等期刊审稿人。中国计算机学会计算机视觉专委会委员、中国计算机学会多媒体专委会委员、中国图像图形学会机器视觉专委会委员、中国图像图形学会多媒体专委会委员、中国计算机学会YOCSEF大连委员。

  王井东 个人主页 MSRA

报告题目:Interleaved Structured Sparse Convolution for Efficient and Effective Deep Neural Networks

报告摘要:Eliminating the redundancy in convolution kernels has been attracting increasing interests for designing efficient convolutional neural network architectures with three goals: Small model, Fast computation, and High accuracy. Existing solutions include low-precision kernels, structured sparse kernels, low-rank kernels, and the product of low-rank kernels. In this talk, I will introduce a novel framework: Interleaved Structured Sparse Convolution (ISSC), which uses the product of structured sparse kernels to compose a dense convolution kernel. It is a drop-in replacement of normal convolution and can be applied to any networks that depend on convolution. I present the complementary condition and the balance condition to guide the design of structured sparse kernels, obtaining a balance between three aspects: model size, computation complexity and classification performance. I will show empirical and theoretic justification of the advantage of the proposed approach over Xception and MobileNet. In addition, ISSC raises a rarely-studied matrix decomposition problem: sparse matrix factorization (SMF). I expect more research efforts in SMF from the researchers in the area of matrix analysis.

讲者信息:Jingdong Wang is a Senior Researcher at the Visual Computing Group, Microsoft Research, Beijing, China. His areas of interest include computer vision, machine learning, and multimedia. He is currently working on CNN architecture design, human understanding, person re-identification, multimedia search, and large-scale indexing. He has served/will serve as an area chair in IJCAI 2018, ACMMM 2018, ICPR 2018, AAAI 2018, ICCV 2017, CVPR 2017, ECCV 2016, ACMMM 2015 and ICME 2015, a track chair in ICME 2012. He is an editorial board member for IEEE Transactions on Circuits and Systems for Video Technology, IEEE Transactions on Multimedia, and Tools and Applications. He has shipped 10+ technologies to Microsoft products, including XiaoIce Chatbot, Microsoft cognitive service, and Bing search.

  陈添水 个人主页 中山大学

报告题目: Image Distillation for Deep Neural Networks Acceleration

报告摘要:Accelerating deep neural networks (DNNs) has been attracting increasing attention as it can benefit a wide range of applications, e.g., enabling mobile systems with limited computing resources to own powerful visual recognition ability. A practical strategy to this goal usually relies on a two-stage process: operating on the trained DNNs (e.g., approximating the convolutional filters with tensor decomposition) and fine-tuning the amended network, leading to difficulty in balancing the trade-off between acceleration and maintaining recognition performance. In this work, we achieve a general and comprehensive way for neural network acceleration by distilling knowledge of the original input image to low-resolution channels (sub-images). Specifically, we develop a Wavelet-like Auto-Encoder (WAE) that decomposes the original input image into two low-resolution channels and incorporate the WAE into the classification neural networks for joint training. The two decomposed channels, in particular, are encoded to carry the low-frequency information (e.g., image profiles) and high-frequency (e.g., image details or noises), respectively, and enable reconstructing the original input image through the decoding process. Then, we feed the low-frequency channel into a standard classification network such as VGG or ResNet and employ a very lightweight network to fuse with the high-frequency channel to obtain the classification result. During training, we use the class probabilities predicted on the original input images as soft target thus enabling transferring their knowledge to help improve classification results. Compared to existing DNN acceleration solutions, our framework has the following advantages: i) it is tolerant to any existing convolutional neural networks for classification without amending their structures; ii) the WAE provides an interpretable way to preserve the main components of the input image for classification.

讲者信息:Tianshui Chen received his B.S. degree from the School of Information and Science Technology, Sun Yat-Sen University, Guangzhou, China, in 2013, where he is currently pursuing his Ph.D. degree in computer science with the School of Data and Computer Science. His current research interests include computer vision and machine learning. He was the recipient of World's FIRST 10K Best Paper Award—Diamond Award in ICME 2017.

  程明明 个人主页 南开大学

报告题目:互联网图像中的像素级语义识别

报告摘要:理解图像中的语义信息是计算机视觉的基础问题,在许多领域都有着重要应用。虽然近年来相关研究取得迅猛发展,但是现有最优方案太过依赖于海量高质量的图像标注。与此相对,人类可以毫不费劲的通过上网搜索等方式自主地学习如何进行高精度的语义识别和目标提取。受这个现象启发,报告人将从显著性物体检测、图像分割、边缘提取等类别无关的语义特征提取技术讲起。接下来将介绍如何利用这类别无关的图像语义特征去减少语义学习过程中对精确标注的依赖,进而实现无需任何显示人工标注的图像语义理解技术。

讲者信息:程明明2012年博士毕业于清华大学,之后在英国牛津从事计算机视觉研究,并于2014年回国任教,现为南开大学教授,国家“万人计划”青年拔尖人才,中科协青年人才托举工程入选者。其主要研究方向包括:计算机图形学、计算机视觉、图像处理等。已在IEEE PAMI, ACM TOG等CCF-A类国际会议及期刊发表论文30余篇。相关研究成果受到国内外同行的广泛认可,论文他引7000余次,最高单篇他引2000余次。其研究工作曾被英国《BBC》,《每日电讯报》,德国《明镜周刊》,美国《赫芬顿邮报》等权威国际媒体撰文报道。

  刘偲 个人主页 中科院信息工程研究所

报告题目:图像理解与编辑

报告摘要:近年来,基于深度学习的图像视频分析技术取得了巨大成功。相比于传统的物体分类识别技术,图像的像素级语义理解,又称语义分割,能提供更加丰富的像素级信息, 因而成为一个新的研究热点。本报告以语义分割的三个典型实例,即场景解析,人脸解析以及人像解析为切入点,重点介绍我们针对语义分割的以下两个挑战做出的工作。1:减少人工标注工作量:在很多实用场景中,图像尺寸大且标签种类繁多,纯人工逐像素标注非常昂贵且低效。我们提出一系列在不降低算法精度的前提下,极大减少人工标注成量的无监督、半监督、弱监督语义分割算法。2:提升分割精度: 通过综合考虑上下文信息,如语义标签之间的共生性和互斥性,不同信息源的互补性,极大地改进了分割精度。最后,我们也将展示语义分割在智能相机、视频监控、智能家居、电商平台搜索等多个领域的应用效果。

讲者信息:刘偲,中国科学院信息工程研究所副研究员。本科毕业于北京理工大学校级实验班,博士毕业于中科院自动化所,曾于新加坡国立大学任研究助理及博士后。其研究领域包括计算机视觉和多媒体分析。刘偲以图像视频中的人物分析为切入点,开展相关研究并形成了较为完整的体系。2017-2019年中科协青年人才托举工程入选者,微软亚洲研究院铸星计划研究员,CCF-腾讯犀牛鸟科研基金获得者。

  魏云超 个人主页 伊利诺伊大学香槟分校

报告题目:Towards Weakly- and Semi- Supervised Object Localization and Semantic Segmentation

报告摘要:Over the past few year, the great success of CNNs in object detection and image semantic segmentation relies on a large amount of human annotations. However, collecting annotations such as bounding boxes and segmentation masks is very costly. To relieve the demand of finance and human effort, in this talk, Dr. Yunchao Wei will introduce his recent works, which utilize weak information as supervision to address more challenging object localization and semantic segmentation tasks. In particular, he proposes several novel solutions to produce dense object localization maps only using image-level labels as supervision. The dense object localization maps can successfully build the relationship between image-level labels and pixels, and effectively boost the accuracy of localization and segmentation tasks. His works are published on top-tier journals/conferences (e.g. T-PAMI and CVPR) and achieve state-of-the-art performance.

讲者信息:Yunchao Wei is currently a Postdoctoral Researcher in Beckman Institute at the University of Illinois at Urbana-Champaign, working with Prof. Thomas Huang. He received his Ph.D. degree from Beijing Jiaotong University in 2016, advised by Prof. Yao Zhao. He received Excellent Doctoral Dissertation Awards of Chinese Institute of Electronics (CIE) and Beijing Jiaotong University in 2016, the Winner prize of the object detection task (1a) in ILSVRC 2014, the Runner-up prizes of all the video object detection tasks in ILSVRC 2017. His current research interest focuses on computer vision techniques for large-scale data analysis. Specifically, he has done work in weakly- and semi-supervised object recognition, multi-label image classification, video object detection and multi-modal analysis.

  王兴刚 个人主页 华中科技大学

报告题目:Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing

报告摘要:Recent state-of-the-art methods on this problem first infer the sparse and discriminative regions using deep classification networks for each object class, then train semantic segmentation networks using the discriminative regions as supervision. Inspired by the traditional image segmentation methods of seeded region growing, we propose to train semantic segmentation networks starting from the discriminative regions and progressively increase the pixel-level supervision using the idea of seeded region growing. The seeded region growing module is integrated in a deep segmentation network and beneficial from deep features. Different from conventional deep networks which has fixed/static supervision, the proposed weakly-supervised network produces some labels for its input data using the contextual information within an image. The proposed method significantly outperforms the weakly-supervised semantic segmentation methods using static supervision, and obtains the state-of-the-art performance on both PASCAL VOC 2012 and COCO, which are 63.2% mIoU score on the PASCAL VOC 2012 test set and 26.0% mIoU score on the COCO dataset.

讲者信息:王兴刚,华中科技大学,电子信息与通信学院,讲师。主要研究方向为计算机视觉和机器学习,尤其在于目标检测和深度学习。分别于2009年和2014年在华中科技大学获得学士和博士学位。迄今在发表学术论文50余篇,其中包括国际顶级会议(ICML, NIPS, CVPR, ICCV, ECCV.)及期刊(IEEE TIP, Information Sciences, Pattern Recognition, Neural Computation etc.)。谷歌学术(Google Scholar)引用次数超过1000次。2012年获“微软学者”奖;2015年入选中国科协“青年托举人才工程”。

  董超 个人主页 商汤科技

报告题目:Reinforcement learning based image restoration and Semantic image super-resolution (基于增强学习的图像复原与基于语义的图像超分辨率)

报告摘要:Abstract: Introduce two of our recent works (published on CVPR2018) on low-level vision problems. In the first paper, we investigate a novel approach for image restoration by reinforcement learning. Unlike existing studies that mostly train a single large network for a specialized task, we prepare a toolbox consisting of small-scale convolutional networks of different complexities and specialized in different tasks. Our method, RL-Restore, then learns a policy to select appropriate tools from the toolbox to progressively restore the quality of a corrupted image. In comparison to conventional human-designed networks, RL-Restore is capable of restoring images corrupted with complex and unknown distortions in a more parameter-efficient manner using the dynamically formed toolchain. In the second paper, we solve the problem of semantic super-resolution and show that it is possible to recover textures faithful to semantic classes. In particular, we only need to modulate features of a few intermediate layers in a single network conditioned on semantic segmentation probability maps. This is made possible through a novel Spatial Feature Modulation (SFM) layer that generates affine transformation parameters for spatial-wise feature modulation. Our final results show that an SR network equipped with SFM can generate more realistic and visually pleasing textures in comparison to state-of-the-arts.

讲者信息:董超,现为商汤研究院高级研究经理,图像视频画质团队负责人,致力于图像和视频的超分辨率、去噪、增强等算法研发和产品落地。本科毕业于北京理工大学,博士毕业于香港中文大学,导师为汤晓鸥教授和吕建勤教授。2014年和2015年分别首次将深度学习应用在图像超分辨领域和图像去压缩领域,2018年首次将强化学习应用在图像复原领域。论文总他引次超过1500次,其中期刊论文SRCNN在2016年3月-8月间被选为TPAMI最受欢迎论文(“Most Popular Article”)。2016年获得香港中文大学优秀博士论文提名。2017年带队参加CVPR图像超分辨竞赛NTIRE2017获得第2名。同年组建商汤画质团队,致力于将基于深度学习的图像处理算法在实际产品中落地。2017年-2018年间,相继为国内知名手机厂商提供单帧图像超分辨率算法和人像美颜算法,并开发了基于多帧输入的真实拍照场景下4k图像画质综合解决方案。

  张兆翔 中国科学院自动化研究所

报告题目:脑启发的神经网络建模与学习

报告摘要:以深度学习为代表的模式识别方法在多种视觉应用中取得了显著成功,甚至媲美人的性能,但是与生物模式识别系统相比,现有的深度学习方法在自适应性、可泛化性和多任务协作方面依旧存在明显缺陷。从脑的神经信息处理机制、认知方法和行为特性上寻求启发有望指导更好的神经网络建模,实现更为鲁棒的类人学习,具有重要研究意义与应用前景。本报告将在现有深度学习方法概述基础上,对我们近期开展的脑启发的神经网络建模与学习方法开展研究,具体报告内容包括神经网络的结构建模、面向多任务的神经网络架构学习、视听模态分析与整合、知识蒸馏和多智能体协同等。

讲者信息:张兆翔,博士,中国科学院自动化研究所研究员,博士生导师,中国科学院脑科学与智能技术卓越创新中心年轻骨干,IEEE高级会员,计算机学会YOCSEF委员,计算机视觉专委会委员,模式识别与人工智能专委会委员,人工智能学会模式识别专委会委员。2004年毕业于中国科学技术大学,获得电路与系统专业学士学位;2004年进入中国科学院自动化研究所硕博连读,于2009年获得工学博士学位。2015年任职中国科学院自动化研究所类脑智能研究中心研究员。张兆翔博士一直从事智能视觉监控方面的研究工作,近期进一步聚焦在结合类脑智能和类人学习机制的视觉计算模型,在可用信息建模和基于模型的物体识别问题上开展了系统工作,在面向国家公共安全和智慧城市监管需求的系统平台上取得成功应用,取得显著社会影响和经济效益,近五年来在国际主流学术期刊与会议上发表论文100余篇,SCI收录期刊论文40余篇,担任了ICPR、IJCNN、AVSS、PCM等多个国际会议的程序委员会委员,SCI期刊《Neurocomputing》编委,《IEEE Access》编委,《Pattern Recognition Letters》客座编委、《Frontiers of ComputerScience》青年编委和TPAMI、TIP、TCSVT、PR等20余个本领域主流期刊的审稿人。入选“教育部新世纪优秀人才支持计划”、“北京市青年英才计划”和“微软亚洲研究院铸星计划”。

  唐华锦 四川大学

报告题目:神经形态视觉计算

报告摘要:模拟大脑智能是计算机科学领域长久以来的目标,神经形态计算主要受神经科学发展推动,是建立在大脑神经电路结构和神经信息处理与神经脉冲计算原理上的新型计算模式,并最终以神经形态硬件方式来实现仿脑的认知计算与低功耗运算。本报告从神经形态计算领域需要解决的主要问题出发,介绍神经形态视觉计算的最新进展。

讲者信息:唐华锦教授分别于1994-1998年在浙江大学、1998-2001年在上海交通大学完成本科和硕士学习,于2001-2004年在新加坡国立大学计算机工程系获得博士学位。 2004–2006年在意法半导体公司担任研发工程师,2006–2008年于澳大利亚昆士兰大学脑科学研究所从事博士后研究,2008年起在新加坡资讯通信研究院任认知计算和机器人认知实验室主任。2014年至今担任四川大学计算机学院类脑计算研究中心主任。主要研究领域为类脑计算、神经形态芯片、智能硬件、智能机器人等。获2016年度IEEE 优秀TNNLS 论文奖,提出的“仿脑GPS模型”被MIT Technology Review高度评价。入选国家青年千人计划。担任IEEE Transactions on Neural Networks and Learning Systems, IEEE Transactions on Cognitive and Developmental Systems 和 Frontiers in Neuromorphic Engineering 的Associate Editor。

  何晖光 中国科学院自动化研究所

报告题目:多模态情绪识别及跨被试迁移学习

报告摘要:人工智能的研究需要强调机器人与人之间的感性化交互和情感计算,以 实现情感智能。情绪识别是情感计算的基本问题。情绪与生物化学和电信号有关, 借助脑电技术可以辅助识别情绪状态,然而由于脑电的个体差异性,使得需要针 对每个个体进行数据采集标注和训练,影响了其应用的便捷性。这里我们将介绍 在情绪识别方面的几个工作:1)基于少量有标签的样本,如何快速的适配模型; 2)基于大量无标签样本的迁移学习;3)基于多模态信号半监督的情绪识别。

讲者信息:何晖光,中国科学院自动化研究所研究员,博士生导师,中国科学院大学人 工智能学院脑认知与智能医学教研室主任、岗位教授,中科院青年创新促进会优 秀会员。在大连海事大学获得本科(1994)和硕士(1997),2002年在中科院 自动化所获博士学位。何晖光博士先后主持了 5项国家自然科学基金(包括 1 项国家自然基金重点项目)、2项 863项目的研究,先后获得国家科技进步二等 奖两项(分别排名第二、第三),北京市科技进步奖两项,中科院首届优秀博士 论文奖,北京市科技新星,中科院“ 卢嘉锡青年人才奖” 等奖项,2017年获得教 育部科技进步一等奖(排名第三)。其研究领域为脑科学、人工智能,医学影像 处理,脑机接口等,其研究结果在 Neur oI mage, HumanBr ai nMappi ng, Pat t er n Recogni t i on, MI CCAI等相关领域的国内外核心期刊以及国际主流会议上发表文 章一百余篇。

  刘健 个人主页 University of Leicester, UK

报告题目:Retinal computation: neuroscience, neuroprosthesis and neurobotics

报告摘要:The retina is the first stage of visual processing in the brain. However, how natural scenes are encoded by retinal ganglion cells (RGCs) is still unsolved. Here, I will discuss this topic by studying the recorded RGC spiking responses. We developed some data-driven approaches, spike-triggered non-negative matrix factorization and deep learning nets for characterizing the underlying mechanisms of retinal visual processing. I further demonstrate that how these computational principles of neuroscience can be translated to neuromorphic chips for the next generation of the artificial retina to enhance the performance of neuroprosthesis and neurorobotics.

讲者信息:Dr. Jian Liu received the Ph.D. in mathematics from UCLA. He is currently an Assistant Professor/Lecturer of Systems Neuroscience at University of Leicester, UK. His area of research includes computational neuroscience and brain-inspired computation for artificial intelligence. His work was published in Nature Comms., eLife, J. of Neurosci., PLOS Comput. Biol, IEEE TNNSL, etc.

  陈霸东 西安交通大学

报告题目:Neural Decoding from fMRI

报告摘要:近年来,随着认知心理学和认知神经科学的发展,特别是伴随着EEG、fMRI以及MEG等大脑成像技术的出现,采用科学手段对大脑心智的解读已经成为可能。本报告介绍基于fMRI的神经编解码的相关概念、研究现状,以及在认知科学中的应用,讨论编解码的几种主要方法,并重点介绍报告人所在团队提出的一些新模型与新方法,如引入受脑启发的Spike Neural Networks以及CorrNet模型对视觉刺激信号进行重构。

讲者信息:陈霸东,西安交通大学教授,博导,陕西省“百人计划”特聘教授。2008年毕业于清华大学计算机专业获博士学位,2008年7月至2010年9月在清华大学精密仪器与机械学系做博士后研究,2010年10月至2012年9月在美国佛罗里达大学 电气与计算机工程系做博士后研究,2015年7月到8月在新加坡南洋理工大学做访问科学家。研究兴趣包括信号处理、机器学习、人工智能、脑机接口等。目前发表学术论文200多篇,其中SCI期刊论文120余篇,发表在IEEE TSP, IEEE TNNLS, IEEE SPL, AUTOMATICA 等著名期刊。第一作者撰写的英文专著(Elsevier出版社)被国际计算评论(Computing Reviews)评选为2013年Notable Book。Google Scholar Citations 中论文被引2300多次,8篇论文获“ESI高被引论文”。陈教授是IEEE高级会员,担任IEEE TNNLS、IEEE TCDS、Journal of the Franklin Institute、Entropy 等著名学术期刊编委。作为项目负责人承担了国家自然科学基金青年、面上、重点和973课题等多项重要科研项目。

  胡晓林 清华大学

报告题目:Deep Learning Predicts Correlation between a Functional Signature of Higher Visual Areas and Sparse Firing of Neurons

报告摘要:Visual information in the visual cortex is processed in a hierarchical manner. Recent studies show that higher visual areas, such as V2, V3, and V4, respond more vigorously to images with naturalistic higher-order statistics than to images lacking them. This property is a functional signature of higher areas, as it is much weaker or even absent in the primary visual cortex (V1). However, the mechanism underlying this signature remains elusive. We studied this problem using computational models. In several typical hierarchical visual models including the AlexNet, VggNet and SHMAX, this signature was found to be prominent in higher layers but much weaker in lower layers. By changing both the model structure and experimental settings, we found that the signature strongly correlated with sparse firing of units in higher layers but not with any other factors, including model structure, training algorithm (supervised or unsupervised), receptive field size, and property of training stimuli. The results suggest an important role of sparse neuronal activity underlying this special feature of higher visual areas.

讲者信息:胡晓林,2007年在香港中文大学获得自动化与辅助工程专业博士学位,然后在清华大学计算机系从事博士后研究,2009年留校任教至今,目前是计算机系副教授。他的研究领域包括人工神经网络和计算神经科学,主要兴趣包括开发受脑启发的计算模型和揭示大脑处理视听觉信息的机制。在IEEE Transactions on Neural Networks and Learning Systems, IEEE Transactions on Image Processing, IEEE Transactions on Cybernetics, PLoS ONE, Neural Computation, European Journal of Neuroscience, Journal of Neurophysiology, Frontiers in Human Neuroscience, Frontiers in Computational Neuroscience 等国际期刊和CVPR, NIPS, AAAI等国际会议上发表论文70余篇。他是IEEE Transactions on Neural Networks and Learning Systems的编委。