“阿法狗”之父:关于围棋,人类3000年来犯了一个错

作者:刘秀云 

来源:澎湃新闻网

AlphaGo之父杰米斯·哈萨比斯(Demis Hassabis)在母校英国剑桥大学做了一场题为“超越人类认知的极限”的演讲,解答了世人对于人工智能,对于阿尔法狗的诸多疑问——过去3000年里人类低估了棋局哪个区域的重要性?阿尔法狗去年赢了韩国职业九段李世石靠哪几个绝招?今年年初拿下数位国际大师的神秘棋手Master究竟是不是阿尔法狗?为什么围棋是人工智能难解之谜?

杰米斯·哈萨比斯,Deep Mind创始人, AlphaGo之父。

杰米斯·哈萨比斯,Deep Mind创始人,AlphaGo(阿尔法狗)之父, 4岁开始下象棋,8岁时在棋盘上的成功促使他开始思考两个至今令他困扰的问题:第一,人脑是如何学会完成复杂任务的?第二,电脑能否做到这一点?17岁时,哈萨比斯就负责了经典模拟游戏《主题公园》的开发,并在1994年发布。他随后读完了剑桥大学计算机科学学位,2005年进入伦敦大学学院,攻读神经科学博士学位,希望了解真正的大脑究竟是如何工作的,以此促进人工智能的发展。2014年他创办公司Deep Mind, 公司产品阿尔法狗在2016年大战围棋冠军李世石事件上一举成名。

哈萨比斯在当天的演讲中透露了韩国棋手李世石去年输给阿尔法狗的致命原因,他最后也提到了阿尔法狗即将迎战的中国棋手柯洁,他说,“柯洁也在网上和阿尔法狗对决过,比赛之后柯洁说人类已经研究围棋研究了几千年了,然而人工智能却告诉我们,我们甚至连其表皮都没揭开。异曲同工,柯洁提到了围棋的真理,我们在这里谈的是科学的真理。”

AlphaGo(阿尔法狗)之父在剑桥大学历时45分钟的演讲,干货满满,请不要漏掉任何一个细节:

非常感谢大家今天能够到场,今天,我将谈谈人工智能,以及DeepMind近期在做些什么,我把这场报告命名为“超越人类认知的极限”,我希望到了报告结束的时候,大家都清晰了解我想传达的思想。

1.你真的知道什么是人工智能吗?

对于不知道DeepMind公司的朋友,我做个简单介绍,我们是在2010年于伦敦成立了这家公司,在2014年我们被谷歌收购,希望借此加快我们人工智能技术的脚步。我们的使命是什么呢?我们的首要使命便是解决人工智能问题;一旦这个问题解决了,理论上任何问题都可以被解决。这就是我们的两大使命了,听起来可能有点狡猾,但是我们真的相信,如果人工智能最基本的问题都解决了的话,没有什么问题是困难的。

那么我们准备怎样实现这个目标呢?DeepMind现在在努力制造世界上第一台通用学习机,大体上学习可以分为两类:一种就是直接从输入和经验中学习,没有既定的程序或者规则可循,系统需要从原始数据自己进行学习;第二种学习系统就是通用学习系统,指的是一种算法可以用于不同的任务和领域,甚至是一些从未见过的全新领域。大家肯定会问,系统是怎么做到这一点的?

其实,人脑就是一个非常明显的例子,这是可能的,关键在于如何通过大量的数据资源,寻找到最合适的解决方式和算法。我们把这种系统叫做通用人工智能,来区别于如今我们当前大部分人在用的仅在某一领域发挥特长的狭义人工智能,这种狭义人工智能在过去的40-50年非常流行。

IBM 发明的深蓝系统(Deep Blue)就是一个很好的狭义人工智能的例子,他在上世纪90年代末期曾打败了国际象棋冠军加里·卡斯帕罗夫(Gary Kasporov) 。如今,我们到了人工智能的新的转折点,我们有着更加先进、更加匹配的技术。

1997年5月,IBM与世界国际象棋冠军加里·卡斯帕罗夫对决

2.如何让机器听从人类的命令?

大家可能想问机器是如何听从人类的命令的,其实并不是机器或者算法本身,而是一群聪明的编程者智慧的结晶。他们与每一位国际象棋大师对话,汲取他们的经验,把其转化成代码和规则,组建了人类最强的象棋大师团队。但是这样的系统仅限于象棋,不能用于其他游戏。对于新的游戏,你需要重新开始编程。在某种程度上,这些技术仍然不够完美,并不是传统意义上的完全人工智能,其中所缺失的就是普适性和学习性。我们想通过“增强学习”来解决这一难题。在这里我解释一下增强学习,我相信很多人都了解这个算法。

首先,想像一下有一个主体,在AI领域我们称我们的人工智能系统为主体,它需要了解自己所处的环境,并尽力找出自己要达到的目的。这里的环境可以指真实事件,可以是机器人,也可以是虚拟世界,比如游戏环境;主体通过两种方式与周围环境接触;它先通过观察熟悉环境,我们起初通过视觉,也可以通过听觉、触觉等,我们也在发展多感觉的系统;

第二个任务,就是在此基础上,建模并找出最佳选择。这可能涉及到对未来的预期,想像,以及假设检验。这个主体经常处在真实环境中,当时间节点到了的时候,系统需要输出当前找到的最佳方案。这个方案可能或多或少会改变所处环境,从而进一步驱动观察的结果,并反馈给主体。

简单来说,这就是增强学习的原则,示意图虽然简单,但是其中却涉及了极其复杂的算法和原理。如果我们能够解决大部分问题,我们就能够搭建普适人工智能。这是因为两个主要原因:首先,从数学角度来讲,我的合伙人,一名博士,他搭建了一个系统叫‘AI-XI’,用这个模型,他证明了在计算机硬件条件和时间无限的情况下,搭建一个普适人工智能,需要的信息。另外,从生物角度来讲,动物和人类等,人类的大脑是多巴胺控制的,它在执行增强学习的行为。因此,不论是从数学的角度,还是生物的角度,增强学习是一个有效的解决人工智能问题的工具。

3.为什么围棋是人工智能难解之谜?

接下来,我要主要讲讲我们最近的技术,那就是去年诞生的阿尔法狗;希望在座的大家了解这个游戏,并尝试玩玩,这是个非常棒的游戏。围棋使用方形格状棋盘及黑白二色圆形棋子进行对弈,棋盘上有纵横各19条直线将棋盘分成361个交叉点,棋子走在交叉点上,双方交替行棋,以围地多者为胜。围棋规则没有多复杂,我可以在五分钟之内教给大家。这张图展示的就是一局已结束,整个棋盘基本布满棋子,然后数一下你的棋子圈出的空间以及对方棋子圈出的空间,谁的空间大,谁就获胜。在图示的这场势均力敌的比赛中,白棋一格之差险胜。

白棋以一格之差险胜

其实,了解这个游戏的最终目的非常难,因为它并不像象棋那样,有着直接明确的目标,在围棋里,完全是凭直觉的,甚至连如何决定游戏结束对于初学者来说,都很难。围棋是个历史悠久的游戏,有着3000多年的历史,起源于中国,在亚洲,围棋有着很深的文化意义。孔子还曾指出,围棋是每一个真正的学者都应该掌握的四大技能之一(琴棋书画),所以在亚洲围棋是种艺术,专家们都会玩。

如今,这个游戏更加流行,有4000万人在玩围棋,超过2000多个顶级专家,如果你在4-5岁的时候就展示了围棋的天赋,这些小孩将会被选中,并进入特殊的专业围棋学校,在那里,学生从6岁起,每天花12个小时学习围棋,一周七天,天天如此。直到你成为这个领域的专家,才可以离开学校毕业。这些专家基本是投入人生全部的精力,去揣摩学习掌握这门技巧,我认为围棋也许是最优雅的一种游戏了。

像我说的那样,这个游戏只有两个非常简单的规则,而其复杂性却是难以想象的,一共有10170 (10的170次方) 种可能性,这个数字比整个宇宙中的原子数1080(10的80次方)都多的去了,是没有办法穷举出围棋所有的可能结果的。我们需要一种更加聪明的方法。你也许会问为什么计算机进行围棋的游戏会如此困难,1997年,IBM的人工智能DeepBlue(深蓝)打败了当时的象棋世界冠军GarryKasparov,围棋一直是人工智能领域的难解之谜。我们能否做出一个算法来与世界围棋冠军竞争呢?要做到这一点,有两个大的挑战:

一、搜索空间庞大(分支因数就有200),一个很好的例子,就是在围棋中,平均每一个棋子有两百个可能的位置,而象棋仅仅是20. 围棋的分支因数远大于象棋。

二、比这个更难的是,几乎没有一个合适的评价函数来定义谁是赢家,赢了多少;这个评价函数对于该系统是至关重要的。而对于象棋来说,写一个评价函数是非常简单的,因为象棋不仅是个相对简单的游戏,而且是实体的,只用数一下双方的棋子,就能轻而易举得出结论了。你也可以通过其他指标来评价象棋,比如棋子移动性等。

所有的这些在围棋里都是不可能的,并不是所有的部分都一样,甚至一个小小部分的变动,会完全变化格局,所以每一个小的棋子都对棋局有着至关重要的影响。最难的部分是,我称象棋为毁灭性的游戏,游戏开始的时候,所有的棋子都在棋盘上了,随着游戏的进行,棋子被对方吃掉,棋子数目不断减少,游戏也变得越来越简单。相反,围棋是个建设性的游戏,开始的时候,棋盘是空的,慢慢的下棋双方把棋盘填满。

因此,如果你准备在中场判断一下当前形势,在象棋里,你只需看现在的棋盘,就能告诉你大致情况;在围棋里,你必须评估未来可能会发生什么,才能评估当前局势,所以相比较而言,围棋难得多。也有很多人试着将DeepBlue的技术应用在围棋上,但是结果并不理想,这些技术连一个专业的围棋手都打不赢,更别说世界冠军了。

所以大家就要问了,连电脑操作起来都这么难,人类是怎样解决这个问题的?其实,人类是靠直觉的,而围棋一开始就是一个靠直觉而非计算的游戏。所以,如果你问一个象棋选手,为什么这步这样走,他会告诉你,这样走完之后,下一步和下下一步会怎样走,就可以达到什么样的目的。这样的计划,有时候也许不尽如人意,但是起码选手是有原因的。

然而围棋就不同了,如果你去问世界级的大师,为什么走这一步,他们经常回答你直觉告诉他这么走,这是真的,他们是没法描述其中的原因的。我们通过用加强学习的方式来提高人工神经网络算法,希望能够解决这一问题。

我们试图通过深度神经网络模仿人类的这种直觉行为,在这里,需要训练两个神经网络,一种是决策网络,我们从网上下载了成百万的业余围棋游戏,通过监督学习,我们让阿尔法狗模拟人类下围棋的行为;我们从棋盘上任意选择一个落子点,训练系统去预测下一步人类将作出的决定;系统的输入是在那个特殊位置最有可能发生的前五或者前十的位置移动;这样,你只需看那5-10种可能性,而不用分析所有的200种可能性了。

一旦我们有了这个,我们对系统进行几百万次的训练,通过误差加强学习,对于赢了的情况,让系统意识到,下次出现类似的情形时,更有可能做相似的决定。相反,如果系统输了,那么下次再出现类似的情况,就不会选择这种走法。我们建立了自己的游戏数据库,通过百万次的游戏,对系统进行训练,得到第二种神经网络。选择不同的落子点,经过置信区间进行学习,选出能够赢的情况,这个几率介于0-1之间,0是根本不可能赢,1是百分之百赢。

通过把这两个神经网络结合起来(决策网络和数值网络),我们可以大致预估出当前的情况。这两个神经网络树,通过蒙特卡洛算法,把这种本来不能解决的问题,变得可以解决。我们网罗了大部分的围棋下法,然后和欧洲的围棋冠军比赛,结果是阿尔法狗赢了,那是我们的第一次突破,而且相关算法还被发表在《自然》科学杂志。

接下来,我们在韩国设立了100万美元的奖金,并在2016年3月,与世界围棋冠军李世石进行了对决。李世石先生是围棋界的传奇,在过去的10年里都被认为是最顶级的围棋专家。我们与他进行对决,发现他有非常多创新的玩法,有的时候阿尔法狗很难掌控。比赛开始之前,世界上每个人(包括他本人在内)都认为他一定会很轻松就打赢这五场比赛,但实际结果是我们的阿尔法狗以4:1获胜。围棋专家和人工智能领域的专家都称这具有划时代的意义。对于业界人员来说,之前根本没想到。

4.棋局哪个关键区域被人类忽视了?

这对于我们来说也是一生仅有一次的偶然事件。这场比赛,全世界28亿人在关注,35000多篇关于此的报道。整个韩国那一周都在围绕这个话题。真是一件非常美妙的事情。对于我们而言,重要的不是阿尔法狗赢了这个比赛,而是了解分析他是如何赢的,这个系统有多强的创新能力。阿尔法狗不仅仅只是模仿其他人类选手的下法,他在不断创新。在这里举个例子 ,这是第二局里的一个情况,第37步,这一步是我整个比赛中最喜欢的一步。在这里,黑棋代表阿尔法狗,他将棋子落在了图中三角标出的位置。为什么这步这么关键呢?为什么大家都被震惊到了。

图左:第二局里,第37步,黑棋的落子位置 图右:之前貌似陷入困境的两个棋子

其实在围棋中有两条至关重要的分界线,从右数第三根线。如果在第三根线上移动棋子,意味着你将占领这个线右边的领域。而如果是在第四根线上落子,意味着你想向棋盘中部进军,潜在的,未来你会占棋盘上其他部分的领域,可能和你在第三根线上得到的领域相当。

所以在过去的3000多年里,人们认为在第三根线上落子和第四根线上落子有着相同的重要性。但是在这场游戏中,大家看到在这第37步中,阿尔法狗落子在了第五条线,进军棋局的中部区域。与第四根线相比,这根线离中部区域更近。这可能意味着,在几千年里,人们低估了棋局中部区域的重要性。

有趣的是,围棋就是一门艺术,是一种客观的艺术。我们坐在这里的每一个人,都可能因为心情好坏产生成千上百种的新想法,但并不意味着每一种想法都是好的。而阿尔法狗却是客观的,他的目标就是赢得游戏。

5.阿尔法狗拿下李世石靠哪几个绝招?

大家看到在当前的棋局下,左下角那两个用三角标出的棋子看起来好像陷入了困难,而15步之后,这两个棋子的力量扩散到了棋局中心,一直延续到棋盘的右边,使得这第37步恰恰落在这里,成为一个获胜的决定性因素。在这一步上阿尔法狗非常具有创新性。我本人是一个很业余的棋手,让我们看看一位世界级专家Michael Redmond对这一步的评价。 Michael是一位9段选手(围棋最高段),就像是功夫中的黑段一样,他说:“这是非常令人震惊的一步,就像是一个错误的决定。”在实际模拟中,Michael其实一开始把棋子放在了另外一个地方,根本没想到阿尔法狗会走这一步。像这样的创新,在这个比赛中,阿尔法狗还有许多。在这里,我特别感谢李世石先生,其实在我们赢了前三局的时候,他下去了。

2016年3月阿尔法狗大战世界围棋冠军李世石,以4:1的总分战胜了人类

那是三场非常艰难的比赛,尤其是第一场。因为我们需要不断训练我们的算法,阿尔法狗之前打赢了欧洲冠军,经过这场比赛,我们知道了欧洲冠军和世界冠军的差别。理论上来讲,我们的系统也进步了。但是当你训练这个系统的时候,我们不知道有多少是过度拟合的,因此,在第一局比赛结束之前,系统是不知道自己的统计结果的。所以,其实第一局,我们非常紧张,因为如果第一局输了,很有可能我们的算法存在巨大漏洞,有可能会连输五局。但是如果我们第一局赢了,证明我们的加权系统是对的。

不过,李世石先生在第四场的时候,回来了,也许压力缓解了许多,他做出了一步非常创新性的举动,我认为这是历史上的创新之举。这一步迷惑了阿尔法狗,使他的决策树进行了错误估计,一些中国的专家甚至称之为“黄金之举”。通过这个例子,我们可以看到多少的哲理蕴含于围棋中。这些顶级专家,用尽必生的精力,去找出这种黄金之举。其实,在这步里,阿尔法狗知道这是非常不寻常的一步,他当时估计李世石通过这步赢的可能性是0.007%,阿尔法狗之前没有见过这样的落子方式,在那2分钟里,他需要重新搜索决策计算。我刚刚已经提到过这个游戏的影响:28亿人观看,35000相关文章的媒体报道,在西方网售的围棋被一抢而空,我听说MIT(美国麻省理工学院)还有其他很多高校,许多人新加入了围棋社。

第四局里,李世石第78步的创新之举

我刚才谈到了直觉和创新,直觉是一种含蓄的表达,它是基于人类的经历和本能的一种思维形式,不需要精确计算。这一决策的准确性可以通过行为进行评判。在围棋里很简单,我们给系统输入棋子的位置,来评估其重要性。阿尔法狗就是在模拟人类这种直觉行为。创新,我认为就是在已有知识和经验的基础上,产生一种原始的,创新的观点。阿尔法狗很明显的示范了这两种能力。

6.神秘棋手Master究竟是不是阿尔法狗?

那么我们今天的主题是“超越人类认知的极限”,下一步应该是什么呢?从去年三月以来,我们一直在不断完善和改进阿尔法狗,大家肯定会问,既然我们已经是世界冠军了,还有什么可完善的? 其实,我们认为阿尔法狗还不是完美的,还需要做更多的研究。

首先,我们想要继续研究刚才提到的和李世石的第四局的比赛,来填充知识的空白;这个问题其实已经被解决了,我们建立了一个新的阿尔法狗分系统,不同于主系统,这个分支系统是用来困惑主系统的。我们也优化了系统的行为,以前我们需要花至少3个月来训练系统,现在只需要一周时间。

第二,我们需要理解阿尔法狗所采取的决定,并对其进行解释;阿尔法狗这样做的原因是什么,是否符合人类的想法等等;我们通过对比人类大脑对于不同落子位置的反应以及阿尔法狗对于棋子位置的反应,以期找到一些新的知识;本质上就是想让系统更专业。我们在网络上与世界顶级的专家对决,一开始我们使用了一个假名(Master),在连胜之后被大家猜出是阿尔法狗。这些都是顶级的专家,我们至今已赢了60位大师了。如果你做个简单的贝叶斯分析,你会发现阿尔法狗赢不同对手的难易也不一样。而且,阿尔法狗也在不断自我创新,比如说图中右下角这个棋子(圆圈标处),落在第二根线里,以往我们并不认为这是个有效的位置。实际上,韩国有的团队预约了这些游戏,想研究其中新的意义和信息。

阿尔法狗自我创新,落在第二格线的旗子

柯洁,既是中国的围棋冠军,也是目前的世界围棋冠军,他才19岁。他也在网上和阿尔法狗对决过,比赛之后他说人类已经研究围棋研究了几千年了,然而人工智能却告诉我们,我们甚至连其表皮都没揭开。他也说人类和人工智能的联合将会开创一个新纪元,将共同发现围棋的真谛。异曲同工,柯洁提到了围棋的真理,我们在这里谈的是科学的真理。

红遍网络的神秘棋手Master2017年1月3日在腾讯围棋对弈平台赢了柯洁

Master执白中盘胜柯洁,Master就是AlphaGo的升级版

那么围棋的新纪元是否真的到来了呢?围棋史上这样的划时代事件曾经发生过两次,第一次是发生在1600年左右的日本,20世纪30-40年代的日本,日本一位当时非常杰出的围棋高手吴清源提出了一个全新的关于围棋的理论,将围棋提升到了一个全新的境界。大家说如今,阿尔法狗带来的是围棋界的第三次变革。

7.为什么人工智能“下围棋”强于“下象棋”?

我想解释一下,为什么人工智能在围棋界所作出的贡献,要远大于象棋界。如果我们看看当今的世界国际象棋冠军芒努斯·卡尔森,他其实和之前的世界冠军没什么大的区别,他们都很优秀,都很聪明。但为什么当人工智能出现的时候,他们可以远远超越人类?我认为其中的原因是,国际象棋更注重战术,而阿尔法狗更注重战略。如今世界顶级的国际象棋程序再不会犯技术性的错误,而在人类身上,不可能不犯错。第二,国际象棋有着巨大的数据库,如果棋盘上少于9个棋子的时候,通过数学算法就可以计算出谁胜谁败了。计算机通过成千上万的迭代算法,就可以计算出来了。因此,当棋盘上少于九个棋子的时候,下象棋时人类是没有办法获胜的。

因此,国际象棋的算法已经近乎极致,我们没有办法再去提高它。然而围棋里的阿尔法狗,在不断创造新的想法,这些全新的想法,在和真人对决的时候,顶级的棋手也可以把其纳入到考虑的范畴,不断提高自己。

就如欧洲围棋冠军樊麾(第一位与阿尔法狗对阵的人类职业棋手)所说的那样,在和阿尔法狗对决的过程中,机器人不断创新的下法,也让人类不断跳出自己的思维局限,不断提高自己。大家都知道,经过专业围棋学校里30多年的磨练,他们的很多思维已经固化,机器人的创新想法能为其带来意想不到的灵感。我真的相信如果人类和机器人结合在一起,能创造出许多不可思议的事情。我们的天性和真正的潜力会被真正释放出来。

8.阿尔法狗不为了赢取比赛又是为了什么?

就像是天文学家利用哈勃望远镜观察宇宙一样,利用阿尔法狗,围棋专家可以去探索他们的未知世界,探索围棋世界的奥秘。我们发明阿尔法狗,并不是为了赢取围棋比赛,我们是想为测试我们自己的人工智能算法搭建一个有效的平台,我们的最终目的是把这些算法应用到真实的世界中,为社会所服务。

当今世界面临的一个巨大挑战就是过量的信息和复杂的系统,我们怎么才能找到其中的规律和结构,从疾病到气候,我们需要解决不同领域的问题。这些领域十分复杂,对于这些问题,即使是最聪明的人类也无法解决的。

我认为人工智能是解决这些问题的一个潜在方式。在如今这个充斥着各种新技术的时代,人工智能必须在人类道德基准范围内被开发和利用。本来,技术是中性的,但是我们使用它的目的和使用它的范围,大大决定了其功能和性质,这必须是一个让人人受益的技术才行。

我自己的理想是通过自己的努力,让人工智能科学家或者人工智能助理和医药助理成为可能,通过该技术,我们可以真正加速技术的更新和进步。

注:本文作者系英国剑桥大学神经学博士生,AlphaGo之父哈萨比斯在剑桥大学的校友

【重磅】Google开源全球最精准自然语言解析器SyntaxNet

来源:Google Research

编译:胡祥杰  朱焕

 

【新智元导读】Google Research今天宣布,世界准确度最高的自然语言解析器SyntaxNet开源。谷歌开源再进一步。据介绍,谷歌在该平台上训练的模型的语言理解准确率超过90%。近日,众多科技巨头人工智能相关平台开源步伐明显加快:谷歌和Facebook一直在领跑,马斯克的OpenAI欲打造一个完全公开的AI模型训练营,就连一直被批评“保守”的亚马逊也在尝试开源。这一股开源热潮背后,是人工智能研究者的福利,但同时也是一场激烈的数据和平台争夺战。

Google环境计算( Ambient  computing) 架构师Yonatan Zunger说:事实上,语言理解被我们认为是“AI的终极任务”,要解决这一难题,前提是要能解决全部人类水平人工智能的问题。

机 器对语言的理解过程,可以分为几个步骤,其中很多的不确定性是逐渐明晰的(语音识别的不确定性更多,因为还要解决从声音到词的转换)。第一步是要把词分 开,放到依存树上,看哪一个词是动词,对名词有哪些影响等等。随后,要理解每一个名字的含义。再次,再加入许多先验知识,即对这个世界的理解,因为很多句 子只有使用了这些信息才能真正理解。如果足够幸运的话,到这就能得到清晰的理解了。

谷歌资深研究科学家Slav Petrov在Google Research的博客上写到:在谷歌,我们花费了大量的时间在思考,计算机系统如何才能阅读和理解人类语言,以一种更加智能的方式处理这些语言?今天,我们激动地跟大家分享我们的研究,向更广阔的人群发布SyntaxNet。这是一个在TensoFlow中运行的开源神经网络框架,提供自然语言理解系统基础。我们所公开的包含了所有用你自己的数据训练新的SyntaxNet模型所需要的代码,以及Paesey  McParseface——我们已经训练好的,可用于分析英语文本的模型。

Paesey  McParseface 建立于强大的机器学习算法,可以学会分析句子的语言结构,能解释特定句子中每一个词的功能。此类模型中,Paesey  McParseface是世界上最精确的,我们希望他能帮助对自动提取信息、翻译和其它自然语言理解(NLU)中的应用感兴趣的研究者和开放者。

SyntaxNet是怎么工作的?

SyntaxNet是一个框架,即学术圈所指的SyntacticParser,他是许多NLU系统中的关键组件。在这个系统中输入一个句子,他会自动给句子中的每一个单词打上POS(part-of-Speech)标签,用来描述这些词的句法功能,并在依存句法树中呈现。这些句法关系直接涉及句子的潜在含义。

举一个很简单的例子,看下面这个句子“Alice saw Bob”的依存句法树:

在这个结构中,Alice和Bob被编码为名词,Saw是动词。只要的动词saw 是句子的根,Alice是saw的主语,Bob是直接宾语(dobj)。和期待的一样,Paesey  McParseface能正确地分析这一句子,也能理解下面这个更加复杂的例子:

句子:Alice, who had been reading about SynataxNet, saw Bob in the hallwayyesterday

在这个句子的编码中,Alice 和 Bob的分别是saw的主语和宾语,Alice由一个带动词“reading”的关系从句来修饰,而saw则由时态“yesterday”来修饰。依存句法树中的语法关系让我们可以轻易地找到不同问题的答案,比如,Alice看见了谁?谁看到了Bob?Alice正在读的是什么?或者Alice是在什么时候看到Bob的。

为什么让计算机正确处理句法分析如此困难?

 

句法分析如此困难的一个主要问题是,人类语言具有显著的歧义性。包含 20 到 30 个单词的中等长度的句子会具有数百、数千甚至数万种可能的句法结构,这样的情况并不少见。一个自然语言句法分析器必须能够搜索所有这些结构选择,并找到给定语境下最合理的那个结构。作为一个非常简单的例子,“Alice drove down the streetin her car”这个句子就具有至少两种可能的依存分析:

第一种分析是对应这句话的(正确)解释,按照这种解释,爱丽丝在汽车里进行驾驶,而汽车位于街道上;第二种分析对应于一种对这句话的(荒诞但仍然可能的)解释,按照这种解释,爱丽丝在街道上驾驶,而街道位于汽车之内。之所以会产生这种歧义,是因为“in”这个介词既可以用来修饰“drove(驾驶)”也可以用来修饰“street(街道)”。上面这个例子是所谓的“介词短语附着歧义”的一个实例。

 

人类在处理歧义方面有超强的能力,以至于人们甚至注意不到句子有歧义。而这里的挑战是,如何能让计算机做到同样好。长句中的多重歧义会共同造成句子的可能结构数量的组合爆炸。通常,这些结构中的绝大多数都极其不合理,但它们仍然是可能的,句法分析器必须以某种方式来丢弃它们。

 

SyntaxNet 将神经网络运用于歧义问题。一个输入句子被从左到右地处理。当句子中的每个词被处理时,词与词之间的依存关系也会被逐步地添加进来。由于歧义的存在,在处理过程的每个时间点上都存在多种可能的决策,而神经网络会基于这些决策的合理性向这些彼此竞争的决策分配分数。出于这一原因,在该模型中使用 Beam Search (集束搜索)就变得十分重要。不是直接取每个时间点上的最优决定,而是在每一步都保留多个部分性假设。只有当存在多个得分更高的假设的时候,一个假设才会被抛弃。下图将展示的,是“I booked a ticket to Google”这句话经过从左到右的决策过程而产生的简单句法分析。

而且,正如我们在论文中所描述的,十分重要的一点是,要把学习和搜索紧密整合起来才能取得最高的预测准确度。Parsey McParseface 和其他 SyntaxNet 模型是我们用谷歌的 TensorFlow 框架训练过的最复杂的网络结构。通过利用谷歌支持的 Universal Treebanks 项目中的数据,你也可以在自己的机器上训练句法分析模型。

 

 Parsey McParseface 的准确度到底有多高?

 

在(从具有二十年历史的宾大树库Penn Treebank中)随机抽取的英语新闻句子构成的标准测试中,Parsey McParseface 在提取词之间的个体依存关系时的准确率超过 94%,这打败了我们自己先前的最高水平,也超过了任何以前的方法。尽管在文献中并没有关于人类的句法分析成绩的明确研究,我们从我们内部的句法标注项目中了解到,那些在该任务上受过训练的语言学家在 96-97% 的情况下能达成一致。这说明,我们正在接近人类的水平——不过这仍然限于那些格式良好的文本。按照我们从 Google WebTreebank (谷歌网络树库,发布于 2011 年)中所学到的,那些从互联网上获得的句子要远远更难分析。在该网络数据集上,Parsey McParseface 只取得了略高于 90% 的句法分析准确率。

 

尽管准确率还不够完美,它已经足够高,能够用于许多应用程序了。目前,错误的主要来源是像上面描述过的介词短语附着歧义这样的情况,对这些情况的处理要求对现实世界的知识(例如,“街道不太可能位于汽车之内”)和深度语境推理。机器学习(特别是神经网络)已在解决这些歧义方面取得了显著的进展。不过我们仍想做进一步的工作:我们想要发展出一些方法,这些方法能够学习现实世界知识,也能够在所有语言和语境中都取得同样好的自然语言理解。

 

想试试吗,请阅读 SyntaxNet 的代码。并下载 Parsey McParseface 句法分析模型。主要研发者Chris Alberti, David Weiss, Daniel Andor, Michael Collins 和 Slav Petrov 祝你成功。

DeepMind成员、谷歌资深员工:神经网络序列学习突破及发展

2016-05-02 新智元

文章来源:O’Reilly 报告《The Future of Machine Intelligence)

作者:David Beyer

题目:Oriol : Sequence-to-Sequence Machine Learning

下载: future-of-machine-intelligence

【新智元导读】谷歌CEO在给投资人的信中写道谷歌搜索将更具有情景意识,其关键技术自然是深度学习。本文中,谷歌资深员工、DeepMind 成员 Oriol Vinyals 全面剖析神经网络序列学习的优势、瓶颈及解决方案。他指出机器翻译实质上是基于序列的深度学习问题,其团队希望用机器学习替代启发式算法,最后推测机器阅读并理解文本将在未来几年实现。

文章来源:O’Reilly 报告《The Future of Machine Intelligence)

作者:David Beyer

题目:Oriol Vinyals: Sequence-to-Sequence Machine Learning

关注新智元公众号,回复“0502”下载报告全文

受访者 Oriol Vinyals 是 Google 的研究科学家,在 DeepMind 团队工作,曾前在 Google Brain 团队工作。他在加州大学伯克利分校拿到 EECS 博士学位,在加州大学圣地亚哥分校拿到硕士学位。

要   点
使用神经网络的序列到序列学习(Sequence-to-sequence learning)在一些领域拥有最前沿的表现,比如机器翻译。

虽然很强大,序列到序列的学习方法也受到一些因素的制约,包括计算能力。长短期记忆(LSTM)在推动该领域前进方面作了很大贡献。

除了图像和文本理解,深度学习模型可以学会为一些著名的算法难题“编写”解决方案,其中包括邮差问题(Salesman Problem)。

机器翻译是基于序列的深度学习问题

【O’Reilly】让我们先了解一下你的背景吧。

【Oriol Vinyals】我来自西班牙巴塞罗那,在那里我完成了数学和通信工程的本科学习。很早,我就知道自己想要到美国学习 AI。我在卡耐基梅隆大学待了9个月,在那里我完成了本科毕业论文。之后我在加州大学圣地亚哥分校拿到硕士学位,然后 2009年在伯克利拿到博士学位。

读博期间,在 Google 实习时,我遇到了 Geoffrey Hinton 并和他一起工作;这段经历催化了我对深度学习的兴趣。加上我在微软和 Google 愉快的实习经历,当时我便下定决心要在产业界工作。2013 年我全职加入 Google。我起初对语音识别和优化 (重点放在自然语言处理和理解上) 有着浓厚的兴趣,后来转到使用深度学习解决这些以及别的问题这方面,包括最近基于数据来让算法自动学习的工作。

【O’Reilly】能不能谈一下你的关注点的变化,既然你离开了语言识别领域。现在最让你兴奋的是哪些领域?

【Oriol Vinyals】我的语言识别背景激发了我对序列的兴趣。最近,Ilya Sutskever, Quoc Le,还有我发表了一篇文章,是关于序列到序列映射的,可以使用循环神经网络(recurrent neural net) 进行从法语到英语的机器翻译

作为背景,监督学习在输入和输出是矢量的情形下取得了成功。往这些经典的模型输入图片,可以输出相应的类别标签。直到不久前,我们还不能通过输入图片就得到一个单词序列作为对这幅图片的描述。目前的快速进展是得益于可以获取带有图片描述的高质量数据集 (MSCOCO),以及与此并行的循环神经网络的复兴。

我们的工作把机器翻译问题重塑为基于序列的深度学习问题。结果表明深度学习可以把英语的单词序列映射为西班牙语的单词序列。由于深度学习令人吃惊的能力,我们可以相当快地达到领域前沿水平。这些结果本身暗示了新的应用,比如,自动把视频提炼成四个描述性句子。

序列到序列的瓶颈及解决方法 

【O’Reilly】序列到序列这种方法在什么地方工作得不好?

 

【Oriol Vinyals】假设你要把一个英语句子翻译成法语。你可以使用一个巨大的政治言论和辩论语料库作为训练数据。应用得当的话,你可以把政治言论转化为任何别的语言。但是,当你试图把——比如说——莎士比亚式的英语——翻译成法语的时候,你会遇到问题。这种领域切换对深度学习方法压力比较大,而传统机器翻译系统是基于规则的,这让它们能适应这种切换。

还有更多的难点。当序列长度超过一定值时,我们缺乏相应的计算能力。当前的模型可以把长度为 200 的序列与对应的同样长度的序列匹配。当序列变长,运行时间也变长。虽然目前我们被局限于相对较短的文档,我相信随着时间推移这个限制会越来越宽松。正如 GPU 压缩了大而复杂的模型的运行时间,内存和计算能力的提高会让可计算的序列越来越长。

除了计算的瓶颈,更长的序列还带来了有趣的数学问题。若干年前 Hochreiter 引入了梯度消失的概念。当你阅读数千个单词,你很容易忘掉三千个单词前的信息;如果不记得第三章的关键情节转换,(小说的) 结局就失去意义。从结果上讲,挑战来自记忆。循环神经网络一般能记住 10 到 15 个词。但如果你把一个矩阵乘 15 次,输出会收缩到 0。换句话说,梯度消失,学习变得不可能。

 

这个问题的一种重要解决方案依赖于长短期记忆 (LSTM)。这种结构对循环神经网络做了聪明的修改,让它们能记住远超正常极限的东西。我见过能记住 300 到 400 个词的 LSTM。虽然已经相当长了,这样的增长只是个开始,以后的神经网络将能处理日常生活规模的文本。

退一步讲,近几年我们看到出现了一些处理记忆问题的模型。我个人尝试过添加这种记忆到神经网络:与其把所有东西塞进循环神经网络的隐含态,记忆让你回忆起之前见过的词,从而帮助解决手头的优化任务。虽然这些年进展迅速,更深层的、关于知识表达究竟意味着什么这一挑战仍然存在,并且其本身仍旧是一个开放问题。尽管如此,我相信接下来我们会看到沿着这些方向的重大进展。

用机器学习代替启发式算法

【O’Reilly】让我们换个话题,谈谈你在算法生成方面的工作。你能不能讲讲这些努力背后的历史和动机?

【Oriol Vinyals】一个展示监督学习能力的经典练习涉及到把一些给定点分割为不同类别:这是 A 类,这是 B 类,等等。XOR (异或) (the“exclusive or” logical connective) 问题特别有教益。目标是要学会异或操作,也就是,给定两个二进制位作为输入,学习正确的输出。精确地讲,这涉及到两个位也就是四个实例:00,01,10,11。对于这些例子,输出是 0,1,1,0。这个问题不是线性模型能解决的,但深度学习可以。即便如此,目前计算能力的限制排除了更复杂的问题。

 

最近,Wojciech Zaremba (我们组的一个实习生) 发表了一篇文章,标题是“Learningto Execute”,描述了一个基于循环神经网络的从 Python 程序到执行这些程序的结果的映射。这个模型可以仅仅通过阅读源代码来预测 Python 程序的结果。这个问题虽然看起来简单,提供了一个良好开端。于是我把注意力转向一个 NP-hard 问题。

 

我们考虑的是一个高度复杂且资源需求高的方法,用来求解经过所有点的最短路径的问题,也就是著名的邮差问题。这个问题从提出开始,就吸引了大量解法;人们发明了各种启发式算法,在效率和精度之间求得平衡。在我们的情形,我们研究了深度学习系统是否能仅仅基于训练数据推断出与已有文献比肩的启发式算法。

出于效率的考虑,我们只考虑 10 个城市,而不是常见的10000 或 100000 个。我们的训练集输入城市位置,输出最短路径。就这样。我们不想让网络知晓任何别的关于这个问题的假设。

成功的神经网络应该能再现遍历所有点且最小化路程的行为。事实上,在一个可以称作奇迹的时刻,我们发现它能做到。

我应该补充一下,输出可能不是最优,因为毕竟是概率性的;但这是个好的开始。我们希望把这个方法应用到一些新问题。目标不是为了替换现有的、人工编码的解决方案,而是,我们要用机器学习代替启发式算法。

【O’Reilly】这会最终让我们成为更好的程序员吗?

【Oriol Vinyals】比如在编程竞赛中。开始是一段问题陈述,用直白的英语写:“在这个程序中,你需要找出 A,B,C,在 X,Y,以及 Z 的前提下。” 你编码你的解决方案,然后在服务器上测试。与此不同的是,想象一下,一个神经网络读入这样一个自然语言写的问题陈述,然后学到一个至少能给出近似解的算法,甚至能给出精确解。这个图景可能听起来太遥远。记住,仅仅几年前,读入 Python 程序然后输出答案也是听起来相当不靠谱的。

 未来几年机器能阅读并理解文本

【O’Reilly】你怎么看待接下来五年你的工作会如何进展?最大的未解决问题有哪些?

【Oriol Vinyals】也许五年的时间有点紧,但机器阅读并理解一本书这样的事不会离我们很远。类似地,我们可以预期看到机器通过从数据学习来回答问题,而不是基于给定的规则集合。现在如果我问你一个问题,你打开 Google 开始搜索;几次尝试后你可能得到答案。跟你一样,机器应该能返回一个答案作为对某个问题的响应。我们已经有沿着这个方向基于紧凑数据集的模型。更往前的挑战是深刻的:你如何区分正确和错误的答案?如何量化正确和错误?这些以及别的重要问题决定未来研究的进程。

谷歌搜索算法如何排名医疗广告?

2016-05-02 新智元

 新智元原创1

【新智元导读】青年魏则西的不幸病逝激起了国内公众对搜索引擎虚假医疗网络广告问题的热议。提到搜索引擎,必须想到谷歌,那么谷歌是如何处理医疗广告的呢,答案是使用机器学习的RankBrain算法。

青年魏则西的不幸病逝,激起了国内公众对搜索引擎虚假医疗网络广告问题的热议。根据《商业价值》微信公众号今日文章《谷歌也曾涉足医疗广告,美国司法是如何监管的呢?》,可以发现在谷歌搜索“滑膜肉瘤”也会出医疗广告,但都有明显的“Ad”标识。同时,与百度相比,谷歌的付费广告并不影响排名。

谷歌关于滑膜肉瘤治疗的搜索广告,有明确的广告标志。来源:商业价值

此外,《商业价值》文中提到,根据谷歌的搜索广告政策,要投放药品广告需要获得 FDA 以及美国药房理事会(NABP)认证。也就是说,只有获得政府审批的正规网上药店、药品与治疗才能在网站投放药品类广告。同时,谷歌的自动广告过滤机制,在很大程度上也能有效杜绝虚假医疗广告出现。根据谷歌发布的报告,他们 2015 年总计预先屏蔽了 7.8 亿条违规广告,封杀 21.4 万家广告商,其中包括 1250 万条违规的医疗和药品广告,涉及药品未获批准或者虚假误导性宣传等原因。

谷歌如何用算法排名

据统计,每天向 Google 提交的查询中有约 15% 是其未曾见过的。公司的资深研究科学家 Greg Corrado 透露,为了更好回答这些问题,Google 利用了 RankBrain 来将海量的书面语嵌入到计算机可以理解的向量里面。

如果 RankBrain 看到自己不熟悉的单词或短语,它会去猜测其类似的意思并对结果进行相应过滤,从而有效地处理一些从未见过的搜索查询。比方说 RankBrain 能够有效回答 “What’ s the title of the consumer at the highest level of a food chain?(食物链当中最高级的消费者的头衔叫做什么?)” 这样的问题。

对于 Google 的搜索处理机制来说,RankBrain 只是为其搜索算法提供输入的数百个信号之一,但这种信号跟别的信号的不同之处在于它懂得学习,而别的只是别人在信息获取中的发现和洞察。Google 内部曾让做算法的工程师人工去猜测搜索算法会选择哪个页面作为排名第一的结果,其准确率为 70%,然后 RankBrain 去做了同样的事情,准确率达到了 80%,超过了做算法的工程师的平均水平。

随着时间的推移,RankBrain 可能能够处理越来越多的当前通过手写代码分析来改善 Google 算法的各种各样的信号。Google 的各项业务也会发展地越来越智能。机器学习将会以各种有意义的方式整合进 Google 的搜索引擎中。Google 这所有的举动将会继续保持其搜索引擎的领头地位。

RankBrain 运行原理解析

RankBrain 是 Google 蜂鸟搜索算法的一部分。蜂鸟是整个搜索算法,就好比车里面有个引擎。引擎本身可能由许多部分组成,比如滤油器、燃油泵、散热器等。同理,蜂鸟也由多个部分组成,RankBrain就是其中一个组成部分。

蜂鸟同时包含其他的部分,这些名字对 SEO圈的人来说已经耳熟能详了,比如 Panda、 Penguin 和 Payday 用于垃圾邮件过滤, Pigeon 用于优化本地结果, Top Heavy 用于给广告太多的页面降级,Mobile Friendly 用于给移动友好型页面加分,Pirate 用于打击版权侵犯。

Google 用于排序的“信号”是什么?

Google 使用信号来决定如何为网页排序。比如,它会读取网页上的词语,那么词语就是一个信号。如果某些词语是粗体,那么这又是一个值得注意的信号。计算的结果作为PageRank的一部分,给一个网页设定一个PageRank分数,这作为一个信号。如果一张网页被检测到是移动友好型的,那么这又会成为一个信号。所有的这些信号都由蜂鸟算法中的各个部分处理,最后决定针对不同搜索返回哪些网页。

一共有多少种信号?

Google 称进行评估的主要排序信号大约有 200多种,反过来, 可能有上万种变种信号或者子信号。如果你想有一个更直观的排序信号向导,来看看 Google SEO成功因素元素周期表:

RankBrain到底做什么?

从与 Google 的来往电子邮件之中,RankBrain 主要用于翻译人们可能不清楚该输入什么确切词语的搜索词条。

Google 很早就找到不根据具体词条搜索页面的方式。比如,许多年前,如果你输入“鞋”(shoe), Google 可能不会找到那些有“鞋”(shoes)的页面,因为从技术上来说这是两个不同的词汇,但是“stemming”使得 Google 变得更聪明,让引擎了解shoes的词根是shoe,就像“running”的词根是“run”。 Google 同样了解同义词,因此,如果你搜索“运动鞋”,它可能知道你想找“跑鞋”。它甚至有概念性的知识,知道哪些网页是关于“苹果”公司,哪些是关于水果“苹果”的。

参考资料:

http://mp.weixin.qq.com/s?__biz=MTA2MTMwNjYwMQ==&mid=2650693625&idx=1&sn=8ab532faa66e69cc447e250f58807dda&scene=1&srcid=0502LFwayyLBIMhASaZX4zrt#rd

Inside The Mind That Built Google Brain: On Life, Creativity, And Failure

Source: The Huffington Post

(Photo: Jemal Countess/Getty)

Here’s a list of universities with arguably the greatest computer science programs: Carnegie Mellon, MIT, UC Berkeley, and Stanford. These are the same places, respectively, where Andrew Ng received his bachelor’s degree, his master’s, his Ph.D., and has taught for 12 years.

Ng is an icon of the artificial intelligence world with the pedigree to match, and he is not yet 40 years old. In 2011, he founded Google Brain, a deep-learning research project supercharged by Google’s vast stores of computing power and data. Delightfully, one of its most important achievements came when computers analyzing scores of YouTube screenshots were able to recognize a cat. (The New York Timesheadline: “How Many Computers to Identify a Cat? 16,000.”) As Ng explained, “The remarkable thing was that [the system] had discovered the concept of a cat itself. No one had ever told it what a cat is. That was a milestone in machine learning.”

Ng exudes a cheerful but profound calm. He happily discusses the various mistakes and failures of his career, the papers he read but didn’t understand. He wears identical blue oxford shirts each and every day. He is blushing but proud when a colleague mentions his adorable robot-themed engagement photo shoot with his now-wife, a surgical roboticist named Carol Reiley (note his shirt in the photo).

One-on-one, he speaks with a softer voice than anyone you know, though this has not hindered his popularity as a lecturer. In 2011, when he posted videos from his own Stanford machine learning course on the web, over 100,000 people registered. Within a year, Ng had co-founded Coursera, which is today the largest provider of open online courses. Its partners include Princeton and Yale, top schools in China and across Europe. It is a for-profit venture, though all classes are accessible for free. “Charging for content would be a tragedy,” Ng has said.

(Photo: Colson Griffith)

Then, last spring, a shock. Ng announced he was departing Google and stepping away from day-to-day involvement at Coursera. The Chinese tech giant Baidu was establishing an ambitious $300 million research lab devoted to artificial intelligence just down the road from Google’s Silicon Valley headquarters, and Andrew Ng would head it up.

At Baidu, as before, Ng is trying to help computers identify audio and images with incredible accuracy, in realtime. (On Tuesday, Baidu announced it had achieved the world’s best results on a key artificial intelligence benchmark related to image identification, besting Google and Microsoft.) Ng believes speech recognition with 99 percent accuracy will spur revolutionary changes to how humans interact with computers, and how operating systems are designed. Simultaneously, he must help Baidu work well for the millions of search users who are brand new to digital life. “You get queries [in China] that you just wouldn’t get in the United States,” Ng explained. “For example, we get queries like, ‘Hi Baidu, how are you? I ate noodles at a corner store last week and they were delicious. Do you think they’re on sale this weekend?’ That’s the query.” Ng added: “I think we make a good attempt at answering.”

Elon Musk and Stephen Hawking have been sounding alarms over the potential threat to humanity from advanced artificial intelligence. Andrew Ng has not. “I don’t work on preventing AI from turning evil for the same reason that I don’t work on combating overpopulation on the planet Mars,” he has said. AI is many decades away (if not longer) from achieving something akin to consciousness, according to Ng. In the meantime, there’s a far more urgent problem. Computers enhanced by machine learning are eliminating jobs long done by humans. The trend is only accelerating, and Ng frequently calls on policymakers to prepare for the socioeconomic consequences.

At Baidu’s new lab in Sunnyvale, Calif., we spoke to Andrew Ng for Sophia, a HuffPost project to collect life lessons from fascinating people. He explained why he thinks “follow your passion” is terrible career advice and he shared his strategy for teaching creativity; Ng discussed his failures and his helpful habits, the most influential books he’s read, and his latest thoughts on the frontiers of AI.

You recently said, “I’ve seen people learn to be more creative.” Can you explain?

The question is, how does one create new ideas? Is it those unpredictable lone acts of genius, people like Steve Jobs, who are special in some way? Or is it something that can be taught and that one can be systematic about?

I believe that the ability to innovate and to be creative are teachable processes. There are ways by which people can systematically innovate or systematically become creative. One thing I’ve been doing at Baidu is running a workshop on the strategy of innovation. The idea is that innovation is not these random unpredictable acts of genius, but that instead one can be very systematic in creating things that have never been created before.

In my own life, I found that whenever I wasn’t sure what to do next, I would go and learn a lot, read a lot, talk to experts. I don’t know how the human brain works but it’s almost magical: when you read enough or talk to enough experts, when you have enough inputs, new ideas start appearing. This seems to happen for a lot of people that I know.

When you become sufficiently expert in the state of the art, you stop picking ideas at random. You are thoughtful in how to select ideas, and how to combine ideas. You are thoughtful about when you should be generating many ideas versus pruning down ideas.

Now there is a challenge still — what do you do with the new ideas, how can you be strategic in how to advance the ideas to build useful things? That’s another whole piece.

Can you talk about your information diet, how you approach learning?

I read a lot and I also spend time talking to people a fair amount. I think two of the most efficient ways to learn, to get information, are reading and talking to experts. So I spend quite a bit of time doing both of them. I think I have just shy of a thousand books on my Kindle. And I’ve probably read about two-thirds of them.

At Baidu, we have a reading group where we read about half a book a week. I’m actually part of two reading groups at Baidu, each of which reads about half a book a week. I think I’m the only one who’s in both of those groups [laughter]. And my favorite Saturday afternoon activity is sitting by myself at home reading.

 

 

Let me ask about your early influences. Is there something your parents did for you that many parents don’t do that you feel had a lasting impact on your life?

I think when I was about six, my father bought a computer and helped me learn to program. A lot of computer scientists learned to program from an early age, so it’s probably not that unique, but I think I was one of the ones that was fortunate to have had a computer and could learn to start to program from a very young age.

Unlike the stereotypical Asian parents, my parents were very laid back. Whenever I got good grades in school, my parents would make a fuss, and I actually found that slightly embarrassing. So I used to hide them. [Laughter] I didn’t like showing my report card to my parents, not because I was doing badly but because of their reaction.

I was also fortunate to have gotten to live and work in many different places. I was born in the U.K., raised in Hong Kong and Singapore, and came to the U.S. for college. Then for my own studies, I have degrees from Carnegie Mellon, MIT, and Berkeley, and then I was at Stanford.

I was very fortunate to have moved to all these places and gotten to meet some of the top people. I interned at AT&T Bell Labs when it existed, one of the top labs, and then at Microsoft Research. I got to see a huge diversity of points of view.

Is there anything about your education or your early career that you would have done differently? Any lessons you’ve learned that people could benefit from?

I wish we as a society gave better career advice to young adults. I think that “follow your passion” is not good career advice. It’s actually one of the most terrible pieces of career advice we give people.

If you are passionate about driving your car, it doesn’t necessarily mean you should aspire to be a race car driver. In real life, “follow your passion” actually gets amended to, “Follow your passion of all the things that happen to be a major at the university you’re attending.”

But often, you first become good at something, and then you become passionate about it. And I think most people can become good at almost anything.

So when I think about what to do with my own life, what I want to work on, I look at two criteria. The first is whether it’s an opportunity to learn. Does the work on this project allow me to learn new and interesting and useful things? The second is the potential impact. The world has an infinite supply of interesting problems. The world also has an infinite supply of important problems. I would love for people to focus on the latter.

I’ve been fortunate to have repeatedly been able to find opportunities that had a lot of potential for impact and also gave me fantastic opportunities to learn. I think young people optimizing for these two things will often have the best careers.

Our team here has a mission of developing hard AI technologies, advanced AI technologies that let us impact hundreds of millions of users. That’s a mission I’m genuinely excited about.

 

 

Do you define importance primarily by the number of people who are impacted?

No, I don’t think the number is the only thing that’s important. Changing hundreds of millions of people’s lives in a significant way, I think that’s the level of impact that we can reasonably aspire to. That is one way of making sure we do work that isn’t just interesting, but that also has an impact.

You’ve talked previously about projects of yours that have failed. How do you respond to failure?

Well, it happens all the time, so it’s a long story. [Laughter] A few years ago, I made a list in Evernote and tried to remember all the projects I had started that didn’t work out, for whatever reason. Sometimes I was lucky and it worked out in a totally unexpected direction, through luck rather than skill.

But I made a list of all the projects I had worked on that didn’t go anywhere, or that didn’t succeed, or that had much less to show for it relative to the effort that we put into it. Then I tried to categorize them in terms of what went wrong and tried to do a pretty rigorous post mortem on them.

So, one of these failures was at Stanford. For a while we were trying to get aircraft to fly in formation to realize fuel savings, inspired by geese flying in a V-shaped formation. The aerodynamics are actually pretty solid. So we spent about a year working on making these aircraft fly autonomously. Then we tried to get the airplanes to fly in formation.

But after a year of work, we realized that there is no way that we could control the aircraft with sufficient accuracy to realize fuel savings. Now, if at the start of the project we had thought through the position requirements, we would have realized that with the small aircraft we were using, there is just no way we could do it. Wind gusts will blow you around far more than the precision needed to fly the aircraft in formation.

So one pattern of mistakes I’ve made in the past, hopefully much less now, is doing projects where you do step one, you do step two, you do step three, and then you realize that step four has been impossible all along. I talk about this specific example in the strategy innovation workshop I talked about. The lesson is to de-risk projects early.

I’ve become much better at identifying risks and assessing them earlier on. Now when I say things like, “We should de-risk a project early,” everyone will nod their head because it’s just so obviously true. But the problem is when you’re actually in this situation and facing a novel project, it’s much harder to apply that to the specific project you are working on.

The reason is these sorts of research projects, they’re a strategic skill. In our educational system we’re pretty good at teaching facts and procedures, like recipes. How do you cook spaghetti bolognese? You follow the recipe. We’re pretty good at teaching facts and recipes.

But innovation or creativity is a strategic skill where every day you wake up and it’s a totally unique context that no one’s ever been in, and you need to make good decisions in your completely unique environment. So as far as I can tell, the only was we know way to teach strategic skills is by example, by seeing tons of examples. The human brain, when you see enough examples, learns to internalize those rules and guidelines for making good strategic decisions.

Very often, what I find is that for people doing research, it takes years to see enough examples and to learn to internalize those guidelines. So what I’ve been experimenting with here is to build a flight simulator for innovation strategy. Instead of having everyone spend five years before you see enough examples, to deliver many examples in a much more compressed time frame.

Just as in a flight simulator, if you want to learn to fly a 747, you need to fly for years, maybe decades, before you see any emergencies. But in a flight simulator, we can show you tons of emergencies in a very compressed period of time and allow you to learn much faster. Those are the sorts of things we’ve been experimenting with.

When this lab first opened, you noted that for much of your career you hadn’t seen the importance of team culture, but that you had come to realize its value. Several months in, is there anything you’ve learned about establishing the right culture?

A lot of organizations have cultural documents like, “We empower each other,” or whatever. When you say it, everyone nods their heads, because who wouldn’t want to empower your teammates. But when they go back to their desks five minutes later, do they actually do it? It’s difficult for people to bridge the abstract and the concrete.

At Baidu, we did one thing for the culture that I think is rare. I don’t know of any organization that has done this. We created a quiz that describes to employees specific scenarios — it says, “You’re in this situation and this happens. What do you do: A, B, C, or D?”

No one has ever gotten full marks on this quiz the first time out. I think the quiz interactivity, asking team members to apply specifics to hypothetical scenarios, has been our way of trying to connect the abstract culture with the concrete; what do you actually do when a teammate comes to you and does this thing?

What are some books that had a substantial impact on your intellectual development?

Recently I’ve been thinking about the set of books I’d recommend to someone wanting to do something innovative, to create something new.

The first is “Zero to One“ by Peter Thiel, a very good book that gives an overview of entrepreneurship and innovation.

We often break down entrepreneurship into B2B (“business to business,” i.e., businesses whose customers are other businesses) and B2C (“business to consumer”). For B2B, I recommend “Crossing the Chasm.” For B2C, one of my favorite books is “The Lean Startup,” which takes a narrower view but it gives one specific tactic for innovating quickly. It’s a little narrow but it’s very good in the area that it covers.

Then to break B2C down even further, two of my favorites are “Talking to Humans,” which is a very short book that teaches you how to develop empathy for users you want to serve by talking to them. Also, “Rocket Surgery Made Easy.” If you want to build products that are important, that users care about, this teaches you different tactics for learning about users, either through user studies or by interviews.

Then finally there is “The Hard Thing about Hard Things.“ It’s a bit dark but it does cover a lot of useful territory on what building an organization is like.

For people who are trying to figure out career decisions, there’s a very interesting one: “So Good They Can’t Ignore You.” That gives a valuable perspective on how to select a path for one’s career.

Do you have any helpful habits or routines?

I wear blue shirts every day, I don’t know if you know that. [laughter] Yes. One of the biggest levers on your own life is your ability to form useful habits.

When I talk to researchers, when I talk to people wanting to engage in entrepreneurship, I tell them that if you read research papers consistently, if you seriously study half a dozen papers a week and you do that for two years, after those two years you will have learned a lot. This is a fantastic investment in your own long term development.

But that sort of investment, if you spend a whole Saturday studying rather than watching TV, there’s no one there to pat you on the back or tell you you did a good job. Chances are what you learned studying all Saturday won’t make you that much better at your job the following Monday. There are very few, almost no short-term rewards for these things. But it’s a fantastic long-term investment. This is really how you become a great researcher, you have to read a lot.

People that count on willpower to do these things, it almost never works because willpower peters out. Instead I think people that are into creating habits — you know, studying every week, working hard every week — those are the most important. Those are the people most likely to succeed.

For myself, one of the habits I have is working out every morning for seven minutes with an app. I find it much easier to do the same thing every morning because it’s one less decision that you have to make. It’s the same reason that my closet is full of blue shirts. I used to have two color shirts actually, blue and magenta. I thought that’s just too many decisions. [Laughter] So now I only wear blue shirts.

 

You’ve urged policymakers to spend time thinking about a future where computing and robotics have eliminated some substantial portion of the jobs people have now. Do you have any ideas about possible solutions?

It’s a really tough question. Computers are good at routine repetitive tasks. Thus far, the main things that computers have been good at automating are tasks where you kind of do the same thing day after day.

Now this can be at multiple points on the spectrum. Humans work on an assembly line, making the same motion for months on end, and now robots are doing some of that work. A midrange challenge might be truck-driving. Truck drivers do very similar things day after day, so computers are trying to do that too. It’s harder than most people think, but automated driving might happen in the next decade or so, we don’t know. Then, even higher-end things, like some radiologists read the same types of x-rays over and over each day. Again, computers may have traction in those areas.

But for the social tasks which are non-routine and non-repetitive, those are the tasks that humans will be better at than computers for quite a period of time, I think. In many of our jobs we do different things every day. We meet different people, we have to arrange different things, solve problems differently. Those things are relatively difficult for computers to do, for now.

The challenge that faces us is that, when the U.S. transformed from an agricultural to a manufacturing and services economy, we had people move from one routine task, such as farming, to a different routine task, such as manufacturing or working call service centers. A large fraction of the population has made that transition, so they’ve been okay, they’ve found other jobs. But many of their jobs are still routine and repetitive.

The challenge that faces us is to find a way to scalably teach people to do non-routine non-repetitive work. Our education system, historically, has not been good at doing that at scale. The top universities are good at doing that for a relatively modest fraction of the population. But a lot of our population ends up doing work that is important but also routine and repetitive. That’s a challenge that faces our educational system.

I think it can be solved. That’s one of the reasons why I’ve been thinking about teaching innovation strategy, teaching creativity strategy. We need to enable a lot of people to do non-routine, non-repetitive tasks. These tactics for teaching innovation and creativity, these flight simulators for innovation, could be one way to get there. I don’t think we’ve figured out yet how to do it, but I’m optimistic it can be done.

You’ve said, “Engineers in China work much harder than the average Silicon Valley engineer. Engineers in Silicon Valley at startups work really hard. At mature companies, I don’t see the same intensity as you do in startups and at Baidu.” Why do you think that is?

I don’t know. I think the individual engineers in China are great. The individual engineers in Silicon Valley are great. The difference I think is the company. The teams of engineers at Baidu tend to be incredibly nimble.

There is much less appreciation for the status quo in the Chinese internet economy and I think there’s a much bigger sense that all assumptions can be challenged and everything is up for grabs. The Chinese internet ecosystem is very dynamic. Everyone sees huge opportunity, everyone sees massive competition. Stuff changes all the time. New inventions arise, and large companies will one day suddenly jump into a totally new business sector.

To give you an idea, here in the United States, if Facebook were to start a brand new web search engine, that might feel like a slightly strange thing to do. Why would Facebook build a search engine? It’s really difficult. But that sort of thing is much more thinkable in China, where there is more of an assumption that there will be new creative business models.

 

 

This seems to suggests a different management culture, where you can make important decisions quickly and have them be intelligent and efficient and not chaotic. Is Baidu operating in a unique way that you feel is particularly helpful to its growth?

Gosh, that’s a good question. I’m trying to think what to point to. I think decision making is pushed very far down in the organization at Baidu. People have a lot of autonomy, and they are very strategic. One of the things I really appreciate about the company, especially the executives, is there’s a very clear-eyed view of the world and of the competition.

When executives meet, and the way we speak with the whole company, there is a refreshing absence of bravado. The statements that are made internally — they say, “We did a great job on that. We’re not so happy with those things. This is going well. This is not going well. These are the things we think we should emphasize. And let’s do a post-mortem on the mistakes we made.” There’s just a remarkable lack of bravado, and I think this gives the organization great context on the areas to innovate and focus on.

You’re very focused on speech recognition, among other problems. What are the challenges you’re facing that, when solved, will lead to a significant jump in the accuracy of speech recognition technology?

We’re building machine learning systems for speech recognition. Some of the machine learning technologies we’re using now have been around for decades. It was only in the last several years that they’ve really taken off.

Why is that? I often make an analogy to building a rocket ship. A rocket ship is a giant engine together with a ton of fuel. Both need to be really big. If you have a lot of fuel and a tiny engine, you won’t get off the ground. If you have a huge engine and a tiny amount of fuel, you can lift up, but you probably won’t make it to orbit. So you need a big engine and a lot of fuel.

The reason that machine learning is really taking off now is that we finally have the tools to build the big rocket engine — that is giant computers, that’s our rocket engine. And the fuel is the data. We finally are getting the data that we need.

The digitization of society creates a lot of data and we’ve been creating data for a long time now. But it was just in the last several years we’ve been finally able to build big enough rocket engines to absorb the fuel. So part of our approach, not the whole thing, but a lot of our approach to speech recognition is finding ways to build bigger engines and get more rocket fuel.

For example, here is one thing we did, a little technical. Where do you get a lot of data for speech recognition? One of the things we did was we would take audio data. Other groups use maybe a couple thousand hours of data. We use a hundred thousand hours of data. That is much more rocket fuel than what you see in academic literature.

Then one of the things we did was, if we have an audio clip of you saying something, we would take that audio clip of you and add background noise to it, like a clip recorded in a cafe. So we synthesize an audio clip of what you would sound like if you were speaking in a cafe. By synthesizing your voice against lots of backgrounds, we just multiply the amount of data that we have. We use tactics like that to create more data to feed to our machines, to feed to our rocket engines.

One thing about speech recognition: most people don’t understand the difference between 95 and 99 percent accurate. Ninety-five percent means you get one-in-20 words wrong. That’s just annoying, it’s painful to go back and correct it on your cell phone.

Ninety-nine percent is game changing. If there’s 99 percent, it becomes reliable. It just works and you use it all the time. So this is not just a four percent incremental improvement, this is the difference between people rarely using it and people using it all the time.

So what is the hurdle to 99 percent at this point?

We need even bigger rocket engines and we still need even more rocket fuel. Both are still constrained and the two have to grow together. We’re still working on pushing that boundary.