原微软全球副总裁李开复演讲(3)课件

上传人:夏** 文档编号:568321338 上传时间:2024-07-24 格式:PPT 页数:45 大小:2.25MB
返回 下载 相关 举报
原微软全球副总裁李开复演讲(3)课件_第1页
第1页 / 共45页
原微软全球副总裁李开复演讲(3)课件_第2页
第2页 / 共45页
原微软全球副总裁李开复演讲(3)课件_第3页
第3页 / 共45页
原微软全球副总裁李开复演讲(3)课件_第4页
第4页 / 共45页
原微软全球副总裁李开复演讲(3)课件_第5页
第5页 / 共45页
点击查看更多>>
资源描述

《原微软全球副总裁李开复演讲(3)课件》由会员分享,可在线阅读,更多相关《原微软全球副总裁李开复演讲(3)课件(45页珍藏版)》请在金锄头文库上搜索。

1、UbiquitousComputing:AVisionforHowSpeechBecomesMainstreamKai-FuLeeCorporateVicePresidentMicrosoftCorporationPresentedatSpeechTEKOctoberPresentedatSpeechTEKOctober29,200229,2002Talk Outlinel lFour trends in the “Digital Decade”.l lNatural UI (speech & language UI) Vision.l lFour trends accelerate natu

2、ral UI.l lA technology roadmap to natural UI.The Digital DecadeThe Digital Decade200119901990GUIGUI19851985PCPCComputing Revolutions19951995InternetWindowsIE, IISMS-DOSTrends in the Digital Decade:Everything connectedl lMore computing and capacity.l lUbiquity of connected devices.l lStructured conte

3、nt.l lDistributed computing platform.2001 : The Digital Decadel lMore computing and capacity.Moores LawMoores LawCPUCPU2X / 18 months2X / 18 monthsBandwidthBandwidth3X / 18 months3X / 18 monthsDisk CapacityDisk Capacity3X / 18 months3X / 18 months2001 : The Digital Decadel lMore computing and capaci

4、tyl lUbiquity of connected devices.PCs, telephones, smart phones, televisions, carsPCs, telephones, smart phones, televisions, carsMoores Law applies here! (e.g., PocketPC, TabletPC)Moores Law applies here! (e.g., PocketPC, TabletPC)Standards on how devices “talk” to each otherStandards on how devic

5、es “talk” to each otherHTML, HTTP, XML, SOAP, UDDI, WSDL HTML, HTTP, XML, SOAP, UDDI, WSDL Metcalfes Law: Value of network = nodesMetcalfes Law: Value of network = nodes2 2. .Tablet PCs& LaptopsDigital Video CameraWeb PadsSecurity & SurveillanceVideoConferencingPhone &VoicemailTVAuto PCHiFi AudioGam

6、esPocket PCsInternet2001 : The Digital Decadel lMore computing and capacityl lUbiquity of connected devices.l lStructured content.XML = Universal standards for describing data.XML = Universal standards for describing data.XML makes content “readable” by programs.XML makes content “readable” by progr

7、ams.XML makes content more like databases than text.XML makes content more like databases than text.On Pocket PCOn Pocket PCOn PCOn PC2001 : The Digital Decadel lMore computing and capacity.l lUbiquity of connected devices.l lStructured content.l lDistributed computing Platform.Built on Standards (X

8、ML web services).Built on Standards (XML web services).Web Software TodayConventional Conventional BrowserBrowserBank AccountXML Web Services SoftwareConventional Conventional BrowserBrowserBank AccountStock TradingPersonal Finance PortalRich Rich ApplicationApplicationXMLXMLWeb ServiceWeb ServiceXM

9、LXMLWeb ServiceWeb Service2001 : The Digital Decadel lMore computing and capacityl lUbiquity of connected devices.l lStructured content.l lDistributed computing platform.Built on standards (XML web services).Built on standards (XML web services).Transparent to the end-user.Transparent to the end-use

10、r.One development & execution model.One development & execution model.Security and privacy critical.Security and privacy critical.Talk Outlinel lFour trends in the “Digital Decade”.l lNatural UI (speech & language UI) Vision.l lFour trends accelerate natural UI.l lA technology roadmap to natural UI.

11、The Vision for “Natural” UIl lUsers naturally articulate what they mean, on any device, to any application or web service, and have their intention interpreted and executed accurately.l lWhy NUI?Expressive Expressive more powerful. more powerful.Delegation Delegation more efficient. more efficient.N

12、atural Natural no learning. no learning.Scalable Scalable any device. any device.Natural UI Will Enablel lSmart SearchFind the Bill Gates book on futureFind the Bill Gates book on futurel lSmart HelpHow do I replace my printer cartridge?How do I replace my printer cartridge?l lQuestion AnsweringWhen

13、 is the Britney Spears concert?When is the Britney Spears concert?l lCommands / TasksSend flowers to mom on her birthdaySend flowers to mom on her birthdayl lPro-Active agentHold all calls unless its from my family.Hold all calls unless its from my family.Talk Outlinel lFour trends in the “Digital D

14、ecade”.l lNatural UI (speech & language UI) Vision.l lFour trends accelerate natural UI.l lA technology roadmap to natural UI.1. Moores Law & Speechl lMoores Law helps ASR accuracyLeveraging Moores Law + more data + research.Leveraging Moores Law + more data + research.Predictable 10% error reductio

15、n or more per year.Predictable 10% error reduction or more per year.Human-level performance possible in 10-20 years.Human-level performance possible in 10-20 years.TasksTasksASR error ASR error raterateHuman Human error rateerror rateASR-ASR-Human GapHuman GapFree style Free style transcriptiontrans

16、cription30%30%4%4%19 years19 yearsDigit stringsDigit strings0.7%0.7%0.009%0.009%41 years41 yearsAlphabet lettersAlphabet letters5%5%1%1%15 years15 yearsRead newspaper Read newspaper transcriptiontranscription3%3%0.9%0.9%11 years11 years1. Moores Law & Speechl lNo need to wait for 10 a 20 years.With

17、real systems, error reduction 10% / year.With real systems, error reduction 10% / year.Most applications dont need human-level performance.Most applications dont need human-level performance.Every year, new applications will be enabled:Every year, new applications will be enabled:Hierarchical Hierar

18、chical Natural language dialog. Natural language dialog.Fixed vocabulary Fixed vocabulary Natural dictation. Natural dictation.Limited commands Limited commands “How may I help you.” “How may I help you.”“ “To me, speech recognition will be a transforming To me, speech recognition will be a transfor

19、ming capability once it finally comes into being. Im talking capability once it finally comes into being. Im talking about when you can speak to your computer and it will about when you can speak to your computer and it will understand what youre saying in context.”understand what youre saying in co

20、ntext.”Gordon Moore, 2002Gordon Moore, 20022. Ubiquitous Devices & SpeechHighHighInternetInternetTVTVPhonePhonePDAPDAEase of text input (keyboard/pen)Ease of text input (keyboard/pen)Ease Ease of GUIof GUI(screen/(screen/Pointer)Pointer)LowLowHighHighPCPCTabletTabletPCPCScreenScreenPhonePhoneScreenS

21、creenPhonePhonePDAPDATabletTabletPCPCCarCarCarCarInternetInternetTVTV2. Ubiquitous Devices & SpeechPhonePhonePCPCScreenScreenPhonePhonePDAPDATabletTabletPCPCCarCarInternetInternetTVTVOpportunities for SpeechEase of text input (keyboard/pen)Ease of text input (keyboard/pen)Ease Ease of GUIof GUI(scre

22、en/(screen/Pointer)Pointer)HighHighHighHighLowLowSpeech-Only Command/ControlDictationMultimodal Command/Control2. Ubiquitous Devices & Speechl lIncreasing number of screen phones.Multimodal opportunity (speech-in, screen-out).Multimodal opportunity (speech-in, screen-out).l lIncreasing number of key

23、board-less devices.Dictation opportunity.Dictation opportunity.l lIncreasing number of mouse-less devices.Multimodal opportunity (speech+pen input).Multimodal opportunity (speech+pen input).DemonstrationSpeech on TabletPC2. Ubiquitous Devices & Speechl lIncreasing number of screen phones.Multimodal

24、opportunity (speech-in, screen-out).Multimodal opportunity (speech-in, screen-out).l lIncreasing number of keyboard-less devices.Dictation opportunity.Dictation opportunity.l lIncreasing number of mouse-less devices.Multimodal opportunity (speech+pen input).Multimodal opportunity (speech+pen input).

25、l lNeed for a model to program across devices!Different compute power, screen, peripherals.Different compute power, screen, peripherals.3. XML/Web Integration & Speech3. XML/Web Integration & Speechl lHuge industry momentum behind XML.Structured XML format.Structured XML format.“Database-like” capab

26、ilities.“Database-like” capabilities.Designed for “web programming”.Designed for “web programming”.Accessible by any device/OS/browser/app.Accessible by any device/OS/browser/app.l lImplications for speech:Integration of web and call center.Integration of web and call center.Structured storage / dat

27、abase is a core Structured storage / database is a core requirement for speech & language UI.requirement for speech & language UI.Web and Call Center TodayWeb CenterCall Center?Integrating Web and Call Center.NET Speech PlatformDemonstration Speech UI in .NET Web Environment Speech Device Recognitio

28、n Browser ApplicationPCPCPhonePhoneSmart PhoneSmart PhoneLocalLocalServerServerDistributedDistributedPCPCPhonePhoneSmart PhoneSmart PhoneX XSmart SearchSpeech Form in IESpeech-Only Interaction by TelephoneMultimodal Interaction on Smart PhoneLocalLocalServerServerDistributedDistributedLocalLocalServ

29、erServerX XX XPCPCPhonePhoneSmart PhoneSmart PhoneLocalLocalServerServerDistributedDistributedPCPCPhonePhoneSmart PhoneSmart PhoneX XSmart SearchSpeech Form in IESpeech-Only Interaction by TelephoneMultimodal Interaction on Smart PhoneLocalLocalServerServerDistributedDistributedLocalLocalServerServe

30、rX XX XPCPCPhonePhoneSmart PhoneSmart PhoneLocalLocalServerServerDistributedDistributedPCPCPhonePhoneSmart PhoneSmart PhoneX XSmart SearchSpeech Form in IESpeech-Only Interaction by TelephoneMultimodal Interaction on Smart PhoneLocalLocalServerServerDistributedDistributedLocalLocalServerServerX XX X

31、PCPCPhonePhoneSmart PhoneSmart PhoneLocalLocalServerServerDistributedDistributedPCPCPhonePhoneSmart PhoneSmart PhoneX XSmart SearchSpeech Form in IESpeech-Only Interaction by TelephoneMultimodal Interaction on Smart PhoneLocalLocalServerServerDistributedDistributedLocalLocalServerServerX XX XPCPCPho

32、nePhoneSmart PhoneSmart PhoneLocalLocalServerServerDistributedDistributedPCPCPhonePhoneSmart PhoneSmart PhoneX XSmart Search.Speech Form in IE.Speech-Only Interaction by Telephone.Multimodal Interaction on Smart Phone.LocalLocalServerServerDistributedDistributedLocalLocalServerServerX XX X4. Distrib

33、uted Computing Platform & Speech4. Distributed Computing Platform & Speechl lDistributed Computing Is Hard!Author once, use anywhere.Author once, use anywhere.Client, server, distributed one runtime?Client, server, distributed one runtime?Debugging?Debugging?l lSpeech Authoring is Hard!Appeal to web

34、 programmers.Appeal to web programmers.Grammar writing takes 70% of authoring.Grammar writing takes 70% of authoring.Debugging?Debugging?Need : Platform for Distributed Speechw/better ROIBusinessesUsersHappier & MoreApplicationsRicher & MoreRicherPlatform.NET Speech Platform & SDKl l.NET is about “c

35、onnected computing”Get client + server + distributed for free (.NET framework)Get client + server + distributed for free (.NET framework)Connect to Web assets for free (ASP, Webforms)Connect to Web assets for free (ASP, Webforms)Get multimodal for free (SALT).Get multimodal for free (SALT).l lFamili

36、ar programming model (VS.NET).Mobilize 7 million developers.Mobilize 7 million developers.l lMake apps & technologies interoperablel lGreat technologies, applications, tools:Technology & Tool Partners: Technology & Tool Partners: Microsoft, Speechworks, Intervoice.Microsoft, Speechworks, Intervoice.

37、Applications: Applications: 10 JDP partners now, 50+ beta partners in 6 months.10 JDP partners now, 50+ beta partners in 6 months.Tools in .NET Speech SDKl lPart of Visual Studio (with the right IDE look & feel).Part of Visual Studio (with the right IDE look & feel).l lPrompt tool.Prompt tool.l lGra

38、phical grammar tool.Graphical grammar tool.Graphical, easy to use.Graphical, easy to use.l lDialog editorDialog editorDrag & Drop easy to use; no need to write SALT.Drag & Drop easy to use; no need to write SALT.“invisible to user”.“invisible to user”.Leveraged authoring for MM and telephonyLeverage

39、d authoring for MM and telephonyCan be “customized” or “extended” by industry partners.Can be “customized” or “extended” by industry partners.l lDebuggerDebuggerCan run locallyCan run locallyWorks for multmodal or speech-only.Works for multmodal or speech-only.Works for text input.Works for text inp

40、ut.Can debug just grammar tool or just dialog tool.Can debug just grammar tool or just dialog tool.Demonstration.NET Speech Platform & SDKTalk Outlinel lFour trends in the “Digital Decade”.l lNatural UI (speech & language UI) Vision.l lFour trends accelerate natural UI.l lA technology roadmap to nat

41、ural UI.Advances in Speech Technologiesl lAccuracy, scalability.l l“Natural” speech.l lUnrestricted vocabulary.l lNoise robustness.l lDialog UI.l lMultimodal UI.Natural UI Vision: Reiteratedl lSmart SearchFind the Bill Gates book on futureFind the Bill Gates book on futurel lSmart HelpHow do I repla

42、ce my printer cartridge?How do I replace my printer cartridge?l lQuestion AnsweringWhen is the Britney Spears concert?When is the Britney Spears concert?l lCommands / TasksSend flowers to mom on her birthdaySend flowers to mom on her birthdayl lPro-Active agentHold all calls unless its from my famil

43、y.Hold all calls unless its from my family.Central Technology : StructureFind email from Find email from John about the John about the BudgetBudgetSELECT DocumentNameSELECT DocumentNameFROM LocalFileSystem, LocalMailSystem, LocalDocumentsFROM LocalFileSystem, LocalMailSystem, LocalDocumentsWHERE Aut

44、hor contains “John” and Subject contains WHERE Author contains “John” and Subject contains LSP_Expand(“Budget”)LSP_Expand(“Budget”)ORDER BY ModifiedDate, CreateDateORDER BY ModifiedDate, CreateDateMy printer is stuckMy printer is stuckSELECT DocumentNameSELECT DocumentNameFROMLocalFileSystem, LocalH

45、ardware, HardwareVendorSupportFROMLocalFileSystem, LocalHardware, HardwareVendorSupportWHERE Vendor=HP and Model=DeskJet 550” and Body contains WHERE Vendor=HP and Model=DeskJet 550” and Body contains LSP_Expand(“jam”)LSP_Expand(“jam”)ORDERED BY Context, LearnedBehavior.ORDERED BY Context, LearnedBe

46、havior.Show me the new Show me the new Dell notebookDell notebookSELECT DocumentNameSELECT DocumentNameFROM WebSearch, UserHeuristicsFROM WebSearch, UserHeuristicsWHERE Vendor=Dell and Category=laptops and Age60WHERE Vendor=Dell and Category=laptops and Age60Make the letters Make the letters on my s

47、creen on my screen biggerbiggerSELECT TaskNamesSELECT TaskNamesFROM LocalTasks, OSTasks, USerHeuristicsFROM LocalTasks, OSTasks, USerHeuristicsWHERE Name contains Video ResolutionWHERE Name contains Video ResolutionORDER by Context, LearnedBehaviorORDER by Context, LearnedBehaviorPlay BritneyPlay Br

48、itneySELECT MediaNameSELECT MediaNameFROM LocalFileSystem, WebMediaProvidersFROM LocalFileSystem, WebMediaProvidersWHERE Name contains Britney OR Name contains SpearsWHERE Name contains Britney OR Name contains SpearsORDER BY CurrentDirectoryBias, LearnedBehavior, Name, ContextORDER BY CurrentDirect

49、oryBias, LearnedBehavior, Name, ContextOther Technology Advancesl lTask-oriented user interfaceHelp Help limited tasks limited tasks d delegation elegation pro-active agent. pro-active agent.l l“Intelligence”Syntax, domain-dependent & independent semantics.Syntax, domain-dependent & independent sema

50、ntics.Context, history, inference, planning, customization.Context, history, inference, planning, customization.l lRe-usable knowledge representation.Personal data Personal data “Dialog modules” “Dialog modules” Data sharing? Data sharing?l lMultiple applicationsRegistration, brokering/federation, a

51、ggregation.Registration, brokering/federation, aggregation.l lThese advances possible without speech:Smart help, search, alerts, services, applications, OS.Smart help, search, alerts, services, applications, OS.The Natural UI Technology RoadmapPhonePhonePCPCScreenScreenPhonePhonePDAPDATabletTabletPC

52、PCCarCarInternetInternetTVTVEase Ease of GUIof GUI(screen/(screen/Pointer)Pointer)HighHighHighHighLowLowNoise robustnessDialog UIDelegation UIUnrestrictedvocabularyMultimodal UIStructure contentTasks“Intelligence”Knowledge rep.Multiple appsConclusionl lSpeech-enabled IVR is only the tip of the icebe

53、rg!Speech-enabled IVR is only the tip of the iceberg!l lUbiquitous telephony + device speech UI through:Ubiquitous telephony + device speech UI through:Improvement in recognition accuracy (Moores Law).Improvement in recognition accuracy (Moores Law).Proliferation of multimodal UI (speech, pen, dicta

54、tion).Proliferation of multimodal UI (speech, pen, dictation).Distributed computing platform (with great tools).Distributed computing platform (with great tools).l lPC/web infrastructure will develop organically:PC/web infrastructure will develop organically:Structured content, tasks, “intelligence”

55、Structured content, tasks, “intelligence”Re-usable knowledge, reasoning across applications.Re-usable knowledge, reasoning across applications.l lTogether, this paves the way for natural UI.Together, this paves the way for natural UI.Microsoft Vision eventually delivering software that hears what you say, knows what you mean, and does what you wantEmpower people through great software, any time, any place, and on any device. naturally 2001 Microsoft Corporation. All rights reserved. 2001 Microsoft Corporation. All rights reserved.

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 办公文档 > 教学/培训

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号