神经网络学习控制NeuralNetworkbasedLearningControl

资源描述

《神经网络学习控制NeuralNetworkbasedLearningControl》由会员分享，可在线阅读，更多相关《神经网络学习控制NeuralNetworkbasedLearningControl（19页珍藏版）》请在金锄头文库上搜索。

1、神经网络学习控制NeuralNetworkbasedLearningControlStillwatersrundeep.流静水深流静水深,人静心深人静心深Wherethereislife,thereishope。有生命必有希望。有生命必有希望7.1 Reinforcement Learning(再励学习，自强式学习）神经网络学习方法有三类：a)监督学习 Supervised Learning 例如BP 有明确的“教师”信号b)无监督学习 Unsupervised Learning 没有任何“教师”信号只是通过输入数据的内部信息相当自组织？类方法。例如c)再励学习Reinforcement

2、 Learning 源于心理学简单的说，一个人有笔钱，有几个投资选择A.B.C.他投B，赚钱了，再投B 。until B不赚钱了，或有突发事件使他觉得A更好，于是将钱投到A。由Barto 等人提出的Reinforcement Learning 可称为ASE/ACE模型，即由 ASE：Associative Search Element ：关联搜索单元 ACE：Adaptive Critic Element：自适应评判单元构成。 ASE的作用是确定控制信号y ACE则对再励信号r进行改善，得到ASE和ACE各有n路输入通道，由系统状态S解码而成（这与cmac 相同），且每一时刻只选一个，即控制

3、信号的确定和各通道权值的修正如下：其中，和分别为ASE和ACE各通道的权值；是经改善的再励信号，、和有关系数, noise为随机噪声。DECODERCartPole systemV1,.v2 vnW1,w2wnCart-Pole 的数学模型Failure 的条件显然，各单元的输出几乎完全取决于被选通道的权值，ASE略受噪声的影响。各权值的学习几乎独立，只有那些曾经被选中的通道才会得到修正，其他则不变。这样，一旦碰到完全新的情况，则可能输出一个完全错误的控制信号,导致FAILTwo approaches to Neural Network based Learning Controln7

4、.2 Direct Inverse Modellingn7.3 Learning Control with a Distal Teacher (Distal Learning)nThe control problemLearnerEnvironmentintentionactionoutcomeInverseModelEnvironmenty*xn-1un-1yn-11. The Direct Inverse Modeling approach to learning an inverse modelEnvironmentInverse Modelxn-1ynun-1+-2. The dist

5、al learning approach to learning an inverse modelEnvironmentForward Modelxn-1ynun-1+-2.1 Learning the forward model using the prediction error yn-ynyn2.2 Learning the inverse model via forward model using the performance error y*n-ynInverseModely*n-1xn-1un-1ynforwardModely*n-ynnThe control systems1.

6、 The direst inverse modeling approach EnvironmentInverse Modelynun-1+-y*n1.2 Eg. Learning control of CSTR using CMACCMAC memoryCMACtrainingCMAC responseCSTRP controllerextreme controllercontrol SwitchreferenceCoordinatorSdepedudupueucSonThe CSTR system (continuous-stirred tank reactor)And this maybe

7、 transformed to the dimensionless form as:nWhere, x1 is the conversion rate relating to the reaction concentration; x2 is the reaction temperature in the dimensionless form; Uf and Uc are control variables corresponding to the input flow rate F and coolant temperature Tc, respectively. are system pa

8、rameters. Temperature controlfeedproductjacketnCMAC based learning control approacha)Current outcome state So(x1,x2,dx1), current setting x1e(k), next setting x1ek+1, where, dx1k=x1k x1k-1b)Let ed= x1ek+1 x1k-1, ep=x1ek- x1k , where, ed= difference between next setting and current output, ep=current

9、 deviation between desired and actual outputc)IF |ed| threshold, THEN take the extreme control, i.e., IF ed threshold, THEN Uc = Umax IF ed - threshold, THEN Uc = Umin OTHERWISE take the learning control Uc= Up + Ud Up= ep * Kp, Ud= CMAC responsenCMAC training So ( x1k+1, x2k+1, dx1k+1 ) as the inpu

10、t to the CMAC Uck as the “teacher signal” for the training Consider that So is the result caused by Uck, therefore, if the input to CMAC is So, the corresponding output should be UcknThis is the end of one control-learning cycle, and successive cycles are just the same.The Distal Learning Control ApproachNN1PextremeControl switchNN2CSTRcoordinatorreference

展开阅读全文

神经网络学习控制NeuralNetworkbasedLearningControl

最新文档