外文翻译--Iterative Linear Least Squares Method of Parameter Estimation for Linear-Fractional Models of Molecular Biological Systems

资源描述

《外文翻译--Iterative Linear Least Squares Method of Parameter Estimation for Linear-Fractional Models of Molecular Biological Systems》由会员分享，可在线阅读，更多相关《外文翻译--Iterative Linear Least Squares Method of Parameter Estimation for Linear-Fractional Models of Molecular Biological Systems（4页珍藏版）》请在金锄头文库上搜索。

1、Iterative Linear Least Squares Method of Parameter Estimation for Linear-Fractional Models of Molecular Biological Systems Li-Ping Tian1, Lei Mu2, and Fang-Xiang Wu2,3* 1School of Information, Beijing Wuzi University, No.1 Fuhe Street, Tongzhou District, Beijing, P.R. China 2Department of Mechanical

2、 Engineering, 3Division of Biomedical Engineering, University of Saskatchewan, 57 Campus Dr., Saskatoon, SK S7N 5A9, CANADA *Corresponding author: faw341mail.usask.ca Abstract: Based on statistical thermodynamics principle or Michaelis-Menten kinetics equation, the models for biological systems cont

3、ain linear fractional functions as reaction rates which are nonlinear in both parameters and states. Generally it is challenging to estimate parameters nonlinear in a model although there have been many traditional nonlinear parameter estimation methods such as Gauss-Newton iteration method and its

4、variants. However, in a linear fractional model both the denominator and numerator are linear in the parameters. Based on this observation, we develop an iterative linear least squares method for estimating parameters in biological system modeled by linear fractional function. The basic idea is to t

5、ransfer optimizing a nonlinear least squares objective function into iteratively solving a sequence of linear least squares problems. The developed method is applied to a linear fractional function and an auto-regulatory gene network. The simulation results show the superior performance of the propo

6、sed method over some existing algorithms. Keywords: Linear fractional model (LFM), models nonlinear in parameters, parameter estimation, iterative linear least squares algorithm I. INTRODUCTION Genomic-alerted diseases (e.g., obesity, cancer, HIV, H1N1) stem from the dysfunction of molecular biologi

7、cal systems, not only their isolated components (e.g., genes, proteins). With advances in high throughput measurement techniques such as microarray, ChIP-chip, and mass spectrometry, large-scale biological data have been and will continuously be produced. Such data contain insightful information for

8、 understanding the mechanism of molecular biological systems and have proved useful in diagnosis, treatment, and drug design for genomic-alerted diseases. The insightful information can be extracted with methods of system modeling and simulation 1. Most, if not all, molecular biological systems are

9、nonlinear in both parameters and system state variables. Estimation of parameters in these models is a nonlinear estimation problem. Based on statistical thermodynamics 2, 3 or Michaelis-Menten kinetics 4, 5, molecular biological systems can be modeled by linear fractional function as their nonlinea

10、r terms. The linear fractional model (LFM) is a rational function whose numerator and denominator are linear in parameters. Parameters in the linear fractional model are typically reaction constants of interest. Estimation of these parameters is crucial to construct a whole molecular biological syst

11、em 6-8. In general, all algorithms for nonlinear parameter estimation can be used to estimate parameters in the linear fractional functions (models), for example, Gauss-Newton iteration method, and its variants such as Box-Kanemasu interpolation method, Levenberg Damped least squares methods, and Ma

12、rquardts method 9, 10. However, these iteration methods are initial-sensitive. Another main shortcoming is that these methods may converge to the local minimum of the least squares cost function, and thus cannot find the real values of parameters. The general form of an LFM follows as: jDiNDpjjNpiiD

13、DNN)()()()(),(1010XXXXX=+= (1) where the vector X consists of the independent observation variables, the pdimensional vector consists of all parameters in the linear fractional function, which can naturally be divided into two groups: those in the numerator, Ni(Npi, 1 ?=), and those in the denominat

14、or Dj (Dpj, 1 ?=), where we have that pppDD=+. The coefficient functions )(XiN (Npi, 1 , 0?=) and )(XjD (Dpj, 1 , 0?=) are the known functions of the independent variables and do not contain any unknown parameters. Either )(0XN or )(0XD must be nonzero, and otherwise from sensitivity analysis 9, 10

15、the parameters cannot be uniquely identified. Recently, several methods have been developed for estimating parameters in linear fractional models 6,7,11,12. From the LFM (1), we have observed that both the denominator and numerator are linear in the parameters. Based on this observation, we develop

16、an iterative linear least squares method for estimating parameters in biological system modeled by linear fractional function. The basic idea is to transfer optimizing a nonlinear least squares objective function into iteratively solving a sequence of linear least squares problems. Briefly, the rema

17、inder of the paper is organized as follows. Section II describes an iterative least squares algorithm for estimating parameters in the LFM. Section III provides two illustrative examples to demonstrate the 978-1-4244-4713-8/10/$25.00 2010 IEEEeffectiveness of the proposed algorithm. Finally we give

18、conclusions in Section IV. II. ALGORITHM DESCRIPTION Suppose that in a series of experiments we obtain a sequence of measurements (observations) of dependent variables: ty (?, 2 , 1=t), which can be represented by LFM (1) of independent variables and parameters. In practice, any measurements can be

19、contaminated by some random noises. For simplicity, we assume that measurement errors are additive. Thus we have the relationships tDtpjjtNtpiittttjDiNDDNNy+=+=)()()()(),(1010XXXXX (2) where t(nt, 2 , 1?=) stand for the measurement errors in experiment t, and tX (nt, 2 , 1?=) stand for the measured

20、or known values of independent variables in experiment t. Assume that independent variables X and parameters are non-random variables. Further, without loss of generality, assume that the measurement errors t(nt, 2 , 1?=) have the mean of zeros. In the sequel, we will use the following notation to d

21、escribe the algorithm. NNpTNpNNNR=,21?, DDpTDpDDDR=,21?. TTDTN = NNptptttNRNNN= )(,),(),()(21XXXX?, DDptptttDRDDD= )(,),(),()(21XXXX?. and ),()(Xtt= nTRnyyy= )(,),2(),1 (?Y, nTnR= )(,),(),()(21?, NpnnNNNNR=)()()(21XXX?, nnNRNNN=)()()(020100XXX?, DpnnDDDDR=)()()(21XXX?,nnDRDDD=)()()(020100XXX?, nnDnD

22、nDDDDDRDDDdiag+=XXXXXX)()()()()()()(0220110? . From the above definitions, we write Equation (2) in vector-matrix format as follows: tDtDtNtNttDNy+=XXXX)()()()(00 (3) To estimate parameters in LFM (1), form a sum of squared errors (the cost function) 2002)()()()()()(),()(+=DtDtNtNttDNDNyttJJXXXXY (4

23、) Minimizing )(Jcan give the nonlinear least squares estimation of parameters N and D . As the parameters N and D are nonlinear in the LFM, the Newton-Gauss iteration method and its variants 7 can typically be applied to estimation of these parameters by minimizing the cost function (4). However, it

24、 is well known that the Newton-Gauss method may fall into a local minimum and thus cannot find the estimates of the parameters. We rewrite the objective function (4) as follows +=+=202002002)()()()()()()()()()()()(),()(DtDtNtNtDtDtttDtDtNtNttDNDyNyDDNyttJJXXXXXXXXXXY (5) Or in the vector-matrix form

25、at as follows )()(2AbAb=DTJ (6) where nNDRYdiag=00b, pnDNRYdiag=A From objective function (5), we have observed that given the value of parameters D in the denominator, we can estimate the parameters TTDTN, = in the numerator by linear least squares methods as bAAA)()(212DTDT= Therefore, we propose

26、the following iterative least squares method. Step 1. Choose the initial guess 0D Step 2. Iteratively solve the linear least squares problem )()(1211+=kkDTkkJAbAb (7) which gives the solution bAAA)()(2121kDTkDTk+= (8) until the stopping criterion is met, where =kDkNk. From equation (7), if the seque

27、nce 1, 2, is converged to *, the objective function (6) reaches its minimum value at *. In this paper the stopping criteria is set as, +1) 1() 1()(iii (9) where TTDTNiii )()()(= and is a preset small positive number for example 510. III. Illustrative examples The expression of a gene is regulated by

28、 regulatory proteins and/or RNA polymerase (RNAP) which are binding to genes regulatory binding site 2. The regulatory binding site of a gene is a short piece of DNA sequence close to it. One gene can have a number of binding sites. The binding sites for regulatory proteins are called operators whil

29、e those for RNAP are called promoters. A gene regulatory network is a collection of genes that regulate one anothers expression rates through their encoded proteins which serve as regulatory proteins. To illustrate the proposed algorithm, this section will consider the parameter identification of on

30、e simple gene regulatory network with one gene, two operators and one promoter as shown in Figure 1. Figure 1. A gene regulatory network with one gene, two operators (Op1 and Op2) and one promoter (Pr). Based on the statistical thermodynamic theory and biochemical kinetics, the model of this network

31、 can be expressed as follows 2: xxbxbxaxaax+=22122101? (10) where x is the concentration of protein encoded by the gene, ai (i = 0,1,2) and bi (i = 1,2) are positive constants related to the biochemical kinetics and is a positive constant representing the protein degradation rate. Model (10) has pos

32、itive parameters , ai and bi. Note that model (10) is slightly different from the one in reference 1. To uniquely identify parameters in model (10), we have rescaled the parameters such that the constant term in the denominator is 1. In system (10), the right-handed side is a linear fractional funct

33、ion plus a function linear in one parameter, which is not the same format as in (1). For the purpose of parameter estimation by the proposed method, we transform system (10) into the following model 11 22132212101)()(xbxbxbxbaxaax+=? (11) The numerator is not linear in 6 original parameters any more

34、 in model (11). However, if we view a single coefficient as a new parameter, the numerator in model (11) is linear in the new parameters. In addition, as there are 6 parameters in 6 coefficients in model (11), using our method the 6 original parameters can be uniquely identified. In this study, a gr

35、oup of artificial data is generated from the model of gene regulatory system (10), with nominal parameter values and initial states provided. The nominal values of parameters are set as: a0=0.4, a1=2.8, a2=0.24, b1=0.5, b2=1.4, =0.4. In this example, we use the nominal values to generate the traject

36、ory of x(t) shown as in Figure 2, The time starts at t= 0s. From Figure 2, system (11) is stable at its steady state x* = 2.18 after 5s. Therefore, we dont use the simulated data after 5 seconds. 01234567891000.511.522.5time (s)Protein concentration Figure 2. Trajectory of system (10) There is no no

37、ise added on the artificial data in the simulation, so they can be considered as noise-free measurements. Nevertheless, unreasonable noises can be introduced in numerically calculating the derivatives by finite difference formulas. In general, the higher the sampling frequency and more data points a

38、re used, the more accurate the numerical derivatives are. On the other hand, in practice we may not obtain data with high frequency because of experimental limitations. In this study, the sampling frequency is 100Hz. In numerically calculating the concentration change rate (derivative )(tx?) at each

39、 time point from concentration x, we adopt the five-point central finite difference formula as follows. )()(8)(8)(121)(2112+=nnnnntxtxtxtxttx ? (12) The performance of the proposed method will be investigated in terms of relative estimation errors, the consumed CPU time and robustness and compared w

40、ith our newly developed method in 11. The relative estimation error (REE) is defined as: Op2 Op1 Pr GenProtein Protein RNAP valuetruevaluetrueestimateREE_= (13) Both the proposed method and the one in 11 are iterative-type, and need the initial values to start-up. In this study, initial values are c

41、hosen as true values plus a relative Gaussian noise, i.e. Initial value = true_values(1+) (14) where follows the standard normal distribution and is the standard deviation. These methods are implemented in Matlab version 7.01 in a laptop with MS Windows XP Professional Version 2002, IntelRCore(TM) 2

42、DuoCPU T940 2.53GHZ and 2.00 GB of RAM. The consumed CPU time will be measured on this computer. The programs implemented for each method will run 100 times with different initial values chosen by formula (14) with =2. The robustness is defined as the percentage that the method will converge. The re

43、sults are listed in Table 1. REE is for the minimum REE over 100 runs. The CPU time is average running time over 100 runs. Robustness is the percentage of runs converging with the minimum REE. Table 1. Comparison between proposed method and nonlinear optimization method CPU Time (s) REE Robustness P

44、roposed method 0.0150 0.0153 100% Method in 11 0.8897 0.2466 87% In Table 1, the REE for the method in 11 is calculated based on all converged runs. From Table 1, the proposed method has more accurate estimation than the method proposed in 11. Furthermore the proposed method uses much less CPU time

45、to converge than the method in 11. The proposed method converges in all 100 runs while the method in 11 converges in 87 out of 100 runs, which indicates that the proposed method is more robust (insensitive) to the initial values than the method in 11. In summary, the proposed method outperforms the

46、method in 11, and thus outperforms the traditional methods compared with the method in 11. IV. Conclusion In this paper, we have developed a new method for estimating parameters in the linear fractional models which are involved in molecular biological systems. The results from the illustrative exam

47、ple have shown that the proposed method outperforms both our newly developed method in 11 and the traditional nonlinear optimization methods. In this study, we do not consider the noises in the data except those introduced by numerical derivatives. One direction of future work is to investigate the

48、robustness of the proposed method to noises in the data. In addition, low sampling frequency is expected in practice, particularly for biological systems. Another direction of future work is to investigate the performance of the proposed method with low sampling frequency. In addition, although we h

49、ave shown that the proposed method converges at 100% from the illustrative example, we will theoretically study the convergence of the proposed method. ACKNOWLEDGMENTS This study was supported by Base Fund of Beijing Wuzi University and Fund for Beijing Excellent Team for Teaching Mathematics throug

50、h the first author and by Natural Science and Engineering Research Council of Canada (NSERC) through the second and third authors. REFERENCES 1 IC Chou and EO Voit: “Recent developments in parameter estimation and structure identification of biochemical and genomic systems,” Mathematical Biosciences

51、, 219(2): 57-83, 2009. 2 DM Wolf and FH Eeckman, “On the relationship between genomic regulatory element organization and gene regulatory dynamics,” Journal of Theoretical Biology, 195: 167-186, 1998. 3 M Fussenegger, JE Bailey, and J Varner, “A mathematical model of caspase function in apoptosis,”

52、Nature Biotechnology, 18: 768-774, 2000. 4 J Nielsen, J Villadsen, and G Liden, “Bioreaction Engineering Principles,” 2nd edition, New York: Kluwer Academic/Plenum Publishers, 2003. 5 GN Stephanopoulos, AA Aritidou, and J Nielsen, “Metabolic Engineering: Principles and Methodologies,” San Diego: Aca

53、demic Press, 1998. 6 KG Cadkar, R. Gunawan, FJ Doyle III, “Iterative approach to model identification of biological networks,” BMC Bioinformatics, Vol. 6:155, 2005. 7 KG Cadkar, J Varner, and FJ Doyle III, “Model identification of signal transduction networks from data using a state regulator proble

54、m,” Systems Biology, 2: 17-30, 2005. 8 FX Wu, L Mu, and RZ Luo, “Complexity analysis and optimal experimental design for parameter estimation of biological systems,” Proceedings of the 21st episode of the IEEE Canadian Conference on Electrical and Computer Engineering, pp: 393-397, 2008. 9 A van den

55、 Bos, “Parameter Estimation for Scientists and Engineers”, New Jersey: John Wiley & Sons, 2007. 10 JV Beck and KJ Arnold, “Parameter Estimation in Engineering and Science,” New York: John Wiley & Sons, 1977. 11 FX Wu and L Mu, “Parameter estimation in rational models of molecular biological systems,

56、 Proceedings of the 31st Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp: 3263-3266, 2009. 12 FX Wu, ZK Shi and L Mu, “Estimating parameters in the caspase activated apoptosis system”, International Journal of Biomedical Engineering and Technology, accepted in September, 2009.

展开阅读全文

外文翻译--Iterative Linear Least Squares Method of Parameter Estimation for Linear-Fractional Models of Molecular Biological Systems

最新文档