损失函数－金锄头文库

资源描述

《损失函数》由会员分享，可在线阅读，更多相关《损失函数（5页珍藏版）》请在金锄头文库上搜索。

1、DefinitionDefinitionFormally, we begin by considering some family of distributions for a random variable X, that is indexed by some .More intuitively, we can think of X as our “data“, perhaps , where i.i.d. The X is the set of things the decision rule will be making decisions on. There exists some n

2、umber of possible ways to model our data X, which our decision function can use to make decisions. For a finite number of models, we can thus think of as the index to this family of probability models. For an infinite family of models, it is a set of parameters to the family of distributions.On a mo

3、re practical note, it is important to understand that, while it is tempting to think of loss functions as necessarily parametric (since they seem to take as a “parameter“), the fact that is non-finite-dimensional is completely incompatible with this notion; for example, if the family of probability

4、functions is uncountably infinite, indexes an uncountably infinite space.From here, given a set A of possible actions, a decisiondecision rulerule is a function : A.A lossloss functionfunction is a real lower-bounded function L on A for some . The value L(, (X) is the cost of action (X) under parame

5、ter .1 editedit DecisionDecision rulesrulesA decision rule makes a choice using an optimality criterion. Some commonly used criteria are:MinimaxMinimax: Choose the decision rule with the lowest worst loss that is, minimize the worst-case (maximum possible) loss:InvarianceInvariance: Choose the optim

6、al decision rule which satisfies an invariance requirement.Choose the decision rule with the lowest average loss (i.e. minimize the expected value of the loss function): editedit ExpectedExpected losslossThe value of the loss function itself is a random quantity because it depends on the outcome of

7、a random variable X. Both frequentist and Bayesian statistical theory involve making a decision based on the expected value of the loss function: however this quantity is defined differently under the two paradigms. editedit FrequentistFrequentist riskriskMain article: risk functionThe expected loss

8、 in the frequentist context is obtained by taking the expected value with respect to the probability distribution, P, of the observed data, X. This is also referred to as the riskrisk functionfunction2 of the decision rule and the parameter . Here the decision rule depends on the outcome of X. The r

9、isk function is given by editedit BayesianBayesian expectedexpected losslossIn a Bayesian approach, the expectation is calculated using the posterior distribution * of the parameter :.One then should choose the action a* which minimises the expected loss. Although this will result in choosing the sa

10、me action as would be chosen using the Bayes risk, the emphasis of the Bayesian approach is that one is only interested in choosing the optimal action under the actual observed data, whereas choosing the actual Bayes optimal decision rule, which is a function of all possible observations, is a much

11、more difficult problem. editedit SelectingSelecting a a lossloss functionfunctionSound statistical practice requires selecting an estimator consistent with the actual loss experienced in the context of a particular applied problem. Thus, in the applied use of loss functions, selecting which statisti

12、cal method to use to model an applied problem depends on knowing the losses that will be experienced from being wrong under the problems particular circumstances, which results in the introduction of an element of teleology into problems of scientific decision-making.A common example involves estima

13、ting “location.“ Under typical statistical assumptions, the mean or average is the statistic for estimating location that minimizes the expected loss experienced under the Taguchi or squared-error loss function, while the median is the estimator that minimizes expected loss experienced under the abs

14、olute-difference loss function. Still different estimators would be optimal under other, less common circumstances.In economics, when an agent is risk neutral, the loss function is simply expressed in monetary terms, such as profit, income, or end- of-period wealth.But for risk averse (or risk-lovin

15、g) agents, loss is measured as the negative of a utility function, which represents satisfaction and is usually interpreted in ordinal terms rather than in cardinal (absolute) terms.Other measures of cost are possible, for example mortality or morbidity in the field of public health or safety engine

16、ering.For most optimization algorithms, it is desirable to have a loss function that is globally continuous and differentiable.Two very commonly-used loss functions are the squared loss, , and the absolute loss, . However the absolute loss has the disadvantage that it is not differentiable at . The squared loss has the disadvantage that it has the tendency to b

展开阅读全文