Part One
1.
No matter how careful one is with observations concerning the measurement of physical quantities, they are inevitably subject to errors of varying degrees. These errors, in most cases, are not simple but arise from several distinct sources that it is best to distinguish into two classes.
Some causes of errors depend, for each observation, on variable circumstances independent of the result obtained: the errors arising from these are called "irregular" or "random," and like the circumstances that produce them, their value is not amenable to calculation. Such are the errors that arise from the imperfection of our senses and all those due to irregular external causes, e.g. vibrations of the air that blur our vision. Some of the errors due to the inevitable imperfection of even the best instruments, e.g. the roughness of the inner part of a level, its lack of absolute rigidity, etc., belong to this same category.
On the other hand, there are other causes that produce an identical error in all observations of the same kind, or one whose magnitude depends only on circumstances that can be viewed as essentially connected to the observation. We will call errors of this category "constant" or "regular" errors.
Moreover, one can see that this distinction is to a certain extent relative, and has a broader or narrower sense depending on the meaning one attaches to the idea of observations of the same nature. E.g. if one indefinitely repeats the measurement of the same angle, the errors arising from imperfect division of the instrument belongs to the class of constant errors. If, on the other hand, one successively measures several different angles, the errors due to imperfect division will be considered random until a table of errors relative to each division has been formed.
2.
We exclude the consideration of regular errors from our discussion. It is up to the observer to carefully investigate the causes that can produce a constant error, to eliminate them if possible, or at least assess their effect in order to correct it for each observation, which will then give the same result as if the constant cause had not existed. It is quite different for irregular errors: by their nature, they resist any calculation, and they must be tolerated in observations. However, by skillfully combining results, their influence can be minimized as much as possible. The following investigation is devoted to this most important topic.
3.
Errors arising from a simple and determinate cause in observations of the same kind are confined within certain limits that could undoubtedly be assigned if the nature of this cause were perfectly known. In most cases, all errors between these extreme limits must be considered possible. A thorough knowledge of each cause would reveal whether all these errors have equal or unequal likelihood, and in the latter case, what the relative probability of each of them is. The same remark applies to the total error resulting from the combination of several simple errors. This error will also be confined between two limits, one being the sum of the upper limits, the other the sum of the lower limits corresponding to the simple errors. All errors between these limits will be possible, and each can result, in an infinite number of ways, from suitable values attributed to the partial errors. Nevertheless, it is possible to assign a larger or smaller likelihood for each result, from which the law of relative probability can be derived, provided that the laws of each of the simple errors are assumed to be known, and ignoring the analytical difficulties involved in collecting all of the combinations.
Of course, certain sources of error produce errors that cannot vary according to a continuous law, but are instead capable of a finite number of values, such as errors arising from the imperfect division of instruments (if indeed one wants to classify them among random errors), because the number of divisions in a given instrument is essentially finite. Nevertheless, if it is assumed that not all sources of error are of this type, then it is clear that the complex of all possible total errors will form a series subject to the law of continuity, or, at least, several distinct series, if it so happens that, upon arranging all possible values of the discontinuous errors in order of magnitude, the difference between a pair of consecutive terms is greater than the difference between the extreme limits of the errors subject to the law of continuity. In practice, such a case will almost never occur, unless the the instrument is subject to gross defects.
4.
Let
denote the relative likelihood of an error
this means, due to the continuity of the errors, that
is the probability that the error lies between the limits
and
In practice it is hardly possible, or perhaps impossible, to assign a form to the function
a priori. Nevertheless, several general characteristics that it must necessarily present can be established:
is obviously a discontinuous function; it vanishes for all values of
not between the extreme errors. For any value between these limits, the function is positive (excluding the case indicated at the end of the previous article); in most cases, errors of opposite signs will be equally possible, and thus we will have:
Finally, since small errors are more easily made than large ones,
will generally have a maximum when
and will continually decrease as
increases.
In general, the integral
expresses the probability that the unknown error falls between the limits
and
It follows that the value of this integral taken between the extreme limits of the possible errors will always be
And since
is zero for values not between these limits, it is clear that in all cases
the value of the integral
will always be
5.
Let us consider the integral
and denote its value by
If the sources of error are such that there is no reason for two equal errors of opposite signs to have unequal likelihood, we will have
and consequently,
We conclude that if
does not vanish and has e.g. a positive value, then there necessarily exists an error source that produces only positive errors or, at least, produces them more easily than negative errors. This quantity
which is the average of all possible errors, or the average value of
can conveniently be referred to as the "constant part of the error". Moreover, it is easily proven that the constant part of the total error is the sum of the constant parts of the simple errors of which it is composed.
If the quantity
is assumed to be known and subtracted from the result of each observation, then, denoting the error of the corrected observation by
and the corresponding probability by
we will have
and consequently,
i.e. the errors of the corrected observations will have no constant part, which is clear in and of itself.
6.
The value of the integral
which is the average value of
reveals the presence or absence of a constant error, as well as the value of this error. Similarly, the integral
which is the average value of
seems very suitable for defining and measuring, in a general manner, the uncertainty of a system of observations. Therefore, between two systems of observations of unequal precision, the one giving a smaller value to the integral
should be considered preferable. If it is argued that this convention is arbitrary and seemingly unnecessary, then we readily agree. The question at hand is inherently vague and can only be delimited by a somewhat arbitrary principle. Determining a quantity through observation can be likened, somewhat accurately, to a game in which there is a loss to be feared and no gain to be expected; each error being likened to a loss incurred, the relative apprehension about such a game should be expressed by the probable loss, i.e., by the sum of the products of the various possible losses by their respective probabilities. But what loss should be likened to a specific error? This is not clear in itself; its determination depends partly on our whim. It is evident, first of all, that the loss should not be regarded as proportional to the error committed; for, in this hypothesis, a positive error representing a loss, the negative error should be regarded as a gain: on the contrary, the magnitude of the loss should be evaluated by a function of the error whose value is always positive. Among the infinite number of functions that fulfill this condition, it seems natural to choose the simplest one, which is undoubtedly the square of the error, and thus we are led to the principle proposed above.
Laplace considered the question in a similar manner, but adopted as a measure of loss the error itself, taken positively. This assumption, if we do not deceive ourselves, is no less arbitrary than ours: should we, indeed, consider a double error as more or less regrettable than a simple error repeated twice, and should we, consequently, assign it a double or more than double importance? This is a question that is not clear, and on which mathematical arguments have no bearing; each must resolve it according to their preference. Nevertheless, it cannot be denied that Laplace's assumption deviates from the law of continuity and is therefore less suitable for analytical study; ours, on the other hand, is recommended by the generality and simplicity of its consequences.
7.
Let us define
we will call
the "mean error to be feared" or simply the "mean error" of the observation whose indefinite errors
have a relative probability of
We do not limit this designation to the immediate result of the observations, but rather extend it to any quantity that can be derived from them in any way. It is important not to confuse this mean error with the arithmetic mean of the errors, which is discussed in art. 5.
When comparing several systems of observations or several quantities resulting from observations that are not given the same precision, we will consider their relative "weight" to be inversely proportional to
and their "precision" to be inversely proportional to
In order to represent the weights by numbers, we should take, as the unit, the weight of a certain arbitrarily chosen system of observations.
8.
If the errors of the observations have a constant part, subtracting it from each obtained result reduces the mean error, increases the weight and precision. Retaining the notation of art. 5, and letting
denote the mean error of the corrected observations, we have
If, instead of the constant part
another number
were subtracted from each observation, the square of the mean error would become
9.
Let
be a determined coefficient and let
the value of the integral
Then
will be the probability that the error of a certain observation is less than
in absolute value, and
will be the probability that this error exceeds
If, for
has the value
it will be equally likely for the error to be smaller or larger than
thus,
can be called the probable error. The relationship between
and
depends on the nature of the function
which is unknown in most cases. However,i t is interesting to study this relationship in some particular cases.
I. If the extreme limits of the possible errors are
and
and if, between these limits, all errors are equally probable, the function
will be constant between these same limits, and, consequently, equal to
Hence, we have
and
so long as
is less than or equal to
finally
and the probability that the error does not exceed the mean error is
II. If as before
and
are the limits of possible errors, and if we assume that the probability of these errors decreases from the error
onwards like the terms of an arithmetic progression, then we will have

for values of

between

and

for values of

between

and

From this, we deduce that
and
as long as
is between 0 and
as long as
is between 0 and 1; and finally,
In this case, the probability that the error remains below the mean error is
III. If we assume the function
to be proportional to
then it must be equal to
where
denotes the semiperimeter of a circle of radius
from which we deduce
(see Disquisitiones generales circa seriem infinitam, art. 28). If we let
denote the value of the integral
then we have
The following table gives some values of this quantity:
10.
Although the relationship between
and
depends on the nature of the function
some general results can be established that apply to all cases where this function does not increase with the absolute value of the variable
then we have the following theorems:
will not exceed
whenever
is less than 
will not exceed
whenever
exceeds 
When
the two limits coincide and
cannot exceed
To prove this remarkable theorem, let
be the value of the integral
Then
will be the probability that an error is between
and
Let us set
then we have
and
and by hypothesis
is always increasing between
and
or at least is not decreasing, or equivalently
is always positive, or at least not negative. Now we have
thus,
Therefore,
always has positive value, or at least this expression is never negative, and therefore
will always be positive and less than unity. Let
be the value of this difference for
since
we have
or
This being prepared, let's consider the function
which we set
and also
Then it is clear that
Since
is continually increasing with
(or at least does not decrease, which should always be understood), and at the same time
is constant, the difference
will be positive for all values of
greater than
and negative for all values of
smaller than
It follows that the difference
is always positive, and consequently,
will certainly be greater than
in absolute value, as long as the function
is positive, i.e. between
and
The value of the integral
will therefore be less than that of the integral
and a fortiori less than
i.e., less than
Now the value of the first of these integrals is found to be
;
and therefore
is less than
with
being a number between
and
If we consider
as a variable, then this fraction, whose differential is
will be continually decreasing as
increases from
to
so long as
is less than
and therefore its maximum value will be found when
and will be
so that in this case, the coefficient
will certainly be less, or at least not greater than
Q.E.P. On the other hand, when
is greater than
the maximum value of the function will be found when
i.e. for
and this maximum value will be
so in this case, the coefficient
will not be greater than
Q.E.S.
Thus e.g. for
it is certain that
will not exceed
which means that the probable error cannot exceed
to which it was found to be equal in the first example in art. 9. Furthermore, it is easily concluded from our theorem that
is not less than
when
is less than
and on the other hand, it is not less than
when
is greater than
11.
Since several of the problems discussed below involve the integral
it will be worthwhile for us to evaluate it in some special cases. Let us denote the value of the integral
by
I. When
for values of
between
and
we have
II. In the second case of art. 9, with
still between
and
we have
III. In the third case, where
we find, as explained in the commentary cited above, that
It can also be demonstrated, with only the assumptions of the previous article, that the ratio
is never less than
12.
Let
etc. denote the errors made in observations of the same kind, and suppose that these errors are independent of each other. Let
be the relative probability of error
and let
be a rational function of variables
etc. Then the multiple integral
(I)

extended to all values of the variables
etc. for which the value of
falls between the given limits
and
represents the probability that the value of
is between
and
This integral is evidently a function of
whose differential we set
so that the integral in question is equal to
and therefore,
represents the relative probability of an arbitrary value of
Since
can be regarded as a function of the variables
etc., which we set
the integral (I) will be
where
takes values between
and
and the other variables take all values for which
is real. Hence we have
the integration, where
is to be regarded as a constant, being extended to all values of the variables
etc. for which
takes a real value.
13.
The previous integration would require knowledge of the function
which is unknown in most cases. Even if this function were known, the calculation would often exceed the capabilities of analysis. Therefore, it will be impossible to obtain the probability of each value of
but it is different if one asks only for the average value of
which will be given by the integral
extended to all possible values of
And since it is evident that for all values which
cannot attain, either due to the nature of the function (e.g. for negative values, if
etc.), or because of the limits imposed on
etc., one can assume that
it is clear that the integration can be extended to all real values of
from
to
But the integral
taken between determinate limits
and
is equal to the integral
,
taken from
to
and extended to all values of the variables
etc. for which
is real. This integral is therefore equal to the integral
in which
is expressed as a function of
etc., and the integration is extended to all values of the variables that leave
between
and
Thus, the integral}}
can be obtained from the integral
where the integration is extended to all real values of
that is, from
to
to
etc.
If the function
reduces to a sum of terms of the form
then the value of the integral
extended to all values of
or equivalently the average value of
will be equal to a sum of terms of the form
that is, the average value of
is equal to a sum of terms derived from those that make up
by replacing
etc. with their average values. The proof of this important theorem could easily be derived from other considerations.
15.
Let us apply the theorem of the previous article to the case where
and
denotes the number of terms in the numerator.
We immediately find that the average value of
is equal to
the letter
having the same meaning as above. The true value of
may be lower or higher than its average, just as the true value of
may, in each case, be lower or higher than
but the probability that by chance, the value of
differs by a small amount from
will approach certainty as
becomes larger. In order to clarify this, since it is not possible to determine this probability exactly, let us investigate the mean error to be feared when
It is clear from the principles of art. 6 that this error will be the square root of the average value of the function
To find it, it suffices to observe that the average value of a term such as
is equal to
(
having the same meaning
as in art. 11), and that the average value of a term such as
is equal to
therefore, the average value of this function will be
Since this last formula contains the quantity
if we only want to get an idea of the precision of this determination, it will suffice to adopt a certain hypothesis about the function
E.g. if we take the third assumption of arts. 9 and 11, this error will be equal to
Alternatively, we can obtain an approximate value of
by means of the errors themselves, using the formula
In general, it can be stated that a precision twice as great in this determination will require a quadruple number of errors, meaning that the weight of the determination is proportional to the number
Similarly, if the errors of the observations contain a constant part, we will deduce from their arithmetic mean a value of the constant part, and this value will be approached as the number of errors increases. In this determination, the mean error to be feared will be represented by
where
denotes the constant part, and
denotes the mean error of the observations uncorrected for their constant error. It will be simply represented by
if
represents the mean error of the observations corrected for the constant part (see art 8).
16.
In the arts. 12-15, we assumed that the errors
etc. belonged to the same type of observation, so that the probability of each of these errors was represented by the same function. However, it is clear that the general principles outlined in arts. 12-14 can be applied with equal ease in the more general case where the probabilities of the errors
etc., are represented by different functions
etc., i.e. when these errors belong to observations of varying precision or uncertainty. Let
denote the error of an observation with a mean error to be feared of
and let
etc. denote the errors of other observations with mean errors to be feared of
etc. Then the average value of the sum
etc. will be
etc. Now, if it is also known that the quantities
etc. are respectively proportional to the numbers
etc., then the average value of the expression
will be
However, if we adopt for
the value that this expression will take, by substituting the errors
etc., as chance offers them, then the mean error affecting this determination will become, just as in the preceding article,
where
etc., have the same meaning with respect to the second and third observation, as
does with respect to the first; and if we can assume the numbers
etc.,
proportional to
etc., this mean error to be feared will be equal to
;
But this method of determining an approximate value for
is not the most advantageous. Consider the more general expression
whose average value will also be
regardless of the coefficients
etc. The mean error to be feared when substituting the value
for a value of
as determined by the likelihoods of
etc., will, according to the principles above, be given by the formula
To minimize this error, we must set
These values cannot be evaluated until the exact ratios
etc. are known. In the absence of exact knowledge[1], it is safest to assume them equal to each other (see art. 11), in which case
i.e. the coefficients
etc., should be assumed equal to the relative weights of the various observations, taking the weight of the one corresponding to the error
as the unit. With this assumption, let
denote, as above, the number of proposed errors. Then the average value of the expression
will be
and when we take, for the true value of
the randomly determined value of this expression, the mean error to be feared will be
and, finally, if we are allowed to assume that the quantities
etc., are proportional to
etc., this expression reduces to
which is identical to what we found in the case where all observations were of the same type.
17.
When the value of a quantity, which depends on an unknown magnitude, is determined by an observation whose precision is not absolute, the result of this observation may provide an erroneous value for the unknown, but there is no room for discretion in this determination. But if several functions of the same unknown have been found by imperfect observations,
we can obtain the value of the unknown either by any one of these observations, or by a combination of several observations, which can be carried out in infinitely many ways. The result will be subject, in all cases, to a possible error, and depending on the combination chosen, the mean error to be feared may be greater or smaller. The same applies if several observed quantities depend on multiple unknowns. Depending on whether the number of observations equals the number of unknowns, or is smaller or larger than this number, the problem will be determined, undetermined, or more than determined (at least in general), and in this third case, the observations can be combined in infinitely many ways to provide values for the unknowns. Among these combinations, the most advantageous ones must be chosen, i.e., those that provide values for which the mean error to be feared is as small as possible. This problem is certainly the most important one presented by the application of mathematics to natural philosophy.
In Theoria motus corporum coelestium we have shown how to find the most probable values of unknowns when the probability law of the observational errors is known, and since, in almost all cases, this law remains hypothetical by its nature, we have applied this theory to the highly plausible hypothesis that the probability of error
is proportional to
Hence this method that I have followed, especially in astronomical calculations, and which most calculators now use under the name of Method of Least Squares.
Laplace later considered the question from another point of view, and showed that this principle is preferable to all others, regardless of the probability law of the errors, provided that the number of observations is very large. But when this number is limited, the question remains open; so that, if we reject our hypothetical law, the method of least squares would be preferable to others, for the sole reason that it leads to simpler calculations.
We therefore hope to please geometers by demonstrating in this Memoir that the method of least squares provides the most advantageous combination of observations, not only approximately, but also absolutely, regardless of the probability law of errors and regardless of the number of observations, provided that we adopt for the mean error, not Laplace's definition, but the one which we have given in arts. 5 and 6.
It is necessary to warn here that in the following investigations, only random errors reduced by their constant part will be considered. It is up to the observer to carefully eliminate the causes of constant errors. We reserve for another occasion the examination of the case where observations are affected by an unknown constant error, and we will address this issue in another Memoir.
18.
Problem. Let
be a given function of the unknowns
etc.; we ask for the mean error
to be feared in determining the value of
when, instead of the true values of
etc., we take the values derived from independent observations;
etc., being the mean errors corresponding to these various observations.
Solution. Let
etc. denote the errors of the observed values
etc.; the resulting error for the value of the function
can be expressed by the linear function
where
etc., represent the derivatives
etc., when
etc., are replaced by their true values.
This value of
is evident if we assume the observations to be accurate enough so that the squares and products of the errors are negligible. It follows that the average value of
is zero, since we assume that the errors of the observations have no constant part. Now the mean error
to be feared in the value of
will be the square root of the average value of
or equivalently
will be the average value of the sum
but the average value of
is
that of
is
, etc., and finally the average values of the products
are all zero. Hence we find that
It is good to add several remarks to this solution.
I. Since we neglect powers of errors higher than the first, we can, in our formula, take for
etc., the values of the differential coefficients
etc., derived from the observed values
etc. Whenever
is a linear function, this substitution is rigorously exact.
II. If instead of mean errors, one prefers to introduce weights
etc. for the respective observations, with the unit being arbitrary, and
being the weight of the value of
Then we will have
III. Let
be another function of
etc., and let
The error in the determination of
from the observed values
etc., will be
and the mean error to be feared in this determination will be
It is obvious that the errors
and
will not be independent of each other, and the mean value of the product
will not be
like the mean value of
but instead it will be equal to
IV. The problem includes the case where the values of the quantities
etc., are not immediately given by observation, but are deduced from any combinations of direct observations. For this extension to be legitimate, the determinations of these quantities must be independent, i.e., they must be provided by different observations. If this condition of independence is not fulfilled, the formula giving the value of
would no longer be accurate. For example, if the same observation were used both in determining
and in determining
the errors
and
would no longer be independent, and the mean value of the product
would no longer be zero. If, in this case, the relationship between
and
and the results of the simple observations from which they derive is known, we can calculate the mean value of the product
as indicated in remark III, and consequently correct the formula which gives
19.
Let
etc., be functions of the unknowns
etc. Let
be the number of these functions, and let
be the number of unknowns. Suppose that observations have given, immediately or indirectly,
etc., and that these determinations are absolutely independent of each other. If
is greater than
then the determination of the unknowns is an indeterminate problem. If
is equal to
then each of the unknowns
etc., can be reduced to a function of
etc., so that the values of the former can be deduced from the observed values of the latter, and the previous article will allow us to calculate the relative accuracy of these various determinations. If
is less than
then each unknown
etc., can be expressed in infinitely many ways as a function of
etc., and, in general, these values will be different; they should coincide if the observations were, contrary to our assumptions, rigorously accurate. It is clear, moreover, that the various combinations will provide results whose accuracy will generally be different.
Moreover, if, in the second and third cases, the quantities
etc., are such that
of them, or more, can be regarded as functions of the others, the problem is more than determined relative to these latter functions and indeterminate relative to the unknowns
etc.; and we could not even determine these latter unknowns, even if the functions
etc., were exactly known: but we exclude this case from our investigations.
If
etc., are not linear functions of the unknowns, we can always assign them this form, by replacing the primitive unknowns with their difference from their approximate values, which we assume known; the mean errors to be feared in the determinations
being respectively denoted by
etc., and the weights of these determinations by
etc., so that
We will assume that both the ratios of the mean errors and the weights are known, one of which will be arbitrarily chosen. Finally, if we set
then things will proceed as if immediate observations, equally precise and with mean error
had given
20.
Problem. Let
etc., be the following linear functions of the unknowns
etc.,
(1)

Among all systems of coefficients
etc., that identically satisfy
being independent of
etc., find the one for which
obtains its minimum value.
Solution. — Let us set
(2)

are linear functions of
and we have
(3)
![{\displaystyle \left\{{\begin{array}{l}{\begin{alignedat}{5}\xi &=x\Sigma a^{2}&{}+{}&y\Sigma ab&{}+{}&z\Sigma ac&{}+{}&\ldots &{}+{}&\Sigma al,\\[0.75ex]\eta &=x\Sigma ab&{}+{}&y\Sigma b^{2}&{}+{}&z\Sigma bc&{}+{}&\ldots &{}+{}&\Sigma bl,\\[0.75ex]\zeta &=x\Sigma ac&{}+{}&y\Sigma bc&{}+{}&z\Sigma c^{2}&{}+{}&\ldots &{}+{}&\Sigma cl,\end{alignedat}}\\\;\cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdot \end{array}}\right.}](./_assets_/eb734a37dd21ce173a46342d1cc64c92/c1e2cd30f640ecab60eca3306bfdd0b5b86d1bb2.svg)
where
denotes the sum
and similarly for the other sums.
The number of quantities
etc., is equal to the number of unknowns
etc., namely
. Thus, by elimination, one can obtain an equation of the following form,[2]
which will be identically satisfied if we replace
with their values from (3). Consequently, if we set
(4)

then we will have identically
(5)

This equation shows that among the different systems of coefficients
etc., we must consider the system
Moreover, for any system, we will have identically
and this equation, being identical, leads to the following:
Adding these equations after multiplying them, respectively, by
etc., we will have, by virtue of the system (4),
which is the same as
thus, the sum
will have its minimum value when
etc. Q.E.I.
Moreover, this minimum value will be obtained as follows. Equation (5) shows that we have
Let's multiply these equations, respectively, by
etc., and add them; considering the relations (4), we find
21.
When the observations have provided approximate equations
etc., it will be necessary, to determine the unknown
to choose a combination of the form
such that the unknown
acquires a coefficient equal to
, and that the other unknowns are eliminated.
According to art. 18, the weight of this determination will be given by
According to the previous article, the most suitable determination will be obtained by taking
etc.
Then
will have the value
and it is clear the same value would be obtained (without knowing the multipliers
etc.), by performing elimination on the equations
etc.
The weight of this determination will be given
and the mean error to be feared will be
A similar approach would lead to the most suitable values of the other unknowns
etc., which would be those obtained by performing eliminating on the equations
etc.
If we denote the sum
or equivalently
by
, then it is clear that
etc. will be the partial differential quotients of the function
i.e.
Therefore, the values of the unknowns that are deduced from the most suitable combination, and which we can call the most plausible values, are precisely those that minimize
. Now
represents the difference between the observed value and the computed value. Thus, the most plausible values of the unknowns are those that minimize the sum of the squares of the differences between the calculated and observed values of the quantities
etc., these squares being respectively multiplied by the weight of the observations. I had established this principle a long time ago through other considerations, in Theoria Motus Corporum Coelestium.
If one wants to assign the relative precision of each determination, it is necessary to deduce the values of
etc. from the equations (3), which gives them in the following form:
(7)

Accordingly, the most plausible values of the unknowns
etc., will be
etc. The weights of these determinations will be
etc. and the mean errors to be feared will be
for
|
for
|
for
|
in agreement with the results obtained in Theoria Motus Corporum Coelestium.
22.
The case where there is only one unknown is the most frequent and simplest of all. In this case we have
etc. We will then have
etc.,
etc., and consequently,
Hence
Therefore, if by several observations that do not have the same precision and whose respective weights are
etc., we have found, for the same quantity, a first
value
a second
a third
etc., then the most plausible value will be
and the weight of this determination will be
If all observations are equally plausible, then the most probable value will be
i.e. the arithmetic mean of the observed values; taking the weight of an individual observation as the unit, the weight of the average will be
Part Two
23.
A number of investigations still remain to be discussed, through which the preceding theory will be clarified and extended.
Let us first investigate whether the elimination used to express the variables
etc., in terms of
etc., is always possible. Since the number of equations is equal to the number of unknowns, we know that this elimination will be possible if
etc. are independent of each other; otherwise, it is impossible.
Suppose, for a moment, that
etc. are not independent, but rather there exists between these quantities an identical equation
We will then have
Let us set
(1)

from which it follows that
Multiplying the equations (1) resp. by
etc., and adding, we obtain
and this equation leads to
etc. From this we conclude, first of all,
Secondly, the equations (1) show that the functions
etc., are such that their values do not change when the variables
etc., increase or decrease proportionally to
etc. respectively. It is clear that the same holds for the functions
etc.: but this can only happen in the case where it would be impossible to determine
etc. using the values of
etc., even if these were exactly known; but then the problem would be indeterminate by its nature, and we will exclude this case from our investigations.
24.
If
etc. denote multipliers playing the same role relative to the unknown
as the multipliers
etc. relative to the unknown
i.e. so that we have
then we will identically have
Let
etc. be the analogous multipliers relative to the variable
so that we have:
and consequently,
In the same way as we found in art. 20 that
we will find here
and so on.
We will also have, as in art. 20
If we multiply the values
etc. (art. 20. (4)), respectively, by
etc., and add; we obtain
or
If we multiply
etc., respectively, by
etc., and add, we will find
and thus
In the same manner, we find
25.
Let
etc. denote the values taken by the functions
etc., when
etc. are replaced by their most plausible values,
etc., i.e.
If we set
so that
is the value of the function
corresponding to the most plausible values of the variables, and therefore, as was shown in art. 20, the minimum value of
Then the value of
will be
corresponding to
etc.\end{aligned},</math> and this value is zero, according to the way
etc. have been obtained. Thus, we have
and similarly we would obtain
and
Finally, multiplying the values of
etc. respectively by
and adding, we get
or
26.
Replacing
etc., with the expressions (7) from art. 21 in the equation
we find, through the same reductions as before,
Multiplying either these equations or the equations (1) of art. 20, by
etc., and then adding, we obtain the identity
27.
The function
can take several forms, which are worth developing.
Let us square the equations (1) art. 20, and add them. Then we find
this is the first form.
Next let us multiply the same equations by
etc. respectively, and add. Then we obtain
and replacing
etc., with the values indicated in the previous article, we find that
or
this is the second form.
Finally, replacing, in this second form,
etc. by the expressions (7) art. 21, we obtain the 'third form':
We can also give a fourth form which results automatically from the third form and the formulas of the previous article:

or

From this last form we clearly see that
is the minimum value of
28.
Let
etc., be the errors made in the observations that gave
etc. Then the true values of the functions
etc., will be
etc. respectively, and the true values of
etc., will be
etc. respectively.
therefore, the true value of
will be
and the error made in the most suitable determination of the unknown
which we will denote by
will be
Similarly, the error made in the most suitable determination of the value of
will be
The average value of the square
will be
The average value of
will similarly be
as shown above. We can also determine the average value of the product
which will be
These results can be stated more briefly as follows:
The average values of the squares
etc., are respectively equal to the products of
with the second-order partial differential quotients
and the average value of a product such as
is the product of
with
where
is regarded as a function of
etc.}
29.
Let
be a given linear function of the quantities
etc., i.e.
the value of
deduced from the most plausible values of
etc., will then be
and we denote this by
The error thus committed will be
which we denote by
The average value of this error will obviously be zero, meaning the error will not contain a constant part, but the average value of
i.e., the sum
will, according to the preceding article, be equal to the product of
with the sum
i.e., the product of
with the value produced by the function
when we substitute
If we let
denote this value of
then the mean error to be feared when we take
will be
and the weight of this determination will be
.
Since we have identically
will be equal to the value of the expression
or the value produced by
when we substitute for
etc. the values corresponding to
etc..
Finally, observing that
expressed as a function of the quantities
etc., will have
as its constant part, if we suppose that
then we will have
30.
We have seen that the function
attains its absolute minimum
when we substitute
etc.
or, equivalently,
etc. If we assign another value to one of the unknowns, e.g.
while the other unknowns remain variable,
may acquire a relative minimum value, which can be obtained from the equations
Therefore, we must have
etc., and since
we have
Likewise, we have
and the relative minimum value of
will be
Reciprocally, we conclude that if
is not to exceed
then the value of
must necessarily be between the limits
and
It is important to note that
becomes equal to the mean error to be feared in the most plausible value
of
if we set
i.e., if
is the mean error of observations whose weights are
.
More generally, let us find the smallest value of the function
that can correspond to a given value of
where
denotes, as in the previous article, a linear expression
whose most plausible value is
. Let us denote by the prescribed value of
by
According to the theory of maxima and minima, the solution to the problem will be given by the equations
or
etc., where
denotes an as yet undetermined multiplier. If, as in the previous article, we identically set,
then we will have
or
where
has the same meaning as in the previous article.
Since
is a homogeneous function of the second degree with respect to the variables
etc., its value when
etc.
will evidently be
and thus the minimum value of
when
will be
Reciprocally, if
must remain less than a given value
the value of
will necessarily be between the limits
and
will be the mean error to be feared in the most plausible value of
if
represents the mean error of observations whose weights are
.
31.
When the number of unknowns
etc. is quite large, the determination of the numerical values of
etc. by ordinary elimination is quite tedious. For this reason we have indicated, in Theoria Motus Corporum Coelestium art. 182, and later developed, in Disquisitione de elementis ellipticis Palladis (Comm. recent. Soc. Gotting Vol. I), a method that simplifies this work as much as possible. Namely, the function
must be reduced to the following form:
where the divisors
etc., are determined quantities;
etc., are linear functions of
etc., such that the second
does not contain
the third
contains neither
nor
the fourth contains neither
nor
nor
and so on, so that the last
contains only the last of the unknowns
etc.; and finally, the coefficients of
etc., in
etc., are respectively equal to
etc. Then we set
etc. and we will easily obtain the values of
etc. by solving these equations, starting with the last one. I do not believe it necessary to repeat the algorithm that leads to the transformation of the function
.
However, the elimination required to find the weights of these determinations requires even longer calculations. We have shown in the Theoria Motus Corporum Coelestium that the weight of the last unknown, (which appears by itself in
is equal to the last term in the series of divisors
etc. This is easily found; hence, several calculators, wanting to avoid cumbersome elimination, have had the idea, in the absence of another method, to repeat the indicated transformation by successively considering each unknown as the last one. Therefore, I hope that geometers will appreciate my indication of a new method for calculating the weights of determinations, which seems to leave nothing more to be desired on this point.
Setting
(1)

we have identically
and from this we deduce:
(2)
![{\displaystyle \left\{{\begin{array}{l}{\begin{alignedat}{3}&\xi &{}={}&u^{0},\\[0.25em]&\eta &{}={}&{\frac {{\mathfrak {B}}^{0}}{{\mathfrak {A}}^{0}}}u^{0}&{}+{}&u',\\&\zeta &{}={}&{\frac {{\mathfrak {C}}^{0}}{{\mathfrak {A}}^{0}}}u^{0}&{}+{}&{\frac {{\mathfrak {C}}'}{{\mathfrak {B}}'}}u'&{}+{}&u'',\end{alignedat}}\\\cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \end{array}}\right.}](./_assets_/eb734a37dd21ce173a46342d1cc64c92/c09eab2cfa590afc04b027eeb3f184a02488c69a.svg)
The values of
etc. deduced from these equations will be presented in the following form:
(3)

By taking the complete differential of the equation
we obtain
and thus
This expression must be equivaleny to the one obtained from the equations (3),
and therefore we have
(4)

By substituting in these expressions the values of
and
etc. obtained from the equations (3), we will have performed the elimination. For the determination of the weights, we have
(5)

The simplicity of these formulas leaves nothing to be desired. Equally simple formulas could be found to express the other coefficients
and
etc.; however, as their use is less frequent, we will refrain from presenting them.
33.
The importance of the subject has prompted us to prepare everything for the calculation and to form explicit expressions for the coefficients
etc.,
etc. etc. This calculation can be approached in two ways. The first involves substituting the values of
and so forth, deduced from the equations (3) into the equations (2), and the second involves substituting the values
from the equations (2) into the equations (3). The first method leads to the following formulas:
These formulas will determine
and so on.
We will then have,
which will determine
and so forth; then
which will determine
etc., and so on.
The second method yields the following system:
from which we deduce
from which we deduce
and
from which we deduce
and so on.
Both systems of formulas offer nearly equal advantages when seeking the weights of the determinations of all unknowns
and so forth; however, if only one of the quantities
and so forth is required, the first system is much preferable.
Moreover, the combination of equations (1) and (4) yields the same formulas, and provides, in addition, a second way to obtain the most plausible values
and so forth, which are
The other calculation is identical to the ordinary calculation in which it is assumed
etc.
34.
The results obtained in art. 32 are only particular cases of a more general theorem which can be stated as follows:
Theorem If
represents the following linear function of the unknowns
etc.,
whose expression in terms of the variables
etc., is
then
will be the most plausible value of
and the weight of this determination will be
Proof. The first part of the theorem is obvious, since the most plausible value of
must correspond to the values
etc.
To demonstrate the second part, let's note that we have
and consequently, when
we have
whatever the differentials
etc. Hence, assuming always,
we obtain
Now it is easily seen that if the differentials
etc. are independent of each other, so will be
etc., therefore, we will have,
Hence, the value of
corresponding to the same assumptions, will be
which, by art. 29, demonstrates the truth of our theorem.
Moreover, if we wish to perform the transformation of the function
without resorting to formulas (4) of art. 32, we immediately have the relations
which will allow us to determine
etc., and we will finally have
35.
We will particularly address the following problem, both because of its practical utility and the simplicity of the solution:
Find the changes that the most plausible values of the unknowns undergo by adding a new equation, and assign the weights of these new determinations.
Let us keep the previous notations. The primitive equations, reduced to have a weight of unity, will be
we will have
and
etc., will be the partial derivatives
Finally, by elimination, we will have
(1)

Now suppose we have a new approximate equation
(which we assume to have a weight equal to unity), and we seek the changes undergone by the most plausible values of
etc., and of the coefficients
etc..
Let us set
and let
be the result of the elimination. Finally, let
which, taking into account the equations (1), becomes
and let
It is clear that
will be the most plausible value of the function
as resulting from the primitive equations, without considering the value
provided by the new observation, and
will be the weight of this determination.
Now we have
and consequently,
or
Furthermore,
From this, we deduce,
which will be the most plausible value of
deduced from all observations.
We will also have
thus
will be the weight of this determination.
Similarly, for the most plausible value of
deduced from all observations, we find
the weight of this determination will be
and so on. Q.E.I.
Let us add some remarks.
I. After substituting the new values
etc., the function
will obtain the most plausible value
and since we have, identically,
the weight of this determination, according to art. 29, will be
These results could be deduced immediately from the rules explained at the end of art. 21. The original equations had, indeed, provided the determination
whose weight was
A new observation gives another determination
independent of the first, whose weight is
and their combination produces the determination
with a weight of
II. It follows from the above that, for
etc.
we must have
etc.,
and consequently,
Furthermore, since
we must have
and
III. Comparing these results with those of the art. 30, we see that here the function
has the smallest value it can obtain when subjected to the condition
36.
We will give here the solution to the following problem, which is analogous to the previous one, but we will refrain from indicating the demonstration, which can be easily found, as in the previous article.
Find the changes in the most plausible values of the unknowns and the weights of the new determinations when changing the weight of one of the primitive observations.
Suppose that after completing the calculation, it is noticed that the weight which has been assigned to an observation is too strong or too weak, e.g. the first one which gave
and that it would be more accurate to assign it the weight
instead of the weight
It is not necessary to then restart the calculation. Instead it is convenient to form the corrections using the following formulas.
The most plausible values of the unknowns will be corrected as follows:
and the weights of these determinations will be found upon dividing unity by
respectively.
This solution applies in the case where, after completing the calculation, it is necessary to completely reject one of the observations, since this amounts to making
; similarly,
will be suitable for the case where the equation
which in the calculation had been regarded as approximate, is in fact absolutely precise.
If, after completing the calculation, several new equations were to be added to those proposed, or if the weights assigned to several of them were incorrect, the calculation of the corrections becomes too complicated, and it is preferable to start over.
37.
In the arts. 15 and 16, we have given a method to approximate the accuracy of a system of observations; but this method assumes that the real errors encountered in a large number of observations are known exactly; however, this condition is rarely fulfilled, if ever.
If the quantities for which the observation provides approximate values depend on one or more unknowns, according to a given law, then the method of least squares allows us to find the most plausible values of these unknowns. If we then calculate the corresponding values of the observed quantities, they can be regarded as differing little from the true values, so that their differences with the observed values will represent the errors committed, with a certainty that will increase with the number observations. This is the procedure followed in practice by calculators, who have attempted, in complicated cases, to retrospectively evaluate the precision of the observations. Although sufficient in many cases, this method is theoretically inaccurate and can sometimes lead to serious errors; therefore, it is very important to treat the issue with more care.
In the following discussion, we retain the notation used in art. 19. The method in question consists of considering
etc., as the true values of the unknowns
etc., and
etc., as those of the functions
etc. If all observations have equal precision and their common weight
is taken to be unity, these same quantities, changed in sign, represent, under this assumption, the errors of the observations. Consequently, according to art. 15,
will be the mean error of the observations. If the observations do not have the same precision, then
etc., represent the errors of the observations, respectively multiplied by the square roots of the weights, and the rules of art. 16 lead to the same formula,
which already expresses the mean error of these observations, when their weight is
. However, it is clear that an exact calculation would require replacing
etc. with the values of
etc., deduced from the true values of the unknowns
etc., and replacing the quantity
by the corresponding value of
Although we cannot assign this latter value, we are nonetheless certain that it is greater than
(which is its minimum possible value), and it would only reach this limit in the infinitely unlikely case where the true values of the unknowns coincide with the most plausible ones. We can therefore affirm, in general, that the mean error calculated by ordinary practice is smaller than the exact mean error, and consequently, that too much precision is attributed to the observations. Now let us see what a rigorous theory yields.
38.
First of all, we need to determine how the quantity
depends on the true errors of the observations. As in art. 28, Let us denote these errors by
etc., and let us set, for simplicity,
and
Let
etc., be the true values of the unknowns
etc., for which
etc., are, respectively,
etc. The corresponding values of
etc., will obviously be
so that we will have
Finally,
will be the value of the function
corresponding to the true values of the
etc. Since we also have identically
we will also have
From this, it is clear that
is a homogeneous function of the second degree of the errors
etc.; for various values of the errors this function may become greater or smaller. However, the extent of the errors remains unknown to us, so it is good to carefully examine the function
, and to first calculate its average value according to the elementary calculus of probability. We will obtain this average value by replacing the squares
etc. with
etc., and omitting the terms in
etc., whose average value is zero; or equivalently, by replacing each square
, by
and neglecting
. Accordingly, the term
will provide
;
the term
will produce
each of the other terms will also give
so that the total average value will be
where
denotes the number of observations, and
denotes the number of unknowns. Due to errors offered by chance, the true value of
may be greater or smaller than this average value, but the difference decrease as the number of observations increases, so that
can be regarded as an approximate value of
Consequently, the value of
provided by the erroneous method we discussed in the previous article, must be increased by the ratio of
to
39.
To clearly understand the extent to which it is permissible to consider the value of
provided by the observations as equal to the exact value, we must seek the mean error to be feared when
This mean error is the square root of the average value of the quantity
which we will write as:
and since the average value of the second term is evidently zero, the question reduces to finding the average value of the function
If we denote this average value by
then the mean error we seek will be
Expanding the function
we see that it is a homogeneous function of the errors
etc., or equivalently, of the quantities
etc.; therefore, we will find the average value by:
1. Replacing the fourth powers
etc., by their average values;
2. Replacing the products
etc., by their average values, that is, by
etc.;
3. Neglecting products such as
etc.. We will assume (see art. 16) that the average values of
etc., are proportional to
etc., so that the ratios of one to another are
where
denotes the average value of the fourth powers of the errors for observations whose weight is
. Thus the previous rules could also be expressed as follows: Replace each fourth power
etc., by
each product
etc., by
and neglect all terms such as
or
These principles being understood, it is easy to see that:
I. The average value of
is
II. The average value of the product
is
because
Similarly, the average value of
is
the average value of
is
and so on. Thus the average value of the product
or
will be
The products
or
etc., will have the same average value. Thus the product
will have an average value of
III. To shorten the following developments, we will adopt the following notation. We give the character
a more extended meaning than we have done so far, by making it designate the sum of similar but not identical terms arising from all permutations of the observations. According to this notation, we will have
Calculating the average value of
term by ter, we first have, for the average value of the product
Similarly, the average value of the product
is
and so on. Therefore, the average value of the product
is
Now the average value of
is
The average value of
is
and so on. Hence, we easily conclude that the average value of the product
is
Thus, for the average value of the product
we have
IV. Similarly, for the average value of the product
we find
Now, we have
so this average value will be
V. By a similar calculation, we find that the average value of
is
and so on. Adding up, we obtain the average value of the product
this value is
VI. Similarly, we find that
is the average value of the product
and
is the average value of the product
and so on.
Hence by addition we find the average value of the square
which is
VII. Finally, from all these preliminaries, we conclude that
Therefore, the mean error to be feared when
will be
40.
The quantity
which occurs in the expression above, generally cannot be reduced to a simpler form. However, we can assign two limits between which its value must necessarily lie. First, It is easily deduced from the previous relations that
from which we conclude that
is a positive quantity smaller than unity, or at least not larger. The same will be true for the quantity
which is equal to the sum
Similarly,
will be smaller than unity; and so on. Therefore,
must be smaller than
Second, we have
since
from which it is easily deduced that
is greater, or at least not smaller, than
Therefore, the term
must necessarily lie between the limits
and
or, between the broader limits
and
Thus, the square of the mean error to be feared for the value
lies between the limits
and
so that a degree of precision as great as desired can be achieved, provided the number of observations is sufficiently large.
It is very remarkable that in hypothesis III of art. 9, on which we had formerly relied to establish the theory of least squares, the second term of the square of the average error completely disappears (since
); and because, to find the approximate value
of the average error of the observations, it is always necessary to treat the sum
as if it were equal to the sum of the squares of
random errors, it follows that, in this hypothesis, the precision of this determination becomes equal to that which we found, in art. 15, for the determination from
true errors.
- ↑ The exact determination of
etc., is conceivable only in the case where, by the nature of the matter, the errors
etc. proportional to
etc., are considered equally probable, or rather in the case where
- ↑ We will later explain the reasoning that led us to denote the coefficients of this formula by the notation
etc..