BACK TO 125 EXAMPLES in Engineering, Operations Management and Computer Information Systems

           Apply  Worldwide Now         

Do it once, do it right, and do it now.

Email Lawson Computing

Back to Lawson Computing Homepage

Apply as needed, when needed.

 

APPENDIX D

MORE ABOUT STATISTICS

APPENDIX D

 MORE ABOUT STATISTICS

          In this Appendix we will discuss, in additional detail, some statistical topics including error, sample size, the Student's "t" distribution, the "f" distribution, and confidence intervals.

 

STATISTICAL INFERENCE

 

          In statistical inference we make generalizations based on samples, and, traditionally, such inferences have been divided into problems of estimation and hypothesis testing. In estimation we assign a numerical value to a population parameter on the basis of sample data. In any CER, we are attempting to predict the population behavior(s) from a sample. We need to ask ourselves the question, "How good is our estimation?" When we test a hypothesis, we accept or reject assumptions concerning the parameters or the form of a population. When we use a CER, we need to be confident of the CERs ability to predict the future - without this confidence, the CER should not be used.

 

Z-scores

          In general, if X is a measurement belonging to a set of data having the mean (or for a population) and the standard deviation s (or for a population), then its value in standard units, denoted by Z, is

 

                or,     

 

depending on whether the data constitute a sample or a population. In these units, Z tells us how many standard deviations a value lies about or below the mean of the set of data to which it belongs.

 

Error

          An estimate is generally called a point estimate, since it consists of a single number, or a single point on the real number scale. Although this is the most common way in which estimates are expressed, it leaves room for many questions. For instance, it does not tell us on how much information the estimate is based, and it does not tell us anything about the possible size of the error. And, of course, we must expect an error. An estimate's reliability depends upon two things - the size of the sample and the size of the population standard deviation, . Any statistics textbook will show that the error term is,

 

              where,

 

Z(/2) represents the number of standard deviations from the mean that we are willing to allow our estimate to be "off" either way by probability of . This result applies when n is large and the population is infinite. The two values which are most commonly used for are 0.95 and 0.99, with corresponding Z scores, Z ( 0.025) = 1.96 (standard deviations) for = 0.95 and Z (0.005) = 2.575 for = 0.99, respectively.

 

          There is one complication with this result. To be able to judge the size of the error we might make when we use as an estimate of , we must know the value of the population standard deviation, . Since this is not the case in most practical situations, we have no choice but to replace with an estimate, usually the sample standard deviation, s. In general, this is considered to be reasonable provided the sample is sufficiently large (n ³ 30).

 

Sample Size

          The formula for E can also be used to determine the sample size that is needed to attain a desired degree of precision. Suppose that we want to use the mean of a large random sample to estimate the mean of a population, and we want to be able to assert with probability that the error of this estimate will be less than some prescribed quantity E. Solving the the previous equation for n, we get,

 

 

Confidence Intervals

          For large random samples from infinite populations, the sampling distribution of the mean is approximately normal with the mean and the standard deviation , namely, that,

 

 

is a random variable having approximately the standard normal distribution. Since the probability is that a random variable having the standard normal distribution will take on a value between -Z() and Z(), namely, that < Z < , we can substitute into this inequality the foregoing expression for z and it yeilds,

 

 

Using some algebraic manipulation, we get

 

 

and we can assert with probability that it will be satisfied for any given sample. In other words, we can assert with ()% confidence that the interval, above, determined on the basis of a large random sample, contains the population mean we are trying to estimate. When s is unknown and n is at least 30, we replace s by the sample standard deviation, s.

          An interval such as this is called a confidence interval, its endpoints are called confidence limits, and the probability is called the degree of confidence. Again, the values most commonly used for are 0.95 and 0.99, the corresponding values of are 1.96 and 2.575, and the resulting confidence intervals are referred to as 95% and 99% confidence intervals for m .

 

Confidence Intervals for Means (Small Samples)

          To develop corresponding theory which applies also to small samples, it will be necessary to assume that the population we are sampling has roughly the shape of a normal distribution. We can then base our methods on the statistic , whose sampling distribution is a continuous distribution called the t distribution. This distribution is symetrical and bell-shaped with zero mean. The exact shape of the t distribution depends as a parameter called the number of degrees of freedom, given by n-1, the sample size less one. For the t distribution we define t(a /2) in the same way in which Z(a /2) was defined. However, t(a /2) depends on n-1 (degrees of freedom) and its value must be looked up in a table of values. In the same way as before we can arrive at the following small sample confidence interval for m :

 

 

          The degree of confidence is and the only difference between this formula and the large sample formula is the t(a /2) takes the place of Z(a /2).

 

Analysis of Variance and the F Statistic

          The F statistic is a statistic for a test concerning the differences among means. It is defined as:

 

          F =     estimate of s 2 based on the variation among the 's

                   estimate of s 2 based on the variation within the samples

 

and is called a variance ratio. The F distribution is a theoretical distribution which depends on two parameters called the numerator and denominator degrees of freedom. When the F statistic is used to compare the means of k samples of size n, the numerator and denominator degrees of freedom are, respectively, k-1 and k(n-1).

 

          This is a simple form of an analysis of variance. The basic idea of an analysis of variance is to express a measure of the total variation of a set of data as a sum of terms, which can be attributed to specific sources, or causes of variation. Two such sources of variation could be 1) actual differences, and, 2) chance differences. As a measure of the total variation of an observation consisting of k samples of size n, we use the total sum of squares,

 

,

 

where Xij is the jth observation of the ith sample, i = 1, 2, ... , k, and j = 1, 2, ... , n, and , the mean of all the k measurements or observations is called the grand mean. If we divide the total sum of squares by kn-1, we get the variance of the combined data.

 

          Letting devote the mean of the ith sample, i = 1, 2, ..., k, we can write the following identity:

 

 

          Looking closely at the two terms into which the total sum of squares SST has been partitioned, we find that the first term is a measure of the variation among the means. Similarly, the second term is a measure of the variation within the individual samples, or chance variation. Dividing the first term by k-1 and the second by k (n-1), we get the numerator and the denominator of the F Statistic as defined, above. The first term is often referred to as the treatment sum of squares, SST and the second term as the error sum of squares, SSE, experimental error, or chance.

 

 

          Refer to a statistics textbook for a further explanation of these and other statistical subjects.

 

BACK TO 100 examples in Business, Operations and Engineering.
Click Here

           Apply  Worldwide Now         

Do it once, do it right, and do it now.

Email Lawson Computing

Back to Lawson Computing Homepage

Apply as needed, when needed.