Statistic Distribution Measures


A continuous right-skewed statistical distribution also Known as Snedecor’s F distribution or the Fisher – Snedecor distribution ( After R.A. Fisher and George W. Snedecor)(2) which arises in the testing of whether two observed samples have the same variance. (1) Note that three of the most important distributions (namely the normal distribution, the t distribution, and the chi-square distribution) may be seen as special cases of the F distribution: (3)

Example: We want to measure the monthly sales volume from Microsoft and Apple. We collect data for a year ( 12 months). We calculate the variance for both and define the “degrees of freedom’ (n-1= 11) and then we can build the F-distribution.

F statistic (): Defined as the ratio of the dispersions of the two distributions, in other words it is the value calculated by the ratio of two sample variances . F always >=1. The F statistic can test the null hypothesis: (1) that the two sample variances are from normal populations with a common variance; (2) that two population means are equal; (3) that no connection exists between the dependent variable and all or some of the independent variables. |

Where and be independent variates distributed as chi-squared with and degrees of freedom. Example: We want to measure the monthly sales volume from Microsoft and Apple. We collect data for a year ( 12 months). We calculate the variance for both and define the “degrees of freedom’ (n-1= 11) . Then we calculate F= (V² (M) /m)/ V²(A)/a, where V(M) variance for Microsoft, V(A) variance for Apple and m,a degrees of freedom for Microsoft and Apple respectively.

Chi-square Distribution: The distribution of the sum of the squares of a set of variables, each of which has a normal distribution and is expressed in standardized units. It consist in a family of curves based on he number of degrees of distribution and is denoted by the symbol , which is pronounced "Ky square". More precisely, and more formally :

Let {X1, X2, …, Xn} be n independent r.v., all ~ N(0,1). Then the n is defined as the distribution of the sum X1² + X2² +…+ Xn². (X1² + X2² +…+ Xn²) ~ n| T distribution:

A distribution used to test a hypothesis about a population mean when the population standard deviation is not known, the sample size is small, and the normal distribution is assumed for the sample mean. This distribution is known as "Student's t distribution", or simply "t distribution". It depends on n, which is therefore a parameter of the distribution. The distribution of T (for n-observation samples) is called the "t distribution with (n – 1) degrees of freedom", and is denoted tn-1: T ~ tn-1 = >

Dependent samples: Two samples are dependent if the sample from one population is used to determine the individuals in the second sample. With dependent samples we sometimes collect two data values from related individuals (before/after, or husband/wife).

These are called matched-pairs samples. Example: We want to estimate the difference between the mean of blood pressure of all participants before and after a exercise session. We take a sample of 40 participants and measure their blood pressure before and after the session exercise. These two samples include the same 40 persons. Note: Both samples are the same size.

Independent samples: Two samples are independent if the sample selected from one population is not related in any way to the sample from the other population. Example: We want to estimate the learning curve from 2 classes in the same period. At the beginning a group of 20 students from each class, take a pretest . At the end of the period, another test is taken by the same 20 student from each class. Although these two samples have the same size and include the same students, they are not related to each other.

Degrees of freedom: In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary. The number of independent variables that must be specified to determine the state of a system. Example: To define the volume for a box, the x, y, and z coordinates are required, so it has 3 degrees of freedom. A collection of N boxes has 3N degrees of freedom.

T statistics: The t statistic is a measure of how extreme a statistical estimate is. We calculate this statistic by subtracting the hypothesized value from the statistical estimate and then dividing by the estimated standard error. For most of cases, the hypothesized value would be zero. If the t-statistic is close to zero: there is an indication that the hypothesized value is reasonable. If t-statistic is large positive: there is an indication that the hypothesized value is not large enough. If t-statistic is large negative: ther is an indication that the hypothesized value is too large.

“To formalize this approach, you need to compare the t-statistic to a percentile from the t-distribution. The t-statistic is sometimes also referred to as a t-test, t-ratio, or Wald statistic. In a study of how low triiodothyronine in pre term infants affects IQ at 8 years follow up (BMJ 1996; 312: 1132-1133), the estimated deficit in IQ and standard error was 6.6 (3.0) for overall IQ, 8.5 (3.6) for verbal IQ and 5.0 (3.0) for performance IQ.

These deficits were adjusted for sex, gestation, birth weight, Apgar score at 5 minutes, and days of ventilation. We wish to compare these estimated deficits to a hypothesized deficit of zero. The t-statistics would be: (6.6-0) / 3.0 = 2.20 for overall IQ,

(8.5-0) / 3.6 = 2.36 for verbal IQ, and (5.0-0) / 3.0 = 1.67 for performance IQ. Since the t-statistics are large positive, this gives some indication that the deficit is greater than the hypothesized value of zero. “ ( 4)

Paired difference: “It is a type of location test that is used when comparing two sets of measurements to assess whether their population means differ. A paired difference test uses additional information about the sample that is not present in an ordinary unpaired testing situation. (Source: Wikipedia)”.

(1) (2) (3) (4)