Edinburgh Research Archive

On the test of a hypothesis concerning two independent frequency distributions

Abstract


One of the chief uses of statistical analysis of given samples, which are of course assumed to be representative in the statistical sense, is to draw correct inferences regarding their populations.
Those cases in which we know a priori the exact distribution of their populations are evidently trivial and do not present any such statistical problem.
In some cases, however, knowing nothing about the nature of the populations, we may attempt to obtain a hypothetical population from which the two given samples may reasonably be assumed to have been drawn.
But quite often we may try to know whether or not the two given samples can arise from the same population when we know only the nature of the distribution of their populations without our knowing them exactly, for some of the parameters which specify them completely may be unknown; that is, when our hypothesis regarding the populations is a ncompositeu one. Thus, in any particular case we may know that the given samples belong to a normal population (say) without their means of variances or both - the two paramenters which completely specify a Normal Universe - being known.
It is evident that whether the nature of the distribution of the populations is known or not, in order to get information about them, we shall have to estimate some unknown parameters or their functions which would specify them (populations) completely, and obtain a test criterion which will enable us to say whether or not the two given samples belong statistically to the same population.
Fundamentally, therefore, the problem is one of estimation of the unknown parameters (or their functions) of the populations from the given samples and, according to the established statistical practice, we shall assert that the two samples belong to the same population when the estimated values of the parameters of the population from the given samples do not differ significantly at pre -assigned levels of significance. These levels are in general determined by the amount of risks we are prepared to take.
The importance of this type of problem cannot be over -estimated. It may be used to study a variety of problems of great practical value, e.g. "the qualities and quantities of manufactured products, yield of agricultural techniques, results of different medical treatments, effects of suggested educational methods and the like". Thus, to take a concrete case, we may have two samples of finished goods of same kind classified into the same different groups according to certain characteristics, which can be measured numerically; the question arises whether or not the two samples are from an identical source of production, i.e. whether or not the processes of manufacture of both the samples can be assumed to be identical.
In view of the importance of the above types of problems, we discuss here an equally useful and important problem of allied nature, namely : w *Given two independent sets of frequencies classified into the same K frequency classes. To develop a test of the hypothesis that the two samples may be said to belong to the same population, it being assumed that the samples are large and the law of distribution of the population is known except for certain unspecified parameters.
It is of course inherent in the above problem that even when the samples belong to different populations the nature of their distributions i.e. their mathematical form remains the same e.g. if the law of distribution is known to be Poissonian (-say) we assume that both the samples come from Poissonian populations.

This item appears in the following Collection(s)