Ratio-type estimators in stratified random sampling using auxiliary attribute
Table Of Contents
- <p> </p><p>TITLE PAGE………………………………………………………………………………………………….. I<br>CERTIFICATION………………………………………………………………………………………….. II<br>DEDICATION ………………………………………………………………………………………………. III<br>ACKNOWLEDGEMENTS …………………………………………………………………………….. IV<br>TABLE OF CONTENTS ………………………………………………………………………………… V<br>LIST OF TABLES ………………………………………………………………………………………… VII<br>ABBREVIATIONS/NOTATIONS…………………………………………………………………. VIII<br>ABSTRACT ………………………………………………………………………………………………….. IX<br>
Chapter ONE
INTRODUCTION
- …………………………………………………………………1<br>
- 1.1INTRODUCTION …………………………………………………………………………………………1<br>
- 1.2CENSUS VERSUS SAMPLE SURVEY ………………………………………………………………..3<br>
- 1.3RANDOM SAMPLING …………………………………………………………………………………..4<br>
- 1.4DEFINITION OF BASIC TERMS ……………………………………………………………………..5<br>
- 1.5AIM AND OBJECTIVES ………………………………………………………………………………..9<br>
- 1.6SIGNIFICANCE OF THE STUDY ……………………………………………………………………10<br>
- 1.7SCOPE AND LIMITATION …………………………………………………………………………..10<br>
Chapter TWO
LITERATURE REVIEW
- ……………………………………………………12<br>
- 2.1RATIO ESTIMATORS …………………………………………………………………………………12<br>
- 2.2RANKED SET SAMPLING …………………………………………………………………………..15<br>
- 2.3STRATIFIED RATIO ESTIMATOR…………………………………………………………………15<br>
Chapter THREE
SYSTEM DESIGN AND IMPLEMENTATION
- MATERIALS AND METHODS ………………………………………20<br>
- 3.1INTRODUCTION ……………………………………………………………………………………….20<br>
- 3.2DATA USED FOR THE ANALYSIS …………………………………………………………………20<br>
- 3.3SOFTWARE USED FOR THE ANALYSIS ………………………………………………………….20<br>
- 3.4PROPOSED ESTIMATORS …………………………………………………………………………..21<br>
- 3.5BIAS AND MEAN SQUARE ERROR (MSE) OF ESTIMATOR<br>Tˆ …………………………..24<br>
- 3.6BIAS AND MEAN SQUARE ERROR OF THE PROPOSED ESTIMATORS<br>ˆ<br>i T  …………….30<br>
- 3.7EFFICIENCY COMPARISONS ………………………………………………………………………37<br>
- 3.8PROPERTIES OF THE PROPOSED ESTIMATORS ……………………………………………..39<br>
- 3.9DETERMINATION OF SAMPLE SIZE …………………………………………………………….39<br>3.
- 9.1Constants of Proportionality for Fixed Cost ………………………………………….43<br>vi<br>3.
- 9.2Constants of Proportionality for Fixed precision ……………………………………45<br>
Chapter FOUR
SYSTEM TESTING AND EVALUATION
- EMPIRICAL STUDY ………………………………………………………..49<br>
- 4.1PRE-AMBLE …………………………………………………………………………………………….49<br>
- 4.2RESULTS AND DISCUSSION ………………………………………………………………………..51<br>
Chapter FIVE
SUMMARY, CONCLUSION AND RECOMMENDATIONS
- CONCLUSION AND RECOMMENDATION …..54<br>
- 5.1SUMMARY ………………………………………………………………………………………………54<br>
- 5.2CONCLUSION …………………………………………………………………………………………..54<br>
- 5.3RECOMMENDATION …………………………………………………………………………………55<br>REFERENCES ………………………………………………………………………………………………56<br>APPENDIX I: DATA ……………………………………………………………………………………..60<br>APPENDIX II: SOURCE CODE ……………………………………………………………………..70</p><p> </p><p> </p> <br><p></p>
Project Abstract
<p> </p><p>A problem of the ratio-type estimators in Stratified Sampling is the use of non-attribute auxiliary information. In this study, some ratio-type estimators in stratified random sampling using attribute as auxiliary information are proposed. The sample mean of study variable and proportion of auxiliary attribute were transformed linearly and using auxiliary parameters respectively. Biases and mean square errors (MSE) for these estimators were derived. The MSE of these estimators were compared with the MSE of the traditional combined ratio estimator. The results show that the proposed estimators are more efficient and less bias than the combined ratio estimate in all conditions. An empirical study was also conducted using students height data from each faculty of the Usmanu Danfodiyo University, Sokoto. The results also show that the proposed estimators are more efficient and less bias than the combined ratio estimator. In addition, formulae for determination of sample sizes when the proposed estimators are adopted under various allocations (Optimum, Neyman and Proportional) for fixed cost and desired precision were obtained.</p><p> </p> <br><p></p>
Project Overview
<p>
</p><p>INTRODUCTION<br>1.1 INTRODUCTION<br>Prior knowledge about population mean along with coefficient of variation, kurtosis and correlation of the population of an auxiliary variable are known to be very useful particularly when the ratio, product and regression estimators are used for estimation of population mean of a variable of interest. The use of auxiliary information can increase the precision of an estimator when study variable is highly correlated with auxiliary variable. Srivastava and Jhajj (1981) suggested a class of estimators of the population mean, provided that the mean and variance of the auxiliary variable are known. Singh and Tailor (2003) considered a modified ratio estimator by exploiting the known value of correlation coefficient of the auxiliary variable. Singh and Upadhyaya (1999) suggested two ratio-type estimators when the coefficient of variation and kurtosis of the auxiliary variable are known.<br>However, the fact that the known population proportion of an attribute also provides similar type of information has not drawn as much attention. In several situations, instead of existence of auxiliary variables there exists some auxiliary attributes, which are highly correlated with study variable (Singh et. al.,2008). For example, sex and height of the persons, amount of milk produced by a particular breed of cow, amount of yield of wheat crop by a particular variety of wheat etc. (Jhajj et. al., 2006). In such situations, taking the advantage of point-biserial correlation between the study variable and the auxiliary attribute, the estimators of parameters of interest can be constructed by using prior knowledge of the parameters of auxiliary attribute.<br>2<br>It is often useful to incorporate auxiliary information of the population in a sampling<br>procedure. In practice, auxiliary information can be obtained in different ways. For<br>example, the sampling frames often used in official statistics production may include<br>auxiliary information on the population elements or these data are extracted from<br>administrative registers and are merged with the sampling frame elements. In other<br>words, aggregate-level of auxiliary information can be obtained from different sources,<br>such as published official statistics. Use of auxiliary information in sampling and<br>estimation can be very useful in the construction of an efficient sampling design.<br>In the estimation of population parameters, auxiliary information is used to improve<br>efficiency for the variable of interest. Whenever there is auxiliary information, the<br>researcher wants to utilize it in the method of estimation to obtain the most efficient<br>estimator.<br>In simple random sampling, the variance of the estimate (say, of population mean Y )<br>depends, apart from the sample size, on the variability of the character y in the<br>population. If the population is very heterogeneous and considerations of cost limit the<br>size of the sample, it may be found impossible to get a sufficiently precise estimate by<br>taking a simple random sample from entire population. And populations encountered in<br>practice are generally very heterogeneous (Raj and Chandhok, 1998). In surveys of<br>manufacturing establishments, for example, it can be found that some establishment are<br>very large, that is, they employ 1000 or more persons, but there are many others which<br>have only two or three persons on their rolls. Any estimate made from a direct random<br>sample taken from the totality of such establishments would be subject to exceedingly<br>large sampling fluctuations. But suppose it is possible to divide this population into parts<br>3<br>or strata on the basis of, say employment, thereby separating the very large ones, the<br>medium-sized ones and the smaller ones. If a random of establishments is now taken<br>from each stratum, it should be possible to make a better estimate of the strata average,<br>which in turn should help in producing a better of the population average. Similarly, if a<br>sample is selected with probability proportionate to x from the entire population, the<br>variance of the population-total estimate may be very high because the ratio of y to x<br>varies considerably over the population. If a way can be found of subdividing the<br>population so that the variation of the ratio of y to x is considerably reduced within the<br>subdivisions or strata, a better estimate of the population can be made. This is the basic<br>consideration involved in the use of stratification for improving the precision of<br>estimation (Raj and Chandhok, 1998).<br>1.2 CENSUS VERSUS SAMPLE SURVEY<br>Broadly speaking, information on population may be collected in two ways. Either every<br>unit in the population is enumerated (called complete enumeration, or census) or<br>enumeration is limited to only a part or sample selected from the population (called<br>sample enumeration or sample survey). A sample survey will usually be less costly than a<br>complete census because the expense of covering all units would be greater than that of<br>covering only a sample fraction. Also, it will take less time to collect and process data<br>from a sample than from a census. But economy is not the only consideration; the most<br>important point is whether the accuracy of the results would be adequate for the end in<br>view. It is a curious fact that the results from a carefully planned and well executed<br>sample survey are expected to be more accurate (near to the aim of study) than those<br>from a complete enumeration that can be taken. A complete census ordinarily requires a<br>4<br>huge and unwieldy organization and therefore many types of errors creep in which cannot<br>be controlled adequately. In a sample survey the volume of work is reduced considerably,<br>and it becomes possible to employ persons of higher caliber, train them suitably, and<br>supervise their work adequately. In a properly designed sample survey it is also possible<br>to make a valid estimate of the margin of error and hence decide whether the results are<br>sufficiently accurate. A complete census does not reveal by its self the margin of<br>uncertainty to which it is subject. But there is not always a choice of one versus the other.<br>For example, if the data are required for every small administrative area in a country, no<br>sample survey of a reasonable size will be able to deliver the desired information; only a<br>complete census can do this (Raj and Chandhok, 1998).<br>1.3 RANDOM SAMPLING<br>Simple random sampling is a method of selecting n units out of the N such that every<br>one of the N n C distinct samples has an equal chance of being drawn. In practice a simple<br>random sample is drawn unit by unit. The units in the population are numbered from 1<br>to N . A series of random numbers between 1 and N is then drawn, either by means of a<br>table of random numbers or by means of a computer program that produces such a table.<br>At any draw the process used must give an equal chance of selection to any number in the<br>population not already drawn. The units that bear these n numbers constitute the sample.<br>It is easily verified that all N n C distinct samples have an equal chance of being selected<br>by this method. Consider one distinct sample, that is, one set of n specified units. At the<br>first draw the probability that some one of the n specified units is selected is n<br>N<br>. At the<br>second draw the probability that some one of the remaining (nï€1) specified units is<br>5<br>drawn is, and so on. Hence the probability that all n specified units are selected in n<br>draws is<br>( 1) ( 2) 1 !( )! 1<br>. . …<br>( 1) ( 2) ( 1) ! N n<br>n n n n N n<br>N N N N n N C<br>ï€ ï€ ï€<br> <br>ï€ ï€ ï€ ï€«<br>Since a number that has been drawn is removed from the population for all subsequent<br>draws, this method is also called random sampling without replacement. Random<br>sampling with replacement is entirely feasible; at any draw, all N members of the<br>population are given equal chance of being drawn, no matter how often they have been<br>drawn. The formulas for the variances and estimated variances of estimates made from<br>the sample are often simpler when sampling is with replacement than when it is without<br>replacement. For this reason sampling with replacement is sometimes used in the more<br>complex sampling plans (Cochran, 1977).<br>1.4 DEFINITION OF BASIC TERMS<br>Sample:- A sample is a group of units selected from larger group (population). By<br>studying the sample, it is hoped to draw valid conclusions about the larger group. A<br>sample is generally selected for study because the population is too large to study in its<br>entirety. The sample should be representative of general population. This is often best<br>achieved by random sampling. Also, before collecting the sample, it is important that the<br>researcher carefully and completely defines the population, including a description of the<br>members to be included (Cochran, 1977).<br>Parameter:- A parameter is a value usually unknown (and which therefore has to be<br>estimated), used to represent a certain population characteristic. Within a population, a<br>6<br>parameter is fixed value which does not vary. They are often denoted by Greek letters<br>(Cochran, 1977).<br>Statistic:- A statistic is a quantity that is calculated from a sample data. It is used to give<br>information about unknown values in the corresponding population. it is possible to draw<br>more than one sample from the same population and the value of a statistic will in general<br>vary from sample to sample. Therefore, statistic is a random variable (Cochran, 1977).<br>Estimator:- An estimator is a rule for calculating an estimate of a given quantity based<br>on observed data. There are point and interval estimators. The point estimator yields<br>single-valued results, although this includes the possibility of single vector-valued results<br>and results that can be expressed as a single function. This is in contrast to an interval<br>estimator, where the results would be a range of plausible values (or vectors or<br>functions). An estimator is a statistic, (that is, a function of data) that is used to infer the<br>value of an unknown parameter in statistical model. The parameter being estimated is<br>sometimes called estimand. It can be either finite-dimensional (in parametric and semiparametric)<br>or finite-dimensional (in nonparametric and semi-nonparametric models). If<br>the parameter is denoted by ï± , then the estimator is typically written as ˆï±<br>. Being a<br>function of data, the estimator is a random variable (Cochran, 1977).<br>Bias:-the bias of an estimator is the difference between this estimator’s expected value<br>and the true value of the parameter being estimated. An estimator with zero bias is called<br>unbiased. Otherwise the estimator is said to be biased. Suppose we have a statistical<br>model parameterized by ï± giving rise to a probability distribution for observed data<br>  p x ï± and a statistic ˆï±<br>which serves as an estimator based on the any observed data<br>7<br>x . That is, we assume that our data follows some unknown distribution p  x ï± ï€© (where<br>ï± is a fixed constant that is part of this distribution, but is unknown), and then we<br>construct some estimators ˆï±<br>that maps observed data to values that we hope are close to.<br>Then the bias of this estimator is defined to be;<br>Bias ﱈ  Eﱈ ï€ï± (Cochran, 1977).<br>Mean Square Error:- The MSE of an estimator is one of many ways to quantify the<br>difference between values implies by an estimator and the true values of the quantity<br>being estimated. MSE is a risk function, corresponding to the expected value of the<br>squared error loss or quadratic loss. MSE measures the average of the squares of the<br>errors. The error is the amount by which the value implied by the estimator differs from<br>the quantity to be estimated. The difference occurs because of randomness or because the<br>estimator doesn’t account for information that could produce a more accurate estimate.<br>The MSE is the second moment (about the origin) of the error, and thus incorporates both<br>the variance of the estimator and its bias. For an unbiased estimator, the MSE is the<br>variance. The MSE of an estimator ˆï±<br>with respect to the estimated ï± is defined<br>mathematically as;<br>   2 MSE ﱈ  E ﱈ ï€ï±<br>    2<br> var ﱈ  Bias ﱈ<br>The MSE thus assess the quality of an estimator in terms of its variat ion and<br>unbiasedness (Cochran, 1977).<br>8<br>Kurtosis:- Kurtosis is any measure of the peakedness of the probability distribution of a<br>real-valued random variable. It is descriptor of the shape of probability distributions. One<br>common measure of kurtosis originated by Pearson, is based on a scaled version of the<br>fourth moment of the data or population. For this measure, higher kurtosis means more of<br>the variance is the result of infrequent extreme deviations as opposed to frequent<br>modestly sized deviations. Distributions with negative or positive excess are called<br>platykurtic or leptokurtic respectively. The fourth standardized moment is defined as;<br>4<br>2 4<br>ï<br>ï¢<br>ï³<br> , Where 4 ï is the fourth moment about the mean and ï³ is the standard<br>deviation (Cochran, 1977).<br>Point-biserial correlation coefficient:- Point-biserial correlation coefficient denoted by<br>pb ï² is a correlation used when one variable (e.gY ) is dichotomous ; Y can either be<br>naturally dichotomous like gender or an artificial dichotomous variable. Point-biserial<br>correlation is mathematically equivalent to the Pearson product moment correlation; that<br>is, if we have one continuously measured variable X and a dichotomous variable Y .<br>This can be shown by assigning two distinct numerical values (say, 1 and 2) to<br>dichotomous variable. The Point-biserial correlation coefficient is given as;<br>1 2 1 2<br>pb 2<br>n<br>M M n n<br>s n<br>ï²<br>ï€<br><br>Where  2<br>1<br>1 n<br>n i<br>i<br>s x x<br>n <br>  ï€ the standard deviation for X , 1 M is the mean value on the<br>continuous variable for all data points in group 1, 2 M is the mean value on the<br>9<br>continuous variable for all data points in group 2, 1 n is the number of data point in<br>group1, 2 n is the number of data point in group 2 and n is the sample size (John, 2008).<br>Coefficient of Variation (CV):- CV is a normalized measure of dispersion of a<br>probability distribution. It is known as unitized risk or the variation coefficient. The<br>absolute value of CV is sometimes known as relative standard deviation (RSD), which is<br>express as a percentage. The CV is defined as the ratio of the standard deviation to the<br>mean;<br>CV<br>ï³<br>ï<br> which is the inverse of the signal-to-noise ratio. It shows the extent of variability<br>in relation to mean of the population (Cochran, 1977).<br>1.5 AIM AND OBJECTIVES<br>The aim of this research work is to develop some ratio-type estimators under stratified<br>random sampling scheme using auxiliary attributes that will produce more precise<br>estimates than the conventional estimator.<br>The above aim is achieved through the following objectives;<br>1. To linearly transform the sample mean of the variable of interest.<br>2. To transform the proportion of auxiliary attributes using auxiliary parameters like<br>kurtosis, coefficient of variation and coefficient of Point-biserial correlation.<br>3. To obtain the biases and mean square errors of the proposed estimators up to first<br>order approximation using Taylors’ expansion.<br>10<br>4. To obtain the conditions for efficiency of the proposed estimators over the conventional estimator.<br>1.6 SIGNIFICANCE OF THE STUDY<br>Ratio estimators of population parameters are more precise than their simple random sampling estimators’ counterparts (Cochran 1942). The mean square error of ratio estimator can be reduced with the application of transformation on the study and auxiliary variables (Chaudhuri and Adrikari 1979). Situations arise when the available auxiliary information are inform of attributes instead of variables. Based on these situations, some ratio-type estimators had been proposed by several researchers in simple random sampling which regards the population units as homogeneous. There are possibilities in which population units are heterogeneous as a whole but homogeneous within sub-populations (strata). In such situations, there is need to develop estimators that capture the variability within and between the strata for population parameters of interest with emphasis to bias reduction and efficiency improvement.<br>1.7 SCOPE AND LIMITATION<br>This research work primarily considers some ratio-type estimators in stratified sampling using attribute as auxiliary information. The transformation of the study variable mean is linear and kurtosis, coefficient of variation and coefficient of point-biserial correlation are the parameters of auxiliary attribute used for the transformation of proportion of auxiliary attribute. The data used for the empirical study was taken from Students Pre-medical Registration, Usmanu Danfodiyo University, Sokoto (2011/2012 Session). The results of the analysis are limited to the data used, the set of the proposed estimators and the sample<br>11<br>sizes taken from the data used. In future research, efforts will be made toward modification of the proposed estimators to obtain unbiased or almost unbiased estimators with higher precisions.</p><p> </p>
<br><p></p>