Evaluating Likelihood Estimation Methods in Multilevel Analysis of Clustered Survey Data
Table Of Contents
Project Abstract
<p>
</p><div>Abstract
. </div><div>Public health researchers often lay little or no emphasis</div><div>on multilevel structure of clustered data and its likelihood estimation techniques.</div><div>This has led to improper inferences. The aim of this research is to evaluate tradi-</div><div>tional methods and the different multilevel likelihood estimation procedures so as</div><div>to compare their computational efficiencies.</div><div>Key words Clustered survey; Likelihood; Adaptive Gaussian Quadrature; Penal-</div><div>ized quasi likelihood, Modern contraception; Akaike’s information criteria.</div>
<br><p></p>
Project Overview
<p>
</p><div>1. Introduction</div><div>Likelihood plays important roles in parameter estimation and it is synonymous with</div><div>probability. It deï¬nes the function of parameters included in a statistical model.</div><div>That is, a set of parameter value given outcome y is the probability of those observed</div><div>outcome given the parameter values (l(θ/y) = p(y/θ)). Likelihood is one of the tools</div><div>used in estimating parameters of multilevel models, including multilevel binary</div><div>logistic models.</div>
<div>Multilevel model is a statistical model of parameter that varies at more than one</div><div>level (Leyland & Goldstein (2001); Sampson et al. (1997)). This model can be seen</div><div>as generalization of linear model, although they also extend to nonlinear models.</div><div>Multilevel model are ideal for research design where the data is collected from</div><div>study participants who were organized at two or more levels (Maas & Hox (2005);</div><div>Srikanthan & Reid (2008)). In which case, one level is nested in the other. Usually,</div><div>the unit of analysis are the individuals (at a lower level) who are nested in within</div><div>an aggregate unit (at higher level) (Klotz et al. (1969); Li et al. (2011)). Multilevel</div><div>(hierarchical) data structure causes correlation among observations within same</div><div>clusters (Li et al. (2011)). Multilevel models present alternative analysis procedures</div><div>to the famous univariate and multivariate analysis of measures that are collected</div><div>repeatedly from same individuals. Over the years, the use of multilevel analysis</div><div>to investigate public health problems has gained signiï¬cant prominence (DiezRouz</div><div>& Mair (2011); Leyland & Goldstein (2001)). This growth can be attributed to the</div><div>need to understand how individuals are related to each other within groups and im-</div><div>portance of such in understanding the distribution of health outcomes (DiezRouz</div><div>& Mair (2011);Oye-Adeniran et al. (2004)). The growth has also been aided by in-</div><div>creased use of multilevel methods in statistical methods and their applicability to a</div><div>broad range of scenarios that have multilevel data. However, its use has been fully</div><div>embraced in most public health research (Bingenheimer & Raudenbush (2004)).</div><div>The percent of total variance in the individual-level health outcome and the cluster</div><div>effects which represent unobserved cluster characteristics that has potentials of</div><div>affecting individuals outcomes could be large. (Li et al. (2011)). This must be viewed</div><div>in light of the fact that the relevant ”levels” are generally grossly mis-speciï¬ed. So</div><div>far, the methods of parameter estimation have led to several problems in the best</div><div>way to carry out multilevel analysis, including under estimation of parameters and</div><div>biased estimates (John et al. (2012)). In this study different methods of estimating</div><div>multilevel binary logistic model parameters were considered and the best method</div><div>was determined.</div><div>Cluster sampling, whereby samples are not taken randomly from entire population</div><div>but from clusters, often introduces multilevel dependency and correlation among</div><div>measurements taken from individuals within same cluster which could substan-</div><div>tially affect parameter estimates. The structure of clustered survey data are usu-</div><div>ally nested and can be analysed using multilevel techniques. Challenges are of-</div><div>ten encountered when multistage sampling is used in data collection without the</div><div>use of multilevel analysis. The description of most of ”the theoretical and method-</div><div>ological challenges facing contextual analysis” has been made by Blalock (1984).</div><div>The dependence among observations in multistage-clustered samples often comes</div><div>from several levels of the hierarchy (Maas & Hox (2005)). In this case, the use of</div><div>single-level statistical models is no longer valid and reasonable (Leyland & Gold-</div><div>stein (2001) ; Li et al. (2011)). The traditional standard logistic regression, that is</div><div>single-level logistic regression, usually requires a sort of independence among the</div><div>observations conditional on the independent variables and uncorrelated residual</div><div>errors. To ensure that appropriate inferences are drawn and that reliable conclusions from clustered survey data is made, it has therefore become necessary to use</div><div>more effective and more involving modeling techniques like multilevel modeling.</div><div>Also, underlying assumptions of ordinary logistic regression are violated when an-</div><div>alyzing nested data, hence the best option is multilevel logistic regression analysis</div><div>(Maas & Hox (2005); Srikanthan & Reid (2008)). This is due to the fact that it con-</div><div>siders the variations due to multilevel structure in the data and allows the simul-</div><div>taneous assessment of effects of different levels in the data used in this study. The</div><div>number of levels, the variance of the random effects and the size of the correlation</div><div>between random effects may affect the performance of the parameter estimation</div><div>method. Some methods of estimation could be biased. Therefore, there is need to</div><div>evaluate these methods and determine the best method. The commonest methods</div><div>used are Penalized Quasi-Likelihood (PQL), Non-Adaptive Gaussian Quadrature</div><div>(NAGQ) and Adaptive Gaussian Quadrature (AGQ) and the Maximum Likelihood</div><div>Estimates (MLE). Early methodology work on multilevel logit model includes use of</div><div>data from 15 World fertility survey (Goldstein (2003);Hox, J. J. (2002)). Further</div><div>documentations on multilevel models especially the type of data it allows, sam-</div><div>pling, outliers, repeated measures, institutional performance, and spatial analysis</div><div>have been made (Leyland & Goldstein (2001)).</div><div>The robustness, sample sizes and statitical power in multilevel modeling for both</div><div>categorical and continuous outcome variables has been studied earlier (Bingen-</div><div>heimer & Raudenbush (2004); Goldstein (2003); Li et al. (2011); Maas & Hox</div><div>(2005); Portnoy (1971)). Monte Carlo simulation has been used to ”assess the im-</div><div>pact of misspeciï¬cation of the distribution of random effects on estimation of and</div><div>inference about both the ï¬xed effects and the random effects in multilevel logis-</div><div>tic regression models” by Austin (2005). The authors concluded that inferences</div><div>aboutg ï¬xed effects estimate were not affected by the inherent misspeciï¬cation of</div><div>random effects distributions. However, the authors opined that inferences about</div><div>random effects estimate were influenced by model misspeciï¬cations. Simulation</div><div>studies indicated that increasing number of levels yield better estimates than larger</div><div>number of individuals per level (Goldstein (2003); Goldstein & Rasbash (1996);</div><div>Mason et al. (1983)). It was concluded in these studies that for second level units</div><div>with a small sample size, while the estimates of the regression coefficients are</div><div>unbiased, the standard errors and the variance components are sometimes un-</div><div>derestimated (<30)Maas & Hox (2004). This is not envisaged in the current study</div><div>since we are using a large dataset.</div><div>The use of these statistical methods allows public health researchers to correctly</div><div>identify factors and causes of disease at different levels. The approach provides op-</div><div>portunity and serves as a tool to investigate disease causation in complex settings.</div><div>Contraceptive Use in Nigeria</div><div>In 1988, the Nigeria Federal Ministry of Health adopted the ”National Policy on Pop-</div><div>ulation for Development, Unity, Progress and Self-Reliance” (Essien et al. (2010)).</div><div>It consequently adopted a revised policy in 2004.</div>
<br><p></p>