Assessing the performance of penalized regression methods and the classical least squares method

 

Table Of Contents


Chapter ONE

INTRODUCTION

  • 1.1Introduction
  • 1.2Background of Study
  • 1.3Problem Statement
  • 1.4Objective of Study
  • 1.5Limitation of Study
  • 1.6Scope of Study
  • 1.7Significance of Study
  • 1.8Structure of the Research
  • 1.9Definition of Terms

Chapter TWO

LITERATURE REVIEW

  • 2.1Overview of Penalized Regression Methods
  • 2.2Classical Least Squares Method
  • 2.3Comparison of Penalized Regression and Least Squares
  • 2.4Applications of Penalized Regression Methods
  • 2.5Advantages of Penalized Regression Methods
  • 2.6Disadvantages of Penalized Regression Methods
  • 2.7Recent Developments in Penalized Regression
  • 2.8Criticisms of Classical Least Squares Method
  • 2.9Case Studies on Penalized Regression Methods
  • 2.10Future Trends in Penalized Regression Research

Chapter THREE

SYSTEM DESIGN AND IMPLEMENTATION

  • 3.1Research Methodology Overview
  • 3.2Research Design and Approach
  • 3.3Data Collection Methods
  • 3.4Sampling Techniques
  • 3.5Variables and Measures
  • 3.6Data Analysis Techniques
  • 3.7Ethical Considerations
  • 3.8Validity and Reliability

Chapter FOUR

SYSTEM TESTING AND EVALUATION

  • 4.1Data Analysis and Interpretation
  • 4.2Comparison of Penalized Regression Results
  • 4.3Evaluation of Model Performance
  • 4.4Discussion on Variable Selection
  • 4.5Impact of Regularization Parameters
  • 4.6Visualization of Results
  • 4.7Addressing Assumptions of Penalized Regression
  • 4.8Practical Implications of Findings

Chapter FIVE

SUMMARY, CONCLUSION AND RECOMMENDATIONS

  • 5.1Conclusion and Summary
  • 5.2Summary of Findings
  • 5.3Contributions to Knowledge
  • 5.4Recommendations for Future Research
  • 5.5Conclusion and Final Remarks

Project Abstract

<p> </p><p>Regression is one of the most useful statistical methods for data analysis. Multicollinearity is a problem that, pose a challenge to regression analysis by increasing the standard error of the estimators, making the model to be less predictive and difficult for interpretation. Penalized regression which is a variable selection techniquehave been developed specifically to eliminate the problem of multicollinearity and also reduce the flaws inherent in the prediction accuracy of the ordinary least squares (OLS) regression technique. In this thesis, the focus is on the numerical study of these three penalized methods, namely least absolute shrinkage selection operator (LASSO), elastic net and the newly introduced correlation adjusted elastic net (CAEN). A diabetes dataset which was shown to possess the qualities of multicollinearitywas obtained from previous literature to compare these well-known techniques. 10-fold cross validation (CV) within glmnet package was used to entirely search for the optimal Ξ».The whole path of results (in Ξ» ) for the LASSO, Elastic Net and CAEN models were calculated using the path wise Cyclic Coordinate Descent (CCD) algorithms– in glmnet package in R,a computationally effective technique for finding out these convex optimization solutions. A regularized profile plot of the coefficient paths for the three methods, were also shown. Predictive accuracy was also assessed using the mean squared error (MSE) and the penalized regression models were able to produce feasible and efficient models capable of capturing the linearity in the data than the ordinary least squares model.It was observed that correlation adjusted elastic net generates a less complex model with a minimum mean square error (MSE).</p><p>&nbsp;</p> <br><p></p>

Project Overview

<p> GENERAL INTRODUCTION<br>1.1 Background of the study<br>In Multiple linear regression analysis, when a large number of predictor variables are introduced in a model to reduce possible modeling biases or there is serious concern of multicollinearity among the predictor variables, variable selection is an important issue. Regression is one of the most useful statistical methods for data analysis. However, there are many practical problems and computational issues, such as multicollinearity and high dimensionality that pose a challenge to regression analysis.To deal with these challenges, variable selection and shrinkage estimation are becoming important and popular. The traditional approach of automatic selection (such as forward selection, backward elimination and stepwise selection) and best subset selection are often computationally expensive and may not necessarily produce the best model. The method of penalized least squares (PLS), which is equivalent to penalized maximum likelihood, helps to deal with the issue of multicollinearity by putting constraints on the values of the estimated parameters. A wonderful consequence is that the entries of the variance-covariance matrices are reduced significantly.<br>Suppose multicollinearity is detected and the predictor variables that cause multicollinearity are identified. As discussed by (Ryan 2009) multicollinearity may not be a problem if the goal is to use the linear regression model for prediction. However multicollinearity is a problem if we use the linear regression model for description or control.Multicollinearity implies that predictor variables form some groups. Within each group, predictor variables are highly correlated. One solution to multicollinearity is to remove one or more of the predictor variables within the same group, but deciding which ones to eliminate<br>2<br>tends to be a difficulttechnical task. A major consequence of multicollinearity is that the parameter estimators and their variances tend to be large. Therefore the inference on the response is highly variable.<br>To deal with the challenges mentioned above, penalized regression approaches, also called shrinkage or regularization methods, have been developed. Although shrinking some of the regression coefficients toward zero may result in biased estimates, these regression coefficient estimates will have smaller variance. This can result in enhanced prediction accuracy because of a smaller mean squared error (Hastie et al., 2009). Regression coefficients are shrunk by imposing a penalty on their size, which is done by adding a penalty function to the least-squares model. Moreover, some of these procedures e.g. the Least Absolute Shrinkage Selection Operator (LASSO) enable variable selection such that only the important predictor variables stay in the model(Szymczak, et al. 2009).<br>1.2 Statement of the Problem<br>When perfect multicollinearity or near-perfect multicollinearity exists in a model, parameter estimates of the multiple linear regression models are not unique. In practice, perfect collinearity occurs rarely, what we often have is nearly-perfect collinearity. However quite often we face the issue of multicollinearity when there are strong linear relationships among two or more predictor variables. This happens when two or more predictor variables contribute more or less to a same characteristic of the subjects.In recent years, alternative methods have been introduced to deal with multicollinearity. In particular, methods of penalization become popular and useful. This is also known as simultaneous shrinkage and variable selection. The purpose of this study is to assess thestatistical performances of LASSO, Elastic Net and the newly introduced Correlation Adjusted Elastic-Net (CAEN) regression methods.<br>3<br>1.3 Research Motivation<br>The motivation for using penalized regression is that in the presence of nearly-perfect multicollinearity, the ordinary least squares estimates are not unique. However, with penalized least squares, these estimates become unique especially when appropriate tuning parameters are chosen. Similarly, without penalization, the ordinary least squares estimators are subject to high variability when multicollinearity exists. With penalization, the variances of the estimators are controlled. Most of the comparisons done by other researchers werebetween LASSO and elastic net.This research attempts to compare LASSO, elastic net and the newly introducedcorrelation adjusted elastic net.And also assess the advantages of using these methods over the classical least squarestechnique. This research attempts to accentuate some of these differences by using numerical results.<br>1.4 Aim and objectives of the study<br>The main aim of this research is to assess the performance and advantages of using LASSO, Elastic Net and CAEN methods over the classical regression methods. We hope to achieve this aim through the following objectives:<br>i. Application ofpenalized regression methods of eliminating multicollinearity.<br>ii. Identifying the variables that possess the characteristics of multicollinearity using the Variance Inflation Factor, and<br>iii. Identifying the number of variables selected by each of the penalized regression methodand the classical least squares method.<br>1.5 Significance of the study<br>The significance of this study is geared toward detecting variables with the qualities of multicollinearity in a regression model. Also to show why penalized methods are preferred,<br>4<br>over classical least squares technique when faced with the problem of multicollinearity. In achieving this, we explored and compared three penalized methods used in eliminating multicollinearity. This work is also aimed at providing assistance to researchers to ease their decision making as to which technique to be used when encountered with the problem of multicollinearity.<br>1.6 Scope and limitations of the study<br>This research is circumscribed by the use of Leave One-Out Cross Validation (LOOCV) criterion to determine the number of variables selected by each of these methods under study, also by the used of mean square error, to assess the predictive accuracy of the methods. The research also gives an overview of each of the procedures in an attempt to highlight the similarities as well as the differences existing among these three penalized methods with respect to variable selection.<br>1.7 Multicollinearity<br>Multicollinearity is another important issue in multiple regression. Collinearity means a linear relationship exists between two or more predictor variables, while multicollinearity refers to a situation in which two or more predictor variables are highly linearly correlated. The most extreme case is perfect collinearity (or multicollinearity) where the linear correlation between two predictor variables is either -1 or 1. This happens, for example, when two predictor variables and satisfy for two real numbers a and b.<br>In the presence of perfect multicollinearity, parameter estimates of the population multiple linear regression model are not unique. In practice, perfect collinearity occurs rarely. However quite<br>5<br>often we face the issue of multicollinearity when there are strong linear relationships among two or more predictor variables. This happens when two or more predictor variables contribute more or less to the same characteristic of the subjects. For a matrix A, let be its transpose and be its inverse matrix, if it exists. When predictor variables are highly linearly correlated, the most significant consequence is that entries of tend to be large, so the predictor variables contribute overlapping and redundant information. Other consequences of multicollinearity are that some predictor variables may not be statistically significant but the model may overall be significant, and that the usual interpretation of coefficient estimates fails in the presence of multicollinearity.Furthermore there is high variability of parameter estimators, because the estimated variance-covariance matrix has large diagonal entries. Several methods for detecting multicollinearity exist. These include checking for significant change in the parameter estimate when its corresponding predictor variable is added to or removed from the model, checking for insignificance of individual estimators while the model is overall significant, calculating the Variance Inflation Factor (VIF) and carrying out formal multicollinearity tests. There are several remedies for dealing with multicollinearity. One method is to select a collection of predictor variables that are minimally correlated with each other. This avoids over fitting the regression model and can be normally done with statistical software. However information from other predictor variables is often lost. Furthermore, there is no clear way of selecting a collection of predictor variables that forms the best subset.<br>Since omitting predictor variables may result in potential loss of information, another method is to include interaction terms into the model to account for high linear correlation among the predictor variables. There are several problems with this approach. One of such is that the form<br>6<br>of interaction is not unique and must be carefully determined. Another problem is that ,the model is much more complex and has too many terms which reduce the degrees of freedom of the inference of the response, and hence reduces the power for predicting and estimating the response.In recent years, alternative methods have been introduced to deal with multicollinearity. In particular, some methods of penalization become popular and useful. This is also known as simultaneous shrinkage and variable selection. <br></p>

Blazingprojects Mobile App

πŸ“š Over 50,000 Project Materials
πŸ“± 100% Offline: No internet needed
πŸ“ Over 98 Departments
πŸ” Software coding and Machine construction
πŸŽ“ Postgraduate/Undergraduate Research works
πŸ“₯ Instant Whatsapp/Email Delivery

Blazingprojects App

Related Research

Computer Science. 3 min read

Adaptive Cybersecurity Threat Detection Using Machine Learning Techniques...

What This Project Is About This project focuses on developing a system that can detect cybersecurity threats, such as hacking attempts or malware, more effectiv...

BP
Blazingprojects
Read more →
Computer Science. 4 min read

AI-Powered Real-Time Language Translation System...

What This Project Is About This project involves creating a system that can understand and translate spoken language from one language to another instantly. The...

BP
Blazingprojects
Read more →
Computer Science. 4 min read

Developing an AI-Powered Personal Health Assistant Chatbot...

What This Project Is About This project focuses on creating a chatbot that uses artificial intelligence (AI) to help people manage their health. The chatbot wil...

BP
Blazingprojects
Read more →
Computer Science. 4 min read

Deep Learning-Based Real-Time Cybersecurity Threat Detection System...

This project is about creating a system that can automatically detect cybersecurity threats, such as hacking attempts or malware attacks, in real-time using adv...

BP
Blazingprojects
Read more →
Computer Science. 4 min read

Development of an AI-Powered Personalized Learning Platform...

This project is about creating a smart online learning platform that adapts to each student's individual needs and ways of learning. Traditional education metho...

BP
Blazingprojects
Read more →
Computer Science. 3 min read

Predicting Disease Outbreaks Using Machine Learning and Data Analysis...

The project topic, &quot;Predicting Disease Outbreaks Using Machine Learning and Data Analysis,&quot; focuses on utilizing advanced computational techniques to ...

BP
Blazingprojects
Read more →
Computer Science. 4 min read

Implementation of a Real-Time Facial Recognition System using Deep Learning Techniqu...

The project on &quot;Implementation of a Real-Time Facial Recognition System using Deep Learning Techniques&quot; aims to develop a sophisticated system that ca...

BP
Blazingprojects
Read more →
Computer Science. 4 min read

Applying Machine Learning for Network Intrusion Detection...

The project topic &quot;Applying Machine Learning for Network Intrusion Detection&quot; focuses on utilizing machine learning algorithms to enhance the detectio...

BP
Blazingprojects
Read more →
Computer Science. 4 min read

Analyzing and Improving Machine Learning Model Performance Using Explainable AI Tech...

The project topic &quot;Analyzing and Improving Machine Learning Model Performance Using Explainable AI Techniques&quot; focuses on enhancing the effectiveness ...

BP
Blazingprojects
Read more →
WhatsApp Click here to chat with us