It's not a very involved project but I am new to statistics and having trouble with the linear regression test in excel with StatPlus to analyze the data from my survey. download the Statplus zip file from the website to a USB drive. When I run the linear regression, it keeps saying it can't analyze the data and to check my data. Regression Models with Multiple Parameters. Maybe I'm not selecting the right columns or rows? My independent variable is the "Code" column (which tells us how sustainable the person is) and my dependent variables are the "Store" rows. The formula for a multiple linear regression is: the predicted value of the dependent variable. STATPLUS MULTIPLE REGRESSION HOW TOĪgain, not really sure how to go about determining the correlation b/w sustainability practices and where people buy their food. the y-intercept (value of y when all other parameters are set to 0) the regression coefficient () of the first independent variable () (a.k.a. the effect that increasing the value of the independent variable has on the predicted y value. To circumvent this problem, people are often take the gradient descent approach to find the solution which does not suffer from ill-conditionally problem.I am attaching my survey results in case more info is needed to help answer my question. Finding a closed form solution requires inverting a matrix that in most of the time is ill-conditioned (it's determinant is very close to zero and therefore it does not give a robust inverse matrix). In the regression problem, the cost function that is often used is the mean square error (MSE). If we drop the ball from some initial point on the bowl it will move until it is settled at the bottom of the bowl.Īs the gradient descent is a general algorithm, one can apply it to any problem that requires optimizing a cost function. To make it easier to understand, imagine a bowl and a ball. At this time we are at the minimum point. We move step by step until there is no decrease in the cost function. What ends up happening in gradient descent is that we start from some random initial point and we try to move in the 'gradient direction' in order to decrease the cost function. So let's say we want to minimize a cost function. It is often used when the optimum point cannot be estimated in a closed form solution. If your problem is small that it can be efficiently solved by an off-the-shelf least squares solver, you should probably not do gradient descent.īasically the 'gradient descent' algorithm is a general optimization technique and can be used to optimize ANY cost function. If the number of data points is very hight, using a standard least squares solver might be too expensive, and (stochastic) gradient descent might give you a solution that is as good in terms of test-set error as a more precise solution, with a run-time that is orders of magnitude smaller (see this great chapter by Leon Bottou ) There are standard least square solvers which could be used instead of gradient descent (and often are). Usually the problem is formulated as a least square problem, which is slightly easier. In that case, you need to invert a matrix to use their simple approach, which can be hard or ill-conditioned. The example you gave is one-dimensional, which is not usually the case in machine learning, where you have multiple input features. My question is, is gradient descent the preferred method for fitting linear models? If so, why? Or did the professor simply use gradient descent in a simpler setting to introduce the class to the technique? Why is this seemingly more simple technique not used in machine learning? In some statistics classes, I have learnt that we can compute this line using statistic analysis, using the mean and standard deviation - this page covers this approach in detail. In some machine learning classes I took recently, I've covered gradient descent to find the best fit line for linear regression. Question: Why do we use gradient descent in linear regression?
0 Comments
Leave a Reply. |