Newbie question: linear regression



Thanks for the replies on the linear regression problem. I've followed
the steps as described and obtained the correct answer for linear
regression.

I tried to take this a step further. I wanted to find values for
coefficients (a and b) of the linear combination of two independent
variables (x and w) for the expression:

y = a * x + b * w

And, I seem to get results, though my tiny little mathematical mind
isn't up to the task of checking whether they're correct. Could anyone
comment on whether I'm doing this right. Note: I don't actually need the
results (though I think that they're interesting), but am more
interested in being able to use maxima to solve problems like this one.

First, I define the function to calculate the sum of the squared
residuals for a set of data (x, w, and y), and values for a and b.

(C1) sqdiff( data_x, data_w, data_y, n, a, b ) :=
(C1)    SUM( (data_y[i] - (a * data_x[i] + b * data_w[i]))^2, i, 0, n -
1 );

I declare sum linear (I presume I can do this at any time), and set 
display2d:false (I would have been able to cut and paste maxima's result
for the last step had I done this earlier).

I now expand the function.

(C4) sqdiff( data_x, data_w, data_y, n, a, b );

(D4) 'SUM((data_y[i]-a*data_x[i]-b*data_w[i])^2,i,0,n-1)
(C5) expand( % );

(D5) 'SUM(data_y[i]^2,i,0,n-1)-2*a*'SUM(data_x[i]*data_y[i],i,0,n-1)
                              -2*b*'SUM(data_w[i]*data_y[i],i,0,n-1)
                              +a^2*'SUM(data_x[i]^2,i,0,n-1)
                              +2*a*b*'SUM(data_w[i]*data_x[i],i,0,n-1)
                              +b^2*'SUM(data_w[i]^2,i,0,n-1)

I then create the two differentials of this.

(C6) [diff(%,a), diff( %, b ) ];

(D6) [-2*'SUM(data_x[i]*data_y[i],i,0,n-1)+2*a*'SUM(data_x[i]^2,i,0,n-1)
                                           +2*b
                                           
*'SUM(data_w[i]*data_x[i],i,0,
                                                  n-1),
      -2*'SUM(data_w[i]*data_y[i],i,0,n-1)
      
+2*a*'SUM(data_w[i]*data_x[i],i,0,n-1)+2*b*'SUM(data_w[i]^2,i,0,n-1)]

And, finally, I solve these equations to produce estimates for a and b
that minimise the squares of the residuals for the data.

 (C7) algsys( d6, [a,b] );

(D7) [[a =
(('SUM(data_w[i]^2,i,0,n-1))*'SUM(data_x[i]*data_y[i],i,0,n-1)
         -('SUM(data_w[i]*data_x[i],i,0,n-1))
          *'SUM(data_w[i]*data_y[i],i,0,n-1))
         /(('SUM(data_w[i]^2,i,0,n-1))*'SUM(data_x[i]^2,i,0,n-1)
          -('SUM(data_w[i]*data_x[i],i,0,n-1))^2),
       b = -(('SUM(data_w[i]*data_x[i],i,0,n-1))
         *'SUM(data_x[i]*data_y[i],i,0,n-1)
         -('SUM(data_x[i]^2,i,0,n-1))*'SUM(data_w[i]*data_y[i],i,0,n-1))
         /(('SUM(data_w[i]^2,i,0,n-1))*'SUM(data_x[i]^2,i,0,n-1)
          -('SUM(data_w[i]*data_x[i],i,0,n-1))^2)]]
(C8)

This is for two coefficients (and two independent variables. Is this
correct? (or correct +/- typos)? Could I not repeat this process (from
the start) for any number of independent variables?

Thanks in anticipation,

Ross-c