Newbie question: linear regression



If there is interest in fitting curves to data in Maxima, is it possible to
integrate with Maxima tools that already exist rather than implement them
from scratch in Maxima?

It seems to me that R (www.r-project.org) already has terrific tools for
fitting curves to data, and in fact fitting functions of multiple inputs to
data.  In fact, I don't think R has a peer for sheer number of different
modeling technologies (quantity doesn't trump quality, but most tools are
of high quality too!) .  I say "modeling" because people might not want to
simply fit curves with 1 input, they might also want to fit a surface with
2 inputs or a function with many inputs.  Why redo this work, if it's
possible to link to R?  R is GPL, so there's no worry about license
conflicts.

I say this with some trepidation, since I don't want to direct enthusiasm
away from Maxima.  However, a great aspect of open source is that it allows
developers to do something once, do it right, and then let other packages
build on this.  (Granted, Gnome and KDE indicate that such centralization
doesn't always happen....)  For instance, there is essentially one tool for
burning CD's in Linux--cdrecord--but there's tremendous variety of, and
competition among, graphical front-ends to cdrecord.  cdrecord, with a
command-line interface, offers a single place to perfect the underlying
methodology without worrying about user interfaces.  Then others develop
graphical user interfaces to it without having to worry about the
underlying method.  Similarly, let's keep curve-fitting ("modeling") in one
place and continue to improve it, to the benefit of all packages that link
to it, and then Maxima developers can focus on what Maxima does uniquely
well.  My vote for a curve-fitting package is R, but whatever it is, let's
not re-invent the wheel.

For anyone wanting specifics on curve-fitting in R, base R (what you get
when you first install) provides logistic regression and the function
"smooth.spline", an all-around good performer for one input.  Regression is
also provided.  It also provides package nnet, which is great if you know
what you're doing.  More tools are found in contributed packages; to see
the listing of official R packages, from the web site above go to your
nearest CRAN mirror, then click on "package sources".  Package mgcv
provides highly automated multiple-input model-fitting (where model
components contribute additively), and locfit is worth looking into.  This
just scratches the surface.

I would caution that some of these tools require some hand-holding and a
bit of subject-matter knowledge.  However, some are amenable to
model-selection logic that will yield good performance most of the time,
though with more care one can do a bit better.  In fact, I would be willing
to contribute R functions that wrap R modeling functions with some
automated model-selection methods, if anyone is interested in tackling the
linkage to Maxima.

Jim Garrett
Becton Dickinson Diagnostic Systems


----- Forwarded by Jim Garrett/BALT/BDX on 12/03/2003 12:19 PM -----
                                                                                                                        
                    Barton Willis                                                                                       
                    <willisb@unk.edu>         To:     maxima@www.ma.utexas.edu                                          
                    Sent by:                  cc:                                                                       
                    maxima-admin@math.        Subject:     Re: [Maxima]  Newbie question: linear regression             
                    utexas.edu                                                                                          
                                                                                                                        
                                                                                                                        
                    12/03/2003 11:28                                                                                    
                    AM                                                                                                  
                                                                                                                        
                                                                                                                        




Furuya Gosei showed how to get Maxima to solve the linear regression
equations.
Let me add two comments:

1. Getting Maxima to do this calculation requires either luck or
adroitness; the first time I tried, I discovered a bug in simpsum:

(C1) sum(a+x[i],i,1,n),simpsum;
(D1)                                  a n
(C2) sum(1+sqrt(i),i,1,n), simpsum;
(D2)                                   n
(C3)

Maybe this bug has already been reported and fixed?  (The culprit maybe in
the
function sumsum.)

2. As Maxima presents them, the formulae for the slope and the intercept
are
poorly suited for numerical evaluation.  (They involve differences of
numbers
that might be close together.) To fix this, you'll need to do some
algebraic
tricks that would be far easier to do by hand than by Maxima.  In the end,

it isn't worth the trouble; there are lots of  software tools (a
garden-variety
spreadsheet or gnuplot are two that I have used) that will do linear
regression accurately without the hassle.

Currently, Maxima lacks the tools to build a good linear regression
function;
to do a decent job, we'd at least need a good numerical LU and SVD
functions.
It seems that there is interest in adding tools for fitting data to
curves;
for now, Maxima isn't up to the task.

Barton

_______________________________________________
Maxima mailing list
Maxima@www.math.utexas.edu
http://www.math.utexas.edu/mailman/listinfo/maxima



**********************************************************************
This message is intended only for the designated recipient(s).  It may
contain confidential or proprietary information and may be subject to
the attorney-client privilege or other confidentiality protections.
If you are not a designated recipient, you may not review, use, copy
or distribute this message.  If you receive this in error, please
notify the sender by reply e-mail and delete this message.  Thank you. 

***********************************************************************