Re: proposal to include functions to read data from files



Many thanks to Barton Willis for posting the io.lisp code
for reading & writing matrices and so on. I went back through
the mailing list discussion circa Sept 2003 about reading
data from files and identified the following issues. I've
posted the list below along with my assessment of them.

In general, I am in favor of (1) allowing a variety of input 
types, and (2) having one object per file. Specific proposals
follow.

I've recently returned from a lengthy sojourn in the world 
of data mining, which may explain some of my strange opinions.

Regards,
Robert Dodier

-------------------------- >8 --------------------------
Notes on matrix/array/list input/output
Robert Dodier

Topics which came up in discussions on Maxima mailing list 
circa Sept 2003:

 1 reproduce commercial Macsyma functions?

 2 accomodate commas and other separators?

 3 infer structure from file layout or impose structure post hoc?

    3.1 specify size explicitly?

    3.2 accomodate ragged arrays?
        (i.e., different number of elements per line)

 4 partial read?

 5 nonnumeric data?


Topics not yet discussed:

 6 accomodate keys and field names? (i.e., row and column identifiers)

    6.1 accomodate input to hash table?


Proposed answers:

 1 No. The commercial Macsyma functions for reading numerical data are 
   limited -- must be integers or floats, must specify size,
   can be separated by space or tab only, can load to list, matrix,
   or array only. I don't see a good reason to emulate this.

 2 Yes, accomodate commas; the one other separator that is common
   (aside from spaces and tabs) is the vertical bar (pipe symbol).

 3 (i) Have a function to read everything as a flat list -- this allows
   caller to later impose other structure. (ii) Also have functions to 
   infer structure for common special cases -- matrix, array, 
   nested list.

    3.1 No, don't allow specification of object size to read
        functions; let user resize after the fact if they want.

    3.2 Yes, allow input file to contain ragged arrays.

 4 No, don't allow partial read; read functions attempt to load
   entire file into an object. Let user split off parts of loaded
   object after the fact. 

 5 Yes, allow structures such as rats and complex, and also nonnumeric
   data such as strings and atoms. (Anything the Lisp read function can
   handle is OK; enforcing rules about data types is unnecessarily
   limiting.)

 6 Yes (eventually), allow row and column identifiers

    6.1 Yes (eventually), allow input to hash table indexed by
        row or column identifier.
-------------------------- 8< --------------------------

__________________________________
Do you Yahoo!?
Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes
http://hotjobs.sweepstakes.yahoo.com/signingbonus