Again, I forgot to include the maxima mailing list. I think I must have
sometimers...
-------- Original Message --------
Subject: Re: Universal read_data function
Date: Thu, 02 Jun 2011 08:57:11 -0700
From: Paul Bowyer <pbowyer at olynet.com>
To: Edwin Woollett <woollett at charter.net>
On 06/01/2011 11:05 AM, Edwin Woollett wrote:
> On May 31, Paul Bowyer wrote:
> ------------------------
> My reason for thinking I needed to use Windows text file standards in
> the data files was because I copy/pasted them from your email
> messages. If I were creating them from scratch on a Linux box, I'd opt
> for the default LF that is standard for Linux text files.
>
> When I try to write utility functions, I try to make them robust so
> they don't fail when things aren't absolutely perfect. It made sense
> to me to handle the case where Windows text file standards were used
> since you were working on a Windows machine. I wasn't trying to be a
> nuisance by continually marking up your code and I hope I didn't upset
> you, but please forgive me if I did.
>
> Paul
> -------------------------------------
> Hi Paul,
>
> I never get upset, and can only be flattered by your interest in
> my faltering efforts at Maxima code.
>
> The current version of read_data (which has changed: see
> below) cares not a whit about end of line chars, so that should
> never be the issue here. The important thing is that the
> file to be read does not contain spurious extra end of
> line chars, and that is why I advise looking at the file with
> a utility such as notepad2, which clearly shows up the
> locations and types of end of line chars (shift+control+9)
> (which is a toggle).
>
> (By the way, when you write data to a stream opened
> with openw, using printf as is the manual examples,
> the end of line chars are LF (unix).)
>
> The NEW version allows the 'data-sep-string' to be "text",
> (which is a hack), in which case all lines are read
> in as strings without splitting, as is appropriate for
> a purely text file which contains spaces and punctuation
> marks.
>
> A related change is if the four arg version is
> used, by supplying a list of line numbers,
> those lines 2 and 4 are read into separate
> sublists as a whole as one string for the
> whole line, doing no splitting.
>
> ---------------------------------------
> The present complete syntax and code are then:
> -------------------------------------------------------------
> /*********** read_data ****************************/
> /* if only a file name is given, then the
> data separators can be an arbitrary mixture
> of spaces and commas, but the commas are
> converted to spaces, so strings with spaces
> will choke the code if you only provide the
> filename, or you provide (filename," ").
>
>
>
> syntax: read_data(filename,data-sep-string,mult,line-list)
>
> with ";" for example in second slot,
> and false in third slot.
> (mult is set to true by default.)
>
> The data separator string can be anything
> recognised by split, and the boolean parameter
> mult is used by split.
>
>
> In addition, the data-sep-string can be "text",
> in which case *all* lines of the stream are read
> in as individual strings.
>
> Thus the syntax read_data(filename,"text") does
> no line splitting.
>
> The most complicated four arg syntax has the
> form
> read_data (filename, " ", true, [2,4] )
>
> for example, where for split line data items,
> (ie., not lines 2 and 4) space is being used
> as the data separator, but lines 2 and 4 should
> be read into separate sublists as a whole as
> one string for the whole line, doing no splitting
> for lines 2 and 4.
> */
>
>
> /* new 5-29 */
>
> read_data([%v]) :=
> block ([%s,%r,%l,%filename,%dsep,%mult:true,
> %mix:false, %whole:[],%ln],
>
> %filename : part (%v,1),
>
> if not stringp (%filename)
> then ( disp (" file name must be a Maxima string "),
> return (false)),
>
> if not file_search (%filename) then
> (disp (" file not found "),return (false)),
>
> if length (%v) = 1 then %mix : true
> else if length(%v) = 2 then %dsep : part (%v,2)
> else if length (%v) = 3
> then (%dsep : part (%v,2), %mult : part (%v,3))
> else
> (%dsep : part (%v,2), %mult : part (%v,3),%whole : part(%v,4)),
>
>
>
> %s : openr (%filename),
> %r : [],
> %ln : 0,
>
> while (%l : readline(%s)) # false do
> ( %ln : %ln + 1,
> if %dsep = "text" then
> %r : cons (%l,%r)
> else if not lfreeof (%whole,%ln) then
> %r : cons (%l,%r)
> else if %mix then
> %r : cons (map(parse_string, split(ssubst (" ",",",%l))), %r)
> else %r : cons (map(parse_string, split(%l,%dsep,%mult)), %r)),
>
> close (%s),
> reverse (%r))$
> ------------------------------------------------
>
> Ted
>
>
Hi Ted:
I tried your latest code shown above on "ndata2.dat" which I re-copied
from your email (using Thunderbird) of "05/29/2011 12:53 PM" into kwrite
and filed without modifications. Because of the way printfile listed the
data, there was a blank line between the two data lines, and because of
the way I copy/pasted, there was only a single LF char at the end of the
file.
(%i3) printfile ("ndata2.dat")$
2 , 4.8, -3/4, "xyz", -2.8e-9
3 22.2 7/8 "abc" 4.4e10
By the way, the CRs that I had in my copies of the data files that I
used for my previous testing had to be manually entered using Okteta,
because they weren't present in the copy/paste data for ndata2.dat. I
must have gotten my facts turned around when I stated that the CRs
showed up as a result of the copy/paste operation.
Anyway, using your code shown above and running:
printfile("/home/pfb/ndata2.dat")$
trace( parse_string );
read_data("/home/pfb/ndata2.dat");
untrace( parse_string );
results in this output:
2 , 4.8, -3/4, "xyz", -2.8e-9
3 22.2 7/8 "abc" 4.4e10
(%o36) [parse_string]
1" Enter "parse_string["2"]
1" Exit "parse_string2
1" Enter "parse_string["4.8"]
1" Exit "parse_string4.8
1" Enter "parse_string["-3/4"]
1" Exit "parse_string(-3)/4
1" Enter "parse_string[""xyz""]
1" Exit "parse_string"xyz"
1" Enter "parse_string["-2.8e-9"]
1" Exit "parse_string-2.8*10^-9
1" Enter "parse_string[]
stdin:1:incorrect syntax: Premature termination of input at $.
(%o38) [parse_string]
The inclusion of one, or two lines of code in your function
gives some protection against erroneous entries such as those
that occur by copy/paste or simply by hand-typed entry.
If I were writing this function, I'd do it this way:
------------------------------------------------------------------------------
read_data([%v]) :=
block ([%s,%r,%l,%filename,%dsep,%mult:true,
%mix:false, %whole:[],%ln],
%filename : part (%v,1),
if not stringp (%filename)
then ( disp (" file name must be a Maxima string "),
return (false)),
if not file_search (%filename) then
(disp (" file not found "),return (false)),
if length (%v) = 1 then %mix : true
else if length(%v) = 2 then %dsep : part (%v,2)
else if length (%v) = 3
then (%dsep : part (%v,2), %mult : part (%v,3))
else
(%dsep : part (%v,2), %mult : part (%v,3),%whole : part(%v,4)),
%s : openr (%filename),
%r : [],
%ln : 0,
while (%l : readline(%s)) # false do
( %ln : %ln + 1,
/*Added the following and the enclosing parens.
The inclusion of these eliminates problems with:
blank lines and CRs in the file. */
/*Add this line if you're concerned about CRs in line ends.
%l : strim(" ", ssubst(" ", ascii(13), %l ) ),*/
if %l # "" then /*Check for blank line*/
(
if %dsep = "text" then
%r : cons (%l,%r)
else if not lfreeof (%whole,%ln) then
%r : cons (%l,%r)
else if %mix then
%r : cons (map(parse_string, split(ssubst (" ",",",%l))), %r)
else %r : cons (map(parse_string, split(%l,%dsep,%mult)), %r)
)
),
close (%s),
reverse (%r));
------------------------------------------------------------------------------
and again running:
printfile("/home/pfb/ndata2.dat")$
trace( parse_string );
/*read_data("/home/pfb/ndata1.dat"," ",true,[4]);*/
read_data("/home/pfb/ndata2.dat");
untrace( parse_string );
results in this output:
2 , 4.8, -3/4, "xyz", -2.8e-9
3 22.2 7/8 "abc" 4.4e10
(%o45) [parse_string]
1" Enter "parse_string["2"]
1" Exit "parse_string2
1" Enter "parse_string["4.8"]
1" Exit "parse_string4.8
1" Enter "parse_string["-3/4"]
1" Exit "parse_string(-3)/4
1" Enter "parse_string[""xyz""]
1" Exit "parse_string"xyz"
1" Enter "parse_string["-2.8e-9"]
1" Exit "parse_string-2.8*10^-9
1" Enter "parse_string["3"]
1" Exit "parse_string3
1" Enter "parse_string["22.2"]
1" Exit "parse_string22.2
1" Enter "parse_string["7/8"]
1" Exit "parse_string7/8
1" Enter "parse_string[""abc""]
1" Exit "parse_string"abc"
1" Enter "parse_string["4.4e10"]
1" Exit "parse_string4.4*10^10
(%o46) [[2,4.8,-3/4,"xyz",-2.8*10^-9],[3,22.2,7/8,"abc",4.4*10^10]]
(%o47) [parse_string]
I didn't do any testing on other data files at this time.
Paul