Texinfo / parse-info stuff



El dom, 05-05-2013 a las 18:14 +0100, Rupert Swarbrick escribi?:
> Mario Rodriguez <biomates at telefonica.net> writes:
> > As far as I know, and if I am not mistaken, both latin1 and utf-8 use
> > the same character encoding for non special characters. I think that it
> > doesn't matter if files are saved in any of these encodings as long as
> > you don't use special characters. In fact, time ago we had some files in
> > info/es saved in latin1 while others were in utf-8. Now, all of them are
> > saved in utf-8.
> 
> I think doing that is incorrect. The build system (even before I started
> hacking on it!) carefully assumes that the contents of info/es are
> encoded as latin1, then it has an explicit transcoding step that copies
> them to info/es.utf8 and converts to UTF-8 as it goes.
> 
> As far as I could tell, from a pretty careful reading of how things
> worked, UTF-8 encoded files in info/es are just wrong. So I converted
> the UTF-8 special characters in info/{es,de,pt,pt_BR} to latin1. To find
> the list of files to edit, I basically called the chardet program on
> each texinfo file and, if it didn't reckon the result was ASCII, I
> opened it up in Emacs (which impressively always guessed the encoding
> correctly) and resaved it as latin1.
> 
> In case you're thinking what I'm thinking... I agree that storing latin1
> copies of everything is more than a little crazy, but (as Ray pointed
> out to me) we need to do this for documentation to work if you have a
> non-unicode lisp (eg gcl) and a latin1 terminal. Since utf-8 can
> represent a strict superset of latin1, it makes more sense to have the
> originals written in latin1 and then transcoded automatically to utf-8
> than the other way around.
> 
> > The error reported above is due to a typo when writing the word
> > 'par?metros', it should be 'par@'ametros'. For the same reason, the
> > German word 'Teilausdr?cke' should be fixed as 'Teilausdr@"ucke'.
> > Sometimes we forget that we are writing a texinfo document.
> >
> > I am not an expert in character encoding, but I suspect that using only
> > non special characters is safer, at least for european languages.
> 
> Yep, I agree with that, but there's no problem as long as we use the
> encoding we claim we're using :-) (note that the info/es/maxima.texi
> file has "@documentencoding ISO-8859-1" as its second non-comment
> line...)
> 
> Most files didn't have any non-7-bit characters, which is why the
> documentation hasn't been obviously massively broken.
> 
> Rupert

Thanks for the clarifications.

--
Mario