Nächste: Functions and Variables for descriptive statistics, Vorige: Introduction to descriptive, Nach oben: Package descriptive [Inhalt][Index]
Builds a sample from a table of absolute frequencies. The input table can be a matrix or a list of lists, all of them of equal size. The number of columns or the length of the lists must be greater than 1. The last element of each row or list is interpreted as the absolute frequency. The output is always a sample in matrix form.
Examples:
Univariate frequency table.
(%i1) load ("descriptive")$ (%i2) sam1: build_sample([[6,1], [j,2], [2,1]]); [ 6 ] [ ] [ j ] (%o2) [ ] [ j ] [ ] [ 2 ] (%i3) mean(sam1); 2 j + 8 (%o3) [-------] 4 (%i4) barsplot(sam1) $
Multivariate frequency table.
(%i1) load ("descriptive")$ (%i2) sam2: build_sample([[6,3,1], [5,6,2], [u,2,1],[6,8,2]]) ; [ 6 3 ] [ ] [ 5 6 ] [ ] [ 5 6 ] (%o2) [ ] [ u 2 ] [ ] [ 6 8 ] [ ] [ 6 8 ] (%i3) cov(sam2); [ 2 2 ] [ u + 158 (u + 28) 2 u + 174 11 (u + 28) ] [ -------- - --------- --------- - ----------- ] (%o3) [ 6 36 6 12 ] [ ] [ 2 u + 174 11 (u + 28) 21 ] [ --------- - ----------- -- ] [ 6 12 4 ] (%i4) barsplot(sam2, grouping=stacked) $
The argument of continuous_freq
must be a list of numbers.
Divides the range in intervals and counts how many values
are inside them. The second argument is optional and either
equals the number of classes we want, 10
by default, or
equals a list containing the class limits and the number of
classes we want, or a list containing only the limits.
Argument list must be a list of (2 or 3) real numbers.
If sample values are all equal, this function returns only
one class of amplitude 2.
Examples:
Optional argument indicates the number of classes we want.
The first list in the output contains the interval limits, and
the second the corresponding counts: there are 16 digits inside
the interval [0, 1.8]
, 24 digits in (1.8, 3.6]
, and so on.
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$ (%i3) continuous_freq (s1, 5); (%o3) [[0, 1.8, 3.6, 5.4, 7.2, 9.0], [16, 24, 18, 17, 25]]
Optional argument indicates we want 7 classes with limits
-2
and 12
:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$ (%i3) continuous_freq (s1, [-2,12,7]); (%o3) [[- 2, 0, 2, 4, 6, 8, 10, 12], [8, 20, 22, 17, 20, 13, 0]]
Optional argument indicates we want the default number of classes with limits
-2
and 12
:
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$ (%i3) continuous_freq (s1, [-2,12]);
3 4 11 18 32 39 46 53 (%o3) [[- 2, - -, -, --, --, 5, --, --, --, --, 12], 5 5 5 5 5 5 5 5 [0, 8, 20, 12, 18, 9, 8, 25, 0, 0]]
Counts absolute frequencies in discrete samples, both numeric and categorical. Its unique argument is a list,
(%i1) load ("descriptive")$ (%i2) s1 : read_list (file_search ("pidigits.data"))$ (%i3) discrete_freq (s1);
(%o3) [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [8, 8, 12, 12, 10, 8, 9, 8, 12, 13]]
The first list gives the sample values and the second their absolute
frequencies. Commands ? col
and ? transpose
should help you to
understand the last input.
This is a sort of variant of the Maxima submatrix
function. The first
argument is the data matrix, the second is a predicate function and optional
additional arguments are the numbers of the columns to be taken. Its behaviour
is better understood with examples.
These are multivariate records in which the wind speed in the first
meteorological station were greater than 18. See that in the lambda expression
the i-th component is refered to as v[i]
.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$ (%i3) subsample (s2, lambda([v], v[1] > 18));
[ 19.38 15.37 15.12 23.09 25.25 ] [ ] [ 18.29 18.66 19.08 26.08 27.63 ] (%o3) [ ] [ 20.25 21.46 19.95 27.71 23.38 ] [ ] [ 18.79 18.96 14.46 26.38 21.84 ]
In the following example, we request only the first, second and fifth components of those records with wind speeds greater or equal than 16 in station number 1 and less than 25 knots in station number 4. The sample contains only data from stations 1, 2 and 5. In this case, the predicate function is defined as an ordinary Maxima function.
(%i1) load ("descriptive")$ (%i2) s2 : read_matrix (file_search ("wind.data"))$ (%i3) g(x):= x[1] >= 16 and x[4] < 25$ (%i4) subsample (s2, g, 1, 2, 5);
[ 19.38 15.37 25.25 ] [ ] [ 17.33 14.67 19.58 ] (%o4) [ ] [ 16.92 13.21 21.21 ] [ ] [ 17.25 18.46 23.87 ]
Here is an example with the categorical variables of biomed.data
.
We want the records corresponding to those patients in group B
who are older than 38 years.
(%i1) load ("descriptive")$ (%i2) s3 : read_matrix (file_search ("biomed.data"))$ (%i3) h(u):= u[1] = B and u[2] > 38 $ (%i4) subsample (s3, h);
[ B 39 28.0 102.3 17.1 146 ] [ ] [ B 39 21.0 92.4 10.3 197 ] [ ] [ B 39 23.0 111.5 10.0 133 ] [ ] [ B 39 26.0 92.6 12.3 196 ] (%o4) [ ] [ B 39 25.0 98.7 10.0 174 ] [ ] [ B 39 21.0 93.2 5.9 181 ] [ ] [ B 39 18.0 95.0 11.3 66 ] [ ] [ B 39 39.0 88.5 7.6 168 ]
Probably, the statistical analysis will involve only the blood measures,
(%i1) load ("descriptive")$ (%i2) s3 : read_matrix (file_search ("biomed.data"))$ (%i3) subsample (s3, lambda([v], v[1] = B and v[2] > 38), 3, 4, 5, 6);
[ 28.0 102.3 17.1 146 ] [ ] [ 21.0 92.4 10.3 197 ] [ ] [ 23.0 111.5 10.0 133 ] [ ] [ 26.0 92.6 12.3 196 ] (%o3) [ ] [ 25.0 98.7 10.0 174 ] [ ] [ 21.0 93.2 5.9 181 ] [ ] [ 18.0 95.0 11.3 66 ] [ ] [ 39.0 88.5 7.6 168 ]
This is the multivariate mean of s3
,
(%i1) load ("descriptive")$ (%i2) s3 : read_matrix (file_search ("biomed.data"))$ (%i3) mean (s3);
65 B + 35 A 317 6 NA + 8144.999999999999 (%o3) [-----------, ---, 87.178, ------------------------, 100 10 100 3 NA + 19587 18.123, ------------] 100
Here, the first component is meaningless, since A
and B
are
categorical, the second component is the mean age of individuals in rational
form, and the fourth and last values exhibit some strange behaviour. This is
because symbol NA
is used here to indicate non available data,
and the two means are nonsense. A possible solution would be to take out from
the matrix those rows with NA
symbols, although this deserves some
loss of information.
(%i1) load ("descriptive")$ (%i2) s3 : read_matrix (file_search ("biomed.data"))$ (%i3) g(v):= v[4] # NA and v[6] # NA $ (%i4) mean (subsample (s3, g, 3, 4, 5, 6));
(%o4) [79.4923076923077, 86.2032967032967, 16.93186813186813, 2514 ----] 13
Nächste: Functions and Variables for descriptive statistics, Vorige: Introduction to descriptive, Nach oben: Package descriptive [Inhalt][Index]