@c Language: Brazilian Portuguese, Encoding: iso-8859-1 @c /descriptive.texi/1.8/Mon Jul 24 10:59:45 2006// @menu * address@hidden@~ao ao pacote descriptive:: * address@hidden@~oes para address@hidden@~ao da dados:: * address@hidden@~oes para estatistica descritiva:: * address@hidden@~oes para specific multivariate descriptive statistics:: * address@hidden@~oes para statistical graphs:: @end menu @node address@hidden@~ao ao pacote descriptive, address@hidden@~oes para address@hidden@~ao da dados, descriptive, descriptive @section address@hidden@~ao ao pacote descriptive Package @code{descriptive} contains a set of functions for making descriptive statistical computations and graphing. Together with the source code there are three data sets in your Maxima tree: @code{pidigits.data}, @code{wind.data} and @code{biomed.data}. They can be also downloaded from the web site @code{www.biomates.net}. Any statistics manual can be used as a reference to the functions in package @code{descriptive}. For comments, bugs or suggestions, please contact me at @var{'mario AT edu DOT xunta DOT es'}. Here is a simple example on how the descriptive functions in @code{descriptive} do they work, depending on the nature of their arguments, lists or matrices, @c ===beg=== @c load (descriptive)$ @c /* univariate sample */ mean ([a, b, c]); @c matrix ([a, b], [c, d], [e, f]); @c /* multivariate sample */ mean (%); @c ===end=== @example (%i1) load (descriptive)$ (%i2) /* univariate sample */ mean ([a, b, c]); c + b + a (%o2) --------- 3 (%i3) matrix ([a, b], [c, d], [e, f]); [ a b ] [ ] (%o3) [ c d ] [ ] [ e f ] (%i4) /* multivariate sample */ mean (%); e + c + a f + d + b (%o4) [---------, ---------] 3 3 @end example Note that in multivariate samples the mean is calculated for each column. In case of several samples with possible different sizes, the Maxima function @code{map} can be used to get the desired results for each sample, @c ===beg=== @c load (descriptive)$ @c map (mean, [[a, b, c], [d, e]]); @c ===end=== @example (%i1) load (descriptive)$ (%i2) map (mean, [[a, b, c], [d, e]]); c + b + a e + d (%o2) [---------, -----] 3 2 @end example In this case, two samples of sizes 3 and 2 were stored into a list. Univariate samples must be stored in lists like @c ===beg=== @c s1 : [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]; @c ===end=== @example (%i1) s1 : [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]; (%o1) [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5] @end example and multivariate samples in matrices as in @c ===beg=== @c s2 : matrix ([13.17, 9.29], [14.71, 16.88], [18.50, 16.88], @c [10.58, 6.63], [13.33, 13.25], [13.21, 8.12]); @c ===end=== @example (%i1) s2 : matrix ([13.17, 9.29], [14.71, 16.88], [18.50, 16.88], [10.58, 6.63], [13.33, 13.25], [13.21, 8.12]); [ 13.17 9.29 ] [ ] [ 14.71 16.88 ] [ ] [ 18.5 16.88 ] (%o1) [ ] [ 10.58 6.63 ] [ ] [ 13.33 13.25 ] [ ] [ 13.21 8.12 ] @end example In this case, the number of columns equals the random variable dimension and the number of rows is the sample size. Data can be introduced by hand, but big samples are usually stored in plain text files. For example, file @code{pidigits.data} contains the first 100 digits of number @code{%pi}: @example 3 1 4 1 5 9 2 6 5 3 ... @end example In order to load these digits in Maxima, @c ===beg=== @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c length (s1); @c ===end=== @example (%i1) load (numericalio)$ (%i2) s1 : read_list (file_search ("pidigits.data"))$ (%i3) length (s1); (%o3) 100 @end example On the other hand, file @code{wind.data} contains daily average wind speeds at 5 meteorological stations in the Republic of Ireland (This is part of a data set taken at 12 meteorological stations. The original file is freely downloadable from the StatLib Data Repository and its analysis is discused in Haslett, J., Raftery, A. E. (1989) @var{Space-time Modelling with Long-memory Dependence: Assessing Ireland's Wind Power Resource, with Discussion}. Applied Statistics 38, 1-50). This loads the data: @c ===beg=== @c load (numericalio)$ @c s2 : read_matrix (file_search ("wind.data"))$ @c length (s2); @c s2 [%]; /* last record */ @c ===end=== @example (%i1) load (numericalio)$ (%i2) s2 : read_matrix (file_search ("wind.data"))$ (%i3) length (s2); (%o3) 100 (%i4) s2 [%]; /* last record */ (%o4) [3.58, 6.0, 4.58, 7.62, 11.25] @end example Some samples contain non numeric data. As an example, file @code{biomed.data} (which is part of another bigger one downloaded from the StatLib Data Repository) contains four blood measures taken from two groups of patients, @code{A} and @code{B}, of different ages, @c ===beg=== @c load (numericalio)$ @c s3 : read_matrix (file_search ("biomed.data"))$ @c length (s3); @c s3 [1]; /* first record */ @c ===end=== @example (%i1) load (numericalio)$ (%i2) s3 : read_matrix (file_search ("biomed.data"))$ (%i3) length (s3); (%o3) 100 (%i4) s3 [1]; /* first record */ (%o4) [A, 30, 167.0, 89.0, 25.6, 364] @end example The first individual belongs to group @code{A}, is 30 years old and his/her blood measures were 167.0, 89.0, 25.6 and 364. One must take care when working with categorical data. In the next example, symbol @code{a} is asigned a value in some previous moment and then a sample with categorical value @code{a} is taken, @c ===beg=== @c a : 1$ @c matrix ([a, 3], [b, 5]); @c ===end=== @example (%i1) a : 1$ (%i2) matrix ([a, 3], [b, 5]); [ 1 3 ] (%o2) [ ] [ b 5 ] @end example @node address@hidden@~oes para address@hidden@~ao da dados, address@hidden@~oes para estatistica descritiva, address@hidden@~ao ao pacote descriptive, descriptive @section address@hidden@~oes para address@hidden@~ao da dados @deffn {Function} continuous_freq (@var{list}) @deffnx {Function} continuous_freq (@var{list}, @var{m}) The argument of @code{continuous_freq} must be a list of numbers, which will be then grouped in intervals and counted how many of them belong to each group. Optionally, function @code{continuous_freq} admits a second argument indicating the number of classes, 10 is default, @c ===beg=== @c load (numericalio)$ @c load (descriptive)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c continuous_freq (s1, 5); @c ===end=== @example (%i1) load (numericalio)$ (%i2) load (descriptive)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) continuous_freq (s1, 5); (%o4) [[0, 1.8, 3.6, 5.4, 7.2, 9.0], [16, 24, 18, 17, 25]] @end example The first list contains the interval limits and the second the corresponding counts: there are 16 digits inside the interval @code{[0, 1.8]}, that is 0's and 1's, 24 digits in @code{(1.8, 3.6]}, that is 2's and 3's, and so on. @end deffn @deffn {Function} discrete_freq (@var{list}) Counts absolute frequencies in discrete samples, both numeric and categorical. Its unique argument is a list, @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data")); @c discrete_freq (s1); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data")); (%o3) [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9, 3, 2, 3, 8, 4, 6, 2, 6, 4, 3, 3, 8, 3, 2, 7, 9, 5, 0, 2, 8, 8, 4, 1, 9, 7, 1, 6, 9, 3, 9, 9, 3, 7, 5, 1, 0, 5, 8, 2, 0, 9, 7, 4, 9, 4, 4, 5, 9, 2, 3, 0, 7, 8, 1, 6, 4, 0, 6, 2, 8, 6, 2, 0, 8, 9, 9, 8, 6, 2, 8, 0, 3, 4, 8, 2, 5, 3, 4, 2, 1, 1, 7, 0, 6, 7] (%i4) discrete_freq (s1); (%o4) [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [8, 8, 12, 12, 10, 8, 9, 8, 12, 13]] @end example The first list gives the sample values and the second their absolute frequencies. Commands @code{? col} and @code{? transpose} should help you to understand the last input. @end deffn @deffn {Function} subsample (@var{data_matrix}, @var{logical_expression}) @deffnx {Function} subsample (@var{data_matrix}, @var{logical_expression}, @var{col_num}, @var{col_num}, ...) This is a sort of variation of the Maxima @code{submatrix} function. The first argument is the name of the data matrix, the second is a quoted logical expression and optional additional arguments are the numbers of the columns to be taken. Its behaviour is better understood with examples, @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s2 : read_matrix (file_search ("wind.data"))$ @c subsample (s2, '(%c[1] > 18)); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s2 : read_matrix (file_search ("wind.data"))$ (%i4) subsample (s2, '(%c[1] > 18)); [ 19.38 15.37 15.12 23.09 25.25 ] [ ] [ 18.29 18.66 19.08 26.08 27.63 ] (%o4) [ ] [ 20.25 21.46 19.95 27.71 23.38 ] [ ] [ 18.79 18.96 14.46 26.38 21.84 ] @end example These are multivariate records in which the wind speeds in the first meteorological station were greater than 18. See that in the quoted logical expression the @var{i}-th component is refered to as @code{%c[i]}. Symbol @code{%c[i]} is used inside function @code{subsample}, therefore when used as a categorical variable, Maxima gets confused. In the following example, we request only the first, second and fifth components of those records with wind speeds greater or equal than 16 in station number 1 and lesser than 25 knots in station number 4, @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s2 : read_matrix (file_search ("wind.data"))$ @c subsample (s2, '(%c[1] >= 16 and %c[4] < 25), 1, 2, 5); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s2 : read_matrix (file_search ("wind.data"))$ (%i4) subsample (s2, '(%c[1] >= 16 and %c[4] < 25), 1, 2, 5); [ 19.38 15.37 25.25 ] [ ] [ 17.33 14.67 19.58 ] (%o4) [ ] [ 16.92 13.21 21.21 ] [ ] [ 17.25 18.46 23.87 ] @end example Here is an example with the categorical variables of @code{biomed.data}. We want the records corresponding to those patients in group @code{B} who are older than 38 years, @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s3 : read_matrix (file_search ("biomed.data"))$ @c subsample (s3, '(%c[1] = B and %c[2] > 38)); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s3 : read_matrix (file_search ("biomed.data"))$ (%i4) subsample (s3, '(%c[1] = B and %c[2] > 38)); [ B 39 28.0 102.3 17.1 146 ] [ ] [ B 39 21.0 92.4 10.3 197 ] [ ] [ B 39 23.0 111.5 10.0 133 ] [ ] [ B 39 26.0 92.6 12.3 196 ] (%o4) [ ] [ B 39 25.0 98.7 10.0 174 ] [ ] [ B 39 21.0 93.2 5.9 181 ] [ ] [ B 39 18.0 95.0 11.3 66 ] [ ] [ B 39 39.0 88.5 7.6 168 ] @end example Probably, the statistical analysis will involve only the blood measures, @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s3 : read_matrix (file_search ("biomed.data"))$ @c subsample (s3, '(%c[1] = B and %c[2] > 38), 3, 4, 5, 6); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s3 : read_matrix (file_search ("biomed.data"))$ (%i4) subsample (s3, '(%c[1] = B and %c[2] > 38), 3, 4, 5, 6); [ 28.0 102.3 17.1 146 ] [ ] [ 21.0 92.4 10.3 197 ] [ ] [ 23.0 111.5 10.0 133 ] [ ] [ 26.0 92.6 12.3 196 ] (%o4) [ ] [ 25.0 98.7 10.0 174 ] [ ] [ 21.0 93.2 5.9 181 ] [ ] [ 18.0 95.0 11.3 66 ] [ ] [ 39.0 88.5 7.6 168 ] @end example This is the multivariate mean of @code{s3}, @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s3 : read_matrix (file_search ("biomed.data"))$ @c mean (s3); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s3 : read_matrix (file_search ("biomed.data"))$ (%i4) mean (s3); 65 B + 35 A 317 6 NA + 8145.0 (%o4) [-----------, ---, 87.178, -------------, 18.123, 100 10 100 3 NA + 19587 ------------] 100 @end example Here, the first component is meaningless, since @code{A} and @code{B} are categorical, the second component is the mean age of individuals in rational form, and the fourth and last values exhibit some strange behaviour. This is because symbol @code{NA} is used here to indicate @var{non available} data, and the two means are of course nonsense. A possible solution would be to take out from the matrix those rows with @code{NA} symbols, although this deserves some loss of information, @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s3 : read_matrix (file_search ("biomed.data"))$ @c mean (subsample (s3, '(%c[4] # NA and %c[6] # NA), 3, 4, 5, 6)); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s3 : read_matrix (file_search ("biomed.data"))$ (%i4) mean (subsample (s3, '(%c[4] # NA and %c[6] # NA), 3, 4, 5, 6)); (%o4) [79.4923076923077, 86.2032967032967, 16.93186813186813, 2514 ----] 13 @end example @end deffn @node address@hidden@~oes para estatistica descritiva, address@hidden@~oes para specific multivariate descriptive statistics, address@hidden@~oes para address@hidden@~ao da dados, descriptive @section address@hidden@~oes para estatistica descritiva @deffn {Function} mean (@var{list}) @deffnx {Function} mean (@var{matrix}) This is the sample mean, defined as @ifhtml @example n ==== _ 1 \ x = - > x n / i ==== i = 1 @end example @end ifhtml @ifinfo @example n ==== _ 1 \ x = - > x n / i ==== i = 1 @end example @end ifinfo @tex $${\bar{x}={1\over{n}}{\sum_{i=1}^{n}{x_{i}}}}$$ @end tex Example: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c mean (s1); @c %, numer; @c s2 : read_matrix (file_search ("wind.data"))$ @c mean (s2); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) mean (s1); 471 (%o4) --- 100 (%i5) %, numer; (%o5) 4.71 (%i6) s2 : read_matrix (file_search ("wind.data"))$ (%i7) mean (s2); (%o7) [9.9485, 10.1607, 10.8685, 15.7166, 14.8441] @end example @end deffn @deffn {Function} var (@var{list}) @deffnx {Function} var (@var{matrix}) This is the sample variance, defined as @ifhtml @example n ==== 2 1 \ _ 2 s = - > (x - x) n / i ==== i = 1 @end example @end ifhtml @ifinfo @example n ==== 2 1 \ _ 2 s = - > (x - x) n / i ==== i = 1 @end example @end ifinfo @tex $${{1}\over{n}}{\sum_{i=1}^{n}{(x_{i}-\bar{x})^2}}$$ @end tex Example: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c var (s1), numer; @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) var (s1), numer; (%o4) 8.425899999999999 @end example See also function @code{var1}. @end deffn @deffn {Function} var1 (@var{list}) @deffnx {Function} var1 (@var{matrix}) This is the sample variance, defined as @ifhtml @example n ==== 1 \ _ 2 --- > (x - x) n-1 / i ==== i = 1 @end example @end ifhtml @ifinfo @example n ==== 1 \ _ 2 --- > (x - x) n-1 / i ==== i = 1 @end example @end ifinfo @tex $${{1\over{n-1}}{\sum_{i=1}^{n}{(x_{i}-\bar{x})^2}}}$$ @end tex Example: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c var1 (s1), numer; @c s2 : read_matrix (file_search ("wind.data"))$ @c var1 (s2); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) var1 (s1), numer; (%o4) 8.5110101010101 (%i5) s2 : read_matrix (file_search ("wind.data"))$ (%i6) var1 (s2); (%o6) [17.39586540404041, 15.13912778787879, 15.63204924242424, 32.50152569696971, 24.66977392929294] @end example See also function @code{var}. @end deffn @deffn {Function} std (@var{list}) @deffnx {Function} std (@var{matrix}) This is the the square root of function @code{var}, the variance with denominator @math{n}. Example: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c std (s1), numer; @c s2 : read_matrix (file_search ("wind.data"))$ @c std (s2); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) std (s1), numer; (%o4) 2.902740084816414 (%i5) s2 : read_matrix (file_search ("wind.data"))$ (%i6) std (s2); (%o6) [4.149928523480858, 3.871399812729241, 3.933920277534866, 5.672434260526957, 4.941970881136392] @end example See also functions @code{var} and @code{std1}. @end deffn @deffn {Function} std1 (@var{list}) @deffnx {Function} std1 (@var{matrix}) This is the the square root of function @code{var1}, the variance with denominator @math{n-1}. Example: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c std1 (s1), numer; @c s2 : read_matrix (file_search ("wind.data"))$ @c std1 (s2); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) std1 (s1), numer; (%o4) 2.917363553109228 (%i5) s2 : read_matrix (file_search ("wind.data"))$ (%i6) std1 (s2); (%o6) [4.17083509672109, 3.89090320978032, 3.953738641137555, 5.701010936401517, 4.966867617451963] @end example See also functions @code{var1} and @code{std}. @end deffn @deffn {Function} noncentral_moment (@var{list}, @var{k}) @deffnx {Function} noncentral_moment (@var{matrix}, @var{k}) The non central moment of order @math{k}, defined as @ifhtml @example n ==== 1 \ k - > x n / i ==== i = 1 @end example @end ifhtml @ifinfo @example n ==== 1 \ k - > x n / i ==== i = 1 @end example @end ifinfo @tex $${{1\over{n}}{\sum_{i=1}^{n}{x_{i}^k}}}$$ @end tex Example: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c noncentral_moment (s1, 1), numer; /* the mean */ @c s2 : read_matrix (file_search ("wind.data"))$ @c noncentral_moment (s2, 5); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) noncentral_moment (s1, 1), numer; /* the mean */ (%o4) 4.71 (%i6) s2 : read_matrix (file_search ("wind.data"))$ (%i7) noncentral_moment (s2, 5); (%o7) [319793.8724761506, 320532.1923892463, 391249.5621381556, 2502278.205988911, 1691881.797742255] @end example See also function @code{central_moment}. @end deffn @deffn {Function} central_moment (@var{list}, @var{k}) @deffnx {Function} central_moment (@var{matrix}, @var{k}) The central moment of order @math{k}, defined as @ifhtml @example n ==== 1 \ _ k - > (x - x) n / i ==== i = 1 @end example @end ifhtml @ifinfo @example n ==== 1 \ _ k - > (x - x) n / i ==== i = 1 @end example @end ifinfo @tex $${{1\over{n}}{\sum_{i=1}^{n}{(x_{i}-\bar{x})^k}}}$$ @end tex Example: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c central_moment (s1, 2), numer; /* the variance */ @c s2 : read_matrix (file_search ("wind.data"))$ @c central_moment (s2, 3); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) central_moment (s1, 2), numer; /* the variance */ (%o4) 8.425899999999999 (%i6) s2 : read_matrix (file_search ("wind.data"))$ (%i7) central_moment (s2, 3); (%o7) [11.29584771375004, 16.97988248298583, 5.626661952750102, 37.5986572057918, 25.85981904394192] @end example See also functions @code{central_moment} and @code{mean}. @end deffn @deffn {Function} cv (@var{list}) @deffnx {Function} cv (@var{matrix}) The variation coefficient is the quotient between the sample standard deviation (@code{std}) and the @code{mean}, @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c cv (s1), numer; @c s2 : read_matrix (file_search ("wind.data"))$ @c cv (s2); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) cv (s1), numer; (%o4) .6193977819764815 (%i5) s2 : read_matrix (file_search ("wind.data"))$ (%i6) cv (s2); (%o6) [.4192426091090204, .3829365309260502, 0.363779605385983, .3627381836021478, .3346021393989506] @end example See also functions @code{std} and @code{mean}. @end deffn @deffn {Function} mini (@var{list}) @deffnx {Function} mini (@var{matrix}) This is the minimum value of the sample @var{list}, @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c mini (s1); @c s2 : read_matrix (file_search ("wind.data"))$ @c mini (s2); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) mini (s1); (%o4) 0 (%i5) s2 : read_matrix (file_search ("wind.data"))$ (%i6) mini (s2); (%o6) [0.58, 0.5, 2.67, 5.25, 5.17] @end example See also function @code{maxi}. @end deffn @deffn {Function} maxi (@var{list}) @deffnx {Function} maxi (@var{matrix}) This is the maximum value of the sample @var{list}, @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c maxi (s1); @c s2 : read_matrix (file_search ("wind.data"))$ @c maxi (s2); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) maxi (s1); (%o4) 9 (%i5) s2 : read_matrix (file_search ("wind.data"))$ (%i6) maxi (s2); (%o6) [20.25, 21.46, 20.04, 29.63, 27.63] @end example See also function @code{mini}. @end deffn @deffn {Function} range (@var{list}) @deffnx {Function} range (@var{matrix}) The range is the difference between the extreme values. Example: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c range (s1); @c s2 : read_matrix (file_search ("wind.data"))$ @c range (s2); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) range (s1); (%o4) 9 (%i5) s2 : read_matrix (file_search ("wind.data"))$ (%i6) range (s2); (%o6) [19.67, 20.96, 17.37, 24.38, 22.46] @end example @end deffn @deffn {Function} quantile (@var{list}, @var{p}) @deffnx {Function} quantile (@var{matrix}, @var{p}) This is the @address@hidden, with @var{p} a number in @math{[0, 1]}, of the sample @var{list}. Although there are several address@hidden@~oes para the sample quantile (Hyndman, R. J., Fan, Y. (1996) @var{Sample quantiles in statistical packages}. American Statistician, 50, 361-365), the one based on linear interpolation is implemented in package @code{descriptive}. Example: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c /* 1st and 3rd quartiles */ [quantile (s1, 1/4), quantile (s1, 3/4)], numer; @c s2 : read_matrix (file_search ("wind.data"))$ @c quantile (s2, 1/4); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) /* 1st and 3rd quartiles */ [quantile (s1, 1/4), quantile (s1, 3/4)], numer; (%o4) [2.0, 7.25] (%i5) s2 : read_matrix (file_search ("wind.data"))$ (%i6) quantile (s2, 1/4); (%o6) [7.2575, 7.477500000000001, 7.82, 11.28, 11.48] @end example @end deffn @deffn {Function} median (@var{list}) @deffnx {Function} median (@var{matrix}) Once the sample is ordered, if the sample size is odd the median is the central value, otherwise it is the mean of the two central values. Example: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c median (s1); @c s2 : read_matrix (file_search ("wind.data"))$ @c median (s2); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) median (s1); 9 (%o4) - 2 (%i5) s2 : read_matrix (file_search ("wind.data"))$ (%i6) median (s2); (%o6) [10.06, 9.855, 10.73, 15.48, 14.105] @end example The median is the 1/address@hidden See also function @code{quantile}. @end deffn @deffn {Function} qrange (@var{list}) @deffnx {Function} qrange (@var{matrix}) The interquartilic range is the difference between the third and first quartiles, @code{quantile(list,3/4) - quantile(list,1/4)}, @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c qrange (s1); @c s2 : read_matrix (file_search ("wind.data"))$ @c qrange (s2); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) qrange (s1); 21 (%o4) -- 4 (%i5) s2 : read_matrix (file_search ("wind.data"))$ (%i6) qrange (s2); (%o6) [5.385, 5.572499999999998, 6.0225, 8.729999999999999, 6.650000000000002] @end example See also function @code{quantile}. @end deffn @deffn {Function} mean_deviation (@var{list}) @deffnx {Function} mean_deviation (@var{matrix}) The mean deviation, defined as @ifhtml @example n ==== 1 \ _ - > |x - x| n / i ==== i = 1 @end example @end ifhtml @ifinfo @example n ==== 1 \ _ - > |x - x| n / i ==== i = 1 @end example @end ifinfo @tex $${{1\over{n}}{\sum_{i=1}^{n}{|x_{i}-\bar{x}|}}}$$ @end tex Example: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c mean_deviation (s1); @c s2 : read_matrix (file_search ("wind.data"))$ @c mean_deviation (s2); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) mean_deviation (s1); 51 (%o4) -- 20 (%i5) s2 : read_matrix (file_search ("wind.data"))$ (%i6) mean_deviation (s2); (%o6) [3.287959999999999, 3.075342, 3.23907, 4.715664000000001, 4.028546000000002] @end example See also function @code{mean}. @end deffn @deffn {Function} median_deviation (@var{list}) @deffnx {Function} median_deviation (@var{matrix}) The median deviation, defined as @ifhtml @example n ==== 1 \ - > |x - med| n / i ==== i = 1 @end example @end ifhtml @ifinfo @example n ==== 1 \ - > |x - med| n / i ==== i = 1 @end example @end ifinfo @tex $${{1\over{n}}{\sum_{i=1}^{n}{|x_{i}-med|}}}$$ @end tex where @code{med} is the median of @var{list}. Example: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c median_deviation (s1); @c s2 : read_matrix (file_search ("wind.data"))$ @c median_deviation (s2); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) median_deviation (s1); 5 (%o4) - 2 (%i5) s2 : read_matrix (file_search ("wind.data"))$ (%i6) median_deviation (s2); (%o6) [2.75, 2.755, 3.08, 4.315, 3.31] @end example See also function @code{mean}. @end deffn @deffn {Function} harmonic_mean (@var{list}) @deffnx {Function} harmonic_mean (@var{matrix}) The harmonic mean, defined as @ifhtml @example n -------- n ==== \ 1 > -- / x ==== i i = 1 @end example @end ifhtml @ifinfo @example n -------- n ==== \ 1 > -- / x ==== i i = 1 @end example @end ifinfo @tex $${{n}\over{\sum_{i=1}^{n}{{{1}\over{x_{i}}}}}}$$ @end tex Example: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$ @c harmonic_mean (y), numer; @c s2 : read_matrix (file_search ("wind.data"))$ @c harmonic_mean (s2); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$ (%i4) harmonic_mean (y), numer; (%o4) 3.901858027632205 (%i5) s2 : read_matrix (file_search ("wind.data"))$ (%i6) harmonic_mean (s2); (%o6) [6.948015590052786, 7.391967752360356, 9.055658197151745, 13.44199028193692, 13.01439145898509] @end example See also functions @code{mean} and @code{geometric_mean}. @end deffn @deffn {Function} geometric_mean (@var{list}) @deffnx {Function} geometric_mean (@var{matrix}) The geometric mean, defined as @ifhtml @example / n \ 1/n | /===\ | | ! ! | | ! ! x | | ! ! i| | i = 1 | \ / @end example @end ifhtml @ifinfo @example / n \ 1/n | /===\ | | ! ! | | ! ! x | | ! ! i| | i = 1 | \ / @end example @end ifinfo @tex $$\left(\prod_{i=1}^{n}{x_{i}}\right)^{{{1}\over{n}}}$$ @end tex Example: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$ @c geometric_mean (y), numer; @c s2 : read_matrix (file_search ("wind.data"))$ @c geometric_mean (s2); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) y : [5, 7, 2, 5, 9, 5, 6, 4, 9, 2, 4, 2, 5]$ (%i4) geometric_mean (y), numer; (%o4) 4.454845412337012 (%i5) s2 : read_matrix (file_search ("wind.data"))$ (%i6) geometric_mean (s2); (%o6) [8.82476274347979, 9.22652604739361, 10.0442675714889, 14.61274126349021, 13.96184163444275] @end example See also functions @code{mean} and @code{harmonic_mean}. @end deffn @deffn {Function} kurtosis (@var{list}) @deffnx {Function} kurtosis (@var{matrix}) The kurtosis coefficient, defined as @ifhtml @example n ==== 1 \ _ 4 ---- > (x - x) - 3 4 / i n s ==== i = 1 @end example @end ifhtml @ifinfo @example n ==== 1 \ _ 4 ---- > (x - x) - 3 4 / i n s ==== i = 1 @end example @end ifinfo @tex $${{1\over{n s^4}}{\sum_{i=1}^{n}{(x_{i}-\bar{x})^4}}-3}$$ @end tex Example: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c kurtosis (s1), numer; @c s2 : read_matrix (file_search ("wind.data"))$ @c kurtosis (s2); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) kurtosis (s1), numer; (%o4) - 1.273247946514421 (%i5) s2 : read_matrix (file_search ("wind.data"))$ (%i6) kurtosis (s2); (%o6) [- .2715445622195385, 0.119998784429451, - .4275233490482866, - .6405361979019522, - .4952382132352935] @end example See also functions @code{mean}, @code{var} and @code{skewness}. @end deffn @deffn {Function} skewness (@var{list}) @deffnx {Function} skewness (@var{matrix}) The skewness coefficient, defined as @ifhtml @example n ==== 1 \ _ 3 ---- > (x - x) 3 / i n s ==== i = 1 @end example @end ifhtml @ifinfo @example n ==== 1 \ _ 3 ---- > (x - x) 3 / i n s ==== i = 1 @end example @end ifinfo @tex $${{1\over{n s^3}}{\sum_{i=1}^{n}{(x_{i}-\bar{x})^3}}}$$ @end tex Example: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c skewness (s1), numer; @c s2 : read_matrix (file_search ("wind.data"))$ @c skewness (s2); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) skewness (s1), numer; (%o4) .009196180476450306 (%i5) s2 : read_matrix (file_search ("wind.data"))$ (%i6) skewness (s2); (%o6) [.1580509020000979, .2926379232061854, .09242174416107717, .2059984348148687, .2142520248890832] @end example See also functions @code{mean}, @code{var} and @code{kurtosis}. @end deffn @deffn {Function} pearson_skewness (@var{list}) @deffnx {Function} pearson_skewness (@var{matrix}) Pearson's skewness coefficient, defined as @ifhtml @example _ 3 (x - med) ----------- s @end example @end ifhtml @ifinfo @example _ 3 (x - med) ----------- s @end example @end ifinfo @tex $${{3\,\left(\bar{x}-med\right)}\over{s}}$$ @end tex where @var{med} is the median of @var{list}. Example: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c pearson_skewness (s1), numer; @c s2 : read_matrix (file_search ("wind.data"))$ @c pearson_skewness (s2); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) pearson_skewness (s1), numer; (%o4) .2159484029093895 (%i5) s2 : read_matrix (file_search ("wind.data"))$ (%i6) pearson_skewness (s2); (%o6) [- .08019976629211892, .2357036272952649, .1050904062491204, .1245042340592368, .4464181795804519] @end example See also functions @code{mean}, @code{var} and @code{median}. @end deffn @deffn {Function} quartile_skewness (@var{list}) @deffnx {Function} quartile_skewness (@var{matrix}) The quartile skewness coefficient, defined as @ifhtml @example c - 2 c + c 3/4 1/2 1/4 -------------------- c - c 3/4 1/4 @end example @end ifhtml @ifinfo @example c - 2 c + c 3/4 1/2 1/4 -------------------- c - c 3/4 1/4 @end example @end ifinfo @tex $${{c_{{{3}\over{4}}}-2\,c_{{{1}\over{2}}}+c_{{{1}\over{4}}}}\over{c _{{{3}\over{4}}}-c_{{{1}\over{4}}}}}$$ @end tex where @math{c_p} is the @var{p}-quantile of sample @var{list}. Example: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c quartile_skewness (s1), numer; @c s2 : read_matrix (file_search ("wind.data"))$ @c quartile_skewness (s2); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) quartile_skewness (s1), numer; (%o4) .04761904761904762 (%i5) s2 : read_matrix (file_search ("wind.data"))$ (%i6) quartile_skewness (s2); (%o6) [- 0.0408542246982353, .1467025572005382, 0.0336239103362392, .03780068728522298, 0.210526315789474] @end example See also function @code{quantile}. @end deffn @node address@hidden@~oes para specific multivariate descriptive statistics, address@hidden@~oes para statistical graphs, address@hidden@~oes para estatistica descritiva, descriptive @section address@hidden@~oes para specific multivariate descriptive statistics @deffn {Function} cov (@var{matrix}) The covariance matrix of the multivariate sample, defined as @ifhtml @example n ==== 1 \ _ _ S = - > (X - X) (X - X)' n / j j ==== j = 1 @end example @end ifhtml @ifinfo @example n ==== 1 \ _ _ S = - > (X - X) (X - X)' n / j j ==== j = 1 @end example @end ifinfo @tex $${S={1\over{n}}{\sum_{j=1}^{n}{\left(X_{j}-\bar{X}\right)\,\left(X_{j}-\bar{X}\right)'}}}$$ @end tex where @math{X_j} is the @math{j}-th row of the sample matrix. Example: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s2 : read_matrix (file_search ("wind.data"))$ @c fpprintprec : 7$ /* change precision for pretty output */ @c cov (s2); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s2 : read_matrix (file_search ("wind.data"))$ (%i4) fpprintprec : 7$ /* change precision for pretty output */ (%i5) cov (s2); [ 17.22191 13.61811 14.37217 19.39624 15.42162 ] [ ] [ 13.61811 14.98774 13.30448 15.15834 14.9711 ] [ ] (%o5) [ 14.37217 13.30448 15.47573 17.32544 16.18171 ] [ ] [ 19.39624 15.15834 17.32544 32.17651 20.44685 ] [ ] [ 15.42162 14.9711 16.18171 20.44685 24.42308 ] @end example See also function @code{cov1}. @end deffn @deffn {Function} cov1 (@var{matrix}) The covariance matrix of the multivariate sample, defined as @ifhtml @example n ==== 1 \ _ _ S = --- > (X - X) (X - X)' 1 n-1 / j j ==== j = 1 @end example @end ifhtml @ifinfo @example n ==== 1 \ _ _ S = --- > (X - X) (X - X)' 1 n-1 / j j ==== j = 1 @end example @end ifinfo @tex $${{1\over{n-1}}{\sum_{j=1}^{n}{\left(X_{j}-\bar{X}\right)\,\left(X_{j}-\bar{X}\right)'}}}$$ @end tex where @math{X_j} is the @math{j}-th row of the sample matrix. Example: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s2 : read_matrix (file_search ("wind.data"))$ @c fpprintprec : 7$ /* change precision for pretty output */ @c cov1 (s2); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s2 : read_matrix (file_search ("wind.data"))$ (%i4) fpprintprec : 7$ /* change precision for pretty output */ (%i5) cov1 (s2); [ 17.39587 13.75567 14.51734 19.59216 15.5774 ] [ ] [ 13.75567 15.13913 13.43887 15.31145 15.12232 ] [ ] (%o5) [ 14.51734 13.43887 15.63205 17.50044 16.34516 ] [ ] [ 19.59216 15.31145 17.50044 32.50153 20.65338 ] [ ] [ 15.5774 15.12232 16.34516 20.65338 24.66977 ] @end example See also function @code{cov}. @end deffn @deffn {Function} global_variances (@var{matrix}) @deffnx {Function} global_variances (@var{matrix}, @var{logical_value}) Function @code{global_variances} returns a list of global variance measures: @itemize @bullet @item @var{total variance}: @code{trace(S_1)}, @item @var{mean variance}: @code{trace(S_1)/p}, @item @var{generalized variance}: @code{determinant(S_1)}, @item @var{generalized standard deviation}: @code{sqrt(determinant(S_1))}, @item @var{efective variance} @code{determinant(S_1)^(1/p)}, (defined in: address@hidden, D. (2002) @var{An@'alisis de datos multivariantes}; McGraw-Hill, Madrid.) @item @var{efective standard deviation}: @code{determinant(S_1)^(1/(2*p))}. @end itemize where @var{p} is the dimension of the multivariate random variable and @math{S_1} the covariance matrix returned by @code{cov1}. Example: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s2 : read_matrix (file_search ("wind.data"))$ @c global_variances (s2); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s2 : read_matrix (file_search ("wind.data"))$ (%i4) global_variances (s2); (%o4) [105.338342060606, 21.06766841212119, 12874.34690469686, 113.4651792608502, 6.636590811800794, 2.576158149609762] @end example Function @code{global_variances} has an optional logical argument: @code{global_variances(x,true)} tells Maxima that @code{x} is the data matrix, making the same as @code{global_variances(x)}. On the other hand, @code{global_variances(x,false)} means that @code{x} is not the data matrix, but the covariance matrix, avoiding its recalculation, @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s2 : read_matrix (file_search ("wind.data"))$ @c s : cov1 (s2)$ @c global_variances (s, false); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s2 : read_matrix (file_search ("wind.data"))$ (%i4) s : cov1 (s2)$ (%i5) global_variances (s, false); (%o5) [105.338342060606, 21.06766841212119, 12874.34690469686, 113.4651792608502, 6.636590811800794, 2.576158149609762] @end example See also @code{cov} and @code{cov1}. @end deffn @deffn {Function} cor (@var{matrix}) @deffnx {Function} cor (@var{matrix}, @var{logical_value}) The correlation matrix of the multivariate sample. Example: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c fpprintprec:7$ @c s2 : read_matrix (file_search ("wind.data"))$ @c cor (s2); @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) fpprintprec:7$ (%i4) s2 : read_matrix (file_search ("wind.data"))$ (%i5) cor (s2); [ 1.0 .8476339 .8803515 .8239624 .7519506 ] [ ] [ .8476339 1.0 .8735834 .6902622 0.782502 ] [ ] (%o5) [ .8803515 .8735834 1.0 .7764065 .8323358 ] [ ] [ .8239624 .6902622 .7764065 1.0 .7293848 ] [ ] [ .7519506 0.782502 .8323358 .7293848 1.0 ] @end example Function @code{cor} has an optional logical argument: @code{cor(x,true)} tells Maxima that @code{x} is the data matrix, making the same as @code{cor(x)}. On the other hand, @code{cor(x,false)} means that @code{x} is not the data matrix, but the covariance matrix, avoiding its recalculation, @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c fpprintprec:7$ @c s2 : read_matrix (file_search ("wind.data"))$ @c s : cov1 (s2)$ @c cor (s, false); /* this is faster */ @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) fpprintprec:7$ (%i4) s2 : read_matrix (file_search ("wind.data"))$ (%i5) s : cov1 (s2)$ (%i6) cor (s, false); /* this is faster */ [ 1.0 .8476339 .8803515 .8239624 .7519506 ] [ ] [ .8476339 1.0 .8735834 .6902622 0.782502 ] [ ] (%o6) [ .8803515 .8735834 1.0 .7764065 .8323358 ] [ ] [ .8239624 .6902622 .7764065 1.0 .7293848 ] [ ] [ .7519506 0.782502 .8323358 .7293848 1.0 ] @end example See also @code{cov} and @code{cov1}. @end deffn @deffn {Function} list_correlations (@var{matrix}) @deffnx {Function} list_correlations (@var{matrix}, @var{logical_value}) Function @code{list_correlations} returns a list of correlation measures: @itemize @bullet @item @var{precision matrix}: the inverse of the covariance matrix @math{S_1}, @ifhtml @example -1 ij S = (s ) 1 i,j = 1,2,...,p @end example @end ifhtml @ifinfo @example -1 ij S = (s ) 1 i,j = 1,2,...,p @end example @end ifinfo @tex $${S_{1}^{-1}}={\left(s^{ij}\right)_{i,j=1,2,\ldots, p}}$$ @end tex @item @var{multiple correlation vector}: @math{(R_1^2, R_2^2, ..., R_p^2)}, with @ifhtml @example 2 1 R = 1 - ------- i ii s s ii @end example @end ifhtml @ifinfo @example 2 1 R = 1 - ------- i ii s s ii @end example @end ifinfo @tex $${R_{i}^{2}}={1-{{1}\over{s^{ii}s_{ii}}}}$$ @end tex being an indicator of the goodness of fit of the linear multivariate regression model on @math{X_i} when the rest of variables are used as regressors. @item @var{partial correlation matrix}: with element @math{(i, j)} being @ifhtml @example ij s r = - ------------ ij.rest / ii jj\ 1/2 |s s | \ / @end example @end ifhtml @ifinfo @example ij s r = - ------------ ij.rest / ii jj\ 1/2 |s s | \ / @end example @end ifinfo @tex $${r_{ij.rest}}={-{{s^{ij}}\over \sqrt{s^{ii}s^{jj}}}}$$ @end tex @end itemize Example: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s2 : read_matrix (file_search ("wind.data"))$ @c z : list_correlations (s2)$ @c fpprintprec : 5$ /* for pretty output */ @c z[1]; /* precision matrix */ @c z[2]; /* multiple correlation vector */ @c z[3]; /* partial correlation matrix */ @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s2 : read_matrix (file_search ("wind.data"))$ (%i4) z : list_correlations (s2)$ (%i5) fpprintprec : 5$ /* for pretty output */ (%i6) z[1]; /* precision matrix */ [ .38486 - .13856 - .15626 - .10239 .031179 ] [ ] [ - .13856 .34107 - .15233 .038447 - .052842 ] [ ] (%o6) [ - .15626 - .15233 .47296 - .024816 - .10054 ] [ ] [ - .10239 .038447 - .024816 .10937 - .034033 ] [ ] [ .031179 - .052842 - .10054 - .034033 .14834 ] (%i7) z[2]; /* multiple correlation vector */ (%o7) [.85063, .80634, .86474, .71867, .72675] (%i8) z[3]; /* partial correlation matrix */ [ - 1.0 .38244 .36627 .49908 - .13049 ] [ ] [ .38244 - 1.0 .37927 - .19907 .23492 ] [ ] (%o8) [ .36627 .37927 - 1.0 .10911 .37956 ] [ ] [ .49908 - .19907 .10911 - 1.0 .26719 ] [ ] [ - .13049 .23492 .37956 .26719 - 1.0 ] @end example Function @code{list_correlations} also has an optional logical argument: @code{list_correlations(x,true)} tells Maxima that @code{x} is the data matrix, making the same as @code{list_correlations(x)}. On the other hand, @code{list_correlations(x,false)} means that @code{x} is not the data matrix, but the covariance matrix, avoiding its recalculation. See also @code{cov} and @code{cov1}. @end deffn @node address@hidden@~oes para statistical graphs, , address@hidden@~oes para specific multivariate descriptive statistics, descriptive @section address@hidden@~oes para statistical graphs @deffn {Function} dataplot (@var{list}) @deffnx {Function} dataplot (@var{list}, @var{option_1}, @var{option_2}, ...) @deffnx {Function} dataplot (@var{matrix}) @deffnx {Function} dataplot (@var{matrix}, @var{option_1}, @var{option_2}, ...) Funtion @code{dataplot} permits direct visualization of sample data, both univariate (@var{list}) and multivariate (@var{matrix}). Giving values to the following @var{options} some aspects of the plot can be controlled: @itemize @bullet @item @code{'outputdev}, default @code{"x"}, indicates the output device; correct values are @code{"x"}, @code{"eps"} and @code{"png"}, for the screen, postscript and png format files, respectively. @item @code{'maintitle}, default @code{""}, is the main title between double quotes. @item @code{'axisnames}, default @code{["x","y","z"]}, is a list with the names of axis @code{x}, @code{y} and @code{z}. @item @code{'joined}, default @code{false}, a logical value to select points in 2D to be joined or isolated. @item @code{'picturescales}, default @code{[1.0, 1.0]}, scaling factors for the size of the plot. @item @code{'threedim}, default @code{true}, tells Maxima whether to plot a three column matrix with a 3D diagram or a multivariate scatterplot. See examples bellow. @item @code{'axisrot}, default @code{[60, 30]}, changes the point of view when @code{'threedim} is set to @code{true} and data are stored in a three column matrix. The first number is the rotation angle of the @var{x}-axis, and the second number is the rotation angle of the @var{z}-axis, both measured in degrees. @item @code{'nclasses}, default @code{10}, is the number of classes for the histograms in the diagonal of multivariate scatterplots. @item @code{'pointstyle}, default @code{1}, is an integer to indicate how to display sample points. @end itemize For example, with the following input a simple plot of the first twenty digits of @code{%pi} is requested and the output stored in an eps file. @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c dataplot (makelist (s1[k], k, 1, 20), 'pointstyle = 3)$ @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) dataplot (makelist (s1[k], k, 1, 20), 'pointstyle = 3)$ @end example Note that one dimensional data are plotted as a time series. In the next case, same more data with different settings, @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c dataplot (makelist (s1[k], k, 1, 50), 'maintitle = "First pi digits", @c 'axisnames = ["digit order", "digit value"], 'pointstyle = 2, @c 'joined = true)$ @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) dataplot (makelist (s1[k], k, 1, 50), 'maintitle = "First pi digits", 'axisnames = ["digit order", "digit value"], 'pointstyle = 2, 'joined = true)$ @end example Function @code{dataplot} can be used to plot points in the plane. The next example is a scatterplot of the pairs of wind speeds corresponding to the first and fifth meteorological stations, @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s2 : read_matrix (file_search ("wind.data"))$ @c dataplot (submatrix (s2, 2, 3, 4), 'pointstyle = 2, @c 'maintitle = "Pairs of wind speeds measured in knots", @c 'axisnames = ["Wind speed in A", "Wind speed in E"])$ @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s2 : read_matrix (file_search ("wind.data"))$ (%i4) dataplot (submatrix (s2, 2, 3, 4), 'pointstyle = 2, 'maintitle = "Pairs of wind speeds measured in knots", 'axisnames = ["Wind speed in A", "Wind speed in E"])$ @end example If points are stored in a two column matrix, @code{dataplot} can plot them directly, but if they are formatted as a list of pairs, their must be transformed to a matrix as in the following example. @c ===beg=== @c load (descriptive)$ @c x : [[-1, 2], [5, 7], [5, -3], [-6, -9], [-4, 6]]$ @c dataplot (apply ('matrix, x), 'maintitle = "Points", @c 'joined = true, 'axisnames = ["", ""], 'picturescales = [0.5, 1.0])$ @c ===end=== @example (%i1) load (descriptive)$ (%i2) x : [[-1, 2], [5, 7], [5, -3], [-6, -9], [-4, 6]]$ (%i3) dataplot (apply ('matrix, x), 'maintitle = "Points", 'joined = true, 'axisnames = ["", ""], 'picturescales = [0.5, 1.0])$ @end example Points in three dimensional space can be seen as a projection on the plane. In this example, plots of wind speeds corresponding to three meteorological stations are requested, first in a 3D plot and then in a multivariate scatterplot. @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s2 : read_matrix (file_search ("wind.data"))$ @c /* 3D plot */ dataplot (submatrix (s2, 4, 5), 'pointstyle = 2, @c 'maintitle = "Pairs of wind speeds measured in knots", @c 'axisnames = ["Station A", "Station B", "Station C"])$ @c /* Multivariate scatterplot */ dataplot (submatrix (s2, 4, 5), @c 'nclasses = 6, 'threedim = false)$ @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s2 : read_matrix (file_search ("wind.data"))$ (%i4) /* 3D plot */ dataplot (submatrix (s2, 4, 5), 'pointstyle = 2, 'maintitle = "Pairs of wind speeds measured in knots", 'axisnames = ["Station A", "Station B", "Station C"])$ (%i5) /* Multivariate scatterplot */ dataplot (submatrix (s2, 4, 5), 'nclasses = 6, 'threedim = false)$ @end example Note that in the last example, the number of classes in the histograms of the diagonal is set to 6, and that option @code{'threedim} is set to @code{false}. For more than three dimensions only multivariate scatterplots are possible, as in @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s2 : read_matrix (file_search ("wind.data"))$ @c dataplot (s2)$ @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s2 : read_matrix (file_search ("wind.data"))$ (%i4) dataplot (s2)$ @end example @end deffn @deffn {Function} histogram (@var{list}) @deffnx {Function} histogram (@var{list}, @var{option_1}, @var{option_2}, ...) @deffnx {Function} histogram (@var{one_column_matrix}) @deffnx {Function} histogram (@var{one_column_matrix}, @var{option_1}, @var{option_2}, ...) This function plots an histogram. Sample data must be stored in a list of numbers or a one column matrix. Giving values to the following @var{options} some aspects of the plot can be controlled: @itemize @bullet @item @code{'outputdev}, default @code{"x"}, indicates the output device; correct values are @code{"x"}, @code{"eps"} and @code{"png"}, for the screen, postscript and png format files, respectively. @item @code{'maintitle}, default @code{""}, is the main title between double quotes. @item @code{'axisnames}, default @code{["x", "Fr."]}, is a list with the names of axis @code{x} and @code{y}. @item @code{'picturescales}, default @code{[1.0, 1.0]}, scaling factors for the size of the plot. @item @code{'nclasses}, default @code{10}, is the number of classes or bars. @item @code{'relbarwidth}, default @code{0.9}, a decimal number between 0 and 1 to control bars width. @item @code{'barcolor}, default @code{1}, an integer to indicate bars color. @item @code{'colorintensity}, default @code{1}, a decimal number between 0 and 1 to fix color intensity. @end itemize In the next two examples, histograms are requested for the first 100 digits of number @code{%pi} and for the wind speeds in the third meteorological station. @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s1 : read_list (file_search ("pidigits.data"))$ @c histogram (s1, 'maintitle = "pi digits", 'axisnames = ["", "Absolute frequency"], @c 'relbarwidth = 0.2, 'barcolor = 3, 'colorintensity = 0.6)$ @c s2 : read_matrix (file_search ("wind.data"))$ @c histogram (col (s2, 3), 'colorintensity = 0.3)$ @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s1 : read_list (file_search ("pidigits.data"))$ (%i4) histogram (s1, 'maintitle = "pi digits", 'axisnames = ["", "Absolute frequency"], 'relbarwidth = 0.2, 'barcolor = 3, 'colorintensity = 0.6)$ (%i5) s2 : read_matrix (file_search ("wind.data"))$ (%i6) histogram (col (s2, 3), 'colorintensity = 0.3)$ @end example Note that in the first case, @code{s1} is a list and in the second example, @code{col(s2,3)} is a matrix. See also function @code{barsplot}. @end deffn @deffn {Function} barsplot (@var{list}) @deffnx {Function} barsplot (@var{list}, @var{option_1}, @var{option_2}, ...) @deffnx {Function} barsplot (@var{one_column_matrix}) @deffnx {Function} barsplot (@var{one_column_matrix}, @var{option_1}, @var{option_2}, ...) Similar to @code{histogram} but for discrete, numeric or categorical, statistical variables. These are the options, @itemize @bullet @item @code{'outputdev}, default @code{"x"}, indicates the output device; correct values are @code{"x"}, @code{"eps"} and @code{"png"}, for the screen, postscript and png format files, respectively. @item @code{'maintitle}, default @code{""}, is the main title between double quotes. @item @code{'axisnames}, default @code{["x", "Fr."]}, is a list with the names of axis @code{x} and @code{y}. @item @code{'picturescales}, default @code{[1.0, 1.0]}, scaling factors for the size of the plot. @item @code{'relbarwidth}, default @code{0.9}, a decimal number between 0 and 1 to control bars width. @item @code{'barcolor}, default @code{1}, an integer to indicate bars color. @item @code{'colorintensity}, default @code{1}, a decimal number between 0 and 1 to fix color intensity. @end itemize This example plots the barchart for groups @code{A} and @code{B} of patients in sample @code{s3}, @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s3 : read_matrix (file_search ("biomed.data"))$ @c barsplot (col (s3, 1), 'maintitle = "Groups of patients", @c 'axisnames = ["Group", "# of individuals"], 'colorintensity = 0.2)$ @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s3 : read_matrix (file_search ("biomed.data"))$ (%i4) barsplot (col (s3, 1), 'maintitle = "Groups of patients", 'axisnames = ["Group", "# of individuals"], 'colorintensity = 0.2)$ @end example The first column in sample @code{s3} stores the categorical values @code{A} and @code{B}, also known sometimes as factors. On the other hand, the positive integer numbers in the second column are ages, in years, which is a discrete variable, so we can plot the absolute frequencies for these values, @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s3 : read_matrix (file_search ("biomed.data"))$ @c barsplot (col (s3, 2), 'maintitle = "Ages", @c 'axisnames = ["Years", "# of individuals"], 'colorintensity = 0.2, @c 'relbarwidth = 0.6)$ @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s3 : read_matrix (file_search ("biomed.data"))$ (%i4) barsplot (col (s3, 2), 'maintitle = "Ages", 'axisnames = ["Years", "# of individuals"], 'colorintensity = 0.2, 'relbarwidth = 0.6)$ @end example See also function @code{histogram}. @end deffn @deffn {Function} boxplot (@var{data}) @deffnx {Function} boxplot (@var{data}, @var{option_1}, @var{option_2}, ...) This function plots box diagrams. Argument @var{data} can be a list, which is not of great interest, since these diagrams are mainly used for comparing different samples, or a matrix, so it is possible to compare two or more components of a multivariate statistical variable. But it is also allowed @var{data} to be a list of samples with possible different sample sizes, in fact this is the only function in package @code{descriptive} that admits this type of data structure. See example bellow. These are the options, @itemize @bullet @item @code{'outputdev}, default @code{"x"}, indicates the output device; correct values are @code{"x"}, @code{"eps"} and @code{"png"}, for the screen, postscript and png format files, respectively. @item @code{'maintitle}, default @code{""}, is the main title between double quotes. @item @code{'axisnames}, default @code{["sample", "y"]}, is a list with the names of axis @code{x} and @code{y}. @item @code{'picturescales}, default @code{[1.0, 1.0]}, scaling factors for the size of the plot. @end itemize Examples: @c ===beg=== @c load (descriptive)$ @c load (numericalio)$ @c s2 : read_matrix (file_search ("wind.data"))$ @c boxplot (s2, 'maintitle = "Windspeed in knots", @c 'axisnames = ["Seasons", ""])$ @c A : @c [[6, 4, 6, 2, 4, 8, 6, 4, 6, 4, 3, 2], @c [8, 10, 7, 9, 12, 8, 10], @c [16, 13, 17, 12, 11, 18, 13, 18, 14, 12]]$ @c boxplot (A)$ @c ===end=== @example (%i1) load (descriptive)$ (%i2) load (numericalio)$ (%i3) s2 : read_matrix (file_search ("wind.data"))$ (%i4) boxplot (s2, 'maintitle = "Windspeed in knots", 'axisnames = ["Seasons", ""])$ (%i5) A : [[6, 4, 6, 2, 4, 8, 6, 4, 6, 4, 3, 2], [8, 10, 7, 9, 12, 8, 10], [16, 13, 17, 12, 11, 18, 13, 18, 14, 12]]$ (%i6) boxplot (A)$ @end example @end deffn