[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: R: PSPP-BUG: Median Bug?
From: |
Ben Pfaff |
Subject: |
Re: R: PSPP-BUG: Median Bug? |
Date: |
Fri, 11 Mar 2011 22:13:11 -0800 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux) |
"Fabio Bordignon" <address@hidden> writes:
> Thank You for your fast reply.
> Consider this simple variable distribution. The correct median should be 2
> ("scuola media inferiore") but PSPP says 3 ("diploma superiore").
>
> I'm using pspp 0.7.5-g70514b.
Here's a fix. It passes all of the existing test cases plus one
based on the data that you supplied.
John, please let me know if you see a problem here. It looked
"obviously correct" to me, so I've pushed it already.
Thanks,
Ben.
--8<--------------------------cut here-------------------------->8--
From: Ben Pfaff <address@hidden>
Date: Fri, 11 Mar 2011 22:10:54 -0800
Subject: [PATCH] FREQUENCIES: Fix percentiles calculation.
The condition for using a variate directly instead of interpolating was
just wrong. It would interpolate in cases where it clearly should not,
which produced incorrect results in many cases.
Thanks to Fabio Bordignon <address@hidden> for reporting the problem
and supplying a simple test case.
---
src/language/stats/frequencies.q | 5 +--
tests/language/dictionary/weight.at | 2 +-
tests/language/stats/frequencies.at | 41 +++++++++++++++++++++++++++++++++++
3 files changed, 44 insertions(+), 4 deletions(-)
diff --git a/src/language/stats/frequencies.q b/src/language/stats/frequencies.q
index ecaefdf..adc4f16 100644
--- a/src/language/stats/frequencies.q
+++ b/src/language/stats/frequencies.q
@@ -1,5 +1,5 @@
/* PSPP - a program for statistical analysis.
- Copyright (C) 1997-9, 2000, 2007, 2009, 2010 Free Software Foundation, Inc.
+ Copyright (C) 1997-9, 2000, 2007, 2009, 2010, 2011 Free Software
Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@@ -922,8 +922,7 @@ calc_percentiles (const struct frq_proc *frq, const struct
var_freqs *vf)
if (rank <= tp)
break;
- if (f->count > 1
- && (rank - (f->count - 1) > tp || f + 1 >= ft->missing))
+ if (tp + 1 < rank || f + 1 >= ft->missing)
pc->value = f->value.f;
else
pc->value = calc_percentile (pc->p, W, f->value.f, f[1].value.f);
diff --git a/tests/language/dictionary/weight.at
b/tests/language/dictionary/weight.at
index 0eb55bc..4022692 100644
--- a/tests/language/dictionary/weight.at
+++ b/tests/language/dictionary/weight.at
@@ -146,6 +146,6 @@ Range,,76.000
Minimum,,18.000
Maximum,,94.000
Sum,,23006.00
-Percentiles,50 (Median),29
+Percentiles,50 (Median),28
])
AT_CLEANUP
diff --git a/tests/language/stats/frequencies.at
b/tests/language/stats/frequencies.at
index cfd992a..8aaba06 100644
--- a/tests/language/stats/frequencies.at
+++ b/tests/language/stats/frequencies.at
@@ -419,6 +419,47 @@ Percentiles,0,1.00
])
AT_CLEANUP
+dnl Data for this test case from Fabio Bordignon <address@hidden>.
+AT_SETUP([FREQUENCIES enhanced percentiles, weighted (3)])
+AT_DATA([frequencies.sps],
+ [DATA LIST LIST notable /X * F *.
+BEGIN DATA.
+1 7
+2 16
+3 12
+4 5
+END DATA.
+
+WEIGHT BY f.
+
+FREQUENCIES
+ VAR=x
+ /PERCENTILES = 0 25 50 75 100.
+])
+AT_CHECK([pspp -O format=csv frequencies.sps], [0], [dnl
+Table: X
+Value Label,Value,Frequency,Percent,Valid Percent,Cum Percent
+,1.00,7.00,17.50,17.50,17.50
+,2.00,16.00,40.00,40.00,57.50
+,3.00,12.00,30.00,30.00,87.50
+,4.00,5.00,12.50,12.50,100.00
+Total,,40.00,100.0,100.0,
+
+Table: X
+N,Valid,40.00
+,Missing,.00
+Mean,,2.38
+Std Dev,,.93
+Minimum,,1.00
+Maximum,,4.00
+Percentiles,0,1.00
+,25,2.00
+,50 (Median),2.00
+,75,3.00
+,100,4.00
+])
+AT_CLEANUP
+
AT_SETUP([FREQUENCIES enhanced percentiles, weighted, missing values])
AT_DATA([frequencies.sps],
[DATA LIST LIST notable /X * F *.
--
1.7.2.3
--
Ben Pfaff
http://benpfaff.org