Date Posted: Mon Aug 31 12:47:36 US/Eastern 1998
Posted by: Farid Hady
(fhsizwoyo@yahoo.com)
)
Date posted: Mon May 9 14:31:02 US/Eastern 2005
question for your web
I'm student of university in Indonesia, my name
is Farid Hady, I'm doing thesis about chi
squared test for goodness of fit, and I have a problem
about how the data is binned, Which references do you
use to make binned of data in example (normality test)
that written on your web (The Chi Square : A
largeSample Godness of Fit Test, volume 10, No. 4)?
thanks for your attention
Reply Posted by: Jorge Romeu
(jromeu@alionscience.com)
Date Posted: Mon May 9 14:33:00 US/Eastern 2005
Message: You posted an item in the RAC Forum about the Chi Square Goodness of Fit test and your understanding of our START sheet on this issue. You state that you are doing a thesis on this topic and have some questions about how to bin the data for this test. You also request a reference on the topic. Let me start by the last item.
1. I would suggest you find a textbook such as Probability and Statistics for Engineers and Scientists. Walpole, Myers and Myers. Sixth Edition. Prentice Hall. NJ. 2001. If you cannot find this book in you University library, then look up any other engineering or computer science statistics book, at the beginning graduate level (e.g. MS) that they may be using there.
2. Regarding the binning of the data, bear in mind three things. First, you need to have the minimum number of bins to allow you to implement the test. In the case of the Normal distribution, for example, if you are estimating its two parameters, you need at least four bins: two bins for the two estimations, one for the test and an extra one because you cannot have zero degrees of freedom and you are subtracting three. The second thing you have to be concerned with, is having at least FIVE expected observations in each bin, which means that you must have at least twenty data points as a bare minimum, to conduct this analysis. The third issue is that all bins should be approximately equal in “area” or probability. You must find the bin limits that accomplish this (some software packages do this for you, but most likely you will have to use your experience, and trial and error). The bin limits partition your data domain into segments such that, under your proposed distribution, each will have equal probability. For four bins, this means that each bin should approximately contain 25% of the hypothesized distribution. Finally, I would not conduct such fit test with less than Five or Six bins. For, the Chi Square with one Degree of Freedom yields very poor results and you should strive for a Chi Square with at least Two or Three DF (and the more, the better). If you have lots of data, you should balance the number of bins with the number of points contained in each. But this also comes with experience.
I hope this discussion is helpful to you. We have two other RAC START sheets on Goodness of Fit: one for the KolmogorovSmirnov and another for the AndersonDarling. You may want to read these, too, even when the tests are different, because the general line of though is similar.
Thankyou for your inquiry.
