Chi-Squared Binning
The chi-squared test is a method of determining "Goodness-of-Fit" for an estimated distribution; the chi-squared test is an alternative to the Kolmogorov-Smirnov (K-S) and Anderson-Darling (A-D) tests. All of these statistics are used to analyze the qualtiy of a fit. However, unlike the K-S and A-D tests, the chi-squared test is dependent on how the data is binned, so appropriate categorization is key.
If unsure about the number of bins for the data set, choose 'Auto' for the 'Number of Bins' and set the 'Bin Arrangement' to 'Equal Probabilities'.
Chi-Sq Binning Configuration
Figure 1 - Fit Distribution to Data - Chi-Sq Binning Tab
The Chi-Sq Binning tab of the Fit Distributions to Data window (Figure 1, right) includes the configuration options for chi-square tests that are run during the fitting process. The tab includes two sections:
Bin Arrangement
Bin arrangement determines how the data values will be categorized for the chi-squared test; The values in the data set can be binned in the following ways:
The 'Number of Bins' defaults to a value of 'Auto'; to input a different value, double-click in the field and type the number of bins that should be used in testing.
Equal Interval Binning Options
If 'Equal Intervals' is chosed for the bin arrangement, this section becomes active. Any (or all) of the following options can be set to configure Equal Interval bins:
Whether 'Automatic Minimum and Maximum Based on Input Data' is checked or if manual minimum and maximum values are set, the first and last bins will still be configured according to the options set for 'Extend First Bin From Minimum to -Infinity' and 'Extend Last Bin From Maximum to +Infinity' settings.
Custom Bins
Figure 2 - Custom Bins
In situations where bins should be of unequal lengths (i.e. where natural groupings exist), Custom Bins can be configured (Figure 2, right). When configuring the first bin, enter the starting value for that bin; for each subsequent bin, enter the ending value. As values are entered into the Bin Limits fields, the Bin Description will populate with the logical formula for that bin.
Similar to Equal Interval bins, the first and last bins can be configured to extend to minus and plus infinity when working with a data set with unknown bounds. For example, Figure 2 illustrates a bin configuration where the first bin extends from 'minus infinity to less than 10', and the last bin extends from 'greater than or equal to 70 to plus infinity'.