Chi-Squared Binning

The chi-squared test is a method of determining "Goodness-of-Fit" for an estimated distribution; the chi-squared test is an alternative to the Kolmogorov-Smirnov (K-S) and Anderson-Darling (A-D) tests. All of these statistics are used to analyze the qualtiy of a fit. However, unlike the K-S and A-D tests, the chi-squared test is dependent on how the data is binned, so appropriate categorization is key.

If unsure about the number of bins for the data set, choose 'Auto' for the 'Number of Bins' and set the 'Bin Arrangement' to 'Equal Probabilities'.

Chi-Sq Binning Configuration

Figure 1 - Fit Distribution to Data - Chi-Sq Binning Tab

The Chi-Sq Binning tab of the Fit Distributions to Data window (Figure 1, right) includes the configuration options for chi-square tests that are run during the fitting process. The tab includes two sections:

  • Bin Arrangement - Configures how the data should be binned, or categorized.
  • Equal Interval Binning Options - These options are only enabled if 'Equal Intervals' is selected in the Bin Arrangement section.

Bin Arrangement

Bin arrangement determines how the data values will be categorized for the chi-squared test; The values in the data set can be binned in the following ways:

  • Equal Intervals - Specifies that the bins will be of equal length across the entire data set; selecting this option enables the 'Equal Interval Binning Options' section. See below for more details.
  • Equal Probabilities - Specifies that the bins will be made at equal-probability intervals across the fitted distribution. This will normally result in unequal bin lengths in terms of actual values, as @RISK will adjust to make all bins have the same probability. For continuous distributions, this is relatively straightforward; for discrete distributions, @RISK will only be able to make bins approximately equal.
  • Custom - Set each bin's starting value manually; see Custom Bins, below, for more information.

The 'Number of Bins' defaults to a value of 'Auto'; to input a different value, double-click in the field and type the number of bins that should be used in testing.

Equal Interval Binning Options

If 'Equal Intervals' is chosed for the bin arrangement, this section becomes active. Any (or all) of the following options can be set to configure Equal Interval bins:

  • Automatic Minimum and Maximum Based on Input Data - Set the minimum and maximum values for bins to the minimum and maximum values for the data set. Uncheck this box to enter different Minimum and Maximum values; this allows for a specific range to be used for binning without regard to the actual minimum and maximum of the data set.
    • Minimum - Set a manual value for the first bin.
    • Maximum - Set a manual value for the last bin.
  • Extend First Bin From Minimum to -Infinity - Specifies that the first bin used will extend from minus infinity to the specified minimum. All other bins will be of equal length. In some situations, this improves the fitting for data sets with unknown lower bounds.
  • Extend Last Bin From Maximum to +Infinity - Specifies that the last bin used will extend from the specified maximum to plus infinity. All other bins will be of equal length. In certain situations, this improves fitting for data sets with unknown upper bounds.

Whether 'Automatic Minimum and Maximum Based on Input Data' is checked or if manual minimum and maximum values are set, the first and last bins will still be configured according to the options set for 'Extend First Bin From Minimum to -Infinity' and 'Extend Last Bin From Maximum to +Infinity' settings.

Custom Bins

Figure 2 - Custom Bins

In situations where bins should be of unequal lengths (i.e. where natural groupings exist), Custom Bins can be configured (Figure 2, right). When configuring the first bin, enter the starting value for that bin; for each subsequent bin, enter the ending value. As values are entered into the Bin Limits fields, the Bin Description will populate with the logical formula for that bin.

Similar to Equal Interval bins, the first and last bins can be configured to extend to minus and plus infinity when working with a data set with unknown bounds. For example, Figure 2 illustrates a bin configuration where the first bin extends from 'minus infinity to less than 10', and the last bin extends from 'greater than or equal to 70 to plus infinity'.