Defining Fit Data

Once a data set has been selected it can be fit to various distributions to determine which would mostly likely produce that data set. The first step in the process is to define the characteristics of the data set (or sets) in the Data tab of the Fit Distributions to Data window.

If a single data set is to be fitted, click the Fit button and select 'Fit'.

If multiple data sets are to be fitted simultaneously, click the Fit button and select 'Batch Fit'. Note that all data sets must use the same fitting configuration when run in batch mode, including which distributions will be tested.

Figure 1 - Fit Distributions to Data - Data Tab

Data Tab

The Data Tab consists of the following primary sections:

  1. Data Set - Details of the data to be included in the Fit, including a Name value which will appear as the name of the fit in the Fit Manager window. See below for more information.
  2. Data Type - Configure the type of data set chosen.
  3. Filter - Filter the data set values; this can be set to use absolute or relative values.

Data Set

The Data Set panel is the primary definition of the data to be included in the Fit.

  • Name - It is highly recommended to give the data set a name; the Name value will appear in the Name column of the Fit Manager window. Additionally, the Name of the Fit will be used in any RiskFit function that links an input distribution to the results of the fit.
  • Range - Defines the range of cells to be included in the fitting process; uses a standard Excel range reference but can only use a contiguous range. Enter the range manually, or use the 'Select an Excel Range' button to select cells.
  • Specific, individual values from a contiguous range can be excluded from the Fit by using a Filter; see Filtering, below. It is not necessary to manually exclude those cells from the Range.

  • Values are Dates - Check this box if the data values are formatted as Dates.

Data Type

This section designates what type of data set is being used for the Fit. The type of data (e.g. continuous or discrete) will greatly affect the distributions that are available to test for fit.

It is key that the Data Type configuration is set correctly for the fit process to be effective. The Data Type selection designates that the data set is a random sampling (continuous or discrete) versus points on a curve (density) versus points on a cumulative curve (cumulative). This choice is a fundamental part of the fitting process.

See Fit Data Data Types for more information.

Filter

Figure 2 - Absolute Filter

Figure 3 - Relative Filter

Filtering settings for the data set allow for the exclusion of certain values from the fitting process, without the need to exclude those values from the designated Excel range. This can be particularly useful when attempting to fit large data sets.

There are three options for Filtering:

  • None - Do not filter the data set.
  • Absolute - Specifies a minimum or maximum value (or both) for the X-value (Figure 2, right). If only a minimum value is entered, only data below that value will be filtered out; if only a maximum is entered, only data above that maximum will be filtered out.
  • Relative - Specifies a number of standard deviations from the mean of the data set (Figure 3, right); only values outside that number of standard deviations will be filtered out.