# Determining the Statistical Power of the Kolmogorov-Smirnov and Anderson-Darling Goodness-of-Fit Tests via Monte Carlo Simulation

## Introduction

The results of a Monte Carlo simulation that calculates the statistical power of two common goodness-of-fit (GoF) tests are presented and analyzed in this paper. Various distribution types are considered, including the normal, lognormal, and exponential distributions. The results of this study provide required sample size as a function of statistical power. The presented data can be used to determine the minimum required sample size for a desired level of power. The simulation methodology can be adapted to calculate statistical power for the same distributions with different parameters or other distribution types.

## Goodness-of-Fit Testing‌

Goodness-of-fit (GoF) testing is a technique used to determine how well a statistical model fits a data set. Single-sample GoF tests consider a null and an alternative hypothesis to confirm whether a sample could have been drawn from a population with a particular distribution. Multi-sample GoF tests determine whether the samples could have been drawn from populations with the same distribution. Thus, GoF tests are useful for validating whether simulation output is similar to real-world data, and for comparing the performance of a new system to that of a previous generation. Two such tests, Kolmogorov–Smirnov (KS) and Anderson-Darling (AD), are the subjects of discussion in this paper, and their behaviors in terms of statistical power are analyzed and presented. Determining statistical power is important for test design because it enables the designer to choose a minimum sample size required to detect a difference between samples (i.e., the GoF result may be too unreliable if the required sample size is not used for the test).

## Two-Sample KS and AD Tests‌

The two-sample KS and AD tests are GoF tests used to infer whether two samples were drawn from populations with the same distribution. In both tests, the empirical distribution function (EDF) of each sample is used to calculate the test statistic. If the value of the test statistic islarger than a critical value for a given significance, or if the p-value is less than the given level of significance, the null hypothesis is rejected and one can infer that the samples were drawn from populations with dissimilar distributions. Both tests can accommodate equal or unequal sample sizes among the two samples being considered.