## Friday, 17 March 2023

### 2.5% AQL: How it works - with Python Code

I will take a case study and then go through it taking two approaches.

Case:

A vendor has offered 671 sarees for inspection, some of which are defective.

a. How many sarees need to be inspected for a 2.5% AQL level

b. What is 2.5% AQL level

c. Suppose I took a sample of 26 sarees and 15 of them are defective, should I reject the whole lot

========
Solution 1. Using Hypothesis Testing Approach
========

To determine whether you should reject the whole lot of sarees, you need to conduct a hypothesis test using the sample data you have collected.

Here is how you can approach it:

1. Define the null and alternative hypothesis:

Null Hypothesis ( H0):  The proportion of defective sarees in the entire lot is equal to or less than a specified value p0.

Alternative Hypothesis ( Ha): The proportion of defective sarees in the entire lot is greater than p0.

2. Set the significant level of the test. This is probability of rejecting the null hypothesis when it is actually true. Lets say you choose a significance level of 0.05.

3. Calculate the test statistic. For this situation, you can use a one-tailed Z -test, for proportions, the formula is

z = (phat-p0)/sqrt(p0(1-p0)/n)

Where phat is the sample proportion of defective marbles, n is the sample size, and sqrt() denotes the square root function.

Plugging in the values from you sample, you get:

z = (15/26-p0)/sqrt(p0(1-p0)/26)

4. Determine the critical value or p-value. critical value can be found from a z -table for your chosen significance level.

Alternatively, you can use p-value approach, which is to find the probability of getting a test statistic as extreme or more extreme than the observed one, assuming the null hypothesis is true.

5. Decide. If the test statistic exceeds the critical value, or p-value is less than the significance level, you reject the null hypothesis and conclude that the proportion of defective sarees in the entire lot is greater than p0, else you fail to reject the null hypothesis.

Assuming that p0 = 0.05 and alpha = 0.05, then test statistic will be

z = (15/26-0.05)/sqrt(0.05(1-0.05)/26) = 3.20

critical value for alpha 0.05 is 1.645, as 3.20 is more than that we reject the null hypothesis and conclude that the proportion of defective sarees in the entire lot is greater than 0.05. Therefore you should reject the whole lot of sarees.

You can use the following python code to achieve it. Here it is assumed that defective rate is max 2.5%, instead of 0.05 as above

============================
import statsmodels.stats.proportion as smprop

# Lot size
N = 671

# Number of defective marbles in the sample
defectives = 15

# Calculate the sample proportion of defectives
p_sample = defectives / N

# Null hypothesis: p <= 0.025 (defective rate is at most 2.5%)
# Alternative hypothesis: p > 0.025 (defective rate is higher than 2.5%)

# Perform one-tailed z-test with alpha = 0.05
z_score, p_value = smprop.proportions_ztest(defectives, N, value=0.025, alternative='larger')
print("z-score:", z_score)
print("p-value:", p_value)

if p_value <= 0.05:
print("Reject null hypothesis")
else:
print("Fail to reject null hypothesis")
=====================================

n this code, we first calculate the sample proportion of defectives by dividing the number of defective marbles by the lot size. We then set up the null and alternative hypotheses as before, and perform a one-tailed z-test with the proportions_ztest() function from the statsmodels.stats.proportion module. The proportions_ztest() function takes the following arguments:

count: the number of successes (defective marbles) in the sample.
nobs: the sample size (lot size).
value: the hypothesized proportion under the null hypothesis (which was 2.5% in this case).
alternative: the alternative hypothesis, which is 'larger' in this case since we are testing for a higher defective rate.
The proportions_ztest() function returns the z-score and p-value of the test. We compare the p-value to the significance level (alpha = 0.05) and make a decision to either reject or fail to reject the null hypothesis.

When you run this code, it will output the z-score and p-value of the test, and the decision to either reject or fail to reject the null hypothesis.

You can achieve the same assuming binomial distribution

======================================

import scipy.stats as stats

# Lot size
N = 671

# Null hypothesis: p <= 0.025 (defective rate is at most 2.5%)
# Alternative hypothesis: p > 0.025 (defective rate is higher than 2.5%)

# Number of defective marbles in the sample
defectives = 15

# Perform one-tailed binomial test with alpha = 0.05
p_value = stats.binom_test(defectives, n=N, p=0.025, alternative='greater')
print("p-value:", p_value)

if p_value <= 0.05:
print("Reject null hypothesis")
else:
print("Fail to reject null hypothesis")

=======================================

The Jupyter code is:

In this code, the null hypothesis is that the defective rate p is at most 2.5% (i.e., p <= 0.025), and the alternative hypothesis is that p is higher than 2.5% (i.e., p > 0.025). We perform a one-tailed binomial test with the binom_test() function from the scipy.stats module, with the following arguments:

defectives: the number of defective marbles in the sample (which was 15 in this case).
n: the lot size (which was 671 in this case).
p: the hypothesized defective rate under the null hypothesis (which was 2.5% in this case).
alternative: the alternative hypothesis, which is 'greater' in this case since we are testing for a higher defective rate.
The binom_test() function returns the p-value of the test. We compare the p-value to the significance level (alpha = 0.05) and make a decision to either reject or fail to reject the null hypothesis.

When you run this code, it will output the p-value of the test and the decision to either reject or fail to reject the null hypothesis.

AQL Approach

Yes, you can use the AQL (Acceptable Quality Level) approach to solve this problem. AQL is a commonly used sampling plan in quality control that specifies the maximum acceptable percentage of defects in a sample. The AQL value is expressed as a percentage, and it represents the level of quality that is deemed acceptable by the customer or the manufacturer.

To use the AQL approach, you need to determine the sample size and the maximum allowable number of defects based on the AQL value and the lot size. The formula for calculating the sample size is:

n = (Zα/2)2 * p * (1-p) / d2

where Zα/2 is the critical value of the standard normal distribution corresponding to the desired level of confidence (e.g., Zα/2 = 1.96 for a 95% confidence level), p is the expected proportion of defects in the lot, d is the maximum allowable difference between the sample proportion and the lot proportion (i.e., the AQL value), and n is the sample size.

For this problem, let's assume that the AQL value is 2.5%, and we want to be 95% confident that the sample proportion is within 2.5% of the lot proportion. We can set p = 15/671, the proportion of defective marbles in the lot based on the sample, and d = 2.5% = 0.025. The lot size is already given as 671.

Using the formula, we get:

n = (1.96)2 * (15/671) * (1 - 15/671) / (0.025)2 ≈ 106

This means that we need to randomly sample 106 sarees from the lot to determine whether the lot meets the AQL requirement of 2.5% defects. If the number of defective sareesin the sample is less than or equal to 3 (i.e., 2.5% of 106), we accept the lot. If it is greater than 3, we reject the lot.

In this case, the sample has 15 defective sarees, which is greater than 3, so we would reject the lot based on the AQL approach as well.

You can use the following python code to achieve it. Here it is assumed that defective rate is max 2.5%, instead of 0.05 as above

To make the Z value dynamic depending on the desired confidence level, you can modify the sample_size function to accept a parameter for the confidence level, and then calculate Z using the scipy.stats.norm.ppf() function, which returns the critical value of the standard normal distribution corresponding to a given percentile (i.e., confidence level).

========================
import math
from scipy.stats import norm

# Lot size
N = 671

# Sample size formula
def sample_size(AQL, p, alpha):
Z = norm.ppf(1 - alpha/2) # Critical value for two-tailed test
d = AQL # Maximum allowable difference
n = ((Z**2) * p * (1 - p)) / (d**2)
return math.ceil(n)

# Calculate sample size for AQL = 2.5%, p = 15/671, and alpha = 0.05 (95% confidence level)
n = sample_size(0.025, 15/671, 0.05)
print("Sample size:", n)

# Number of defective marbles in the sample
defectives = 15

# Check if the lot meets the AQL requirement at alpha = 0.05
AQL_defectives = math.ceil(n * 0.025) # Maximum allowable defects based on AQL
if defectives <= AQL_defectives:
print("Lot accepted")
else:
print("Lot rejected")

# Check if the lot meets the AQL requirement at alpha = 0.01 (99% confidence level)
n = sample_size(0.025, 15/671, 0.01)
print("Sample size:", n)
AQL_defectives = math.ceil(n * 0.025)
if defectives <= AQL_defectives:
print("Lot accepted")
else:
print("Lot rejected")
=========================================

In this code, the alpha parameter represents the significance level (1 - confidence level), which is used to calculate the critical value of Z. The norm.ppf() function takes a percentile (in this case, 1 - alpha/2 for a two-tailed test) and returns the corresponding critical value of the standard normal distribution.

When you run this code, it will output the sample size and the lot acceptance/rejection decision for both a 95% confidence level (alpha = 0.05) and a 99% confidence level (alpha = 0.01). The Z value will be different for each confidence level, and will be calculated using the norm.ppf() function.