Yet another practical example for conducting a statistical analysis in Python
My second project at Flatiron School required gathering information from a realworld database and using my knowledge of statistical analysis and hypothesis testing to generate analytical insights. We were working with the famous Northwind database – a free, opensource dataset created by Microsoft containing data from a fictional company and widely used for learning SQL. After already conducting both a Welch’s ttest and an independent ttest, this blog covers a oneway ANOVA. The following blogs deal with a twoway ANOVA and a ChiSquared Test for Independency.
The complete Jupyter Notebook can be found here.
The question to answer and analyze is:
Does discount level have a statistically significant effect on the
quantity of product in an order? If so, at what level(s)?
Here’s what’s covered in this section:
 Obtaining the data
 Data Cleaning & Feature Engineering
 The Hypothesis
 The Analysis
 Conclusions
Obtaining the data
This section was already discussed extensively in the first blog regarding statistical analysis.
Data Cleaning & Feature Engineering
This time it’s very easy, as we only need one table:
 “OrderDetail”
We’re using the pd
.read_sql_query() method:
df3 = pd.read_sql_query('''SELECT * FROM OrderDetail;''', engine)
Let’s check the newly created dataframe:
df3.head()
How many unique discount levels do we have?
df3.Discount.value_counts()
Output:
0.00 1317 0.05 185 0.10 173 0.20 161 0.15 157 0.25 154 0.03 3 0.02 2 0.01 1 0.04 1 0.06 1
Oh, some discount levels occur only once. Let’s drop these and work only with both the nodiscount group, and the 5 groups with substantial discount levels:
crit = (df3.Discount == 0.00)  (df3.Discount == 0.05)  (df3.Discount == 0.10)  (df3.Discount == 0.15)  (df3.Discount == 0.20)  (df3.Discount == 0.25)
df3_test = df3.loc[crit]
df3_test.Discount.value_counts()
Output:
0.00 1317 0.05 185 0.10 173 0.20 161 0.15 157 0.25 154
Much better. Let’s also check if any of our discount levels have substantial outliers:
df3_test.boxplot('Quantity', by='Discount', figsize=(10, 5));
Great! There are no outliers in each group with respect to the quantity of products.
The Hypothesis
Now it’s time to define the hypotheses – the null and the alternative one – and set the alpha level:
Ho: The average quantity of ordered products in an order is equal for all levels of discounts.:
H0: π1 = π2 = π3 = … = ππ … = ππ
H1: The average quantity of ordered products differs between at least two discounts levels:
H1: ππ β ππ
Significance level is set to the value of 5%:
πΌ = 0.05
The Analysis
The analysis of variance or oneway ANOVA is used to compare the differences between the means of a condition of 2+ groups. It can be thought of as an extension of the ttest that is used to compare the means of a condition between 2 groups.
ANOVA is an omnibus test, meaning it tests the data as a whole; it tests if there is a difference in the mean somewhere in the model, but it does not tell us where the difference is (if there is one). To find out where the difference is between the groups, we have to conduct posthoc tests.
Although it can be thought of as an extension of the ttest in terms of when to use it, mathematically speaking itβs more of a regression model and is considered a generalized linear model (GLM).
Because of the FDistribution’s shape, the significance level is always specified on the right (no onetailed or twotailed specification is necessary).
There are 3 assumptions that need to be met for the results of an ANOVA test to be considered accurate and trustworthy. Itβs important to note that two assumptions apply to the residuals and not the variables themselves. The ANOVA assumptions are the same as those for linear regression, and are:
 Normality
The caveat to this is that if group sizes are equal, the Fstatistic is robust to violations of normality.  Homogeneity of variance
Same caveat as above; if group sizes are equal, the Fstatistic is robust to this violation.  Independent observations
The independence of observations can be assumed!
If possible, it is best to have equalsized groups so corrections to the data do not need to be made. Fortunately, we are dealing with equalsized groups here.
As two of the assumptions apply to the residuals, they are tested after conducting the ANOVA. Both goodnessoffit tests are left out here, but can be looked up in detail in my notebook.
The ANOVA itself is coded very simple:
import statsmodels.api as sm
from statsmodels.formula.api import ols
# syntax for defining the ANOVAmodel
anova = ols('Quantity ~ Discount', data=df3_test).fit()
table = sm.stats.anova_lm(anova, typ=2)
table
On the left, we have our discount factor, and on the right, the probability that this factor is indeed influential.
Values < .05 (i.e. our Ξ±) indicate rejection of the null hypothesis. We can see the discount factor appears to be highly influential.
One can use a few different effect sizes for an ANOVA: eta squared (Ξ·2), and omega squared (Ο2). Omega squared is considered a better measure of effect size than eta squared because it is unbiased in its calculation.
Something to note: For some reason, R2 is called eta squared within the ANOVA framework, but they are the same thing. R2 is a measure of how much variance is explained by the model, and is calculated by taking the explained variance and dividing it by the total variance. The following code uses the ANOVA table produced by statsmodels and appends the effect size measures of etasquared (Ξ·2) and omegasquared (Ο2).
def anova_effect_size(aov):
""" This function takes the Anova table performed with
statsmodels and appends two effect size measures: etasquared
and omegasquared. """
# mean squared
aov['mean_sq'] = aov[:]['sum_sq']/aov[:]['df']
# etasquared (Ξ·2)
aov['eta_sq'] = aov[:1]['sum_sq']/sum(aov['sum_sq'])
# omegasquared (Ο2)
num = (aov[:1]['sum_sq']  (aov[:1]['df']*aov['mean_sq'][1]))
denom = (sum(aov['sum_sq']) + aov['mean_sq'][1])
aov['omega_sq'] = num / denom
return aov
anova_effect_size(table)
According to both measures (around 0.016), the discount has only a small effect.
The overall model was significant; now we want to know which groups actually differ. This is called Posthoc Testing. To do so, Tukeyβs HSD is one method that can be used:
from statsmodels.stats.multicomp import pairwise_tukeyhsd
from statsmodels.stats.multicomp import MultiComparison
mc = MultiComparison(df3_test['Quantity'], df3_test['Discount'])
mc_results = mc.tukeyhsd()
Instead of printing a table, let’s use visualizations – it’s just so much easier to spot the differences.
The base discount group to compare against is color coded BLUE, all groups that are significantly different from this base are color coded RED, and all insignificant groups GRAY:
# let's compare all levels with discount level 0.0 (blue)
# all groups that are significantly different from 0.0 are color coded red, all insignificant groups gray
mc_results.plot_simultaneous(comparison_name=0.0, figsize=(6,4), ylabel='Discount Level\n', xlabel='\nAverage Number of Products in Order');
 All discount levels significantly differ from NO DISCOUNT – except the 0.1% discount.
Let’s have a look at all of the other comparisons:
 Some levels seem to have an unwanted effect though, e.g. 0.2: The number of ordered products becomes even less compared 0.15 or 0.05!
Conclusions
A oneway ANOVA on πΌ = 0.05 was used to analyze the effect of discount levels on the quantity of a product in an order:

There was a statistically significant difference between the discount levels with p = 2.991487e09.

Therefore I may reject the null hypothesis in support of the alternative: The average quantity of ordered products differs between at least two discounts levels.

Posthoc testing revealed interesting details about which discountlevel groups actually differ. Further investigation may be needed, but there is evidence to suggest that a discount of 0.1 has less of an effect than one of 0.15 or 0.25 (doubledigits).

However, the effect size is small, pointing to a low practical significance.
The results could be used to further investigate – and maybe test – the effect of different discount levels as revenue boosters.
. . . . . . . . . . . . . .
Thank you for your reading!
If you want to read more about statistical testing, check out the other posts. The complete Jupyter Notebook can be found here.
I hope you enjoyed reading this article, and am always happy to get
critical and friendly feedback, or suggestions for improvement!