Another practical example for conducting a statistical analysis in Python
My second project at Flatiron School required gathering information from a realworld database and using my knowledge of statistical analysis and hypothesis testing to generate analytical insights. We were working with the famous Northwind database – a free, opensource dataset created by Microsoft containing data from a fictional company and widely used for learning SQL. After already conducting a Welch’s ttest, an independent ttest and a oneway ANOVA, this blog covers a twoway ANOVA. The following and last blog of this series covers a ChiSquared Test for Independency.
The complete Jupyter Notebook can be found here.
The question to answer and analyze is:
How do gender and product influence the amount of sales? Do they interact in that a specific gender responds well to specific categories?
Here’s what’s covered in this section:
 Obtaining the data
 Data Cleaning & Feature Engineering
 The Hypothesis
 The Analysis
 Conclusions
Obtaining the data
This section was already discussed extensively in the first blog regarding statistical analysis.
Data Cleaning & Feature Engineering
We will need the following five tables from our SQL database:
 “Order”
 “OrderDetail”
 “Customer”
 “Product” and
 “Category”
We’re using the pd
.read_sql_query() method:
df4 = pd.read_sql_query('''
SELECT c.Id customer_id, c.contactname contact_name, c.contacttitle title, o.Id order_id, od.Quantity quantity, od.UnitPrice unit_price, od.Discount discount, cat.Id cat_id, cat. CategoryName cat_name
FROM OrderDetail od
JOIN [Order] o ON od.OrderId = o.Id
JOIN Customer c ON o.CustomerId = c.Id
JOIN Product p ON od.ProductId = p.Id
JOIN Category cat ON p.categoryId = cat.Id
;''', engine)
Let’s check the newly created dataframe:
df4.head()
How many and what product categories are we up against?
print(df4.cat_name.unique())
Output:
['Dairy Products' 'Grains/Cereals' 'Produce' 'Seafood' 'Condiments' 'Confections' 'Beverages' 'Meat/Poultry']
Also: How many names are we up against?
df4.contact_name.unique()
Output:
array(['Paul Henriot', 'Renate Messner', (......) 'Dominique Perrier', 'Daniel Tonini'], dtype=object)
Okay, how do we define who’s male and who’s female? Simply by using good, old copyandpaste, and defining all male names. A google search can help us out …:
# define the gender for male customers:
male_names = ['Paul Henriot','Mario Pontes', 'Pascale Cartrain',
'Yang Wang', 'Michael Holz', 'Carlos Hernández',
'Roland Mendel', 'Francisco Chang', 'Bernardo Batista',
'Frédérique Citeaux', 'Pirkko Koskitalo', 'Peter Franken', 'Manuel Pereira',
'Karl Jablonski', 'Art Braunschweiger', 'Horst Kloss', 'Giovanni Rovelli',
'Miguel Angel Paolino', 'Alexander Feuer', 'Carlos González', 'Maurizio Moroni',
'Pedro Afonso', 'José Pedro Freyre', 'Rene Phillips', 'Fran Wilson', 'Guillermo Fernández',
'Philip Cramer', 'Jose Pavarotti','Martín Sommer', 'Lino Rodriguez', 'Laurence Lebihan',
'Jean Fresnière', 'Georg Pipps', 'Thomas Hardy', 'Hari Kumar', 'Sven Ottlieb',
'Eduardo Saavedra', 'Palle Ibsen', 'Zbyszek Piestrzeniewicz', 'Yoshi Latimer',
'Jonas Bergulfsen', 'Felipe Izquierdo', 'Paolo Accorti', 'André Fonseca', 'Sergio Gutiérrez',
'John Steel', 'Yoshi Tannamuri', 'Simon Crowther','Patricio Simpson', 'Howard Snyder',
'Helvetius Nagy', 'Jaime Yorres', 'Matti Karttunen', 'Dominique Perrier', 'Daniel Tonini']
# create a new column "gender"
# when contact name is found in the list of male names, label the gender 'm', else: 'f'
df4['gender'] = np.where(df4['contact_name'].isin(male_names), 'm', 'f')
# multiplying quantity and unit price to get a column with gross spending
df4['spend_gross'] = np.multiply(df4['quantity'], df4['unit_price'])
# subtracting discount to get net spending
df4['spend_net'] = np.subtract(df4['spend_gross'], df4['discount'])
df4.head()
The Hypothesis
Now it’s time to define the hypotheses – the null and the alternative one – and set the alpha level. This time we have to define a set of hypotheses for each factor and for the interaction we expect:
Main effect of factor A (product category):
Ho: The mean sales are equal for all 8 product categories.
H1: The mean sales are different for least one product category.
Main effect of factor B (gender):
Ho: The mean sales are equal for both women and men.
H1: The mean sales between women and men are different.
Factor A x factor B interactions:
Ho: There is no interaction between product and gender.
H1: There is an interaction between product and gender.
Significance level is set to the value of 5%.
The Analysis
The twoway ANOVA is an extension of the oneway ANOVA that examines the influence of two different categorical independent variables on one continuous dependent variable. The twoway ANOVA not only aims to assess the main effect of each independent variable but also to see if there is any interaction between them.
Because of the FDistribution’s shape, the significance level is always specified on the right (no onetailed or twotailed specification is necessary).
There are the same 3 assumptions for a twoway ANOVA as for the oneway ANOVA:

Normality
The caveat to this is that if group sizes are equal, the Fstatistic is robust to violations of normality. 
Homogeneity of variance
Same caveat as above; if group sizes are equal, the Fstatistic is robust to this violation. 
Independent observations
Could be assumed.
All goodnessoffit tests are left out here, but can be looked up in detail in my notebook.
The syntax for a twoway ANOVA is quite similar to that of a oneway ANOVA:
import statsmodels.api as sm
from statsmodels.formula.api import ols
# Fit the model with the interaction term
# This will also automatically include the main effects for each factor
model = ols('spend_net ~ C(gender)*C(cat_name)', df4).fit()
# Seeing if the OVERALL model is significant
print(f"Overall model F({model.df_model: .0f},{model.df_resid: .0f}) = {model.fvalue: .3f}, p = {model.f_pvalue: .4f}")
Output:
Overall model F( 15, 2062) = 4.103, p = 0.0000
Excellent, the overall model is significant.
Next let’s create the ANOVA table to check each factor and the interaction term:
# Creating the ANOVA table
result = sm.stats.anova_lm(model, typ=2)
result
The interaction term is not significant (p = 0.918). This indicates that there is no interaction effect between gender and product category on the mean spendings. Hence, the interaction term is to be removed from the model and the model will need to be rerun so we can look at the main effects of each factor independently.
# Fit the model again
model2 = ols('spend_net ~ C(gender) + C(cat_name)', df4).fit()
print(f"Overall model F({model2.df_model: .0f},{model2.df_resid: .0f}) = {model2.fvalue: .3f}, p = {model2.f_pvalue: .4f}")
Output:
Overall model F( 8, 2069) = 7.382, p = 0.0000
Excellent, the model is still significant. Let’s create the ANOVA table a second time:
# Create the ANOVA table
result2 = sm.stats.anova_lm(model2, typ=2)
result2
Each factor, gender and product category, independently has a significant effect on the mean net spending.
On can use a few different effect sizes for an ANOVA: eta squared (η2) and omega squared (ω2). Omega squared is considered a better measure of effect size than eta squared because it is unbiased in its calculation.
+++ Something to note: For some reason, R2 is called eta squared within the ANOVA framework, but they are the same thing. R2 is a measure of how much variance is explained by the model, and is calculated by taking the explained variance and dividing it by the total variance. +++
The following code uses the ANOVA table produced by statsmodels and appends the effect size measures of etasquared (η2) and omegasquared (ω2).
def anova_effect_size(aov):
""" This function takes the Anova table performed with
statsmodels and appends two effect size measures: etasquared
and omegasquared. """
# mean squared
aov['mean_sq'] = aov[:]['sum_sq']/aov[:]['df']
# etasquared (η2)
aov['eta_sq'] = aov[:1]['sum_sq']/sum(aov['sum_sq'])
# omegasquared (ω2)
num = (aov[:1]['sum_sq']  (aov[:1]['df']*aov['mean_sq'][1]))
denom = (sum(aov['sum_sq']) + aov['mean_sq'][1])
aov['omega_sq'] = num / denom
return aov
anova_effect_size(result2)
Each factor, gender and product category, has a small effect on the mean net spendings.
The overall model was significant, so now we want to find the differences between groups of factors – this is called Posthoc Testing. To do so, Tukey’s HSD is one method that can be used to test for differences within each factor separately.
Let’s start with the first factor “gender”:
import statsmodels.stats.multicomp
mc = statsmodels.stats.multicomp.MultiComparison(df4['spend_net'], df4['gender'])
mc_results = mc.tukeyhsd()
print('\n', mc_results, '\n')
The output’s explanation:
 The meandiff column is the difference in means of the two groups being calculated as group2 – group1.
 The lower/upper columns are the lower/upper boundaries of the 95% confidence interval.
 And the reject column states whether or not the null hypothesis should be rejected.
There is a statistically significant difference in the mean net spending between women and men: men’s spending rates are significantly higher than women’s.
Now the second factor, “product category”. Let’s use the visual form for this factor as it’s much easier to spot differences. The standard of comparison is color coded BLUE, the groups that differ significantly are RED and all insignificant groups are GRAY.
The code for the first comparison is provided:
mc = statsmodels.stats.multicomp.MultiComparison(df4['spend_net'], df4['cat_name'])
mc_results = mc.tukeyhsd()
mc_results.plot_simultaneous(comparison_name='Meat/Poultry', figsize=(6,4));
In sum: meat and poultry revenues were the highest, whilst those for seafood were the lowest.
Conclusions
A twoway ANOVA with 𝛼 = 0.05 was used to analyze the effect of the factors gender and product category on net spending:

Each factor, gender and product category, has a statistically significant effect on the mean net spending: a) between gender and spending with p = 9.079449e03, and b) between product category and spending with p = 5.902098e09.

The interaction term, though, was not significant (p = 0.918), indicating no interaction effect between the factors gender and product category on mean spending.

Therefore I may partially reject the null hypothesis in support of the alternative: The mean sales between women and men are different. So are the mean sales between some of the product categories. However, with regards to the interaction effect, I support the null hypothesis.

Posthoc testing revealed interesting details about which groups actually differ: 1. Men’s spending rates are significantly higher than
women’s. 2. Meat and poultry outperform most of the other product categories. 
However, the effect size is small pointing, to a low practical significance.
The results may be used to focus more on a male target group or draft marketing strategies to increase women’s spending. Moreover, the shop could diversify its meat and poultry products.
. . . . . . . . . . . . . .
Thank you for your reading!
If you want to read more about statistical testing, check out the other posts. The complete Jupyter Notebook can be found here.
I hope you enjoyed reading this article, and am always happy to get critical and friendly feedback, or suggestions for improvement!