Data Analysis Tools Week-4
Hello guys, I am writing this blog as a part of the week-4 assignment for the Coursera course named Data Analysis Tools part of Data Analysis and Interpretation specialisation. The assignments are about writing one blog for each week presenting your research work done within the week.
So, in week-4 the assignment is about running a statistical interactions test. Here, we need to analyze and interpret post hoc paired comparisons in instances where our original statistical test was significant, we were given to examine more than two groups (i.e. more than two levels of a categorical, explanatory variable).
STEP 1 : Syntax used to run statistical interactions test
def georegion(x):
if x <= -30:
return ‘POLES’
elif x <= 30:
return ‘EQUATOR’
else:
return ‘POLES’
data[‘LATITUDE_BIN’] = data[‘LATITUDE_CIRCLE_IMAGE’].apply(georegion)
data[‘LATITUDE_BIN’] = data[‘LATITUDE_BIN’].astype(‘category’)
print(‘Let us now look at data with only the top 3 morphology types present’)
morphofinterest = [‘Rd’, ‘SLEPS’, ‘SLERS’]
data = data.loc[data[‘MORPHOLOGY_EJECTA_1’].isin(morphofinterest)]
latitude = numpy.array(data[‘LATITUDE_BIN’])
morphology = numpy.array(data[‘MORPHOLOGY_EJECTA_1’])
diameter = numpy.array(data[‘LONGITUDE_CIRCLE_IMAGE’])
data2 = pandas.DataFrame({‘LATITUDE_BIN’:latitude,’MORPHOLOGY_EJECTA_1':morphology,’LONGITUDE_CIRCLE_IMAGE’:diameter}).dropna()
data3 = data2.groupby([‘LATITUDE_BIN’,’MORPHOLOGY_EJECTA_1']).mean()
data3.rename(columns={“LONGITUDE_CIRCLE_IMAGE”:”LONGITUDE_CIRCLE_IMAGE_MEAN”},inplace=True)
data4 = data2.groupby([‘LATITUDE_BIN’,’MORPHOLOGY_EJECTA_1']).std()
data4.rename(columns={“LONGITUDE_CIRCLE_IMAGE”:”LONGITUDE_CIRCLE_IMAGE_STDEV”},inplace=True)
data5 = pandas.concat([data3,data4],axis=1)
data5
gplot = seaborn.factorplot(x=’LATITUDE_BIN’,y=’LONGITUDE_CIRCLE_IMAGE’,data=data2,col=’MORPHOLOGY_EJECTA_1',kind=’bar’)
gplot
for a0 in morphofinterest:
tempdata = data2.loc[data2[‘MORPHOLOGY_EJECTA_1’]==a0]
tempmodel = smf.ols(formula=’LONGITUDE_CIRCLE_IMAGE ~ C(LATITUDE_BIN)’,data=tempdata)
tempresults = tempmodel.fit()
print(‘ANOVA STUDY :’ + a0)
print(tempresults.summary())
STEP 2 : Corresponding output
Let us now look at data with only the top 3 morphology types present
STEP 3 : A few sentences of interpretation
The output shows that the MORPHOLOGY_EJECTA_1=Rd is more spread across the equator in the east side of the longitude, while the MORPHOLOGY_EJECTA_1=SLERS is more spread across the poles in the west side of the longitude and the MORPHOLOGY_EJECTA_1=SLEPS is equally spread across the equator and poles.