Data Analysis Tools Week-4

2 min readNov 5, 2020

Hello guys, I am writing this blog as a part of the week-4 assignment for the Coursera course named Data Analysis Tools part of Data Analysis and Interpretation specialisation. The assignments are about writing one blog for each week presenting your research work done within the week.

So, in week-4 the assignment is about running a statistical interactions test. Here, we need to analyze and interpret post hoc paired comparisons in instances where our original statistical test was significant, we were given to examine more than two groups (i.e. more than two levels of a categorical, explanatory variable).

STEP 1 : Syntax used to run statistical interactions test

def georegion(x):

if x <= -30:

return ‘POLES’

elif x <= 30:

return ‘EQUATOR’

else:

return ‘POLES’

data[‘LATITUDE_BIN’] = data[‘LATITUDE_CIRCLE_IMAGE’].apply(georegion)

data[‘LATITUDE_BIN’] = data[‘LATITUDE_BIN’].astype(‘category’)

print(‘Let us now look at data with only the top 3 morphology types present’)

morphofinterest = [‘Rd’, ‘SLEPS’, ‘SLERS’]

data = data.loc[data[‘MORPHOLOGY_EJECTA_1’].isin(morphofinterest)]

latitude = numpy.array(data[‘LATITUDE_BIN’])

morphology = numpy.array(data[‘MORPHOLOGY_EJECTA_1’])

diameter = numpy.array(data[‘LONGITUDE_CIRCLE_IMAGE’])

data2 = pandas.DataFrame({‘LATITUDE_BIN’:latitude,’MORPHOLOGY_EJECTA_1':morphology,’LONGITUDE_CIRCLE_IMAGE’:diameter}).dropna()

data3 = data2.groupby([‘LATITUDE_BIN’,’MORPHOLOGY_EJECTA_1']).mean()

data3.rename(columns={“LONGITUDE_CIRCLE_IMAGE”:”LONGITUDE_CIRCLE_IMAGE_MEAN”},inplace=True)

data4 = data2.groupby([‘LATITUDE_BIN’,’MORPHOLOGY_EJECTA_1']).std()

data4.rename(columns={“LONGITUDE_CIRCLE_IMAGE”:”LONGITUDE_CIRCLE_IMAGE_STDEV”},inplace=True)

data5 = pandas.concat([data3,data4],axis=1)

data5

gplot = seaborn.factorplot(x=’LATITUDE_BIN’,y=’LONGITUDE_CIRCLE_IMAGE’,data=data2,col=’MORPHOLOGY_EJECTA_1',kind=’bar’)

gplot

for a0 in morphofinterest:

tempdata = data2.loc[data2[‘MORPHOLOGY_EJECTA_1’]==a0]

tempmodel = smf.ols(formula=’LONGITUDE_CIRCLE_IMAGE ~ C(LATITUDE_BIN)’,data=tempdata)

tempresults = tempmodel.fit()

print(‘ANOVA STUDY :’ + a0)

print(tempresults.summary())

STEP 2 : Corresponding output

Let us now look at data with only the top 3 morphology types present

STEP 3 : A few sentences of interpretation

The output shows that the MORPHOLOGY_EJECTA_1=Rd is more spread across the equator in the east side of the longitude, while the MORPHOLOGY_EJECTA_1=SLERS is more spread across the poles in the west side of the longitude and the MORPHOLOGY_EJECTA_1=SLEPS is equally spread across the equator and poles.

Data Analysis Tools Week-4

STEP 1 : Syntax used to run statistical interactions test

STEP 2 : Corresponding output

STEP 3 : A few sentences of interpretation

Written by Aanshi Patwari