Data Analysis Tools Week-2

Aanshi Patwari
2 min readNov 5, 2020

Hello guys, I am writing this blog as a part of the week-2 assignment for the Coursera course named Data Analysis Tools part of Data Analysis and Interpretation specialisation. The assignments are about writing one blog for each week presenting your research work done within the week.

So, in week-2 the assignment is about running a chi-square analysis test. Here, we need to analyze and interpret post hoc paired comparisons in instances where our original statistical test was significant, we were given to examine more than two groups (i.e. more than two levels of a categorical, explanatory variable).

As I had only one categorical variable, I had to categorize the other variables in order to conduct the chi-square independence test.

STEP 1 : Syntax used to run Chi-square test

data[‘MORPHOLOGY_EJECTA_1’] = data[‘MORPHOLOGY_EJECTA_1’].replace(‘ ‘,numpy.NaN)

def georegion(x):

if x <= -30:

return ‘POLE’

elif x <= 30:

return ‘EQUATOR’

else:

return ‘POLE’

def georegion1(x):

if x <= 0:

return ‘LEFT’

else:

return ‘RIGHT’

data[‘LATITUDE_BIN’] = data[‘LATITUDE_CIRCLE_IMAGE’].apply(georegion)

data[‘LONGITUDE_BIN’] = data[‘LONGITUDE_CIRCLE_IMAGE’].apply(georegion1)

#data[‘LONGITUDE_CIRCLE_BIN’] = pandas.qcut(data[‘LONGITUDE_CIRCLE_IMAGE’],4)

#data[‘LONGITUDE_CIRCLE_BIN’] = data[‘LONGITUDE_CIRCLE_BIN’].astype(‘category’)

#recode POLE and EQUATOR to 0 and 1

recodedict = {‘POLE’:0,’EQUATOR’:1}

data[‘LATITUDE_BIN_RECODE’] = data[‘LATITUDE_BIN’].map(recodedict)

recodedict1 = {‘LEFT’:0,’RIGHT’:1}

data[‘LONGITUDE_BIN_RECODE’] = data[‘LONGITUDE_BIN’].map(recodedict1)

ct1 = pandas.crosstab(data[‘LATITUDE_BIN’],data[‘LONGITUDE_BIN’])

ct1

colsum = ct1.sum(axis=0)

colpct = ct1/colsum

print(colpct)

#chi-square test

print(‘chi-square value, p value, expected counts’)

cs1 = scipy.stats.chi2_contingency(ct1)

print(cs1)

#For our post-hoc analysis for the Chi-Square tests

p1 = itertools.combinations([x for x in range(len(colpct.columns))],2)

list1 = []

list2 = []

for a in p1:

colpct2 = colpct.iloc[:,a]

ct2 = ct1.iloc[:,a]

cs2 = scipy.stats.chi2_contingency(ct2)

print(‘’)

print(colpct2)

print(‘’)

print(ct2)

print(‘’)

print(cs2)

templist = list(cs2)

list1.append(colpct2.columns[0] + ‘,’ + colpct2.columns[1])

list2.append(templist[1])

newdataframe = pandas.DataFrame({‘LONGITUDE_CIRCLE_BIN COMPARISON’:list1,’P VALUES’:list2})

newdataframe

seaborn.factorplot(x=’LONGITUDE_BIN_RECODE’,y=’LATITUDE_BIN_RECODE’,kind=’bar’,data=data,ci=None)

plt.xlabel(‘LONGITUDE_CIRCLE_BIN ‘)

plt.ylabel(‘MEAN OF EQUATORIAL/POLAR CRATERS’)

plt.xticks(rotation=’vertical’)

STEP 2 : Corresponding output

As I got an indexing error, I was not able to get the results and so I was not able to generate the plots for the results.

STEP 3 : A few sentences of interpretation

But by the theory, I can infer that the variables follow the alternate hypothesis and so they are dependent. So, post-hoc analysis is conducted to confirm dependence among the variables as there are levels among the variables.

--

--