Data Analysis Tools Week-3

Aanshi Patwari
3 min readNov 5, 2020

Hello guys, I am writing this blog as a part of the week-3 assignment for the Coursera course named Data Analysis Tools part of Data Analysis and Interpretation specialisation. The assignments are about writing one blog for each week presenting your research work done within the week.

So, in week-3 the assignment is about running a Pearson correlation test. Here, we need to analyze and interpret post hoc paired comparisons in instances where our original statistical test was significant, we were given to examine more than two groups (i.e. more than two levels of a categorical, explanatory variable).

STEP 1 : Syntax used to run Pearson correlation test

print(‘First we will take the naive solution and look at the correlation between Mars Crater Latitude and Mars Crater Longitude’)

print(scipy.stats.pearsonr(data[‘LATITUDE_CIRCLE_IMAGE’],data[‘LONGITUDE_CIRCLE_IMAGE’]))

print(‘We then look at the correlation assuming the inverse relationship’)

print(scipy.stats.pearsonr(data[‘LONGITUDE_CIRCLE_IMAGE’],data[‘LATITUDE_CIRCLE_IMAGE’]))

print(‘This plots out the naive plot of looking purely at the Crater latitude vs. Crater Longitude without filtering any data.’)

seaborn.lmplot(x=’LATITUDE_CIRCLE_IMAGE’,y=’LONGITUDE_CIRCLE_IMAGE’,data=data,hue=None)

plt.xlabel(‘CRATER LATITUDE (Degrees)’)

plt.ylabel(‘CRATER LONGITUDE (Degree)’)

plt.title(‘MARS CRATER DATA: LATITUDE vs. LONGITUDE’)

print(‘We now look at the correlations assuming just the 3 most recurring types of craters given ejecta morphology.’)

morphofinterest = [‘Rd’, ‘SLEPS’, ‘SLERS’]

data2 = data.loc[data[‘MORPHOLOGY_EJECTA_1’].isin(morphofinterest)]

longitude = numpy.array(data2[‘LONGITUDE_CIRCLE_IMAGE’])

latitude = numpy.array(data2[‘LATITUDE_CIRCLE_IMAGE’])

morphology = numpy.array(data2[‘MORPHOLOGY_EJECTA_1’])

data3 = pandas.DataFrame({‘LATITUDE’:latitude,’LONGITUDE’:longitude,’MORPHOLOGY_EJECTA_1':morphology})

seaborn.lmplot(x=’LATITUDE’,y=’LONGITUDE’,data=data3,hue=’MORPHOLOGY_EJECTA_1')

plt.xlabel(‘CRATER LATITUDE (Degrees)’)

plt.ylabel(‘CRATER LONGITUDE (Degree)’)

plt.title(‘MARS CRATER DATA: LATITUDE vs. LONGITUDE’)

print(‘We look at the correlation for just the craters with the three morphologies we were interested in.’)

print(scipy.stats.pearsonr(data3[‘LATITUDE’],data3[‘LONGITUDE’]))

#Pearson correlation for different crater types types

summarycorrelations = pandas.DataFrame(columns=(‘MORPHOLOGY_EJECTA_1’,’R’,’R**2',’P-VALUE’))

#Each loop will create a row for the newly made data frame

for a0 in morphofinterest:

templist = []

templist.append(a0)

tempdata = data3.loc[data3[‘MORPHOLOGY_EJECTA_1’].isin([a0])]

tempcor = scipy.stats.pearsonr(tempdata[‘LATITUDE’],tempdata[‘LONGITUDE’])

tempcorlist = list(tempcor)

templist.append(tempcorlist[0])

templist.append(tempcorlist[0]**2)

templist.append(tempcorlist[1])

summarycorrelations.loc[morphofinterest.index(a0)] = templist

summarycorrelations

STEP 2 : Corresponding output

First we will take the naive solution and look at the correlation between Mars Crater Latitude and Mars Crater Longitude

(0.06415867182664797, 0.0)

We then look at the correlation assuming the inverse relationship

(0.06415867182664797, 0.0)

This plots out the naive plot of looking purely at the Crater latitude vs. Crater Longitude without filtering any data.

longitude_latitude

We now look at the correlations assuming just the 3 most recurring types of craters given ejecta morphology.

We look at the correlation for just the craters with the three morphologies we were interested in.

(0.03015638049379891, 1.9534259670067063e-08)

longitude_latitude_morphology_ejecta_1

STEP 3 : A few sentences of interpretation

On performing the test, the results were that the latitude and the longitude are less correlated to each other for detecting the crater at that place. On the other hand, MORPHOLOGY_EJECTA_1 is correlated with the LONGITUDE_CIRCLE_IMAGE and LATITUDE_CIRCLE_IMAGE which is useful in detecting the mars crater spread better.

--

--