Data Analysis Tools Week-3
Hello guys, I am writing this blog as a part of the week-3 assignment for the Coursera course named Data Analysis Tools part of Data Analysis and Interpretation specialisation. The assignments are about writing one blog for each week presenting your research work done within the week.
So, in week-3 the assignment is about running a Pearson correlation test. Here, we need to analyze and interpret post hoc paired comparisons in instances where our original statistical test was significant, we were given to examine more than two groups (i.e. more than two levels of a categorical, explanatory variable).
STEP 1 : Syntax used to run Pearson correlation test
print(‘First we will take the naive solution and look at the correlation between Mars Crater Latitude and Mars Crater Longitude’)
print(scipy.stats.pearsonr(data[‘LATITUDE_CIRCLE_IMAGE’],data[‘LONGITUDE_CIRCLE_IMAGE’]))
print(‘We then look at the correlation assuming the inverse relationship’)
print(scipy.stats.pearsonr(data[‘LONGITUDE_CIRCLE_IMAGE’],data[‘LATITUDE_CIRCLE_IMAGE’]))
print(‘This plots out the naive plot of looking purely at the Crater latitude vs. Crater Longitude without filtering any data.’)
seaborn.lmplot(x=’LATITUDE_CIRCLE_IMAGE’,y=’LONGITUDE_CIRCLE_IMAGE’,data=data,hue=None)
plt.xlabel(‘CRATER LATITUDE (Degrees)’)
plt.ylabel(‘CRATER LONGITUDE (Degree)’)
plt.title(‘MARS CRATER DATA: LATITUDE vs. LONGITUDE’)
print(‘We now look at the correlations assuming just the 3 most recurring types of craters given ejecta morphology.’)
morphofinterest = [‘Rd’, ‘SLEPS’, ‘SLERS’]
data2 = data.loc[data[‘MORPHOLOGY_EJECTA_1’].isin(morphofinterest)]
longitude = numpy.array(data2[‘LONGITUDE_CIRCLE_IMAGE’])
latitude = numpy.array(data2[‘LATITUDE_CIRCLE_IMAGE’])
morphology = numpy.array(data2[‘MORPHOLOGY_EJECTA_1’])
data3 = pandas.DataFrame({‘LATITUDE’:latitude,’LONGITUDE’:longitude,’MORPHOLOGY_EJECTA_1':morphology})
seaborn.lmplot(x=’LATITUDE’,y=’LONGITUDE’,data=data3,hue=’MORPHOLOGY_EJECTA_1')
plt.xlabel(‘CRATER LATITUDE (Degrees)’)
plt.ylabel(‘CRATER LONGITUDE (Degree)’)
plt.title(‘MARS CRATER DATA: LATITUDE vs. LONGITUDE’)
print(‘We look at the correlation for just the craters with the three morphologies we were interested in.’)
print(scipy.stats.pearsonr(data3[‘LATITUDE’],data3[‘LONGITUDE’]))
#Pearson correlation for different crater types types
summarycorrelations = pandas.DataFrame(columns=(‘MORPHOLOGY_EJECTA_1’,’R’,’R**2',’P-VALUE’))
#Each loop will create a row for the newly made data frame
for a0 in morphofinterest:
templist = []
templist.append(a0)
tempdata = data3.loc[data3[‘MORPHOLOGY_EJECTA_1’].isin([a0])]
tempcor = scipy.stats.pearsonr(tempdata[‘LATITUDE’],tempdata[‘LONGITUDE’])
tempcorlist = list(tempcor)
templist.append(tempcorlist[0])
templist.append(tempcorlist[0]**2)
templist.append(tempcorlist[1])
summarycorrelations.loc[morphofinterest.index(a0)] = templist
summarycorrelations
STEP 2 : Corresponding output
First we will take the naive solution and look at the correlation between Mars Crater Latitude and Mars Crater Longitude
(0.06415867182664797, 0.0)
We then look at the correlation assuming the inverse relationship
(0.06415867182664797, 0.0)
This plots out the naive plot of looking purely at the Crater latitude vs. Crater Longitude without filtering any data.
We now look at the correlations assuming just the 3 most recurring types of craters given ejecta morphology.
We look at the correlation for just the craters with the three morphologies we were interested in.
(0.03015638049379891, 1.9534259670067063e-08)
STEP 3 : A few sentences of interpretation
On performing the test, the results were that the latitude and the longitude are less correlated to each other for detecting the crater at that place. On the other hand, MORPHOLOGY_EJECTA_1 is correlated with the LONGITUDE_CIRCLE_IMAGE and LATITUDE_CIRCLE_IMAGE which is useful in detecting the mars crater spread better.