Robust correlation coefficient based on robust scale and location estimator
The correlation coefficient is the common statistical analysis that has been used in measuring the relationship between two variables. The most frequently used correlation coefficients is the Pearson correlation coefficient. This coefficient is powerful when the assumptions of linearity between two...
保存先:
第一著者: | |
---|---|
フォーマット: | 学位論文 |
言語: | English English English |
出版事項: |
2018
|
主題: | |
オンライン・アクセス: | https://etd.uum.edu.my/9137/1/s818475_01.pdf https://etd.uum.edu.my/9137/2/s818475_02.pdf https://etd.uum.edu.my/9137/3/s818475_references.docx https://etd.uum.edu.my/9137/ |
タグ: |
タグ追加
タグなし, このレコードへの初めてのタグを付けませんか!
|
id |
my.uum.etd.9137 |
---|---|
record_format |
eprints |
spelling |
my.uum.etd.91372022-03-28T00:41:33Z https://etd.uum.edu.my/9137/ Robust correlation coefficient based on robust scale and location estimator Nur Amira, Zakaria QA273-280 Probabilities. Mathematical statistics The correlation coefficient is the common statistical analysis that has been used in measuring the relationship between two variables. The most frequently used correlation coefficients is the Pearson correlation coefficient. This coefficient is powerful when the assumptions of linearity between two variables and the normality of the distribution are fulfilled. However, this correlation coefficient unable to perform well with the presence of the outlier in the data. The calculation of the Pearson correlation coefficient uses mean, which known to be very sensitive to the outlier. Alternatively, the Spearman rank correlation coefficient and Kendall’s Tau correlation coefficient are the solutions for this problem. The usage of rank in the calculation of these coefficients instead of original observation lead to losing useful information. For that reason, this study focusing on robust correlation approach based on the median. The existence of median based correlation coefficient used Median Absolute Deviation (MAD) as it scales estimator. Nevertheless, the MAD has low efficiency under Gaussian distribution and this estimator only view dispersion on symmetric distribution. Thus, this study modified the median based correlation using two approaches. Firstly, using the same median based correlation, this study proposed another robust scale estimator namely MADn, Sn, and Qn. Secondly, this study changed the median based correlation to the Hodges Lehmann based correlation and employed all robust scale estimators that are median, MAD, MADn, Sn, and Qn. The performances of the proposed procedures were evaluated based on two conditions of simulation data; perfect and contaminated data. Three indicators were used in evaluating the performance of the proposed procedures which are the correlation coefficient value, the average bias and the standard error. The proposed procedures were validated using a real dataset. The results of the simulation data show that the Qn correlation coefficient and Hodges Lehmann- Qn correlation coefficient performed better under contaminated data compared to the Pearson correlation coefficient and other existing robust correlation coefficients. As the conclusion, the Qn correlation coefficient and the Hodges Lehmann- Qn correlation coefficient are the good alternatives for the Pearson correlation coefficient when there is the outlier in the data. 2018 Thesis NonPeerReviewed text en https://etd.uum.edu.my/9137/1/s818475_01.pdf text en https://etd.uum.edu.my/9137/2/s818475_02.pdf text en https://etd.uum.edu.my/9137/3/s818475_references.docx Nur Amira, Zakaria (2018) Robust correlation coefficient based on robust scale and location estimator. Masters thesis, Universiti Utara Malaysia. |
institution |
Universiti Utara Malaysia |
building |
UUM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Utara Malaysia |
content_source |
UUM Electronic Theses |
url_provider |
http://etd.uum.edu.my/ |
language |
English English English |
topic |
QA273-280 Probabilities. Mathematical statistics |
spellingShingle |
QA273-280 Probabilities. Mathematical statistics Nur Amira, Zakaria Robust correlation coefficient based on robust scale and location estimator |
description |
The correlation coefficient is the common statistical analysis that has been used in measuring the relationship between two variables. The most frequently used correlation coefficients is the Pearson correlation coefficient. This coefficient is powerful when the assumptions of linearity between two variables and the normality of the distribution are fulfilled. However, this correlation coefficient unable to perform well with the presence of the outlier in the data. The calculation of the Pearson correlation coefficient uses mean, which known to be very sensitive to the outlier. Alternatively, the Spearman rank correlation coefficient and Kendall’s Tau correlation coefficient are the solutions for this problem. The usage of rank in the calculation of these coefficients instead of original observation lead to losing useful information. For that reason, this study focusing on robust correlation approach based on the median. The existence of median based correlation coefficient used Median Absolute Deviation (MAD) as it scales estimator. Nevertheless, the MAD has low efficiency under Gaussian distribution and this estimator only view dispersion on symmetric distribution. Thus, this study modified the median based correlation using two approaches. Firstly, using the same median based correlation, this study proposed another robust scale estimator namely MADn, Sn, and Qn. Secondly, this study changed the median based correlation to the Hodges Lehmann based correlation and employed all robust scale estimators that are median, MAD, MADn, Sn, and Qn. The performances of the proposed procedures were evaluated based on two conditions of simulation data; perfect and contaminated data. Three indicators were used in evaluating the performance of the proposed procedures which are the correlation coefficient value, the average bias and the standard error. The proposed procedures were validated using a real dataset. The results of the simulation data show that the Qn correlation coefficient and Hodges Lehmann- Qn correlation coefficient performed better under contaminated data compared to the Pearson correlation coefficient and other existing robust correlation coefficients. As the conclusion, the Qn correlation coefficient and the Hodges Lehmann- Qn correlation coefficient are the good alternatives for the Pearson correlation coefficient when there is the outlier in the data. |
format |
Thesis |
author |
Nur Amira, Zakaria |
author_facet |
Nur Amira, Zakaria |
author_sort |
Nur Amira, Zakaria |
title |
Robust correlation coefficient based on robust scale and location estimator |
title_short |
Robust correlation coefficient based on robust scale and location estimator |
title_full |
Robust correlation coefficient based on robust scale and location estimator |
title_fullStr |
Robust correlation coefficient based on robust scale and location estimator |
title_full_unstemmed |
Robust correlation coefficient based on robust scale and location estimator |
title_sort |
robust correlation coefficient based on robust scale and location estimator |
publishDate |
2018 |
url |
https://etd.uum.edu.my/9137/1/s818475_01.pdf https://etd.uum.edu.my/9137/2/s818475_02.pdf https://etd.uum.edu.my/9137/3/s818475_references.docx https://etd.uum.edu.my/9137/ |
_version_ |
1729706558272569344 |
score |
13.251813 |