Synthetic multivariate data generation procedure with various outlier scenarios using R programming language

A synthetic data generation procedure is a procedure to generate data from either a statistical or mathematical model. The data generation procedure has been used in simulation studies to compare statistical performance methods or propose a new statistical method with a specific distribution. A synt...

Full description

Saved in:
Bibliographic Details
Main Authors: Sharifah Sakinah, Syed Abd Mutalib, Siti Zanariah, Satari, Wan Nur Syahidah, Wan Yusoff
Format: Article
Language:English
Published: Penerbit Universiti Teknologi Malaysia 2022
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/33887/1/2022%20Mutalib%20et%20al%20jurnal%20teknologi.pdf
http://umpir.ump.edu.my/id/eprint/33887/
https://doi.org/10.11113/jurnalteknologi.v84.17900
https://doi.org/10.11113/jurnalteknologi.v84.17900
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.ump.umpir.33887
record_format eprints
spelling my.ump.umpir.338872022-04-27T08:02:42Z http://umpir.ump.edu.my/id/eprint/33887/ Synthetic multivariate data generation procedure with various outlier scenarios using R programming language Sharifah Sakinah, Syed Abd Mutalib Siti Zanariah, Satari Wan Nur Syahidah, Wan Yusoff QA Mathematics A synthetic data generation procedure is a procedure to generate data from either a statistical or mathematical model. The data generation procedure has been used in simulation studies to compare statistical performance methods or propose a new statistical method with a specific distribution. A synthetic multivariate data generation procedure with various outlier scenarios using R is formulated in this study. An outlier generating model is used to generate multivariate data that contains outliers. Data generation procedures for various outlier scenarios by using R are explained. Three outlier scenarios are produced, and graphical representations using 3D scatterplot and Chernoff faces for these outlier scenarios are shown. The graphical representation shows that as the distance between outliers and inliers by shifting the mean,  increases in Outlier Scenario 1, the outliers and inliers are completely separated. The same pattern can also be seen when the distance between outliers and inliers, by shifting the covariance, increase in Outlier Scenario 2. For Outlier Scenario 3, when both values and increase, the separation of outliers and inliers are more apparent. The data generation procedure in this study will be continually used in other applications, such as identifying outliers by using the clustering method. Penerbit Universiti Teknologi Malaysia 2022-04-20 Article PeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/33887/1/2022%20Mutalib%20et%20al%20jurnal%20teknologi.pdf Sharifah Sakinah, Syed Abd Mutalib and Siti Zanariah, Satari and Wan Nur Syahidah, Wan Yusoff (2022) Synthetic multivariate data generation procedure with various outlier scenarios using R programming language. Jurnal Teknologi (Sciences & Engineering), 84 (3). pp. 89-101. ISSN 2180–3722 https://doi.org/10.11113/jurnalteknologi.v84.17900 https://doi.org/10.11113/jurnalteknologi.v84.17900
institution Universiti Malaysia Pahang
building UMP Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Pahang
content_source UMP Institutional Repository
url_provider http://umpir.ump.edu.my/
language English
topic QA Mathematics
spellingShingle QA Mathematics
Sharifah Sakinah, Syed Abd Mutalib
Siti Zanariah, Satari
Wan Nur Syahidah, Wan Yusoff
Synthetic multivariate data generation procedure with various outlier scenarios using R programming language
description A synthetic data generation procedure is a procedure to generate data from either a statistical or mathematical model. The data generation procedure has been used in simulation studies to compare statistical performance methods or propose a new statistical method with a specific distribution. A synthetic multivariate data generation procedure with various outlier scenarios using R is formulated in this study. An outlier generating model is used to generate multivariate data that contains outliers. Data generation procedures for various outlier scenarios by using R are explained. Three outlier scenarios are produced, and graphical representations using 3D scatterplot and Chernoff faces for these outlier scenarios are shown. The graphical representation shows that as the distance between outliers and inliers by shifting the mean,  increases in Outlier Scenario 1, the outliers and inliers are completely separated. The same pattern can also be seen when the distance between outliers and inliers, by shifting the covariance, increase in Outlier Scenario 2. For Outlier Scenario 3, when both values and increase, the separation of outliers and inliers are more apparent. The data generation procedure in this study will be continually used in other applications, such as identifying outliers by using the clustering method.
format Article
author Sharifah Sakinah, Syed Abd Mutalib
Siti Zanariah, Satari
Wan Nur Syahidah, Wan Yusoff
author_facet Sharifah Sakinah, Syed Abd Mutalib
Siti Zanariah, Satari
Wan Nur Syahidah, Wan Yusoff
author_sort Sharifah Sakinah, Syed Abd Mutalib
title Synthetic multivariate data generation procedure with various outlier scenarios using R programming language
title_short Synthetic multivariate data generation procedure with various outlier scenarios using R programming language
title_full Synthetic multivariate data generation procedure with various outlier scenarios using R programming language
title_fullStr Synthetic multivariate data generation procedure with various outlier scenarios using R programming language
title_full_unstemmed Synthetic multivariate data generation procedure with various outlier scenarios using R programming language
title_sort synthetic multivariate data generation procedure with various outlier scenarios using r programming language
publisher Penerbit Universiti Teknologi Malaysia
publishDate 2022
url http://umpir.ump.edu.my/id/eprint/33887/1/2022%20Mutalib%20et%20al%20jurnal%20teknologi.pdf
http://umpir.ump.edu.my/id/eprint/33887/
https://doi.org/10.11113/jurnalteknologi.v84.17900
https://doi.org/10.11113/jurnalteknologi.v84.17900
_version_ 1732945660821372928
score 13.159267