Sometimes raw data is very difficult to understand. So, it needs to be preprocessed before retrieving any information from it. Data transformation is a technique used to convert the raw data into a suitable format that helps in the data mining process to retrieve strategic information from data. Data transformation is an essential data preprocessing technique that must be performed on the data before data mining to provide patterns that are easier to understand.
Data transformation is the process of changing the format, structure, or values of data from one to another.
Following are the types of options to transform data in Data Mining:
- range: Normalize values so it ranges between 0 and 1
- center: Subtract Mean
- scale: Divide by Standard Deviation
- BoxCox: Remove skewness leading to normality. Values must be > 0
- YeoJohnson: Like BoxCox, but works for negative values.
- expoTrans: Exponential transformation, works for negative values.
- pca: Replace with principal components
- ica: Replace with independent components
- spatialSign: Project the data to a unit circle
In this tutorial, I tried to brief about preprocess options to transform data in Data Mining. Hope you have enjoyed the tutorial. If you want to get updated, like the facebook page https://www.facebook.com/LearningBigDataAnalytics
and stay connected.