Member-only story
Data cleaning and preprocessing in R involve various tasks, such as handling missing values, removing duplicates, transforming variables, and scaling data. Here’s a list of common R codes for data cleaning and preprocessing:
1. Handling Missing Values:
- Removing rows with missing values:
data <- na.omit(data)
- Imputing missing values with mean:
data$variable <- ifelse(is.na(data$variable), mean(data$variable, na.rm = TRUE), data$variable)
- Imputing missing values with median:
data$variable <- ifelse(is.na(data$variable), median(data$variable, na.rm = TRUE), data$variable)
2. Removing Duplicates:
- Removing duplicate rows from a data frame
data <- unique(data)
- Removing duplicate rows based on specific columns:
data <- data[!duplicated(data[, c("col1", "col2")]), ]
- Removing duplicate columns:
data <- data[, !duplicated(colnames(data))]
3. Transforming Variables:
- Log transformation:
data$log_variable <…