Member-only story

21 Strategies for Data Cleaning and Preprocessing in R

btd
3 min readNov 17, 2023

--

Data cleaning and preprocessing in R involve various tasks, such as handling missing values, removing duplicates, transforming variables, and scaling data. Here’s a list of common R codes for data cleaning and preprocessing:

1. Handling Missing Values:

  • Removing rows with missing values:
data <- na.omit(data)
  • Imputing missing values with mean:
data$variable <- ifelse(is.na(data$variable), mean(data$variable, na.rm = TRUE), data$variable)
  • Imputing missing values with median:
data$variable <- ifelse(is.na(data$variable), median(data$variable, na.rm = TRUE), data$variable)

2. Removing Duplicates:

  • Removing duplicate rows from a data frame
data <- unique(data)
  • Removing duplicate rows based on specific columns:
data <- data[!duplicated(data[, c("col1", "col2")]), ]
  • Removing duplicate columns:
data <- data[, !duplicated(colnames(data))]

3. Transforming Variables:

  • Log transformation:
data$log_variable <…

--

--

btd
btd

Responses (1)