Member-only story

Mastering Data Science: 100 Quick Python One-liner Codes for Data Cleaning

btd
5 min readNov 28, 2023

--

Photo by Zak Neilson on Unsplash

Data cleaning is a crucial step in the data preprocessing pipeline, ensuring that the dataset is accurate and ready for analysis or modeling. Here’s a list of 100 concise Python code snippets for various data cleaning tasks. These one-liners cover tasks such as handling missing values, removing duplicates, converting data types, extracting information from datetime columns, and transforming text data. Each line provides a quick solution to a common data cleaning challenge, aiding in the efficient preparation of your data for further exploration and analysis.

  1. Remove duplicate rows: df.drop_duplicates()
  2. Handle missing values with mean: df.fillna(df.mean())
  3. Remove rows with missing values: df.dropna()
  4. Replace missing values with zero: df.fillna(0)
  5. Rename columns: df.rename(columns={'old_name': 'new_name'})
  6. Remove whitespaces from column names: df.columns = df.columns.str.strip()
  7. Convert data type of a column: df['column'] = df['column'].astype('new_type')
  8. Remove special characters from strings: df['column'] = df['column'].str.replace('[^a-zA-Z0-9]', '')
  9. Convert string to lowercase: df['column'] = df['column'].str.lower()

--

--

btd
btd

No responses yet