Member-only story
Data cleaning is a crucial step in the data preprocessing pipeline, ensuring that the dataset is accurate and ready for analysis or modeling. Here’s a list of 100 concise Python code snippets for various data cleaning tasks. These one-liners cover tasks such as handling missing values, removing duplicates, converting data types, extracting information from datetime columns, and transforming text data. Each line provides a quick solution to a common data cleaning challenge, aiding in the efficient preparation of your data for further exploration and analysis.
- Remove duplicate rows:
df.drop_duplicates()
- Handle missing values with mean:
df.fillna(df.mean())
- Remove rows with missing values:
df.dropna()
- Replace missing values with zero:
df.fillna(0)
- Rename columns:
df.rename(columns={'old_name': 'new_name'})
- Remove whitespaces from column names:
df.columns = df.columns.str.strip()
- Convert data type of a column:
df['column'] = df['column'].astype('new_type')
- Remove special characters from strings:
df['column'] = df['column'].str.replace('[^a-zA-Z0-9]', '')
- Convert string to lowercase:
df['column'] = df['column'].str.lower()