Categorial to Numerical: 12 Data Encoding Techniques for Effective Feature Engineering

btd
9 min readNov 9, 2023

Encoding data is the process of converting categorical or non-numeric data into a numerical format, which can be used for analysis or machine learning models. There are several encoding techniques, and the choice depends on the nature of your data and the specific requirements of your analysis or model. Here are common methods to encode data:

1. Label Encoding:

  • Label encoding assigns a unique integer to each category within a categorical variable.
  • It is suitable for ordinal data, where the categories have a meaningful order.
  • The primary goal is to convert categorical data into a format that can be provided as input to machine learning algorithms, which often require numerical input.
  • Label encoding assumes an ordinal relationship between the categories. This is appropriate when the categories naturally have an inherent order. For example, if the categories are “low,” “medium,” and “high,” label encoding assigns integers in a way that reflects this order.
  • It’s crucial to be cautious when applying label encoding to non-ordinal data, where there is no inherent order among categories. For non-ordinal data, label encoding can introduce a false sense of ordinality…

--

--