Member-only story
【Data Science Project】 Explainable AI: Scene Classification with ResNet-18 and Grad-CAM Visualization

Introduction
Scene Classification is a special task in Computer Vision. Unlike Object Classification, which focuses on classifying prominent objects in the foreground, Scene Classification uses the layout of objects within the scene, in addition to the ambient context, for classification (King et al, 2017). This project could be practically used for detecting the type of scenery from the satellite images and from that we will be able to find solutions to challenges relating to climate, agriculture, water and biodiversity.
Explainable AI or XAI is an emerging field in machine learning that aims to address how the black box decisions of AI systems are made i.e. explain to humans how an AI system made a decision. XAI is used to describe an AI model for both of its expected impact and potential biases (IBM, n.d.). Explainability can help developers ensure that the system is working as expected, it might be necessary to meet regulatory standards, or it might be important in allowing those affected by a decision to challenge or change that outcome (IBM, n.d.).
There are many approaches to explain CNN outputs such as: Activations Visualization, Vanilla Gradients, Occlusion Sensitivity, CNN Fixations, Class Activation Mapping (CAM), and finally Gradient-Weighted Class Activation Mapping (Grad-CAM). In this project, we choose to use Grad-CAM.
Problem Statement
In this project, we will build and train a Deep Convolutional Neural Network (CNN) with Residual Blocks to detect the type of scenery in an image. In addition, we will also use a technique known as Gradient-Weighted Class Activation Mapping (Grad-CAM) to visualize the regions of the inputs and help us explain how our CNN models think and make decision.
Methodology

We are going to feed in the image (96x96)
into a ResNet-18 model to classify if the scene belong to any one of the 6 classes: Building
, Forest
, Glacier
, Mountain
, Sea
, and Street
.