March, 2023

Braintumorclassification & Grad-CAM

An approach to explain Convolutional Neural Networks to clinical decision makers

Convolutional neural networks have proven to be effective in classifying images in the medical domain. However, the lack of interpretability in these models raises concerns for clinicians who rely on explanations for clinical decision making. This private project aimed to develop an mvp solution by incorporating multiple visualization techniques to improve the interpretability of the model.

Model

The project used transfer-learning to compare several pre-trained models, including VGG16, EfficientNetV2B3, InceptionNet, and GoogleNet. The EfficientNetV2B3 model was selected based on its high accuracy of 97% on the test set and its relatively small size, which was an important metric for the project as the model was version-controlled using MLFlow, which had a limit on upload size.

Explainability

The project incorporated several visualization techniques, including grad-cam, activation visualizations, vanillagrad, and occlusion sensitivity maps, to improve the interpretability of the model. Among these techniques, grad-cam was found to provide the best results in terms of interpretability.

Grad-CAM is a technique used to understand how a convolutional neural network (CNN) makes predictions. It highlights the areas of an image that were most important in making the prediction. The technique uses the last layer of the CNN, which contains a set of feature maps. These feature maps highlight different aspects of the image. Grad-CAM computes the importance of each feature map by looking at how much it contributed to the final prediction. Grad-CAM computes a heatmap that shows the most important areas of the image for a particular prediction. It works by computing the gradient (or slope) of the prediction for a specific class with respect to each feature map. This gradient gives us an idea of how much each feature map contributed to the prediction. The gradients are then used to compute a weighted sum of the feature maps, resulting in a heatmap that highlights the most important areas of the image for that prediction.

Features Implemented:

Loading and preprocessing data
Initiating and training the model
Evaluating and predicting with the model
Returning grad-cam explainability images of predictions and other visualization techniques
Saving the model in MLFlow or Google Cloud
Loading the model from MLFlow or Google Cloud
An API to upload images from the frontend and return predictions and grad-cam images

Tech Stack:

Tensorflow.keras for the deep learning model
Tf-explain for the visualization techniques
FastAPI and Uvicorn for the API
MLflow for version-controlling the model

Conclusion:

This private project demonstrates the use of multiple visualization techniques to improve the interpretability of convolutional neural networks for clinical decision making. The EfficientNetV2B3 model was found to be the best trade-off between accuracy and model size, and grad-cam was found to provide the best results in terms of interpretability. The project also demonstrates the use of modern technologies such as FastAPI, Uvicorn, and MLflow for building and version-controlling deep learning models.

Open Source:

This project is public, feel free to inspect the corresponding git repository.