Research — Shubham Gajjar

03 · Research

Selected research.

2026

1 paper

In Progress · Research Capstone · April 2026

MorphoCLIP: Cell Microscopy and Text Contrastive Learning

Shubham Gajjar (Team Lead, 3-person team)

MorphoCLIP is a contrastive learning framework that embeds Cell Painting microscopy images and natural-language perturbation descriptions in a shared 512-dimensional space. The image branch uses a frozen DINOv3 ViT-L/16 backbone with a CrossChannelFormer aggregator across five fluorescence channels; the text branch uses BioClinical ModernBERT with a trainable projection head. Trained with Continuously Weighted Contrastive Loss (CWCL) on the CPJUMP1 benchmark (51 plates, 817 perturbations), it reaches text-to-image Recall@10 of 24.3% — about two orders of magnitude above the 0.12% random baseline and 3.3× over CellCLIP, with 185× fewer trainable parameters (8M vs 1.48B). The first such model trained jointly on compounds, CRISPR knockouts, and ORF overexpressions.

Vision-LanguageContrastive LearningCell PaintingDINOv3Drug DiscoveryMicroscopy

2025

3 papers

Published · 4th IEEE World Conference on Applied Intelligence and Computing (AIC 2025)

A Hybrid ResNet-ViT Architecture for Skin Cancer Classification

Shubham Gajjar, Om Rathod, Deep Joshi, Harshal Joshi, Vishal Barot

Hybrid architecture combining a frozen ResNet50 backbone (feature extractor) with Vision Transformer blocks using a four-head multi-head self-attention mechanism, Global Average Pooling, and transformer-based global dependency modeling for seven-class skin lesion classification (melanoma, nevus, basal cell carcinoma, benign keratosis, dermatofibroma, actinic keratosis, vascular lesions). Achieves 96.3% test accuracy, macro F1 0.961, and AUC ~1.00 across all seven classes on HAM10000. Applies stratified data augmentation (rotation ±20°, horizontal/vertical flips, brightness ±25 / contrast ±10%) to address class imbalance, scaling 10,015 source images to ~74,353. Trained with the Nadam optimizer (lr 0.001) and sparse categorical crossentropy on a 70/15/15 stratified split (7,010 / 1,503 / 1,502).

Computer VisionDeep LearningMedical AISkin CancerResNetVision TransformerIEEE Xplore →GitHub →DOI: 10.1109/AIC66080.2025.11212073

Under Review · Manuscript · July 2025

VGG16-MCA UNet: A Hybrid Deep Learning Approach for Enhanced Brain Tumor Segmentation in FLAIR MRI

Shubham Gajjar, Deep Joshi, Avi Poptani, Vishal Barot

Hybrid segmentation framework integrating a pretrained VGG16 encoder with a Multi-Channel Attention decoder for brain tumor segmentation in FLAIR MRI. Applies Focal Tversky Loss to address severe class imbalance (tumor regions ~2–5% of total image area), with ensemble learning over multiple checkpoints and architectural configurations. Skip connections, batch normalization, and dropout regularize against overfitting. Achieves 99.59% accuracy and 99.71% specificity on the LGG Brain MRI Segmentation dataset (110 low-grade glioma patients, TCGA). Trained with Adam (lr 0.05), ReduceLROnPlateau, and EarlyStopping; 35 systematic experiments compare against standard UNet, Attention UNet, and Scaler Attention UNet. Preprocessing pipeline: skull stripping, intensity normalization, 256×256 resize. Improves Dice and IoU with enhanced boundary delineation.

Computer VisionDeep LearningMedical AIBrain TumorUNetFLAIR MRI

Under Review · Manuscript · 2025

Extended ResNet50: Inverse Soft Mask Attention for Skin Cancer Classification

Shubham Gajjar, Om Rathod, Deep Joshi, Harshal Joshi, Vishal Barot

Two-stage pipeline integrating a U-Net++ hair segmentation model with an Extended ResNet50 classifier featuring a novel Inverse Soft Mask Attention mechanism. Dense residual blocks and Squeeze-and-Excitation modules with learnable weighted feature aggregation combine hair-occluded and unoccluded image regions. Achieves 97.89% test accuracy, 99.67% train, and 97.74% validation at epoch 22 on HAM10000 (10,015 dermoscopic images, seven classes). Trained with Nadam + Cosine Decay Restarts and sparse categorical crossentropy. Systematic experimentation across 21 architectural trials covering Vision Transformers, hybrid models, and custom attention mechanisms. Outperforms SCCNet (95.20%), VCCINet (93.18%), and SPCB-Net (97.10%).

Computer VisionMedical AISkin CancerAttention MechanismsResNetDermatoscopic

Reach out

Got a question? Let's talk.

Get in touch →← Back to home