Mohit Joshi

Researcher at IIT Bombay KCDH & UCSF Department of Medicine

I got into machine learning research through the medical domain: starting with medical reasoning in language models, then moving into uncertainty quantification for clinical NLP, and eventually into computer vision for stroke segmentation, where I designed a novel mathematical loss function that became my undergraduate thesis.

Alongside this, I started working with Dr. Vivek Rudrapatna at UCSF on applying reinforcement learning to sparse medical data -specifically drug repurposing from electronic health records. That work is pushing me toward what I'm most interested in now: non-rewardable RL for language models and studying consciousness in recursive language models.

My research interests sit at the intersection of Domain Adaptaion, Mathematics, and Machine Learning. I'm drawn to problems like implicit rewards (Non-rewardable RL) through Epiplexity & statistical distribution in reasoning through Recursive Language Model.

I am also interested in exploring other areas, for the sake of "Serendipity" in my Research, - currently I am interested in Scientific Computing & Mathematical Biology (Dynamical Modelling).

Research Highlights

Non-Rewardable Reinforcement Learning (active work)

Offline RL and causal inference for drug repurposing environments
Implicit rewards and Epiplexity in recursive language models for domain-agnostic reasoning

Medical AI & Computer Vision

Size Penalty Loss -novel size-stratified volume regularization for ischemic stroke segmentation (Bachelors of Technology Thesis, patent anticipated)
End-to-end multimodal stroke prognostication framework integrating CT, CTA, DWI/ADC, and FLAIR imaging
Uncertainty-aware X-ray report generation using Bayesian region detection with Monte Carlo Dropout

Reinforcement Learning & Drug Discovery

Offline RL for drug repurposing -modeling Markov decision processes over EHR data with latent state representations (UCSF, ongoing)
Non-rewardable RL framework for recursive language models -implicit rewards via Epiplexity (epistemic complexity)

Computational Biology

Mechanistic deep learning for neurodegeneration using PINNs, neural ODE/SDE, and multi-agent biological game formulations
Protein–DNA affinity interface analysis pipeline (Snakemake + Docker, IIT Jodhpur)
Post-COVID immune signature analysis via scRNA-seq (Seurat, SCTransform)

Research Appointments

University of California San Francisco (UCSF) Apr 2026 – Present

Visiting Researcher (Collaboration)

Advisor: Dr. Vivek Rudrapatna | Department of Medicine

Offline RL and causal inference for drug repurposing using EHR data. Building emulation frameworks to evaluate drug-repurposing candidates using LINCS L1000.
Indian Institute of Technology Bombay (IITB) Aug 2025 – Present

Visiting Researcher (Internship) & Exchange Semester

Advisor: Dr. Kshitij Jadhav | Koita Centre for Digital Health, Dept. of CS

Designed the Size Penalty Loss for stroke segmentation, built multimodal imaging pipelines, worked on uncertainty-aware medical reasoning and mechanistic deep learning for neurodegeneration. Coursework in Computer Vision and Biophysics.
Indian Institute of Technology Jodhpur (IITJ) May 2025 – Jun 2025

Visiting Researcher (Internship)

Advisor: Dr. Sucharita Dey | Biosciences & Bioengineering Department

Developed an automated protein–DNA interface analysis pipeline in Snakemake. Containerized the workflow with Docker, reducing setup time by 90%+.

Education

🎓

Institute of Advanced Research

Bachelor of Technology in Biotechnology

Jul 2022 – Jun 2026 (anticipated) · CGPA: 8.42/10.0 · Top 10% of class

🎓

Indian Institute of Technology Bombay

Semester Exchange Researcher, Koita Centre for Digital Health, Dept. of CS

Courses: Computer Vision (medical imaging), Game Theory in Evolutionary Dynamics(by Prof Supreet Saini) & Biophysics

Jan 2026 – Jun 2026 · Coursework: Computer Vision (medical imaging), Biophysics · Advisor: Dr. Kshitij Jadhav

Research Projects

Undergraduate Thesis

Size Penalty Loss: "What Dice Misses" -Size-Stratified Volume Regularization for Ischemic Stroke Lesion Prognostication

Mohit Joshi

Supervised by Dr. Kshitij Jadhav (IIT Bombay) · May 2026 · Patent anticipated

Small ischemic stroke lesions are difficult targets for deep segmentation models because the positive class can occupy a tiny fraction of the image volume, and voxel-overlap objectives can remain numerically acceptable while clinically important small lesions are missed. This thesis introduces Size Penalty Loss, a continuous size-stratified auxiliary loss for medical image segmentation. The proposed term penalizes relative error between the predicted soft lesion volume and the ground-truth volume, with an exponential weight that gives stronger optimization pressure to smaller lesions. Evaluated with a 3D Attention U-Net using ADC, DWI, and an ADC-DWI mismatch channel on two ischemic stroke MRI datasets. Size Penalty improves missed-lesion count, detection, RVE, RQ, PQ, and ASSD -with the clearest gain in sub-milliliter lesions.

Thesis PDF Slides Project Page

Ongoing

Drug Repurposing and EHR Environment Emulation via Offline Reinforcement Learning

Mohit Joshi

Advisor: Dr. Vivek Rudrapatna · UCSF Department of Medicine · Apr 2026 – Present

Modeling Markov decision processes over EHR data with latent state representations to address sparsity, including Q-value gradient methods. Building an offline RL emulation framework to evaluate drug-repurposing candidates using public resources including LINCS L1000. The goal is to identify new drug targets from observational clinical data without running new trials.

Ongoing · Patent Anticipated

Acute Ischemic Stroke Prognostication -End-to-End Multimodal Framework

Mohit Joshi

Advisor: Dr. Kshitij Jadhav · IIT Bombay KCDH · Jan 2025 – Present

Designed a multimodal medical imaging model using CT, CTA, DWI/ADC, and FLAIR for stroke lesion segmentation and outcome prediction, integrating volumetric features, vascular topology, and clinical variables in a U-Net architecture. Building an end-to-end framework combining 3D segmentation, vessel-level occlusion modeling, and state-space learning to estimate clot burden and predict 90-day outcomes (mRS) within 24h of admission.

IIT Bombay

Uncertainty-Aware Medical Reasoning in Language Models

Mohit Joshi

Advisor: Dr. Kshitij Jadhav · IIT Bombay KCDH · Aug – Dec 2025

Developed a Bayesian region-detection model combining Monte Carlo Dropout for epistemic uncertainty with direct prediction of aleatoric variance, enabling robust, uncertainty-aware X-ray report generation. Implemented a Gaussian NLL objective that predicts bounding-box coordinates together with their inherent σ, allowing the system to flag spatially ambiguous regions. Built an uncertainty-aware gating mechanism that suppresses auto-generated sentences for low-confidence regions and prioritizes radiologist review.

IIT Bombay

Mechanistic Deep Learning for Predicting Neurodegenerative Disease Progression

Mohit Joshi

Advisor: Dr. Kshitij Jadhav · IIT Bombay KCDH · Aug – Dec 2025

Developed a BLIP-2-inspired architecture interfacing frozen foundation models (Swin-UNETR for 3D MRI, scGPT for omics) with trainable Q-Formers to distill ADNI data into compact latent representations. Explored Amyloid-β and Tau pathologies as a multi-agent biological game, using cross-modal attention to identify mechanistic drivers. Prototyped continuous-time forecasting using neural ODE/SDE with adversarial training via the Gillespie algorithm.

IIT Jodhpur

Protein–DNA Affinity Interface Analysis Tool

Mohit Joshi

Advisor: Dr. Sucharita Dey · IIT Jodhpur · May – Jun 2025

Developed an automated protein–DNA interface analysis pipeline in Snakemake to process multichain PDB files, integrating Python, Fortran, and shell scripts with Naccess, HBPLUS, and FreeSASA for comprehensive aggregation. Containerized with Docker, reducing setup time by 90%+. Submitted a detailed report covering objectives, protocol, and findings.

GitHub

Self-Supervised

Immune Signature Analysis in Post-Acute COVID-19 Lung Sequelae

Mohit Joshi

Jan – Apr 2025

Engineered a scalable scRNA-seq pipeline in R (Seurat, SCTransform) analyzing post-COVID lung T cells, identifying persistent pro-inflammatory signatures (IL32, CCL5, CD8A, NKG7). Reactome and DAVID pathway analysis revealed sustained T-cell cytotoxicity and IFN-γ signaling. Proposed an ODE model to translate static gene signatures into dynamic hypotheses of post-acute lung damage.

GitHub

Exploratory Projects

Self-Supervised

Physics-Informed Neural Networks (PINNs) for Partial Differential Equations

Mohit Joshi

Exploratory Project

Implemented Physics-Informed Neural Networks to solve the Burgers' equation and electrostatic potential through the Laplace equation. This project explores the integration of physical laws into neural network loss functions for data-driven discovery of partial differential equations.

GitHub

Self_supervised

Symbolic Regression & Sparse Identification for SIR Epidemic Modeling

Mohit Joshi

Exploratory Project

Conducted a comparative analysis between PySR (Symbolic Regression) and SINDy (Sparse Identification of Nonlinear Dynamics) to model epidemic spread using the SIR mathematical framework. This work examines the interpretability and predictive power of equation discovery methods.

GitHub

Ideas & Future Directions

Primary research direction · In development

Domain-Agnostic Implicit Rewards in Generative Models Decouple Reasoning Quality from Quantity of Knowledgeable

I'm building a framework for systematic guardrails in Recursive Language Models (RLMs) through non-rewardable reinforcement learning. The central claim is that these guardrails can be controlled through implicit rewards rather than hard-coded constraints -and that we can measure them using a new quantity called as Epiplexity (epistemic complexity).

This work challenges the fundamental learning theory for Language Models. Domain fluency in both humans and machines has always been mistaken for intelligence. What we show is that real adaptive capacity is solely not about how much a system knows, but about 'what' path it is taking in it's reasoning journey and 'how' it can differentiate between what it knows and what it doesn't, and crucially, whether it can recognize, without external feedback, when that navigation is 'epistemically productive'.

The Precise point is, "Right Directional Steps" is equally important over "Sole Intelligence" of the model.

We are trying to forward universal 'implicit rewards' governing the internal reasoning in language models that emerge regardless of domain, size, or training -showing that the architecture of how a system monitors its own reasoning, not the scale of its knowledge, is what separates domain-agnostic adaptive intelligence from narrow expertise.
Ongoing with Dr. Vivek Rudrapatna · UCSF

Combining EHR Drug Trial Emulation and Drug Repurposing

Applying offline RL to sparse, noisy electronic health record data to find new drug targets. Medical data is high-stakes -the reward signal is delayed, sparse, and confounded by unmeasured variables. We're modeling Markov decision processes over EHR data with latent state representations, and building emulation frameworks using LINCS L1000 to evaluate drug-repurposing candidates from observational data without running new clinical trials.
The thread connecting everything

Extending RL to Domains Where Rewards Don't Exist

From penalizing what Dice misses in stroke segmentation, to discovering implicit rewards in language model reasoning chains, to emulating drug environments from health records -the common thread across my work is extending reinforcement learning into spaces where the reward signal isn't given and needs to be found. The most interesting problems in AI right now live at exactly this boundary: where explicit supervision ends and implicit structure begins.

News

May 2026	Undergraduate thesis defended -"What Dice Misses": Size Penalty Loss
Apr 2026	Began visiting research at UCSF, USA with Dr. Vivek Rudrapatna
Jan 2026	Exchange semester at IIT Bombay, Koita Centre for Digital Health
Aug 2025	Started research at IIT Bombay with Dr. Kshitij Jadhav
May 2025	Research internship at IIT Jodhpur with Dr. Sucharita Dey
Feb 2025	Best Poster Presenter -CME Immunology, Institute of Advanced Research
Mar 2024	Best Poster Presenter -Annual Research and Innovation Conclave (ARIC)
2024	3rd place -Gujarat Government Healthcare Hackathon

Awards & Honors

IITB Research Internship Award 2025–26 -selected for exchange semester at IIT Bombay from 3,000 candidates
Best Poster Presenter -Enhancer Hijacking in Medulloblastoma, CME Immunology, IAR (Feb 2025)
Best Poster Presenter -Dairy Wastewater Treatment Research, ARIC, IAR (Mar 2024)
3rd Place -Solutions for Healthcare Challenges, Gujarat Government Statewide Hackathon (2024)
Project Funding -secured INR 100,000 for data-driven innovation projects (Bubble Labs & Curiosy, 2022–2024)

Get in Touch

Interested in collaborating, have research questions, or just want to chat? Drop me a message.

mpjoshi2425@gmail.com GitHub CV / Resume

.-- . / .-- .. -. -.-.-- -.-.-- -.-.--