Vidhi Jain

Machine Learning Researcher | Carnegie Mellon University

I am a Machine Learning graduate student at CMU, working under Professor Aarti Singh (NSF AI-SDM). My research focuses on AI-driven decision-making, cognitive alignment, and multimodal systems.

My research lies at the intersection of AI-driven decision-making, cognition, and psychology. I design frameworks that bridge the gap between how humans think and how AI reasons,adapts and perceives

I'm fascinated by how we can embed human traits—like memory, emotion, perception, and social reasoning, into AI systems. I focus on Theory of Mind, mental models, and emotional cues—bridging neuroscience, psychology, and LLMs.

RESEARCH INTERESTS:

Human-AI Co-evolution Cognitive Alignment Multimodal Learning Decision-Making Human Perception & AI Theory of Mind in AI AI reasoning

Publications & Research

Mind the Gap: Bridging AI-Human Cognitive Misalignment Through Psychological Personas

V Jain* et al, AAAI [in review]

we present the Integrated Cognitive-Persona Alignment Framework (ICPAF), which aligns LLM reasoning with human cognitive patterns through three stages: (1) Psychological Feature Extraction and Persona Discovery, (2) Cognitive Alignment Analysis, and (3) Cognitive-Persona Integration. Our framework discovers latent psychological personas using unsupervised learning and applies theory-grounded cognitive steering based on dual-process theory and established psychological frameworks.

Paper (In Review)

Contextual Bandits with Online Arm Generation

J. Xu*, V. Jain*, NeurIPS [in review]

Explores contextual bandits with online arm generation, presenting novel approaches to dynamic decision-making in machine learning systems.

Paper (In Review)

Ambient Intelligence-based multimodal human action recognition for autonomous systems

V Jain* et al, ISA Transactions [Paper]

Presents a method based on Bi-Convolutional Recurrent Neural Network (Bi-CRNN)-based Feature Extraction and Random Forest classification for human action recognition in autonomous robots.

Read Paper

Maternal Health Agent

Research Project

Designed safety-critical RAG pipelines with HNSW indexing, hybrid semantic chunking, and multi-stage reranking, achieving +18% retrieval relevance and –25% latency using vLLM. Integrated emergency-aware reasoning framework with symptom triage, intent classification, and medical guardrails with knowledge distillation of open-source LLMs ( med-alpacca, lamma).

View Project

Can Deep Neural Network Mimic Human Perception

Research Project

Advanced human visual attention modeling using PredNet, achieving 23% improvement over baseline saliency predictions. Investigated AI-visual human perception gaps in self-supervised learning systems (SimCLR, BYOL, etc), developing StyleGAN-based data augmentation techniques that improved model robustness by 18%, mitigating perceptual phenomena.

View Code

Behavior Analysis in Songbirds

Research Project

Developed a scalable behavioral analytics pipeline to segment and cluster over 400 hours of songbird vocalizations using spectrogram analysis, quantifying the impact of motion and sleep deprivation on cognitive performance patterns. Daily syllable "crystallization" followed a gradient descent-like process, mapping shifts between exploratory and exploitative behavior.

View Code

Selected Projects

Neurascribe 2025

A voice-to-voice knowledge graph-based journaling-wellness platform that integrates memory weaving and episodic memory recall

Website

Neurascribe

● A voice-to-voice journaling-wellness platform that integrates memory weaving and episodic memory recall, leveraging hybrid retrieval from vector embeddings (Pinecone) and knowledge graphs, reducing latency by 40% using vLLMs. ● Implemented multi-stage reranking pipeline with emotional inference and fine-tuned LLMs using LoRA adaptation.

Knowledge Graphs vLLMs Pinecone Memory Weaving Episodic Memory Recall Emotional Inference LoRA

Website

Cognitive Multimodal Speech Recognition 2025

Multimodal lip-reading + self-supervised audio with knowledge distillation.

GitHub

Cognitive Multimodal Speech Recognition

Built multimodal pipeline combining vision (Transformer lip-reading, ViT-B/16) and self-supervised audio (HuBERT, wav2vec 2.0). Applied knowledge distillation to compress large multimodal models, reducing size by 60% with minimal WER/CER drop.

PyTorch Transformers Computer Vision Audio pipeline Knowledge Distillation

GitHub

Needle 2024

A full-stack deep learning library from scratch with Cuda-accelerated operations and automatic differentiation.

Github

Needle

Designed and implemented a full-stack deep learning library from scratch with Cuda-accelerated operations and automatic differentiation. Developed modular components, including parameterized layers, loss functions, optimizers, and data loaders. Built state-of-the-art models: CNNs, RNNs, and Transformers for tasks such as image classification and language modeling.

Python Cuda Automatic Differentiation Parameterized Layers Loss Functions Optimizers Data Loaders

Github

Hybrid Attention Multi-modal Learning for Perceived Mental Workload Classification 2022

Hybrid attention fusion across modalities for workload prediction.

Details

HAMML Workload Classification

The project is based on the applicability of the brain-computer interface using multimodal biosignals and attention networks for quantifying cognitive load in well-validated problem-solving tasks. The proposed approach firstly models each modality through LSTM and Convolution based self-attention. Then it applies bi-modal attention to pairwise modalities and tries to learn the contributing features amongst them. It gives an accuracy of 92.3%.

LSTM Convolution Bi-modal Attention PyTorch Self-attention

Details

Agri AI 2021

Intelligent crop monitoring system using computer vision and ML to predict yields, detect diseases, and optimize farming practices.

GitHub

Agri AI

Deployed an end-to-end web application to assist farmers in administering their crop quality, suggesting crop price, and predicting the yield taking into account the soil and crop type, all by the comfort of a click of a button. Used CatBoost Classifier, Gaussian Naïve Bayes Model, MobileNet, Random Forest Regressor, and Deep bidirectional LSTM Models one for each feature in the project. The aggregate accuracy achieved by the 5 Models is 91%.

CatBoost TensorFlow OpenCV IoT MobileNet

Github

Covi Home Bot 2021

Home triage assistant for COVID-era symptom checks and resources.

Details

Covi Home Bot

AI-powered chatbot providing real-time COVID-19 symptom assessment, resource recommendations, and emergency guidance during the pandemic.

Python NLP Dialogflow Flask BERT Elasticsearch

Details