Publications
My research spans large language models, multilingual NLP, evaluation, and applications in scientific domains. All preprints are available on arXiv.
Published
We present DNIPRO, a novel longitudinal, multinational, and multilingual corpus containing 246K+ news articles covering the Russo-Ukrainian War. The dataset enables analysis of how geopolitical narratives evolve over time and differ across national and linguistic boundaries, providing insights into information dynamics and framing strategies.
LREC 2026 β arXiv:2601.16309 βInvestigating how incorporating author-level context can address ecological fallacy issues in large language models, particularly when models are trained on aggregate-level data but used to make individual-level predictions.
arXiv:2603.05928 βA comprehensive overview of how big data and omics technologies are transforming fisheries biology, covering new computational approaches and their impact on understanding aquatic ecosystems.
ResearchGate βUnder Review
We introduce PolyBench, a comprehensive benchmark with 125K+ tasks for evaluating compositional reasoning in polymer design and synthesis. The benchmark includes knowledge-augmented reasoning traces and diagnostic evaluations that reveal specific skill gaps in how LLMs handle multi-constraint reasoning problems. Our analysis provides insights into improving LLM capabilities for scientific applications.
arXiv:2601.16312 βIn Preparation
How do composite reward signals β combining validity, novelty, and diversity β shape LLM post-training on open-ended polymer design tasks, and what does this reveal about building models capable of generating scientifically feasible hypotheses under competing objectives?
Developing benchmarks to evaluate how well large language models can incorporate and reason about human context, including individual differences, cultural backgrounds, and personal characteristics in their responses.
A large-scale corpus of human language data designed to support research on human-context-aware language models and individual-level language understanding.
Exploring fundamental human traits that can be identified and measured through language patterns, combining computational linguistics with psychometric analysis to discover language-based factors.
Developing benchmarks for evaluating how well LLMs can extract structured information about battery electrolyte properties from scientific literature, with applications to materials science research acceleration.
Experience
My journey through academia and industry has shaped how I approach researchβbalancing theoretical rigor with practical impact.
Conducting research on large language models with a focus on compositional reasoning, retrieval-augmented generation, and evaluation. Working on developing benchmarks and techniques to improve LLM capabilities in scientific domains. Collaborating with interdisciplinary teams on projects funded by DARPA and NSF.
Key achievements: Two papers under review at top-tier venues (ACL, LREC), contributor to DARPA SciFy program, recipient of SUNY RF Academic Fellowship.
Led and mentored a team of data scientists delivering production ML platforms for tax and financial analytics supporting 40+ enterprise clients. Owned system architecture across modeling, scalable ML APIs, deployment, and MLOps in highly regulated environments.
Designed human-in-the-loop ML systems with feedback-driven metrics and active learning, driving sustained cost reduction and operational efficiency. Built end-to-end pipelines from data ingestion to model deployment, ensuring compliance with enterprise security and governance standards.
Recognition: EY Ovation Award (FY2022-23, Top 5% performer), EY Bravo Awards (FY2020-23), AI Challenge Winner (FY2020, FastAI Kaggle competition).
Built predictive models to estimate early seller success and risk for automated lending and loan allocation decisions in Amazon Seller Lending. Developed behavioral features and comprehensive experimentation pipelines including offline validation, A/B testing frameworks, and feature attribution analysis.
Engineered features that ranked in the top decile among ~30K existing features, demonstrating significant predictive power for seller success metrics.
Built end-to-end ML/NLP pipelines for client applications, spanning data ingestion, modeling, evaluation, and API deployment. Developed clinical and pharmaceutical NLP systems for text classification and domain language understanding using large scientific corpora.
Open Source & Research Artifacts
Research datasets, benchmarks, and tools shared with the community. Committed to open science and reproducible research.
PolyBench / PolyLM
Under Review Β· ACL 2026A 125K+ task benchmark for multi-constraint polymer design and synthesis, including knowledge-augmented reasoning traces and diagnostic evaluation to analyze compositional reasoning and skill gaps in LLMs.
DNIPRO
Published Β· LREC 2026A longitudinal, multinational, and multilingual corpus of 246K+ news articles analyzing geopolitical narratives and framing shifts across countries during the Russo-Ukrainian War.
LHLC: Large Human Language Data Corpus
Technical Report To Be SubmittedA large-scale corpus of human language data designed to support research on human-context-aware language models and individual-level language understanding.
Research Tools & Utilities
Collection of scripts and utilities for ML research workflowsβdata preprocessing, evaluation metrics, experiment tracking, and visualization tools.
Selected Projects
Research projects exploring LLM capabilities, alignment, multimodal learning, and systems design.
LLM Alignment & Catastrophic Forgetting
Research ProjectStudied sequential fine-tuning effects in GPT-2 (SFT, RLHF), quantifying catastrophic forgetting and impacts on commonsense reasoning.
Viral Humor Analysis
Research ProjectCharacterized humor on Reddit as measurable attributes and analyzed how these signals drive audience engagement using transformer-based NLP on 500K+ posts.
LLaVA 1.5 Adaptive Pruning
Research ProjectImplemented adaptive token pruning strategies in a 7B Vision-Language Transformer and benchmarked efficiency tradeoffs on VQA-v2, TextVQA, and POPE.
Commodity Price Forecasting
Research ProjectBuilt multimodal time-series + NLP models using government reports to predict oil and gas prices and generate economic explanations.
Delulu β RISC-V Processor
Systems ProjectDesigned and implemented a RV64IM 5-stage pipelined processor in SystemVerilog with caching, forwarding, branch prediction, and modular verification.