Skip to content

Use the Menu or Search Box to navigate.

Links: Spring 2026 overview · Winners · All years · Site home

Projects — Big Data and AI Trends Market (Spring 2026)


Team 1: Clinical Note Intelligence: An Agentic Hybrid Retrieval Framework Combining Structured Search and Retrieval-Augmented Generation

Members: Ethan Armstrong, Ankit (Ziqi) Cao, Ko Jung Hsu, Cole Johnson, Mashhood Khan, Wenyu Zhong

Abstract: An AI chatbot for querying large-scale clinical datasets with natural language while keeping answers grounded in real patient data. It combines structured cohort/statistical retrieval with a RAG pipeline (e.g., LangChain/LlamaIndex) to surface trends like readmission patterns and treatment outcomes in an interactive experience.

Github | Flyer


Team 2: Karen – AI Complaint Assistant

Members: Mohameddeq Ali, Cora Goodwin, Midori Neaton, Raja Sori, Xupei Ye, Kyle Zhu

Abstract: An AI assistant that turns high-volume financial complaint narratives into prioritized, analyzable insights. It structures complaint text into topics (e.g., BERTopic) and ranks them with a custom priority score, then provides a natural-language interface for querying metrics and generating visualizations.

Github | Flyer


Team 3: PotholeVision – Automating Pothole Detection and GeoMapping

Members: Chunfang Wang, James Pashek, Joseph Sheehan, Madhu Damani, Moses Effah Akoto, Tao Fang

Abstract: A big data pipeline that detects potholes from video/images using a CNN model and publishes positive detections to an ArcGIS dashboard. City planners can see pothole counts, exact locations, and traffic context to prioritize repairs and reduce safety risk.

Github | Flyer


Team 4: From Clicks to Actions: Spark-Powered Funnel Analysis with LLM-Driven Recommendations

Members: Shang Chi Hsu, Xiang Li, Ashwini Manokar, Meenakshi Narendra, Isabel O'Grady

Abstract: A Spark-based clickstream analytics pipeline that measures conversion funnels (view → cart → purchase) and pinpoints drop-off drivers like repeated views or cart abandonment. It adds prioritization to focus on high-impact products/behaviors, and can translate findings into structured recommendations via an LLM.

Github | Flyer


Team 5: PersonaPath: Personalized Travel & Dining Recommendation Engine (Behavioral Profiling)

Members: Esther Baumgartner, Hsin Kuei Chang, Saloni Jain, Fu Lee, Dhairya Lunia

Abstract: A Yelp-based recommendation prototype that builds behavior-driven user and business profiles from review text. Topic modeling and contextual tags create “personas” and experience clusters; a RAG-enabled LLM then retrieves and ranks recommendations with explanations grounded in reviewer language.

Github | Flyer


Team 6: Data Quality Remediation Assistant: AI-Driven Anomaly Detection & ETL Fix Generation at Scale

Members: Sean Cabaniss, Yung Hsuan Hsieh, Ching-Fen Hung, Yonghui Kim, Omkar Thombare

Abstract: A scalable assistant that detects data quality issues (e.g., null anomalies and statistical outliers) with Spark, then uses a 2-step LLM agent flow to diagnose root causes and generate runnable PySpark remediation code. A human-in-the-loop Streamlit UI reviews fixes before execution, with auditability and before/after tracking.

Github | Flyer


Team 7: Demand Sense: An AI-Backed Driver Nudge System for Demand-Aware Repositioning

Members: Davey Johnson, Hengrui Li, Huiguo Liu, Mansi Malpani, Mounika Polamreddy

Abstract: A demand analytics pipeline over NYC TLC trips that identifies high-opportunity zones by time window using Spark-based aggregation and normalized scoring. An LLM layer converts the structured demand signals into concise, explainable “nudges” that tell drivers where demand is strong and why, with grounding and confidence filtering.

Github | Flyer


Team 8: NFL Contract Prediction and Evaluation with LLM-Based Recommendations

Members: Adam Getzkin, Mallika Kommera, Jay Pederson, Ariel Zhan, Zhen Zhang

Abstract: A predictive analytics system that estimates NFL contract value from performance and historical contract data, then uses an LLM-driven recommendation agent to answer natural-language questions like “undervalued RBs under $6M.” The agent rewrites queries into structured filters, compares predicted vs. market values, and returns ranked recommendations with grounded explanations.

Github | Flyer


Team 9: InsideInsight: Agentic AI for Airbnb Pricing Strategy and Performance Optimization

Members: Bhavisha Chafekar, Jyothirmai Sri Peesapati, Phoenix Ferrari, Stephen Weiler, Tzu-Yu Chen

Abstract: An end-to-end big data pipeline on Inside Airbnb data to analyze pricing, availability, and guest feedback at scale. It produces structured insights (e.g., neighborhood-level drivers of occupancy and satisfaction) and uses an agentic AI layer to convert analytics into actionable recommendations for hosts and property managers.

Github | Flyer


Team 10: An AI Copilot for Detecting Delayed Market Reactions to Corporate Disclosures

Members: Kristina Dennise Paraiso, Evelyn Lai, Zhichen Yang, Parul Chaudhary, Shivanshu Dagur

Abstract: A disclosure intelligence system that ingests SEC filings, uses a grounded LLM to score importance, and analyzes delayed stock price reactions to identify “underinterpreted” events. It surfaces a prioritized watchlist with traceable explanations so analysts can focus on disclosures the market may be slow to price in.

Github | Flyer


Team 11: Detect Hidden Drug Safety Risks Faster with AI — FDA FAERS Analytics

Members: Amogha Yalgi, Austin Ganje, Hannah Huang, Hayden Herstrom, Rachel Le

Abstract: An analyst-facing application that cleans and standardizes the FDA FAERS dataset at scale to surface emerging high-risk drug–event patterns. It enables smarter search, risk detection, and LLM-assisted summarization so users can review safety signals quickly and drill into trends interactively.

Github | Flyer


Team 12: TheaterIQ: AI-Driven Scheduling and Promotional Intelligence for Movie Theater Operations

Members: Sam Benson Devine, Jack Halverson, Tobias Knight, Qiqi Li, Yehan Wang

Abstract: A scheduling and promotions copilot for independent theaters that scores each film × audience × timeslot with a Match Score, then proposes weekly schedules and targeted promotional briefs. It combines large-scale data processing with a grounded agent workflow so recommendations are explainable and auditable before a human approves.

Github | Flyer