Use the Menu or Search Box to navigate.

Links: Spring 2026 overview · Winners · All years · Site home

Projects — Big Data and AI Trends Market (Spring 2026)

Team 1: Clinical Note Intelligence: An Agentic Hybrid Retrieval Framework Combining Structured Search and Retrieval-Augmented Generation
Team 2: Karen – AI Complaint Assistant
Team 3: PotholeVision – Automating Pothole Detection and GeoMapping
Team 4: From Clicks to Actions: Spark-Powered Funnel Analysis with LLM-Driven Recommendations
Team 5: PersonaPath: Personalized Travel & Dining Recommendation Engine (Behavioral Profiling)
Team 6: Data Quality Remediation Assistant: AI-Driven Anomaly Detection & ETL Fix Generation at Scale
Team 7: Demand Sense: An AI-Backed Driver Nudge System for Demand-Aware Repositioning
Team 8: NFL Contract Prediction and Evaluation with LLM-Based Recommendations
Team 9: InsideInsight: Agentic AI for Airbnb Pricing Strategy and Performance Optimization
Team 10: An AI Copilot for Detecting Delayed Market Reactions to Corporate Disclosures
Team 11: Detect Hidden Drug Safety Risks Faster with AI — FDA FAERS Analytics
Team 12: TheaterIQ: AI-Driven Scheduling and Promotional Intelligence for Movie Theater Operations

Team 1: Clinical Note Intelligence: An Agentic Hybrid Retrieval Framework Combining Structured Search and Retrieval-Augmented Generation

Members: Ethan Armstrong, Ankit (Ziqi) Cao, Ko Jung Hsu, Cole Johnson, Mashhood Khan, Wenyu Zhong

Abstract: An AI chatbot for querying large-scale clinical datasets with natural language while keeping answers grounded in real patient data. It combines structured cohort/statistical retrieval with a RAG pipeline (e.g., LangChain/LlamaIndex) to surface trends like readmission patterns and treatment outcomes in an interactive experience.

Github | Flyer

Team 2: Karen – AI Complaint Assistant

Members: Mohameddeq Ali, Cora Goodwin, Midori Neaton, Raja Sori, Xupei Ye, Kyle Zhu

Abstract: An AI assistant that turns high-volume financial complaint narratives into prioritized, analyzable insights. It structures complaint text into topics (e.g., BERTopic) and ranks them with a custom priority score, then provides a natural-language interface for querying metrics and generating visualizations.

Github | Flyer

Team 3: PotholeVision – Automating Pothole Detection and GeoMapping

Members: Chunfang Wang, James Pashek, Joseph Sheehan, Madhu Damani, Moses Effah Akoto, Tao Fang

Abstract: A big data pipeline that detects potholes from video/images using a CNN model and publishes positive detections to an ArcGIS dashboard. City planners can see pothole counts, exact locations, and traffic context to prioritize repairs and reduce safety risk.

Github | Flyer

Team 4: From Clicks to Actions: Spark-Powered Funnel Analysis with LLM-Driven Recommendations

Members: Shang Chi Hsu, Xiang Li, Ashwini Manokar, Meenakshi Narendra, Isabel O'Grady

Abstract: A Spark-based clickstream analytics pipeline that measures conversion funnels (view → cart → purchase) and pinpoints drop-off drivers like repeated views or cart abandonment. It adds prioritization to focus on high-impact products/behaviors, and can translate findings into structured recommendations via an LLM.

Github | Flyer

Team 5: PersonaPath: Personalized Travel & Dining Recommendation Engine (Behavioral Profiling)

Members: Esther Baumgartner, Hsin Kuei Chang, Saloni Jain, Fu Lee, Dhairya Lunia

Abstract: A Yelp-based recommendation prototype that builds behavior-driven user and business profiles from review text. Topic modeling and contextual tags create “personas” and experience clusters; a RAG-enabled LLM then retrieves and ranks recommendations with explanations grounded in reviewer language.

Github | Flyer

Team 6: Data Quality Remediation Assistant: AI-Driven Anomaly Detection & ETL Fix Generation at Scale

Members: Sean Cabaniss, Yung Hsuan Hsieh, Ching-Fen Hung, Yonghui Kim, Omkar Thombare

Abstract: A scalable assistant that detects data quality issues (e.g., null anomalies and statistical outliers) with Spark, then uses a 2-step LLM agent flow to diagnose root causes and generate runnable PySpark remediation code. A human-in-the-loop Streamlit UI reviews fixes before execution, with auditability and before/after tracking.

Github | Flyer

Team 7: Demand Sense: An AI-Backed Driver Nudge System for Demand-Aware Repositioning

Members: Davey Johnson, Hengrui Li, Huiguo Liu, Mansi Malpani, Mounika Polamreddy

Abstract: A demand analytics pipeline over NYC TLC trips that identifies high-opportunity zones by time window using Spark-based aggregation and normalized scoring. An LLM layer converts the structured demand signals into concise, explainable “nudges” that tell drivers where demand is strong and why, with grounding and confidence filtering.

Github | Flyer

Team 8: NFL Contract Prediction and Evaluation with LLM-Based Recommendations

Members: Adam Getzkin, Mallika Kommera, Jay Pederson, Ariel Zhan, Zhen Zhang

Abstract: A predictive analytics system that estimates NFL contract value from performance and historical contract data, then uses an LLM-driven recommendation agent to answer natural-language questions like “undervalued RBs under $6M.” The agent rewrites queries into structured filters, compares predicted vs. market values, and returns ranked recommendations with grounded explanations.

Github | Flyer

Team 9: InsideInsight: Agentic AI for Airbnb Pricing Strategy and Performance Optimization

Members: Bhavisha Chafekar, Jyothirmai Sri Peesapati, Phoenix Ferrari, Stephen Weiler, Tzu-Yu Chen

Abstract: An end-to-end big data pipeline on Inside Airbnb data to analyze pricing, availability, and guest feedback at scale. It produces structured insights (e.g., neighborhood-level drivers of occupancy and satisfaction) and uses an agentic AI layer to convert analytics into actionable recommendations for hosts and property managers.

Github | Flyer

Team 10: An AI Copilot for Detecting Delayed Market Reactions to Corporate Disclosures

Members: Kristina Dennise Paraiso, Evelyn Lai, Zhichen Yang, Parul Chaudhary, Shivanshu Dagur

Abstract: A disclosure intelligence system that ingests SEC filings, uses a grounded LLM to score importance, and analyzes delayed stock price reactions to identify “underinterpreted” events. It surfaces a prioritized watchlist with traceable explanations so analysts can focus on disclosures the market may be slow to price in.

Github | Flyer

Team 11: Detect Hidden Drug Safety Risks Faster with AI — FDA FAERS Analytics

Members: Amogha Yalgi, Austin Ganje, Hannah Huang, Hayden Herstrom, Rachel Le

Abstract: An analyst-facing application that cleans and standardizes the FDA FAERS dataset at scale to surface emerging high-risk drug–event patterns. It enables smarter search, risk detection, and LLM-assisted summarization so users can review safety signals quickly and drill into trends interactively.

Github | Flyer

Team 12: TheaterIQ: AI-Driven Scheduling and Promotional Intelligence for Movie Theater Operations

Members: Sam Benson Devine, Jack Halverson, Tobias Knight, Qiqi Li, Yehan Wang

Abstract: A scheduling and promotions copilot for independent theaters that scores each film × audience × timeslot with a Match Score, then proposes weekly schedules and targeted promotional briefs. It combines large-scale data processing with a grounded agent workflow so recommendations are explainable and auditable before a human approves.

Github | Flyer