Mathew: Agentic AI Math Tutoring System

MATHew is an AI-powered math tutoring system designed to help students learn concepts through guided reasoning rather than simply receiving answers. The system aligns tutoring sessions with the Massachusetts state curriculum and provides step-by-step explanations, practice questions, and structured feedback that reinforce classroom learning.

As generative AI tools become more capable, students can easily obtain full solutions to homework problems. While convenient, this can bypass the reasoning process that mathematics relies on. MATHew explores how AI can instead act as a learning partner by encouraging students to think through problems, practice concepts, and build mastery over time.

By combining large language models with retrieval-based curriculum grounding, memory systems, and autonomous intervention agents, the system functions as an interactive tutor that adapts to student progress and supports deeper understanding of mathematical concepts.

  • One of the core challenges in building an AI tutor is ensuring explanations remain accurate, relevant, and appropriate for a student’s grade level. To address this, MATHew uses a retrieval-augmented generation (RAG) system that grounds responses in a curated knowledge base built from Massachusetts mathematics standards, grade-level learning objectives, and instructional examples.

    When a student asks a question or works through a problem, the system retrieves relevant curriculum material and uses it to guide the model’s response. This helps ensure explanations stay aligned with what students are expected to learn in school and prevents the system from generating unsupported or overly advanced solutions.

    Grounding responses in official curriculum materials allows the tutor to function more like a classroom-aligned learning assistant rather than a generic problem solver.

  • Beyond answering questions, MATHew is designed as an agentic system that actively monitors student progress and intervenes when needed. The system tracks mastery of individual math standards across sessions using long-term memory and analyzes patterns in student mistakes.

    If the system detects sustained difficulty with a concept, it automatically triggers an intervention workflow. The tutor generates a targeted lesson plan, schedules a review session using the Google Calendar API, and sends both the student and their parents a calendar invitation with the lesson plan attached. A progress update email is also sent using the Gmail API.

    This creates a full decision loop within the tutoring system: monitor student performance, diagnose learning gaps, determine the appropriate intervention, and take action. Instead of passively responding to questions, the tutor actively supports the student’s learning trajectory.

  • To support a safe and effective learning environment, MATHew incorporates several safeguards designed to promote learning rather than shortcut it.

    The tutor provides hints and guided explanations before revealing answers, encouraging students to work through problems step by step. Responses are restricted to curriculum-aligned material retrieved through the RAG system, helping maintain grade-level appropriateness.

    The system also includes a secondary validator bot that checks quiz logic and verifies whether student answers are evaluated correctly. This additional verification layer helps reduce the risk of incorrect feedback and improves the reliability of the tutoring experience.

    Through structured evaluation and user testing, the project explored how generative AI can be designed to support education responsibly while maintaining accuracy, transparency, and pedagogical value.

Keystone Homes: Student Housing Demand & Rent Analysis

Student housing markets rarely move in isolation. Shifts in enrollment, changes in ownership, and neighborhood-level signals all interact in ways that are difficult to see without pulling data together. This project began as part of market and customer research for Keystone Homes, with the goal of understanding how student population trends, especially international enrollment, show up in rent dynamics and housing conditions across student-heavy neighborhoods.

Focusing on Boston and surrounding areas, this project treats housing data as a living system rather than a static snapshot. By combining enrollment data, rental trends, property characteristics, and housing complaint signals, Scout surfaces patterns that help explain where demand translates into pricing pressure, where quality risks emerge, and how these signals can inform housing strategy and product positioning.

  • DesThe core challenge behind Scout was not simply understanding whether rents were rising, but identifying where demand pressure translates into pricing power and where it creates risk for tenant experience. Housing operators, investors, and product teams often have access to fragmented data but lack a clear way to synthesize it into localized, decision-ready insight.

    This project frames housing data as an input to decision-making. The goal was to move from broad trends, such as enrollment growth, to more granular questions around neighborhood behavior, ownership patterns, and maintenance signals that directly affect pricing, acquisition, and product differentiation.cription text goes here

  • To explore these questions, I integrated multiple real-world datasets, including student enrollment figures, rental pricing data, property characteristics, and 311 housing-related complaints. The analysis began with exploratory data analysis to surface relationships and anomalies, followed by modeling approaches aligned to specific questions.

    Linear regression was used to examine the relationship between international student enrollment and average rent over time. Random Forest models were applied to predict building condition and maintenance categories using features such as owner type, unit counts, assessed values, and complaint frequency. K-means clustering was used to identify neighborhoods with similar complaint profiles, revealing patterns that were not visible at the individual property level.

    The emphasis throughout was on using models as tools to test assumptions and surface signals, rather than optimizing for predictive performance alone.

  • The analysis showed that enrollment trends alone do not explain rent increases, but when combined with ownership structure and neighborhood-level indicators, they provide meaningful insight into pricing dynamics and maintenance risk. Certain neighborhoods exhibited strong demand signals alongside elevated complaint activity, suggesting opportunities for differentiated offerings or proactive investment in housing quality.

    From a product and strategy perspective, these insights can inform pricing decisions, market segmentation, and acquisition strategy for student housing products. Scout demonstrates how analytical outputs can be translated into practical guidance, helping stakeholders move from raw data to clearer, more confident decisions.

    Product & Technical Applications

    • Pricing and market segmentation for student housing products

    • Identifying neighborhoods with unmet demand or quality risk

    • Supporting acquisition and investment decisions

    • Informing differentiated offerings for international vs domestic students

Fraud Review Prioritization Using Bayesian Networks

As AI-powered products scale, fraud detection becomes a core product responsibility, directly tied to customer trust, platform integrity, and risk management. Many fraud systems struggle not because they lack data, but because they force uncertain situations into rigid yes-or-no decisions that can harm both users and operations. This project explores how probabilistic reasoning can support fraud review workflows by surfacing confidence levels, enabling smarter prioritization, and keeping humans in the loop where judgment matters most.

By implementing Bayesian network inference, I examined how evolving evidence can be used to update risk assessments over time, helping review teams decide which cases warrant escalation rather than automating irreversible decisions. The project reframes fraud detection as a prioritization problem, focusing on how uncertainty can be surfaced and managed in trust-critical systems.

  • Fraud review is fundamentally a decision-making problem under uncertainty. Automated systems must surface risk signals without overwhelming review teams or damaging user trust through excessive false positives. At the same time, missed fraud can carry significant financial and reputational consequences.

    This project frames fraud detection not as a binary classification task, but as a prioritization and escalation problem. The goal is to design a system that helps human reviewers decide where to focus attention, making uncertainty visible rather than hiding it behind rigid thresholds. From a product perspective, this mirrors challenges faced by trust and safety teams, compliance operations, and platform integrity functions.

  • To explore this problem, I implemented Bayesian network inference from scratch, modeling relationships between latent variables and observed evidence. The system uses probabilistic dependencies to update beliefs as new signals are introduced, producing confidence-aware risk scores rather than deterministic outputs.

    The modeling approach emphasizes transparency and interpretability. Each probability update reflects explicit assumptions encoded in the network structure, making it easier to reason about why a case is flagged and how additional evidence would change the outcome. Rather than optimizing for prediction accuracy alone, the focus was on understanding how uncertainty propagates through the system and how it can support human-in-the-loop workflows.

    Linear regression was used to examine the relationship between international student enrollment and average rent over time. Random Forest models were applied to predict building condition and maintenance categories using features such as owner type, unit counts, assessed values, and complaint frequency. K-means clustering was used to identify neighborhoods with similar complaint profiles, revealing patterns that were not visible at the individual property level.

    The emphasis throughout was on using models as tools to test assumptions and surface signals, rather than optimizing for predictive performance alone.

  • This project highlights how probabilistic models can improve fraud review operations by shifting focus from automated decisions to informed prioritization. By ranking cases based on evolving confidence levels, review teams can allocate attention more effectively, reducing unnecessary interventions while maintaining strong fraud coverage.

    From a product perspective, this approach supports scalable trust systems that respect both operational constraints and user experience. It demonstrates how explainable, uncertainty-aware models can serve as decision support tools rather than opaque gatekeepers, reinforcing trust in automated systems while keeping humans in control where it matters most.

    Product Applications

    • Fraud detection and review queue prioritization

    • Trust & safety escalation workflows

    • Risk scoring and compliance review systems

    • Human-in-the-loop decision support tools