Back to projects

Project

CUDA Mixture of Experts

GPU-accelerated tennis simulation with MoE routing and CUDA Monte Carlo.

Project Summary

GPU-accelerated tennis Monte Carlo simulator using CUDA C++, Mixture of Experts routing, softmax expert blending, and CPU vs GPU benchmark comparison.

Status
Independent technical project
Role
CUDA simulation + modeling + benchmark analysis
Stack
CUDA C++, cuRAND, Mixture of Experts, Monte Carlo simulation
Code
Not listed

I. Overview

CUDA Mixture of Experts is a CUDA C++ tennis match simulator that estimates win probability, average total games, and over/under probabilities with Monte Carlo simulation. The project combines matchup modeling with GPU systems work: many independent simulated matches are run in parallel, then aggregated into readable betting analytics outputs.

Interactive Systems View

The embedded visual shows MoE routing as a weighted expert mixture, CUDA Monte Carlo simulation as parallel match trials, and a benchmark view that can switch between best-of-3 and best-of-5 test runs.

Jannik Sinner vs Carlos Alcaraz Test Run

The following values are sample benchmark results from a generated test run, not universal performance claims.

Best of 3

Over 30 15.94%
Under 30 84.06%
Average total games 24.11
Median total games 24.00
CUDA GPU runtime 12.10 ms
CPU runtime 1200.00 ms
Speedup 99.2x

Best of 5

Sinner win probability 57.87%
Alcaraz win probability 42.13%
Over 30 82.26%
Under 30 17.74%
Average total games 39.75
Median total games 40.00
CUDA GPU runtime 1.77 ms
CPU runtime 371.62 ms
Speedup 209.8x

Runtime depends on simulation count, hardware, match format, and benchmark settings.

Educational analytics and systems visualization.

II. How the MoE Model Works

The Mixture of Experts router combines specialized expert signals such as surface performance, fatigue, head-to-head history, and recent form. Rather than forcing one global score to explain every matchup, the router assigns weights to the experts and blends their outputs into simulation parameters for the tennis match model.

III. CUDA Monte Carlo Simulation

CUDA threads simulate independent matches in parallel. Each thread uses cuRAND for per-thread randomness, runs a stochastic tennis match trial, and contributes results to aggregate estimates for Player A win probability, average total games, and over/under outcomes.

IV. CPU vs GPU Performance

The project includes a CPU baseline for comparison with the --compare-cpu option. The benchmark path is intended to make GPU acceleration measurable by comparing the same simulation math and the same workload across serial CPU execution and massively parallel CUDA execution. The speedup comes from CUDA parallelism, not from changing the Monte Carlo model.

V. Tech Stack

Core technologies include CUDA C++, cuRAND for GPU-side random number generation, Monte Carlo simulation, Mixture of Experts routing, softmax expert blending, and CPU/GPU benchmark tooling for performance comparison.