Project detail · technical deep-dive
Trained a neural network in Node to predict UCL 21/22 goals from FIFA 22 attributes, then shipped the results as a full-stack dashboard. End-to-end ML in JavaScript: data pipeline, MLP regression, Express API, React UI.
Role
Solo / ML + full-stack engineering
Timeline
2026
Tech
Node 22+, TensorFlow.js, SQLite, Express, React 18, Vite 6
7 page views
Product showcase

Dashboard surface: ranking of players whose UCL 21/22 goals outpaced what the model predicted from their FIFA 22 attributes.
Shipped
Status
ML + full-stack
Focus
0.18 g/match
Val. MAE
Data pipeline
Five-stage ingestion written as idempotent Node scripts. xlsx → SQLite for 19k FIFA 22 players, eight CSVs merged into a unified UCL 21/22 table, fuzzy matching across both datasets, feature engineering, training, prediction — runnable end-to-end with one npm command.
ML model
Shallow MLP regressor (33 → 32 → 16 → 1) trained in TensorFlow.js. Target is goals-per-match as a rate — the model learns a continuous distribution and predicted values are scaled by each player's real match count to reconstruct tournament-level expected goals.
Position filter insight
Training on all 562 matched players flattened predictions toward zero because the MSE loss was dominated by non-scorers. Restricting the training cohort to offensive positions (FWD + attacking mids) pushed Benzema's expected from 4.4 to 9.2 — the predictions only became believable once the input distribution was honest about who scores.
Full-stack integration
SQLite predictions exposed through a thin Express API; React front end consumes three endpoints (over / under / stats) and filters client-side. Vite dev proxy plus concurrently orchestrate frontend and API in a single npm run.
Truth Scout is an end-to-end machine-learning project I built to answer one question: which footballers massively overperformed — or underperformed — what their FIFA 22 attributes said they should do in the UCL 21/22? A neural network trained on real data learns the FIFA-to-UCL mapping, and the gap between prediction and reality becomes the product.
The whole thing is written in JavaScript end-to-end — no Python sidecar, no model-server dance, no serialization between languages. TensorFlow.js trains the regression in Node during the pipeline, SQLite stores the predictions, Express serves them, and a hand-built React dashboard lets users click through the results across three languages.
Two players with 5 goals aren't comparable if one played 3 matches and the other 13. The model learns a goals-per-match rate; the UI multiplies that rate back by each player's real match count to reconstruct tournament totals.
// target = goals / matches_played (a continuous rate in [0, ~1.5])
const targetVal = (row) => row.goals / row.matches_played;Composite ratings, attacking detail, skill, movement, power, mentality, age. Normalization constants are computed on the training set only and reapplied identically at inference to avoid data leakage.
const FEATURE_COLS = [
"overall", "potential", "age",
"pace", "shooting", "passing", "dribbling", "defending", "physic",
"attacking_finishing", "attacking_heading_accuracy", "attacking_volleys",
"attacking_short_passing", "attacking_crossing",
"skill_dribbling", "skill_curve", "skill_fk_accuracy",
"skill_long_passing", "skill_ball_control",
"movement_acceleration", "movement_sprint_speed", "movement_agility",
"movement_reactions", "movement_balance",
"power_shot_power", "power_stamina", "power_strength", "power_long_shots",
"mentality_aggression", "mentality_positioning", "mentality_vision",
"mentality_penalties", "mentality_composure",
];
const means = {}, stds = {};
for (const col of FEATURE_COLS) {
const vals = clean.map((r) => r[col]);
const m = vals.reduce((a, b) => a + b, 0) / vals.length;
const v = vals.reduce((a, b) => a + (b - m) ** 2, 0) / vals.length;
means[col] = m;
stds[col] = Math.sqrt(v) || 1; // guards against constant columns
}
const featureVec = (row) =>
FEATURE_COLS.map((c) => (row[c] - means[c]) / stds[c]);1,633 parameters total. With only 216 training rows, anything deeper memorizes within 30 epochs regardless of regularization. L2 on the dense kernels + dropout 0.2 between layers keeps the generalization gap tight.
import * as tf from "@tensorflow/tfjs";
const model = tf.sequential({
layers: [
tf.layers.dense({
inputShape: [FEATURE_COLS.length],
units: 32,
activation: "relu",
kernelRegularizer: tf.regularizers.l2({ l2: 0.01 }),
}),
tf.layers.dropout({ rate: 0.2 }),
tf.layers.dense({
units: 16,
activation: "relu",
kernelRegularizer: tf.regularizers.l2({ l2: 0.01 }),
}),
tf.layers.dropout({ rate: 0.2 }),
tf.layers.dense({ units: 1, activation: "linear" }),
],
});
model.compile({
optimizer: tf.train.adam(0.003),
loss: "meanSquaredError",
metrics: ["mae"],
});Deterministic shuffle → 80/20 train/val split → 200 epochs with early telemetry to catch divergence. Validation MAE stabilizes around 0.18 goals/match by epoch 40 and stays there.
await model.fit(xTrain, yTrain, {
epochs: 200,
batchSize: 32,
validationData: [xVal, yVal],
shuffle: true,
verbose: 0,
callbacks: {
onEpochEnd: (epoch, logs) => {
if (epoch % 20 === 0) {
console.log(
`epoch ${epoch}: loss=${logs.loss.toFixed(4)} ` +
`val_mae=${logs.val_mae.toFixed(4)}`
);
}
},
},
});The single most impactful decision in the pipeline — and it wasn't a hyperparameter. The first training run included all 562 matched players: defenders, goalkeepers, everyone. The MSE loss dragged predictions toward the mean of a defender-heavy cohort, and Benzema came out with an expected 4.4 goals. Restricting the training SQL to offensive positions pushed it to 9.2 without touching a single line of model code.
const ATTACKING_POSITIONS = [
"ST", "CF", "LS", "RS", "LF", "RF", // strikers
"LW", "RW", // wingers
"CAM", "LAM", "RAM", // attacking mids
"LM", "RM", // wide mids that arrive in the box
];
const joinSql = `
SELECT m.ucl_id, m.sofifa_id, u.matches_played, u.goals,
${FEATURE_COLS.map((c) => `f.${c}`).join(", ")}
FROM player_matches m
JOIN ucl_players u ON u.id = m.ucl_id
JOIN fifa22_players f ON f.sofifa_id = m.sofifa_id
WHERE u.matches_played >= 2
AND f.primary_position IN (${ATTACKING_POSITIONS.map((p) => `'${p}'`).join(", ")})
`;Model output is a per-match rate, clamped at zero (can't score negative goals). Multiplied by the player's real match count, it becomes the "expected goals in tournament" number shown on each card, and the difference against real goals becomes the dashboard's story.
const predArr = (await model.predict(xAll).array()).map((a) => a[0]);
for (let i = 0; i < allRows.length; i++) {
const row = allRows[i];
const perGamePred = Math.max(0, predArr[i]);
const expected = perGamePred * row.matches_played;
const delta = row.goals - expected;
const deltaPct = expected > 0.01 ? (delta / expected) * 100 : 0;
// → persisted into SQLite, served via Express, rendered in React
}| Validation MAE | 0.18 goals/match (~1 goal over a 6-match group stage) |
| Dataset | 216 offensive players (173 train / 43 val) |
| Training time | ~4 seconds in Node |
| Parameters | 1,633 |
| Benzema delta | +5.8 goals (expected 9.2, real 15) |
| Haller delta | +5.2 goals (expected 5.8, real 11) |
| Nkunku delta | +6.4 goals (expected 0.6, real 7) |
better-sqlite3 for synchronous reads, service layer that isolates SQL from routing. Three REST endpoints do everything the frontend needs./api to the Express server so there's no CORS dance or environment flag juggling.@tensorflow/tfjs, runs in any Node without native bindings)better-sqlite3, xlsx parsing, CSV streaming, fastest-levenshtein for fuzzy matchingcors, module-type ESMconcurrently + nodemon orchestrate Vite and Express in one npm commandKey pipeline commands — each script is idempotent, so they can be rerun in any order to rebuild downstream state:
npm run ingest:fifa — parse the FIFA 22 xlsx into SQLite (19,239 rows)npm run ingest:ucl — merge 8 UCL CSVs into one unified player tablenpm run match — fuzzy-join FIFA × UCL with Levenshtein scoringnpm run train — train the MLP, generate predictions, persist deltasnpm run pipeline — run all four in sequencenpm run dev:all — Vite (5173) + Express (3001) side by side