Horse Race Data Modelling

Lewismc21Lewismc21 Member Posts: 1 Newbie
Hi there,

I have 14 files from the past 7 years with horse race results and horses (I'll note the details of that file at the bottom)

How could I apply this to Rapidminer to predict the outcome of future races with a probability % of winning for each horse?

races_* columns description:

rid - Race id;
course - Course of the race, country code in brackets, AW means All Weather, no brackets means UK;
time - Time of the race in hh:mm format, London TZ;
date - Date of the race;
title - Title of the race;
rclass - Race class;
band - Band;
ages - Ages allowed
distance - Distance;
condition - Surface condition;
hurdles - Hurdles, their type and amount;
prizes - Places prizes;
winningTime - Best time shown;
prize - Prizes total (sum of prizes column);
metric - Distance in meters;
countryCode - Country of the race;
ncond - condition type (created from condition feature);
class - class type (created from rclass feature).

horses_* columns description:

rid - Race id;
horseName - Horse name;
age - Horse age;
saddle - Saddle # where horse starts;
decimalPrice - 1/Decimal price;
isFav - Was horse favorite before start? Can be more then one fav in a race;
trainerName - Trainer name;
jockeyName - Jockey name;
position - Finishing position, 40 if horse didn't finish;
positionL - how far a horse has finished from a precidor;
dist - how far a horse has finished from a winner;
weightSt - Horse weight in St;
weightLb - Horse weight in Lb;
overWeight - Overweight code;
outHandicap - Handicap;
headGear - Head gear code;
RPR - RP Rating;
TR - Topspeed;
OR - Official Rating
father - Horse's Father name;
mother - Horse's Mother name;
gfather - Horse's Grandfather name;
runners - Runners total;
margin - Sum of decimalPrices for the race;
weight - Horse weight in kg;
reswin - Horse won or not; resplace - Horse placed or not


  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Interesting problem. There are multiple ways to approach this, and it partly this depends on the way you want to formulate your problem---are you trying to take a given race and predict the horse with the highest probability of winning?  Or are you trying to take a given horse and predict the probability of winning a particular race?  Basically you have to organize your dataset into rows that correspond to the level of the data you are trying to predict (which will involve some characteristics of races and some of horses, I would imagine, so you are going to need to use Join to bring these two datasets together).  You also have to decide whether you are trying to do simultaneous selection (e.g., take a race with a set of given 20 horses, pick the single horse that wins) or independent selection (e.g., take 20 horses and separately predict their probability of winning a given race).  That will further dictate the final shape of your dataset.  You may need to pivot your data or you may need to create interaction variables as well.  There isn't just one correct answer to this problem in my view.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.