# Predict Sports Outcome

I am a total newbie in Rapidminer. I was trying in python but the coding bit is a bit intense. I want to predict the outcome "Classification" for the team/s that i can specify. "Home Team" & "Away Team". e.g. Liverpool playing against Burnley, Big win goal scored and FC Shalke playing against Borrussia Dortmund, Draw goal sored. So far I have got distinct responses in the Simulator tab. I would like to use/test those responses. I would ultimitately like to run machine learning to predict these outputs. How can I cluster my data? Where do I start? Please help.. I am really excited about this!

Attachment: MyDream. (for some reason I am unable to remove attachment)

Attachment: MyDream. (for some reason I am unable to remove attachment)

1

## Comments

2,225RM Data ScientistDortmund, Germany

15Contributor IWhats the point of having data and the ability to manipulate it if you cannot have fun right?

Well its all fun and games until you start doing the bit. (Here, I mean, building the correct model, getting the correct dependencies and the dataset and then getting the results, testing it).

Then, when it comes to betting, I am not the most interested in the outcome bit. I.e. There can be a big win or a big draw or big loss.

Just these headers are enough for me to help feed into other algorithoms. They can write an article, Have an animated interaction on the video.. So many possibilities. But, to get there, I feel getting the bit where there are statements involving the 'Classification" Column is important. Thats why I am so invested in getting the model correct.

Would you be ready to help me?

I would do all the work. All I need is proper guidance.

So far I have got it down to this:

Now I understand the Regression model is Binary and I have incorrectly used it in here, However, How do I make it non-binary in here? What would the correct format for the input file and the correct parameters for the Auto model/Designed model (non-Auto)?

Thanks,

Harshad

15Contributor I443Unicorn15Contributor ITHANK YOU!

Thank you for helping me in my quest.Please find the preliminary conditions for the exercise: Breaking it down in Phases

Phase 1:

We should be able to predict based on past performance data alone.

- Event Date "Date Time"
- Competition "Competition of the event" e.g. Premier League/Champions League/etc "Polynomial"
- Team Playing at home "Home Team" "Polynomial"
- Home Team Score "Team Score" "Integer"
- Team Playing away "Away Team" "Polynomial"
- Away Team Score "Team Score" "Integer"
- Score of the event "Score" e.g. 1-0/0-6 "Polynomial"
- Classification describes the event in a bit of descriptive manner i.e. Big Win, Big Loss, Small Win, Small Loss, Big Draw,, etc. Its defined in the 'Classification No Dupes' Sheet "Classification" "Polynomial"

I want to predict this data for now. (For an event, for the home team vs the away team, the result is expected to be as "Classification".)Phase 2: We can add these datapoints for every event and make the algorithim better.

fyi. I am pretty comfortable with Alteryx so the learning curve should not be so steep with RapidMiner.

Thank You!Harshad Barge

15Contributor II am ready to put in the work.. need worthy team..

[email protected]

2,225RM Data ScientistDortmund, Germany

1,007UnicornSome pointers from my work on a sport analytics project.

1. Lots of literature need to be done on event statistics that need to be used in the model building.

2. You don't need to consider all, but you need to focus on features that are important from the soccer perspective.

3. Adopt some feature selection techniques and see if the features you are trying to use are useful in model building or not.

4. Start with simple models like Decision Tree and GLM.

5. Try to build different models for different leagues and see how it goes.

6. Validate your model well. This is not just machine learning validation, but domain validation which is very important. If your model is predicting based on an attribute that has statistical relevance but not domain relevance it is not a good generalizable model even though it is highly accurate.

Just my 2c

Varun

https://www.varunmandalapu.com/

1Newbie15Contributor IRapidminer could help but, I am a bit rusty here. You can check it out!