RapidMiner 9.8 Beta is now available
Be one of the first to get your hands on the new features. More details and downloads here:
GET RAPIDMINER 9.8 BETA
How to Use/Model "Time Series" Data for College Athletics Finance Data
S_R_Webster
Member Posts: 3 Learner I
in Help
Greetings all:
I am a student and new to the community, so please take it easy on me for this first go around. I have read some of the questions and responses to other time series questions but am still not finding an answer, or maybe just not understanding the answers given, or both. I don't have a model to share yet because that is where I am stuck to begin with. I would like to have three separate models to use for a predictive analysis project I have for a data science class. We only are concerned with training and prediction, not testing. We briefly learned how to run a simple linear regression, decision tree, and logistic regression model and I thought the data I had for my project could be used for all three. However, this was not including the fact that my data is based on time series and what we learned did not have that element in there. My data is based on roughly 77 different universities for about 13 years of data. The goal is to train the data on the first 12 years and then use the 13th year features for running the prediction and determining the target (in our case, amount of profit/loss for linear, and profitableyes/no for decision tree and logistic regression). I am not sure what the best way to attack this problem is. Anything helps at this point. Thank you everyone in advance!
I am a student and new to the community, so please take it easy on me for this first go around. I have read some of the questions and responses to other time series questions but am still not finding an answer, or maybe just not understanding the answers given, or both. I don't have a model to share yet because that is where I am stuck to begin with. I would like to have three separate models to use for a predictive analysis project I have for a data science class. We only are concerned with training and prediction, not testing. We briefly learned how to run a simple linear regression, decision tree, and logistic regression model and I thought the data I had for my project could be used for all three. However, this was not including the fact that my data is based on time series and what we learned did not have that element in there. My data is based on roughly 77 different universities for about 13 years of data. The goal is to train the data on the first 12 years and then use the 13th year features for running the prediction and determining the target (in our case, amount of profit/loss for linear, and profitableyes/no for decision tree and logistic regression). I am not sure what the best way to attack this problem is. Anything helps at this point. Thank you everyone in advance!
Tagged:
0
Best Answer

MarcoBarradas RapidMiner Certified Analyst, Member Posts: 137 Unicorn@S_R_Webster great so you are going to build a model that predicts if a University is going to lose or earn money on a given year taking into consideration the outcomes of previous years.
You have some interesting data on to work with you may play with binning some off your attributes and please keep in mind that a model should only consider information that is available at the time of the prediction in order to work.
So in order to train your model you'll need to take into consideration only information that was available to predict 2018. And once you have trained and optimized your model you can create a dataset that will be able to predict 2019 outcome.
This means that instead of taking 2017_Total Ticket Sales you'll work with something like PreviousYear_Total Ticket Sales
and you'll do all that on the ETL. And also you can create attribute like Previous_Year_Profit/Loss, 2yearbefore_Profit/Loss
and maybe those attribute could capture the University trend.
6
Answers
After reading your file go to File> Print/Export_Image to obtain an image of your DataSet.
The main thing you'll need to do is the ETL to convert your DATA into an example set that would help us predict with a model your label (cost or yes/no) . Time data can be transformed into attribute like. First Date, Age, TimeSinceX or Time between Y and X but without a sample of your data is difficult to have some ideas.
For the Time Series you could take this little course
https://academy.rapidminer.com/learn/course/timeseriesanalytics/timeseriesanalytics/datapreparationandanalysis
I am not sure I can share the data set, not sure how the rights of use would go with sharing. One data base is cafidatabase and the other date comes from sportsreference. We have University, University ID (selfmade), Year(s) (from 20052018), NCAA Football WinPercentage per school for each year, NCAA Football Simple Rating System (a strength of schedule measure) per school for each year, Total Football Spending for school for each year, Total Football Coaching Pay for school for each year, Total Recruiting Expenditures (as an aggregate for all athletics, men and woman) for school for each year, Total Facility/Equipment Expenditures (as an aggregate for all athletics, men and woman) for school for each year, Total Ticket Sales (as an aggregate for all athletics, men and woman) for school for each year, Total Revenue (as an aggregate for all athletics, men and woman) for school for each year, Total Expenditures (as an aggregate for all athletics, men and woman) for school for each year, Profit/Loss "our target/label" (as an aggregate for all athletics, men and woman) for school for each year determined by subtracting Total Expenditures from Total Revenues. Does this information help? Thank you for the link to tutorial as well. I will also check that out as soon as I have a chance. I don't necessarily want anyone to do the work for me, just lead me to the water so I can drink.
You already have your first Data Set on which you could work with Automodel I think you could Remove College ID and University from the equation and you'll have a simple dataset that uses your Label Profit/Loss as a predictor.
That way you'll have a model that will take into consideration al your numeric data and build a first model.
I don´t know if you have explored your data first and this needs to be the firs step. Read your data and explore the statistics and graphs for each attribute.
 Do you have outliers on your data?
 What happens when you use the Year value as a color
 Make a scatter plot with at least to of your attributes and see what you notice
After doing all of this we can move on and create a ETL process to transform all your data to a single example per university in order to try to predict what may happen on 2019 with all the previous data I don´t think you have a time series data since you'll only have 13 years as aggregates and that doesn't seem enough data for a good time series analysis.Best regards and hope this helps you.
Automodel video https://academy.rapidminer.com/learn/video/automodelclassification
Turbo Prep video https://academy.rapidminer.com/learn/video/turboprepintroduction