Options

How to Use/Model "Time Series" Data for College Athletics Finance Data

S_R_WebsterS_R_Webster Member Posts: 3 Learner I
Greetings all:

I am a student and new to the community, so please take it easy on me for this first go around. I have read some of the questions and responses to other time series questions but am still not finding an answer, or maybe just not understanding the answers given, or both. I don't have a model to share yet because that is where I am stuck to begin with. I would like to have three separate models to use for a predictive analysis project I have for a data science class. We only are concerned with training and prediction, not testing. We briefly learned how to run a simple linear regression, decision tree, and logistic regression model and I thought the data I had for my project could be used for all three. However, this was not including the fact that my data is based on time series and what we learned did not have that element in there. My data is based on roughly 77 different universities for about 13 years of data. The goal is to train the data on the first 12 years and then use the 13th year features for running the prediction and determining the target (in our case, amount of profit/loss for linear, and profitable-yes/no for decision tree and logistic regression). I am not sure what the best way to attack this problem is. Anything helps at this point. Thank you everyone in advance!

Best Answer

Answers

  • Options
    MarcoBarradasMarcoBarradas Administrator, Employee, RapidMiner Certified Analyst, Member Posts: 272 Unicorn
    @S_R_Webster Hi I'll try to help you could you send us an image of the dataset you are using or could you name the columns you have?
    After reading your file go to File--> Print/Export_Image to obtain an image of your DataSet.

    The main thing you'll need to do is the ETL to convert your DATA into an example set that would help us predict with a model your label (cost or yes/no) . Time data can be transformed into attribute like. First Date, Age, TimeSinceX or Time between Y and X but without a sample of your data is difficult to have some ideas.
    For the Time Series you could take this little course 
    https://academy.rapidminer.com/learn/course/time-series-analytics/time-series-analytics/data-preparation-and-analysis

  • Options
    S_R_WebsterS_R_Webster Member Posts: 3 Learner I
    Thank you for responding!

    I am not sure I can share the data set, not sure how the rights of use would go with sharing. One data base is cafidatabase and the other date comes from sports-reference. We have University, University ID (self-made), Year(s) (from 2005-2018), NCAA Football Win-Percentage per school for each year, NCAA Football Simple Rating System (a strength of schedule measure) per school for each year, Total Football Spending for school for each year, Total Football Coaching Pay for school for each year, Total Recruiting Expenditures (as an aggregate for all athletics, men and woman) for school for each year, Total Facility/Equipment Expenditures (as an aggregate for all athletics, men and woman) for school for each year, Total Ticket Sales (as an aggregate for all athletics, men and woman) for school for each year, Total Revenue (as an aggregate for all athletics, men and woman) for school for each year, Total Expenditures (as an aggregate for all athletics, men and woman) for school for each year, Profit/Loss "our target/label" (as an aggregate for all athletics, men and woman) for school for each year determined by subtracting Total Expenditures from Total Revenues. Does this information help? Thank you for the link to tutorial as well. I will also check that out as soon as I have a chance. I don't necessarily want anyone to do the work for me, just lead me to the water so I can drink.
  • Options
    MarcoBarradasMarcoBarradas Administrator, Employee, RapidMiner Certified Analyst, Member Posts: 272 Unicorn
    @S_R_Webster ok based on what you description we could do some things 
    You already have your first Data Set on which you could work with Automodel I think you could Remove College ID and University from the equation and you'll have a simple dataset that uses your Label Profit/Loss as a predictor.
    That way you'll have a model that will take into consideration al your numeric data and build a first model.
    I don´t know if you have explored your data first and this needs to be the firs step. Read your data and explore the statistics and graphs for each attribute. 

    1. Do you have outliers on your data?
    2. What happens when you use the Year value as a color
    3. Make a scatter plot with at least to of your attributes and see what you notice
    After doing all of this we can move on and create a ETL process to transform all your data to a single example per university in order to try to predict what may happen on 2019 with all the previous data I don´t think you have a time series data since you'll only have 13 years as aggregates and that doesn't seem enough data for a good time series analysis.

    Best regards and hope this helps you. 
    Automodel video https://academy.rapidminer.com/learn/video/auto-model-classification
    Turbo Prep video https://academy.rapidminer.com/learn/video/turbo-prep-introduction

  • Options
    S_R_WebsterS_R_Webster Member Posts: 3 Learner I
    MarcoBarradas I am in the process still of cleaning the data and fixing/estimating what little missing data I have and looking for outliers and what not. I was not sure if this was actually a time series matter or not to be honest, I just knew I had 13 years of data and wanted to use the first 12 years for each university to train on, then taking the feature values for each university for the 13th year to predict the 13th year target/label for each school. I will check out the to links you provided as well later this afternoon when I get off from work. Thanks again for following up and assisting with this, my deepest gratitude!
Sign In or Register to comment.