The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.
Is what I'd like to do even possible with RapidMiner?
I'm new to machine learning, and I wonder if what I'd like to do is even possible with RapidMinder. I'd welcome suggestions!
1. My data set will be about 30 years of monthly economic data for Canada and will contain about 30 variables for each time period -- things like gross national product, size of the workforce, the money supply, the interest rate, etc.
2. It would be an unsupervised task since we want the algorithm to determine the relationship between the variables.
3. I understand that RNN with LSTM is the state-of-the-art for this type of problem.
4. After the model is up and running, I'd like to test different values of certain independent variables for future time periods. For example, if the government sets the interest rate at x% and sets public spending at $y for the coming 12 months, we want the model to predict all the other variables.
5. How would I proceed to set up this type of task? Are there any particular techniques I should know about to accomplish this?
Thanks for your suggestions!
1. My data set will be about 30 years of monthly economic data for Canada and will contain about 30 variables for each time period -- things like gross national product, size of the workforce, the money supply, the interest rate, etc.
2. It would be an unsupervised task since we want the algorithm to determine the relationship between the variables.
3. I understand that RNN with LSTM is the state-of-the-art for this type of problem.
4. After the model is up and running, I'd like to test different values of certain independent variables for future time periods. For example, if the government sets the interest rate at x% and sets public spending at $y for the coming 12 months, we want the model to predict all the other variables.
5. How would I proceed to set up this type of task? Are there any particular techniques I should know about to accomplish this?
Thanks for your suggestions!
0
Answers
Indeed this is possible to do, however 30 years of economic data might seem to be a lot.
To specify techniques and all that stuff, I would need to see part of the data (I don't care about the data itself but about the shape: what's your predictive label, what kind of categorical, numerical or date data you have, etc...). Is it feasible?
All the best,
Rod.
Thanks for your reply! Let me describe the data set for Canada, and if you'd like to see the actual data, please send me your email address.
Description of the excel spreadsheet I'm preparing: There are currently 293 rows, each for monthly economic data starting in Jan 1997 and going until May 2021.
Below are the titles of the 33 columns.
I look forward to your thoughts!
Thanks,
Hal Segal
Period column - starts in Jan 1997. This is the first period available from statcan.
GDP - Gross Domestic Product. Table number 36100434 (3790031). This is in 2012 constant dollars, seasonally adjusted, for the entire Canadian economy. All statistics are in Canadian dollars.
Number of people employed
Number of people with part-time employment
Number of people unemployed
Canada CPI - Consumer Price Index
US CPI
Canada PPI - Producer Price Index
US PPI
Canada Consumer Confidence
US Consumer Confidence
Canada Business Climate Indicator
US Business Confidence
Canada Stock Market
US Stock Market
Retail Sales
Consumer Spending
Producer Spending
Personal Savings
Consumer Credit
Consumer Disposible Income
Households Debt to GDP
Households Debt to Income
Building Permits
Imports
Exports
Canada Inflation Rate
US Inflation Rate
Government Spending
Leading Economic Index
QE bond purchases
Foreign Direct Investment
Tourist Arrivals
BTW, neural networks are supervised.
- Create a new repository to structure your study. I'll give a specific one to you, but feel free to ignore it and do yours.
- Import your raw data into RapidMiner Studio.
- On one process, perform the basic checks: mean, median, average, standard deviations.
- On another process, perform a correlation matrix, so you'll see possible influences.
- Create different processes for a number of clustering algorithms that might give you patterns. Unsupervised machine learning algorithms will give you trash data if you don't adjust it properly, therefore both the correlation matrix and the basic checks will help you understand if such a clustering algorithm makes sense or not. On another note: many people tend to think that unsupervised algorithms will do the job for you, but that isn't the case: you then need to interpret those things, and a simple way to do that is a decision tree (or another classification algorithm). You will spend some time here until you get a good classification.
- Use the classification to create linear regressions around your correlated attributes. A regression algorithm is the inference of a mathematical function that describes the correlation between two variables. Since you want to use any parameter, that means you'll need at least one linear regression per parameter you want to evaluate, which is rather unpleasant, but to my knowledge there aren't ways that will help you predict two, three or four columns from the other 29.
- If you want to adventure yourself, you can change the linear regression algorithm to a neural network, a deep learning extension or something on that note.
Now, you'll need a lot of creativity on each step, but tbh it's not the job of the software to give you the answers you are looking for, it is to drive you to get those answers based in the results the software gave you.Sorry for the late reply.