The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.
From a newbie - Neural Net prediction
Hi all,
As stated in the subject, i'm just starting to learn how to use RapidMiner and I wish to predict the amount of traffic (number of cars) passing a point at certain times of the day (e.g. every 15 mins), having the history (or data) for the past one year.
I have tried using the Neural Net operator in rapidminer with "vehicle_count" as the label (or target) but the predicted "count" when i apply the model is way off the mark.
What am i missing? Please, any help would be very much appreciated.
Thanks,
Leo
Tagged:
0
Answers
If the only data you have is a time series of vehicle counts with a timestamp, then you would need to window the data so that a given vehicle count at a moment in time is accompanied by regular attributes corresponding to vehicle counts in the past. The model would then try to predict the vehicle count now based on the previous vehicle counts. The Windowing operator is the one to use for this.
Typically, you will probably also want to add other features to the data set. Examples would include the hour of the day, the day of the week and so on. You can do this by using the Generate Attributes operator on the timestamp to extract the required detail.
Hello awchisholm,
thanks for your quick response....I am not good with coding, so i use the rapidminer operators "from the box"...
I checked from the available operators and there's no "Windowing" operator, so could you help me understand this a bit more.
Also, my original data has several attributes including, the station_ID, Start_time (mm/dd/yyyy hh:mm format), Road_Lane, Traffic_direction and Traffic_count. See my attached file for the first 20 rows of the collected data.
How would you suggest i proceed, to use neural net and the given attributes to predict traffic for a future date and time?
Hopefully, my questions make sense
Thanks.
Hello
The Windowing operator is in the Series extension so you would need to download and install it. I don't know your data so it might not be appropriate to use it if you have data from many different locations rather than one.
Given that you have some additional attributes, you could certainly try to use these. An important thing to do is to set the role of attributes that you want to exclude from the analysis. So I can see that station_ID should be probably be an id and starttime should be given a role that is not regular - i.e. you can use "starttime". You can use "Set Role" to change the role of attributes. I would suggest that RoadLane, TrafficDirection should be regular attributes and TrafficCount should be the label. I don't know your data but it seems likely that you wouldn't get good results without additional features. This is where Generate Attributes comes in to allow the creation of new regular attributes like Hour, Day and so on from the starttime.
regards
Andrew
Hi Leo,
Prior to me joining RapidMIner I was in the transporation world, mostly trains but dealt with auto traffic counts occasionally.
What Andrew is suggesting (Windowing) will likely be the way you analyze the data set going forward but I think you might want to do some aggregation beforehand.
From your screenshot, the timestamps appear to be in 15 minute increments and in two directions. I'm not sure if you want to aggregate it to the hourly level but if you do, you want to use the Aggregate operator first.
I wrote up an explaination of using the Windowing operator here, check that out first.
Is there any specific reason why you want to use a Neural Net algorithm in this case? They can be hard to train.
Hi TBone and Andrew,
Thank you for your help.
I have gone through the first part of the challenge...i.e. the neural net training and i realized i was using too many different data points (different Lanes, Directions, different measuring_Stations, etc) hence it was difficult to get the proper results. But when I used data from one measuring point (station), same direction, same lane, same Time on different days (e.g. 10:00 a.m. every day for one month), then it seems to work well.
My next question is that now I have trained the model and tested it successfully, how do i use this information to then predict the traffic volume at that same point in for the next month? So model was trained on January traffic data. How do i predict traffic data using the trained model for other months February, March, April, or even January of the next year?
Best regards,
-Leo