Predicting geographic events

tomtom Member Posts: 4 Contributor I
edited November 2018 in Help



I'm new to Rapidminer and predictive analytics and I'm attempting to build a model that can predict the likelihood of an event occurring within an area based on years of previous data.


I have several years of data of these events including their lat/long and numerous characteristics about the event. These events have all occurred around a city. Ultimately I'd like to have a grid over a map that divides the city into zones and then use the model to predict the likelihood of the event occurring within each grid square given a specified day of the week, time of the day, weather etc.


I began by choosing an initial area to focus on (main metro area) and filtered out all the data that was not within that "box". I then used Discretize by Binning to divide that area into zones for both the X and Y coordinate and then merged those attributes into one.


I'm not sure if I am on the correct track or not, but at this point I'm stuck on what to do next?


Could anyone point me in the right direction or provide me with a previous tutorial/example for predicting the location of events?







  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist

    Dear Tom,


    thanks for choosing RapidMiner! I think you are on a very good way. The key thing is to create a profile per event. If you can get some meta information for example the part of the city, if it is a mainstreet or something, that would be great. So you need to create attributes describing the place and time (like "Is weekday? Try Date to Numerical for this).

    Afterwards i think it turns out to be a time series problem. You want to predict how many events there are tomorrow (or in 1h). A standard technique for this is windowing. You built look at the values of the past X-days to predict the next.


    @BalazsBarany got some experience with the geographic part, i think.




    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn



    I think you're on the right track. One aspect that I'd recommend to verify is the similarity of "boxes" near each other. 

    So maybe if the box you're looking at didn't have a traffic accident in the last month but the boxes directly to the north and and the east had, you could use a value of e. g. 0.2 instead of 0 in your accidents attribute. 


    The geographic aspect could help insofar as that you could assign additional attributes to your boxes by looking at geographic features if you have access to them. E. g. number and length of street segments, area of parks and other natural areas, city district, population density by district etc. A lot of data can be extracted from OpenStreetMap, for example.


    Here's my post about geographic operations in RapidMiner:


  • Options
    tomtom Member Posts: 4 Contributor I
    Thanks for your help so far. I'm starting to get s reasonable grasp on my problem and rapidminer.

    However, I'm stuck on what is probably a data preparation problem. I have csv files filled with these events - date,day,time,location, weather and lots of other attributes. But I'm struggling with how to present this info given that I don't have a label attribute. I only have data on when an event occurred as opposed to when it didn't occur.

    My ultimate question is to find out the likelihood of an event occurring within one of the boxes given the day, date, time, weather, nearby events etc.

    How do I transform this data into something that can be put into a learning algorithm? Do I add to this data set a sort of 'blank example' when an event didn't occur in the box? ie. It would have date, time, weather and then a 'event occurred'=N attribute and all the other attributes would be blank?


  • Options
    Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    You can create a label in many ways, but from what you describe I suggest the Generate Attributes operator.


    In the Generate Attributes operator you can create a new column called "label," then in the expression field you can write an expression like if(!missing(event_column),"event occured","no event'). This will create a new column based on your event column. Then use a Set Role to set the new column to a label and use a Select Attributes to remove the event_column (if you want too).


    See example.

    <?xml version="1.0" encoding="UTF-8"?><process version="7.2.000">
    <operator activated="true" class="process" compatibility="7.2.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.2.000" expanded="true" height="68" name="Retrieve Labor-Negotiations" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Samples/data/Labor-Negotiations"/>
    <operator activated="true" class="generate_attributes" compatibility="7.2.000" expanded="true" height="82" name="Generate Attributes" width="90" x="246" y="34">
    <list key="function_descriptions">
    <parameter key="label" value="if(!missing([col-adj]),&quot;Event&quot;,&quot;No Event&quot;)"/>
    <connect from_op="Retrieve Labor-Negotiations" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
Sign In or Register to comment.