Predicting Influenza cases with historical data on weather and influenza cases

annicchiarico_aannicchiarico_a Member Posts: 6 Learner I

Dear Rapid Miner Community,

For a university class we have to create a prediction modelling with rapid miner. We have done ok until now, but are stuck and can't seem to find a solution.

This is our current process. We have prepared and joined 2 data sets with historical data. Data set 1 gives laboratory confirmed Influenza cases, Data set 2 gives us weather (temperature max, temperature min and precipitation). Both data sets are from NYC.

For both data sets we set a period from December to march, since those are the known influenza periods. We would now like to predict Influenza counts for the future in relation to the weather from that period and the the week before the Influenza count each time. The following xml file will show you how we prepared the data. Now we are not sure if we made a mistake while preparing it, or if we made a mistake for our modelling. If you need the original data sets, or another step before, we have those too!

We have tried modelling with linear regression, polynomial regression, logistic regression and neural net, but none of our modelling attempts were successful and we are kind of stuck.


  • annicchiarico_aannicchiarico_a Member Posts: 6 Learner I
    edited January 2019
  • annicchiarico_aannicchiarico_a Member Posts: 6 Learner I
    edited January 2019

    <?xml version="1.0" encoding="UTF-8"?><process version="9.1.000">
      <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="productivity:execute_process" compatibility="9.1.000" expanded="true" height="68" name="Execute 7 Only Influenza A" width="90" x="45" y="34">
            <parameter key="process_location" value="//Data Management/Processes/7 Only Influenza A"/>
            <parameter key="use_input" value="true"/>
            <parameter key="store_output" value="false"/>
            <parameter key="propagate_metadata_recursively" value="true"/>
            <parameter key="cache_process" value="true"/>
            <list key="macros"/>
            <parameter key="fail_for_unknown_macros" value="true"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
  • annicchiarico_aannicchiarico_a Member Posts: 6 Learner I
    Somehow I'm not able to post the xml code. Could someone please help?
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    hi @annicchiarico_a just curious...what university and class is this?
  • annicchiarico_aannicchiarico_a Member Posts: 6 Learner I
    Thank you for your help Lionel, it should have worked now!

    hi @sgenzer , we are doing a Master of Arts in Digital Management, this is the first and only class we have in data management, so we are by far no experts. The goal of our post is not to get a ready made answer that will save us the work but to understand where the problem is so that we know what we need to change so that it can work 
  • annicchiarico_aannicchiarico_a Member Posts: 6 Learner I
    Basically, what we have done until now is removing missing values, removing attributes we don't need, limiting the period to december to march. Then when we first joined them we had the problem that when they joined, we only had the weather info for the dates that we had the influenza counts. However if you get sick today, that's not because it's raining today right? So we tried to do it differently with union, but that leaves us with a hell of a lot of missing values, so that's probably why any modelling isn't working. It's pretty frustrating at this stage. 
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hi @annicchiarico_a

    If possible can you post your dataset and xml code how you are processing? This helps us understand more to help you. Sorry, that I lost somewhere in your post and couldn't understand what you are trying to predict. Are you trying to get the variable that is influencing the output (influenza)?


    Be Safe. Follow precautions and Maintain Social Distancing

Sign In or Register to comment.