Using Poisson distribution in RapidMiner

Dave0408Dave0408 Member Posts: 8 Contributor I
edited November 2018 in Help

Hey there,

i'm writing my masterthesis about predictive analytics and text mining. So i got in contact with RapidMiner. Starting with this tool and trying different things out was quite easy. Now i would like to use POISSON distribution to calculate probabilities of different events. But i couldn't find any operator that supports Poisson distribution. Is there one?

So i started to install an extension pack for R and thought i could do this job in an R script. Unfortunatelly R or Python is not supported in RapidMiner 7.1? Every startup i get an error message.

 

Any ideas or hints how i could do the Poisson calculations?

My current workaround is extracting the values i need using rapidminer. Export them into an excel file and do the poisson manually with an excel function. Then retrieve those result in an other process. But I could imagine that there's a more handy way to do this.

Thanks.

Kind regards from germany

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Hi Dave,

     

    R and Python are supported in RapidMiner 7.1, not sure what kind of error messages you are getting, please share.

     

    I believe @mschmitz wrote an Poisson Distribution operator for RapidMiner. I would touch base with him.

     

     

  • Dave0408Dave0408 Member Posts: 8 Contributor I

    Hey @Thomas_Ott,

     

    thanks for your reply. I'll contact @mschmitz for further information.

     

    In addition my error:

    After Installing R Scripting extension and application reboot i'll get an "incompatible extension" warning.

     

    R Scripting 
    Version 7.0.0
    Release date Jan 22, 2016
    File size 89 kB
    License RM_EULA
    Python Scripting 
    Version 7.0.0
    Release date Jan 22, 2016
    File size 78 kB
    License RM_EULA

    rm_incompatible_extension.jpg

     

    I'm using RapidMiner 7.1.001 on my:

    Windows 10 Pro
    Intel Core i5-5200 CPU @2.20Ghz
    16GB RAM
    64-Bit

     

     If need any other information...let me know.

     

    Dave

     

    Ps.: Other extensions work fine.

     

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    This is a silly question, but do you have Python and R installed and configured in the Preferences?

  • Dave0408Dave0408 Member Posts: 8 Contributor I

    @Thomas_Ott wrote:

    This is a silly question, but do you have Python and R installed and configured in the Preferences?


    Oh, I should have add that info.

    Yes, I did.

    For example i installed and configured R-3.2.5.

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi Dave,

     

    what i built quite a while ago is a naive bayes using poisson distribution instead of gaussian distribution. But this was rather to learn how to write an operator :).

     

    What do you want to do with poisson? Seems like something very easy to built as an operator if you can do some java.


    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Dave0408Dave0408 Member Posts: 8 Contributor I

    Hey,

     

    what i'm tying to do:

     

    I'm trying to predict the outcomes of soccer matches of the german Bundesliga using a lot of historical data (like Shots on Target, Full Time Goals, ...).

    Once calculating the offensive and defensive strength of each team it should be possible to predict results using poisson distribution.

    The result should be the probabilty of a score for 0:0, 0:1, 1:0, 1:1 and so on...

    For example something like this but using rapidminer instead of excel. 

    The reason why i'm not just using excel is that i would like to combine mutliple strategies (Poisson, Team Ranking, Text Mining(RSS Feeds),...) and i think

    that rapidminer would be perfect to aggregate those data and give me a final result.

     

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Dave,

     

    i think you can do all of that inside RM. You need to calculate the avg(#Leauge) and eval the poisson dist. While Poisson is not included in GenA it should be easily calculateable by copy/pasting the right formula. Since you do not expect values above 10 the there shouldn't be any problem.

     

    Otherwise you can easily use R/Python/Javascript to built the new col. Might be nicer to use Poisson from Scipy or similar.

     

    Please be a bit careful with this article. It does a data science mistake. You take the average of the whole seasnon to calculate the value. Technically you should only take the values until thedate, because you transfer over label information otherwise.

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.