RapidMiner

RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2017

Community Manager Community Manager
Community Manager

RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2017

Hello all community members -

 Picture1.png

Welcome to the 2nd RapidMiner Data Science Competition: Farming on "Mars"!  

 

Our sponsor and we are super excited to bring this open competition to our 270,000+ users and we hope that you have a great time exploring this unique use case.  Below is a brief summary and rules of the competition; complete documentation can be found in the attachments below.  PLEASE READ all the attached documentation before beginning the competition and let the best model win!

 

Summary

 

One of the major challenges of the human colonization of “Mars” is the introduction of Earth-independent food production facilities, i.e. farming. A key element to farming on “Mars” will be the fertilization of available soil, which in its current state is not farmable due to a lack of nutrients.  In order to address this, an experimental setup has been created under “Martian” environmental conditions to produce bio-fertilizer made from algae and measure the usable yield after each production run.  This yield varies based on the exact quantities of certain base nutrients and the optional addition of one of two possible additional nutrients, α or β, inserted into the bio-fertilizer at some time t during the production run.  The research facility has already done 1653 production runs, each one lasting 36 hours with 41 sensors recording data every hour, and recorded the potential yield of each one.  These are your data to work with during this challenge.

 

Challenge

 

The goal of the challenge is to build a model that will classify which additional nutrient, α or β, and at what time t, will be most likely to boost yield during a production run.  The metric to be optimized is the cumulative score value of the same 178 production runs in the test set; the baseline example above has a cumulative score value of 1000.

 

Submission and Evaluation

 

All submissions in this competition need to be posted in this thread with the entire XML of the process and the score. This includes the finished models, as well as the entire training process and all pre-processing steps.  The deadline for submissions is October 13, 2017 at 23:59:59 UTC.

 

RapidMiner Server Instance

 

In order to increase the efficiency of model training and to demonstrate RapidMiner’s powerful parallel processing capabilities with its new SaaS on Amazon AWS EC2 , RapidMiner has agreed to provide a free Server EC2 instance for all participants for the duration of this competition. This server instance can be used by any participant free of charge, as often as desired, for the duration of the competition as long as all use is restricted to this competition only. Participants wishing to use this server must @sgenzer a private message to register and obtain the relevant connection details.  The instance URL is https://competitions.rapidminer.com and will be online only for the duration of the competition.

 

Winner and Prizes

 

The winner of the competition will be selected based on the highest aggregate score value of the 178 testing production runs ≥ 1000, after applying the test dataset to the submitted models. All submissions will be validated by RapidMiner and the competition’s sponsor within 72 hours after their submission. The winners of this RapidMiner Data Science Challenge will be announced by October 17, 2017 in the competition’s thread.

 

RapidMiner and the competition sponsor will award the following prices to the winners:

 

1st place:     US$1000

2nd place:    US$250

3rd place:     US$100

 

PLUS all participants who submit a valid entry in the thread prior to the deadline will be eligible to win one or more amazing RapidMiner “swag” items.  Supplies are limited and will be awarded on a first come-first served basis.

 

Restrictions

 

All participants of the RapidMiner Data Science Competitions must be registered users in good standing of the RapidMiner User Community and age 18 or older at the time of entry.  Employees, directors, consultants, and any other persons affiliated with RapidMiner, Inc. are not eligible to participate in this competition.

 

Good luck everyone and reply to this thread with questions and your models!

 

Scott

 

 Links: Training Data Set

            Test Data Set

            Annotated Data Set Example 

 

 

 

 

 

 

Scott Genzer
Senior Community Manager
RapidMiner, Inc.
59 REPLIES
Contributor I bigD
Contributor I

Re: RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2

Hi Scott,

Looks like an interesting problem Smiley Happy  It appears that 'run 1341' in the test dataset may be corrupted.

Cheers

Dan

temp.png

Community Manager Community Manager
Community Manager

Re: RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2

Hi Dan -

 

Hmm.  I just downloaded the zip from the link above and I see no problems the files.  

 

Screen Shot 2017-09-15 at 4.44.24 PM.png

 

Download again?  

 

https://rapidminer-my.sharepoint.com/personal/sgenzer_rapidminer_com/_layouts/15/guestaccess.aspx?do...

 

Scott

Scott Genzer
Senior Community Manager
RapidMiner, Inc.
Highlighted
Contributor I bigD
Contributor I

Re: RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2

I guess it does have run 1341 but it also has a corrupted fragment at the bottom of the list.  I'll just delete it.

D.

 

Learner III 16B543J
Learner III

Re: RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2

Hi,

 

I would like to clarify a few points on the explanation given.

"

These are the production yield increases for the production run at each hour of production.  For this example, all yield increases for nutrient A (column AS) will be scored as invalid (-100) because it was shown later that nutrient B was needed (see cell C10).  For column AT, the score is determined by which hour nutrient B was inserted: if nutrient B was inserted at t=0, score = 62.5  If nutrient B was inserted at t=5 hours, score = 59.5.  If nutrient A was inserted at t = 24 hours, score = 54.3"

 

1. If nutrient B was inserted at t=5 hours, score = 59.5. It should be 59.9.

2. If nutrient A was inserted at t = 24 hours, score = 54.3. This statement is true only when the Label is equal to "A".

 

Pls clarify. Thank you.

 

Community Manager Community Manager
Community Manager

Re: RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2

hello @16B543J - thanks for your questions.  I am assuming you are referring to the annotated training set 1?  Here are my answers.

 

These are the production yield increases for the production run at each hour of production.  For this example, all yield increases for nutrient A (column AS) will be scored as invalid (-100) because it was shown later that nutrient B was needed (see cell C10).

 

1. Yes that is correct.

 

For column AT, the score is determined by which hour nutrient B was inserted: if nutrient B was inserted at t=0, score = 62.5  If nutrient B was inserted at t=5 hours, score = 59.5.  If nutrient A was inserted at t = 24 hours, score = 54.3".  

 

1. If nutrient B was inserted at t=5 hours, score = 59.5. It should be 59.9.

2. If nutrient A was inserted at t = 24 hours, score = 54.3. This statement is true only when the Label is equal to "A".

 

2. I'm not really sure what your question is.  For the annotated training set 1, if nutrient B was inserted at t=5, the score would be 59.9.  And if nutrient B was inserted at t=24 hrs, the score would be 54.3.  If nutrient A is inserted at any time, score = -100.

 

Thanks and good luck!

 

Scott

 

 

 

 

 

 

 

 

Pls clarify. Thank you.

 


 

Scott Genzer
Senior Community Manager
RapidMiner, Inc.
Learner III 16B543J
Learner III

Re: RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2

Thanks Scott for the clarification.

RM Certified Expert
RM Certified Expert

Re: RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2

Hello Scott

 

I noticed there are around 7% of the rows contain missing values for the attributes sensor41, yieldIncreaseA and yieldIncreaseB. For example trainingset 1001 shows this. Is this intentional?

 

Andrew

RM Certified Expert
RM Certified Expert

Re: RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2

Hello Scott

 

Could you change the annotation in cell AS:5 in the worked example to match your reply to avoid confusing later readers.

 

regards

 

Andrew

Community Manager Community Manager
Community Manager

Re: RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2

Hello @Andrew - thank you for the feedback.  I finally got the aha moment about what @16B543J was referring to yesterday, i.e. the text explanation in the pink boxes.  I think I have looked at that so many times that I glanced over it completely.  My apologies.  I will update the file in a few minutes.

 

As for your question about missing values, yes, there are many.  These are actually real data from our sponsor and hence there all sorts of wonky things in it.  Smiley Happy

 

Scott

 

Scott Genzer
Senior Community Manager
RapidMiner, Inc.