Noob question? Finding maximum of setsub of data
Hi,
I have a data source (CSV) with TestRun,Time,Result.
TestRun identifies when a test was run.
Time is time since the start of the test.
Result is the measured value at that time.
I have a large number of unique TestRuns. I'd like to perform some calculations but for the life of me I can't figure out how any of the loops etc work. I've also tried tutorials, time series extension etc, and tried cutting and pasting XML from other answers in this forum. Nothing seems to work (BTW There is a smiley face on one of the XML examples which screws it up.)
The calculations I'd like to perform are:
What is the maximum value of Result for the TestRun? (From this result, I can find the earliest time for the maximum result)
I can think of a number of ways to do this with a programming language, but just can't get my head around doing it within RapidMiner.
Any help would be appreciated?
Answers
Hi,
i think an Aggreate does the trick if it is only the maximum. For complex calcuations with "group by" Test Run I would use the Loop Values. I attach a two processes to demonstrate the two ideas.
Best,
Martin
Easy Aggregation
Loop Values to generate average(cumulative sum) per class
Dortmund, Germany
Thanks for the fast response Martin. Aggregate is what I was looking for - it is much faster than using the loop function, and I'm starting to see how the loop function works.
I'm guessing that my next problem to solve will require the loop function - I have now got a maximum value, and I want to find the coresponding time where the maximum occured. If I have several maximum events, I only want to see the first one.
So, I think I need to multiply my original example set so I can keep the data through the aggregation, then join that example set with the example set including the aggregated maximums, then loop by TestRun, which filters by testrun=%{loop_value} and Result=maximum(result), I'll get a resulting exampleset that includes the maximum, and the time of the maximum, for each TestRun.
Or am I over thinking it? It sounds way to complicated for what I'm trying to do...
I think you are a bit of overthinking it. Or rather - you are a bit much in a programming wolrd and not in an ETL/SQL world.
Why don't you join the max/testrun on the original one and take the first (with Filter Examples to remove missings, and Remove duplicates to take the first)). That should be way faster and easier to built.
~Martin
Dortmund, Germany