🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

"Time series"

DocMusherDocMusher Member Posts: 329   Unicorn
edited June 2019 in Help
Hi,
I am very much attracted to the way stock market specialists use data mining with time series.
The data I use has multiple attributes. Each attribute has a value, a timestamp and a patient (ID) from which the value was generated at a certain moment.
The problem here is that timestamps are 24/7 without any pattern, so if I want to compare I need to shift different patients as if they happende in the same timeframe.
Which template or approach best suits this task in order to discriminate between 2 groups having a label: survived, non-survived.
I suppose I will be able to find the weights of the different attributes but I am also interested in finding the threshold where an attribut or a combiantion of attributes results in another label (eg. from survival to non-survival). These findings could facilitate the search for which attributes are important, which values of attributes determine the outcome (label) and when ( a combination of attributes, values, timestamp or frame) the label swith from survivor to non survivor or reverse.
Who is willing to give me some feedback?
Cheers
Sven
Tagged:

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,127  RM Data Scientist
    Hi Sven,

    i think i proposed a similar idea in another thread of yours.
    What you might do is refine the problem a bit into: Does the patient survive the next X hours?

    For the non-survivers you take a window of X hours before the time of death and anlyse this timeframe (Extract mean, std, peaks etc. of diffrent time seria). For the survivers you take a random point  on the time axis and again look X hours back to compute the values.
    Then you can do usual supervised learning on this structured data. The resulting confidence can be used quite similar to a ECG.

    Cheers,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 574   Unicorn
    Alongside the windowing of X-hours before time of death I'd also be tempted to look into adding attribute flags of some variables as RFM (Recency, Frequency, Medical) over a longer period of time.  Hours + days + weeks + months.  For example if someone has suffered from a previous cardiac arrest in the last week, I would wager that their odds of survival are less than those who have had a cardiac arrest in the last year. 

    For example:
    PatientID | HeartAttack_LastWeek | HeartAttack_Over2yearsAgo | Survival
    1 | 0 | 2 | yes
    2 | 1 | 0 | no

    What attributes are important or not will probably take some time to discover.  Whilst heartattack is yes/no flag, other attributes would just be high, low + StdDev etc readings for a time period. 
    This will help you determine not only which attributes are important indicators, but also what the important time period for treatment is.  If intervening earlier / later by a few hours of an indicator can boost a patients survival rate that would be brilliant. 
  • DocMusherDocMusher Member Posts: 329   Unicorn
    Hi,
    Thank you both for your replies. In fact the database I use provides for every patient a value related to an attribute at a particular timestamp.
    ID  Att  Value  Time
    1  1      200   
    This generates multivariate time series for one patient and based on average multivariate values time for all patients. Because the data I use is historical data (2001-2008), each patient is already having a label for survival.

    The aim of the analysis is to 1/ quantify differences in time series of different patient populations 2/ find correlations between attributes and 3/ find predictive changes of one or more attribute ending in the labeled outcome.

    Cheers
    Sven
  • DocMusherDocMusher Member Posts: 329   Unicorn
    Hi,
    How could the next chart be generated using RM? image
    As I understand it is this tool able to play a Scatter Multiple chart by changing time in order to observe whether correlations are fixed or variable over time.
    Cheers
    Sven
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,127  RM Data Scientist
    Hey,

    what about scatter matrix?

    Cheers,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • DocMusherDocMusher Member Posts: 329   Unicorn
    Hi,
    This gives an additional way to present correlation. Let say 50 attributes with values which can give a scatter matrix at one moment (T1). At at T2 and T3, correlation can change or not. In the latter the relation between 2 attributes is time independent?
    Sven
  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 574   Unicorn
    If it's not available I think it could be useful to work out how much effort it will be to add it. 
    I'll add it to the user stories I'm collecting for an extension we're planning with improved visualisation options. 

    One way to almost generate it would be to use a Loop operator to produce multiple outputs for different time values as a collection.  You can then click through the objects in the collection and view them in the Scatter Matrix plot. 
Sign In or Register to comment.