"Time series"

DocMusher · June 2015

Hi,
I am very much attracted to the way stock market specialists use data mining with time series.
The data I use has multiple attributes. Each attribute has a value, a timestamp and a patient (ID) from which the value was generated at a certain moment.
The problem here is that timestamps are 24/7 without any pattern, so if I want to compare I need to shift different patients as if they happende in the same timeframe.
Which template or approach best suits this task in order to discriminate between 2 groups having a label: survived, non-survived.
I suppose I will be able to find the weights of the different attributes but I am also interested in finding the threshold where an attribut or a combiantion of attributes results in another label (eg. from survival to non-survival). These findings could facilitate the search for which attributes are important, which values of attributes determine the outcome (label) and when ( a combination of attributes, values, timestamp or frame) the label swith from survivor to non survivor or reverse.
Who is willing to give me some feedback?
Cheers
Sven

MartinLiebig · June 2015

Hi Sven,

i think i proposed a similar idea in another thread of yours.
What you might do is refine the problem a bit into: Does the patient survive the next X hours?

For the non-survivers you take a window of X hours before the time of death and anlyse this timeframe (Extract mean, std, peaks etc. of diffrent time seria). For the survivers you take a random point on the time axis and again look X hours back to compute the values.
Then you can do usual supervised learning on this structured data. The resulting confidence can be used quite similar to a ECG.

Cheers,
Martin

JEdward · June 2015

Alongside the windowing of X-hours before time of death I'd also be tempted to look into adding attribute flags of some variables as RFM (Recency, Frequency, Medical) over a longer period of time. Hours + days + weeks + months. For example if someone has suffered from a previous cardiac arrest in the last week, I would wager that their odds of survival are less than those who have had a cardiac arrest in the last year.

For example:
PatientID | HeartAttack_LastWeek | HeartAttack_Over2yearsAgo | Survival
1 | 0 | 2 | yes
2 | 1 | 0 | no

What attributes are important or not will probably take some time to discover. Whilst heartattack is yes/no flag, other attributes would just be high, low + StdDev etc readings for a time period.
This will help you determine not only which attributes are important indicators, but also what the important time period for treatment is. If intervening earlier / later by a few hours of an indicator can boost a patients survival rate that would be brilliant.

DocMusher · June 2015

Hi,
Thank you both for your replies. In fact the database I use provides for every patient a value related to an attribute at a particular timestamp.
ID Att Value Time
1 1 200
This generates multivariate time series for one patient and based on average multivariate values time for all patients. Because the data I use is historical data (2001-2008), each patient is already having a label for survival.

The aim of the analysis is to 1/ quantify differences in time series of different patient populations 2/ find correlations between attributes and 3/ find predictive changes of one or more attribute ending in the labeled outcome.

Cheers
Sven

DocMusher · June 2015

Hi,
How could the next chart be generated using RM?

As I understand it is this tool able to play a Scatter Multiple chart by changing time in order to observe whether correlations are fixed or variable over time.
Cheers
Sven

MartinLiebig · June 2015

Hey,

what about scatter matrix?

Cheers,
Martin

DocMusher · June 2015

Hi,
This gives an additional way to present correlation. Let say 50 attributes with values which can give a scatter matrix at one moment (T1). At at T2 and T3, correlation can change or not. In the latter the relation between 2 attributes is time independent?
Sven

JEdward · June 2015

If it's not available I think it could be useful to work out how much effort it will be to add it.
I'll add it to the user stories I'm collecting for an extension we're planning with improved visualisation options.

One way to almost generate it would be to use a Loop operator to produce multiple outputs for different time values as a collection. You can then click through the objects in the collection and view them in the Scatter Matrix plot.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Time series"

Answers