Options

Classifying? / Labelling? Time series examples

AndryehAndryeh Member Posts: 8 Learner I
Hi. I'm new to this but I think I can muddle my way through most of it. The one thing I'm struggling with is that I have a collection of time series examples that I basically want to organise into two groups, True and False, so that I can use the Deep Learning extension on them. What's the best way of applying a label for the DL operator to predict? Is there anyway I can just sort them into two seperate groups? "Tag" the examplesets? Do I just add an extra row at the bottom with True or False in them and get the operator to predict that?

Thanks.

Answers

  • Options
    BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi!

    For classification the data have to be in table format, with one row representing one example and having a label as an attribute.

    This means that you'll need to transform your time series into this format. For example you would extract minima, maxima, percentiles by time range (if they aren't for the same time span) and so on. Transpose and Pivot are operators you'll probably need.

    Regards,
    Balázs
  • Options
    AndryehAndryeh Member Posts: 8 Learner I
    Thanks, I can do that but I'm not sure how that affects the Deep Learning operator. Does the Deep Learning operator look at the data any differently when it's in a time series layout than it would an example in a single row? It's all the same data just in a different layout. Does DL care?
  • Options
    BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi!

    In RapidMiner you always set up a table with rows as the examples and a label attribute for classification and regression.

    There are also time series forecasting methods that take a time series and can predict future values of the series. But that's not what you're doing here. These are completely different operations, requiring data in different layouts.

    Look at the Deep Learning operator after putting it into your process. Its input is an Example Set, and the help tells you that it needs a "labeled ExampleSet". That's a table with a label attribute. It can't directly take a time series as the input. You can create an example set from your time series and use operators like Windowing to create attributes for the predictions, and then set up the original time series attribute as the label. But that's one regression model for one series.

    Regards,
    Balázs
  • Options
    AndryehAndryeh Member Posts: 8 Learner I
    Thanks, that makes sense. I need to clarify that I'm actual using tensors which I understand would be better compared with objects in a programming sense.

    I wasn't actually expecting this to be difficult. I thought there would be some way of "tagging" the tensor. How does it work with training using images? They can be presented as tensors from what I've read. For example, you're training the model to tell if there's a dog or a cat in the picture, during training how would you tell the model it's looking at a dog or a cat? Could it be that RapidMiner is just not the tool I need?

    Again, apologies if I'm saying anything eye-rollingly stupid.
  • Options
    BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi,

    please check out this in-depth introduction to the Deep Learning extension:
    https://community.rapidminer.com/discussion/52670/a-more-native-deep-learning-solution
    It enables you to build a custom neural network with different neuron node types and handle it as a model.
    This can be also used for image classification.

    How are you building tensors? You probably do have the underlying data in a not-a-tensor format or you can extract the raw data from the tensors again.

    What are you using the tensors for? This is a use case where you want to compare properties of these time series and classify them. This means that you need to deduce these properties and put them into an example set suitable for classification. The way I described it is the way RapidMiner does it, but also R and Python. 

    There is no way to put a bunch of tensor objects into an operator and expect it to classify them. 

    Data preprocessing is 70-80 % of the project time in most cases. In this case you have data in a format not suitable for classification and you want to have them in this format. I described the ways you can process the time series data in RapidMiner to achieve this. 

    Regards,
    Balázs
Sign In or Register to comment.