Analysis and normalization of instantaneous data

student_computestudent_compute Member Posts: 49 Contributor I
Hello friends
I have a sensor that gives me information at any time (10 milliseconds once). E.g. x, y I have thousands of these x and y. I know clustering and classification in rapidminer.
I ask experienced friends
What suggestions do you have for this data?
How can I predict x, y?
And analyze the data?
I ask you to help me
Thanks to the very good rapidminer


  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 916   Unicorn
    You are going to want to look at some kind of feature selection.  I would recommend one of the variance reduction techniques, like PCA.  You will have a lot of redundant overlap in data that is taken so frequently.
    What is it that you are trying to do with this data---predict some outcome?
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • student_computestudent_compute Member Posts: 49 Contributor I
    Thank you so much for your help
    I have my data in this way
    It's time to be in a pillar
    Insert x in a column and type y in a column
    like this:

      time    x     y
       21    45     8
       35    52   12

    Now I do not know how to normalize
    And I can predict values of x, y at a later time?
    Or do I analyze the data?
    If anyone has experience
    Maybe help
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 916   Unicorn
    Take a look at the new Time Series operators, they are part of the standard Studio operator set.
    There is an operator for Normalizing time series data.  There are also operators for forecasting time series data such as ARIMA or Holt-Winters.  I would probably start with ARIMA.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • student_computestudent_compute Member Posts: 49 Contributor I
     I am a beginner in this case
    May you give me more guidance?
    Tutorial to introduce me?
    Thank you so much
  • hughesfleming68hughesfleming68 Member Posts: 114   Unicorn
    Take a look at the ARIMA examples, specifically the ARIMA model for Lake Huron. To better understand ARIMA, do a search on Rob Hyndman. He wrote the forecast package for R and there are a lot of examples that you could duplicate in Rapidminer. You will have to understand normalization and what it means for your time series to be stationary. Don't take this part lightly as it can make or break your forecast.
  • student_computestudent_compute Member Posts: 49 Contributor I
    thanks for your help
    I searched a lot about the time series
    But it is still ambiguous to me
    My data is as below.

    I do not know how to normalize the data in the RapidMiner program. And does not need normalization at all?
    How to stack the series?
    How to use ARIMA? So I can predict the x and y values at a later time?
    I ask you to help me
    best regard
  • hughesfleming68hughesfleming68 Member Posts: 114   Unicorn
    edited January 8
    Did you look at the operators to see what they do?  @student_compute, I have read a lot of your posts and you seem quite lost. Unfortunately,  there are no shortcuts. You have to put in the time to learn the material. Is this for school? There are already standard ARIMA examples. It would be helpful if you could be more specific about what exactly you are having difficulty with. Do you understand what normalization means? Do you understand why a time series might need to be de-trended? When your question is so broad, it is hard to figure out where to begin. Post a process. That is the best way to get help. It is much quicker to solve problems that way.
  • MaerkliMaerkli Member Posts: 73   Unicorn
    Take 15mn to look this training
    How to normalize data in RapidMiner by Markus Hofmann.
    I enclose as well an example given some months ago in the RM Forum:

    <?xml version="1.0" encoding="UTF-8"?><process version="9.1.000">
      <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.1.000" expanded="true" height="68" name="Retrieve Sonar" width="90" x="45" y="85">
            <parameter key="repository_entry" value="//Samples/data/Sonar"/>
          <operator activated="true" class="set_role" compatibility="9.1.000" expanded="true" height="82" name="Set Role" width="90" x="179" y="85">
            <parameter key="attribute_name" value="class"/>
            <parameter key="target_role" value="label"/>
            <list key="set_additional_roles"/>
          <operator activated="true" class="concurrency:cross_validation" compatibility="9.1.000" expanded="true" height="145" name="Cross Validation" width="90" x="380" y="85">
            <parameter key="split_on_batch_attribute" value="false"/>
            <parameter key="leave_one_out" value="false"/>
            <parameter key="number_of_folds" value="10"/>
            <parameter key="sampling_type" value="automatic"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
            <parameter key="enable_parallel_execution" value="true"/>
            <process expanded="true">
              <operator activated="true" class="normalize" compatibility="9.1.000" expanded="true" height="103" name="Normalize" width="90" x="112" y="136">
                <parameter key="return_preprocessing_model" value="false"/>
                <parameter key="create_view" value="false"/>
                <parameter key="attribute_filter_type" value="all"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="numeric"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="real"/>
                <parameter key="block_type" value="value_series"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_series_end"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="method" value="Z-transformation"/>
                <parameter key="min" value="0.0"/>
                <parameter key="max" value="1.0"/>
                <parameter key="allow_negative_values" value="false"/>
              <operator activated="true" class="h2o:logistic_regression" compatibility="9.0.000" expanded="true" height="124" name="Logistic Regression" width="90" x="246" y="34">
                <parameter key="solver" value="AUTO"/>
                <parameter key="reproducible" value="false"/>
                <parameter key="maximum_number_of_threads" value="4"/>
                <parameter key="use_regularization" value="false"/>
                <parameter key="lambda_search" value="false"/>
                <parameter key="number_of_lambdas" value="0"/>
                <parameter key="lambda_min_ratio" value="0.0"/>
                <parameter key="early_stopping" value="true"/>
                <parameter key="stopping_rounds" value="3"/>
                <parameter key="stopping_tolerance" value="0.001"/>
                <parameter key="standardize" value="true"/>
                <parameter key="non-negative_coefficients" value="false"/>
                <parameter key="add_intercept" value="true"/>
                <parameter key="compute_p-values" value="true"/>
                <parameter key="remove_collinear_columns" value="true"/>
                <parameter key="missing_values_handling" value="MeanImputation"/>
                <parameter key="max_iterations" value="0"/>
                <parameter key="max_runtime_seconds" value="0"/>
              <connect from_port="training set" to_op="Normalize" to_port="example set input"/>
              <connect from_op="Normalize" from_port="example set output" to_op="Logistic Regression" to_port="training set"/>
              <connect from_op="Normalize" from_port="preprocessing model" to_port="through 1"/>
              <connect from_op="Logistic Regression" from_port="model" to_port="model"/>
              <portSpacing port="source_training set" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
              <portSpacing port="sink_through 2" spacing="0"/>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="9.1.000" expanded="true" height="82" name="Apply Model" width="90" x="112" y="85">
                <list key="application_parameters"/>
                <parameter key="create_view" value="false"/>
              <operator activated="true" class="apply_model" compatibility="9.1.000" expanded="true" height="82" name="Apply Model (2)" width="90" x="246" y="34">
                <list key="application_parameters"/>
                <parameter key="create_view" value="false"/>
              <operator activated="true" class="performance" compatibility="9.1.000" expanded="true" height="82" name="Performance" width="90" x="380" y="34">
                <parameter key="use_example_weights" value="true"/>
              <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_port="through 1" to_op="Apply Model" to_port="model"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Apply Model (2)" to_port="unlabelled data"/>
              <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
              <connect from_op="Performance" from_port="example set" to_port="test set results"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="source_through 2" spacing="0"/>
              <portSpacing port="sink_test set results" spacing="0"/>
              <portSpacing port="sink_performance 1" spacing="0"/>
              <portSpacing port="sink_performance 2" spacing="0"/>
          <connect from_op="Retrieve Sonar" from_port="output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
          <connect from_op="Cross Validation" from_port="model" to_port="result 3"/>
          <connect from_op="Cross Validation" from_port="test result set" to_port="result 1"/>
          <connect from_op="Cross Validation" from_port="performance 1" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>

    Bonne chance,

  • hughesfleming68hughesfleming68 Member Posts: 114   Unicorn
    edited January 8
    In addition to the video and process kindly posted by @Maerkli, with time series data, you will need to know what first order differencing is and why you might need to use a moving average to de-trend your data. You will have to understand your data first so plot it out and take a look.

  • MaerkliMaerkli Member Posts: 73   Unicorn
    Hallo Hughes,
    Thanks for the link.
    PS. C'est du lourd.

  • student_computestudent_compute Member Posts: 49 Contributor I
    Hello to all
    Thank you very much for helping my dear friends <3 
    I am a beginner in time series.
    I studied the basic concepts
    But it is difficult to understand and generalize the concepts of theory to practical
    My data is related to a sensor that is received at different times.
    I want to anticipate new values for later on these data
    But do not know where to start :(
    So, I asked experienced friends at the forum for help.
    I'm sure to try to create a process. So friends can guide me.
    Thank you all
    good day
  • MaerkliMaerkli Member Posts: 73   Unicorn
    Hallo Student_compute,
    If your data are not confidential, share them with the RapidMiner community and explain exactly what you want. I am sure that many people are going to help you out. CSV format is very convenient.
Sign In or Register to comment.