Multiple date attributes to adjust/change

benicsakpbenicsakp Member Posts: 3 Contributor I
edited December 2018 in Help

Hi Fellas,

I am new to RapidMiner.

The very first task I am trying to do requires identification of the earliest date from a large dataset containing 8 date attributes and few thousands of examples. Due to low quality data, some date attributes missing, some mistyped.

 

So far, I created a process, that is identifying the earliest date from the 8 and generates a new attribute called ’Earliest date’. Now, I have 8 date attributes, but will use only the 9th as nominal value for further transformation of the dataset.

Before doing that, I would need to filter the wrong values from the 8 attributes such as -1 or dates in the future like 2071 and so on. Without proper cleansing the 9th attribute is wrong is few cases returning 1899 and similar.

 

Is there any way to filter out the dates preferably without repeating the same operator 8 times? Outliers for dates or something close to that? I am not familiar with macros yet, but perhaps that is the missing piece…

 

Thanks!

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi,

     

    have you tried Loop Attributes?

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hello @benicsakp - welcome to the community.  Yes as @mschmitz said, Loop Attributes is likely what you need.  Please post your XML process (see "Read Before Posting" on right when you reply) so we can see exactly what your process looks like.


    Scott

     

     

  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn

    Hi @benicsakp

     


     

     Outliers for dates or something close to that? 

     


     

    I think a solution for this certain part would depend on what exactly you want to do with wrong dates (skip these examples, replace these dates with certain value etc etc). For example, if you expect all the dates to be in a certain range (like, between 1/1/2012 and today), you may try use here DECLARE MISSING VALUES operator with corresponding expression filter which would turn all values out of the pre-defined range into missing values, which you could then treat accordingly (replace or whatever).  

  • benicsakpbenicsakp Member Posts: 3 Contributor I

    Exactly. I filtered the really bad examples (all dates missing) and now, I would like to change the not too bad dates in a subprocess. 

    For example:

    IF dateattribute <  01.01.2017, THEN dateattribute = 01.01.2017
    and

    IF dateattribute > 01.01.2018 THEN dateattribute = 31.12.2018

     

    Must be very easy to do for sevearal attributes, but could not find the solution yet.

    - I checked Loop attributes as  @mschmitz suggested, but no clue how to change the subset of attributes.
    - Declare missing values is interesting, but I had the same issue as with loop attributes. Identification is easy, but the cannot make the changes. 

    - Following @sgenzer suggestion, try to post the XML here.

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="error"/>
    <process expanded="true">
    <operator activated="true" class="read_excel" compatibility="8.0.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
    <parameter key="excel_file" value="C:\Users\benicsak\Dropbox\CTBTO\DBX LSA\Analysis PHY\Better Excel (All fields) (CTBTO IRS) OLD format.xls"/>
    <parameter key="imported_cell_range" value="A1:EA2137"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="date_format" value="MM/yyyy"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Project.true.polynominal.attribute"/>
    <parameter key="1" value="Key.true.polynominal.attribute"/>
    <parameter key="2" value="Summary.true.polynominal.attribute"/>
    <parameter key="3" value="T.true.polynominal.attribute"/>
    <parameter key="4" value="Status.true.polynominal.attribute"/>
    <parameter key="5" value="P.true.integer.attribute"/>
    <parameter key="6" value="Resolution.true.polynominal.attribute"/>
    <parameter key="7" value="Assignee.true.polynominal.attribute"/>
    <parameter key="8" value="Reporter.true.polynominal.attribute"/>
    <parameter key="9" value="Creator.true.polynominal.attribute"/>
    <parameter key="10" value="Created.true.date_time.attribute"/>
    <parameter key="11" value="Last Viewed.true.polynominal.attribute"/>
    <parameter key="12" value="Updated.true.date_time.attribute"/>
    <parameter key="13" value="Resolved.true.date_time.attribute"/>
    <parameter key="14" value="Affects Version/s.true.attribute_value.attribute"/>
    <parameter key="15" value="Fix Version/s.true.attribute_value.attribute"/>
    <parameter key="16" value="Components.true.attribute_value.attribute"/>
    <parameter key="17" value="Due.true.attribute_value.attribute"/>
    <parameter key="18" value="Watchers.true.integer.attribute"/>
    <parameter key="19" value="Images.true.polynominal.attribute"/>
    <parameter key="20" value="Original Estimate.true.attribute_value.attribute"/>
    <parameter key="21" value="Remaining Estimate.true.attribute_value.attribute"/>
    <parameter key="22" value="Time Spent.true.attribute_value.attribute"/>
    <parameter key="23" value="Work Ratio.true.attribute_value.attribute"/>
    <parameter key="24" value="Sub-Tasks.true.attribute_value.attribute"/>
    <parameter key="25" value="Links.true.polynominal.attribute"/>
    <parameter key="26" value="Environment.true.attribute_value.attribute"/>
    <parameter key="27" value="Description.true.polynominal.attribute"/>
    <parameter key="28" value="Security.true.polynominal.attribute"/>
    <parameter key="29" value="Progress.true.attribute_value.attribute"/>
    <parameter key="30" value="Σ Progress.true.attribute_value.attribute"/>
    <parameter key="31" value="Σ Time Spent.true.attribute_value.attribute"/>
    <parameter key="32" value="Σ Remaining Estimate.true.attribute_value.attribute"/>
    <parameter key="33" value="Σ Original Estimate.true.attribute_value.attribute"/>
    <parameter key="34" value="Actions Taken.true.polynominal.attribute"/>
    <parameter key="35" value="CC.true.polynominal.attribute"/>
    <parameter key="36" value="CC type.true.polynominal.attribute"/>
    <parameter key="37" value="CCN Reference.true.polynominal.attribute"/>
    <parameter key="38" value="CCR Reference.true.polynominal.attribute"/>
    <parameter key="39" value="Categorization.true.polynominal.attribute"/>
    <parameter key="40" value="Closed.true.date_time.attribute"/>
    <parameter key="41" value="Closed by.true.polynominal.attribute"/>
    <parameter key="42" value="Configuration Changed.true.polynominal.attribute"/>
    <parameter key="43" value="Customer Request Type.true.attribute_value.attribute"/>
    <parameter key="44" value="Data Availability.true.attribute_value.attribute"/>
    <parameter key="45" value="Data Availability Affected.true.polynominal.attribute"/>
    <parameter key="46" value="Data Backfilled.true.polynominal.attribute"/>
    <parameter key="47" value="Data Quality.true.polynominal.attribute"/>
    <parameter key="48" value="Diagnosis Ended.true.date_time.attribute"/>
    <parameter key="49" value="Diagnosis Started.true.date_time.attribute"/>
    <parameter key="50" value="Effect of Failure.true.polynominal.attribute"/>
    <parameter key="51" value="Email Others.true.polynominal.attribute"/>
    <parameter key="52" value="End Date.true.date_time.attribute"/>
    <parameter key="53" value="End Date Text Field.true.polynominal.attribute"/>
    <parameter key="54" value="Equipment Affected.true.polynominal.attribute"/>
    <parameter key="55" value="Equipment Installed.true.attribute_value.attribute"/>
    <parameter key="56" value="Equipment Removed.true.attribute_value.attribute"/>
    <parameter key="57" value="Equipment to be installed.true.attribute_value.attribute"/>
    <parameter key="58" value="Equipment to be removed.true.attribute_value.attribute"/>
    <parameter key="59" value="Equipment/Software Affected.true.attribute_value.attribute"/>
    <parameter key="60" value="External issue ID.true.attribute_value.attribute"/>
    <parameter key="61" value="FMECA Codes.true.polynominal.attribute"/>
    <parameter key="62" value="Failure Comments.true.polynominal.attribute"/>
    <parameter key="63" value="Failure Details.true.attribute_value.attribute"/>
    <parameter key="64" value="Filter.true.attribute_value.attribute"/>
    <parameter key="65" value="Filters.true.polynominal.attribute"/>
    <parameter key="66" value="Heading.true.polynominal.attribute"/>
    <parameter key="67" value="Immediate Incident.true.polynominal.attribute"/>
    <parameter key="68" value="IsLaboratory.true.polynominal.attribute"/>
    <parameter key="69" value="Iteration.true.attribute_value.attribute"/>
    <parameter key="70" value="LEGACY SUBSYSTEM AFFECTED.true.attribute_value.attribute"/>
    <parameter key="71" value="LEGACY SYSTEM AFFECTED.true.attribute_value.attribute"/>
    <parameter key="72" value="LTR Date.true.attribute_value.attribute"/>
    <parameter key="73" value="LTR Exp\. Recovery Date.true.attribute_value.attribute"/>
    <parameter key="74" value="LTR Mission Capable.true.attribute_value.attribute"/>
    <parameter key="75" value="LTR Pending Action.true.attribute_value.attribute"/>
    <parameter key="76" value="Laboratory Code.true.attribute_value.attribute"/>
    <parameter key="77" value="Laboratory Reference.true.attribute_value.attribute"/>
    <parameter key="78" value="Last Request Delay.true.attribute_value.attribute"/>
    <parameter key="79" value="Last update by.true.polynominal.attribute"/>
    <parameter key="80" value="Legacy Actions Details.true.attribute_value.attribute"/>
    <parameter key="81" value="Legacy Actions Taken.true.attribute_value.attribute"/>
    <parameter key="82" value="Legacy Cause of Failure.true.attribute_value.attribute"/>
    <parameter key="83" value="Legacy Effect of Failure.true.attribute_value.attribute"/>
    <parameter key="84" value="Legacy Equip/Soft Affected.true.attribute_value.attribute"/>
    <parameter key="85" value="Legacy Failure Details.true.attribute_value.attribute"/>
    <parameter key="86" value="Legacy Sub-Cause.true.attribute_value.attribute"/>
    <parameter key="87" value="Legacy Sub-System Affected.true.attribute_value.attribute"/>
    <parameter key="88" value="Legacy System Affected.true.attribute_value.attribute"/>
    <parameter key="89" value="Mission Capability.true.attribute_value.attribute"/>
    <parameter key="90" value="Mission Capable.true.polynominal.attribute"/>
    <parameter key="91" value="Next Action Date.true.attribute_value.attribute"/>
    <parameter key="92" value="Object Class.true.attribute_value.attribute"/>
    <parameter key="93" value="Object Name.true.attribute_value.attribute"/>
    <parameter key="94" value="Opened By.true.attribute_value.attribute"/>
    <parameter key="95" value="Organizations.true.attribute_value.attribute"/>
    <parameter key="96" value="Other stations affected.true.polynominal.attribute"/>
    <parameter key="97" value="Outage ID.true.polynominal.attribute"/>
    <parameter key="98" value="POC.true.polynominal.attribute"/>
    <parameter key="99" value="POC type.true.polynominal.attribute"/>
    <parameter key="100" value="Parameter List.true.attribute_value.attribute"/>
    <parameter key="101" value="Period Covered.true.attribute_value.attribute"/>
    <parameter key="102" value="Point of Contact update required.true.polynominal.attribute"/>
    <parameter key="103" value="RISM.true.polynominal.attribute"/>
    <parameter key="104" value="RISM string.true.attribute_value.attribute"/>
    <parameter key="105" value="RISMs.true.attribute_value.attribute"/>
    <parameter key="106" value="Report.true.polynominal.attribute"/>
    <parameter key="107" value="Report Summary.true.polynominal.attribute"/>
    <parameter key="108" value="Request participants.true.attribute_value.attribute"/>
    <parameter key="109" value="Resolved by.true.polynominal.attribute"/>
    <parameter key="110" value="Reviewed.true.attribute_value.attribute"/>
    <parameter key="111" value="Reviewed by.true.attribute_value.attribute"/>
    <parameter key="112" value="Rule description.true.attribute_value.attribute"/>
    <parameter key="113" value="Rule name.true.attribute_value.attribute"/>
    <parameter key="114" value="Satisfaction.true.polynominal.attribute"/>
    <parameter key="115" value="Sites.true.polynominal.attribute"/>
    <parameter key="116" value="Spare Parts.true.polynominal.attribute"/>
    <parameter key="117" value="Spare Parts Comments.true.polynominal.attribute"/>
    <parameter key="118" value="Start Date.true.date_time.attribute"/>
    <parameter key="119" value="Start Date Text Field.true.polynominal.attribute"/>
    <parameter key="120" value="Station Sub-System.true.attribute_value.attribute"/>
    <parameter key="121" value="Station reference.true.polynominal.attribute"/>
    <parameter key="122" value="Stations.true.polynominal.attribute"/>
    <parameter key="123" value="Submitted By.true.polynominal.attribute"/>
    <parameter key="124" value="Technology.true.polynominal.attribute"/>
    <parameter key="125" value="Time to first response.true.attribute_value.attribute"/>
    <parameter key="126" value="Time to resolution.true.attribute_value.attribute"/>
    <parameter key="127" value="Treaty number.true.polynominal.attribute"/>
    <parameter key="128" value="Urgency.true.attribute_value.attribute"/>
    <parameter key="129" value="nfeedTestFiled.true.attribute_value.attribute"/>
    <parameter key="130" value="prtool_group.true.polynominal.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="subprocess" compatibility="8.0.001" expanded="true" height="82" name="Select earliest start date" width="90" x="179" y="34">
    <process expanded="true">
    <operator activated="false" class="declare_missing_value" compatibility="8.0.001" expanded="true" height="82" name="Declare Missing Value" width="90" x="648" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Closed|Created|Diagnosis Ended|Diagnosis Started|End Date|Resolved|Start Date|Updated|Earliest_nominal|Earliest"/>
    <parameter key="mode" value="nominal"/>
    </operator>
    <operator activated="false" class="concurrency:loop_attributes" compatibility="8.0.001" expanded="true" height="68" name="Loop Attributes" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Closed|Start Date|Diagnosis Started|End Date|Diagnosis Ended|Resolved|Updated|Created"/>
    <process expanded="true">
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="8.0.001" expanded="true" height="103" name="Filter Examples (3)" width="90" x="246" y="34">
    <parameter key="invert_filter" value="true"/>
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Created.is_missing."/>
    <parameter key="filters_entry_key" value="Updated.is_missing."/>
    <parameter key="filters_entry_key" value="Resolved.is_missing."/>
    <parameter key="filters_entry_key" value="Closed.is_missing."/>
    <parameter key="filters_entry_key" value="Diagnosis Ended.is_missing."/>
    <parameter key="filters_entry_key" value="Diagnosis Started.is_missing."/>
    <parameter key="filters_entry_key" value="End Date.is_missing."/>
    <parameter key="filters_entry_key" value="Start Date.is_missing."/>
    </list>
    <description align="center" color="yellow" colored="true" width="126">Filter crappy examples</description>
    </operator>
    <operator activated="true" class="generate_aggregation" compatibility="8.0.001" expanded="true" height="82" name="Generate Aggregation" width="90" x="380" y="34">
    <parameter key="attribute_name" value="Earliest"/>
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Closed|Created|Diagnosis Ended|Diagnosis Started|Updated|Resolved|Start Date|End Date"/>
    <parameter key="aggregation_function" value="minimum"/>
    <parameter key="ignore_missing_attributes" value="true"/>
    <description align="center" color="yellow" colored="true" width="126">Generate earliest date</description>
    </operator>
    <operator activated="true" class="date_to_nominal" compatibility="8.0.001" expanded="true" height="82" name="Date to Nominal" width="90" x="514" y="34">
    <parameter key="attribute_name" value="Earliest"/>
    <parameter key="date_format" value="yyyy/MM"/>
    <parameter key="locale" value="English (United Kingdom)"/>
    <parameter key="keep_old_attribute" value="true"/>
    <description align="center" color="yellow" colored="true" width="126">Convert earliest date</description>
    </operator>
    <connect from_port="in 1" to_op="Filter Examples (3)" to_port="example set input"/>
    <connect from_op="Filter Examples (3)" from_port="example set output" to_op="Generate Aggregation" to_port="example set input"/>
    <connect from_op="Generate Aggregation" from_port="example set output" to_op="Date to Nominal" to_port="example set input"/>
    <connect from_op="Date to Nominal" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="aggregate" compatibility="8.0.001" expanded="true" height="82" name="Aggregate" width="90" x="313" y="34">
    <list key="aggregation_attributes">
    <parameter key="Legacy Cause of Failure" value="count"/>
    </list>
    <parameter key="group_by_attributes" value="Legacy Cause of Failure|Legacy Equip/Soft Affected|Earliest_nominal"/>
    <parameter key="ignore_missings" value="false"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="514" y="34">
    <parameter key="attribute" value="Earliest_nominal"/>
    </operator>
    <connect from_port="input 1" to_op="Read Excel" to_port="file"/>
    <connect from_op="Read Excel" from_port="output" to_op="Select earliest start date" to_port="in 1"/>
    <connect from_op="Select earliest start date" from_port="out 1" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Aggregate" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
  • Edin_KlapicEdin_Klapic Moderator, Employee, RMResearcher, Member Posts: 299 RM Data Scientist

    Hi @benicsakp,

     

    This following example process shows the usage of Loop Attributes and contains a Generate Attribute function where your IF - ELSE statement is included.

    More details on the available functions within Generate Attributes and how they are used can be found here

    image.png

     

     

     

     

    Best regards,

    Edin

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_sales_data" compatibility="8.0.001" expanded="true" height="68" name="Generate Sales Data" width="90" x="112" y="34"/>
    <operator activated="true" class="generate_attributes" compatibility="8.0.001" expanded="true" height="82" name="Generate Attributes" width="90" x="246" y="34">
    <list key="function_descriptions">
    <parameter key="Date 2" value="date_add(date,+transaction_id,DATE_UNIT_MONTH)"/>
    <parameter key="date" value="date"/>
    </list>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="380" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Date 2|date"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    <operator activated="true" class="concurrency:loop_attributes" compatibility="8.0.001" expanded="true" height="82" name="Loop Attributes" width="90" x="514" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="date|Date 2"/>
    <parameter key="reuse_results" value="true"/>
    <process expanded="true">
    <operator activated="true" class="generate_attributes" compatibility="8.0.001" expanded="true" height="82" name="Generate Attributes (3)" width="90" x="313" y="34">
    <list key="function_descriptions">
    <parameter key="new %{loop_attribute}" value="if (&#10;&#9;date_before(#{loop_attribute},date_parse_custom(&quot;01/01/2007&quot;,&quot;dd/MM/yyyy&quot;)),&#10;&#9;date_parse_custom(&quot;01/01/2000&quot;,&quot;dd/MM/yyyy&quot;),&#10;&#9;if(&#10;&#9;&#9;date_after(#{loop_attribute},date_parse_custom(&quot;01/01/2009&quot;,&quot;dd/MM/yyyy&quot;)),&#10;&#9;&#9;date_parse_custom(&quot;31/12/2999&quot;,&quot;dd/MM/yyyy&quot;),&#10;&#9;&#9;#{loop_attribute}&#10;&#9;)&#10;)"/>
    </list>
    </operator>
    <connect from_port="input 1" to_op="Generate Attributes (3)" to_port="example set input"/>
    <connect from_op="Generate Attributes (3)" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Generate Sales Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Loop Attributes" to_port="input 1"/>
    <connect from_op="Loop Attributes" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
Sign In or Register to comment.