Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Importing Adobe Audience Manager files

robinrobin Member Posts: 100 Guru
edited November 2018 in Help

Hi

I am receiving a file from adobe audience manager, it is a fixed width file. I am not able to access the sevrer directly to extract the information for security reasons. The file is dropped in my folder. Which methode of reading the file in would you recommend? 

 

The file contains customer demographic information which I need to access to be able to determine which marketing collateral will have the highest response rate for each individual. 

 

Thanks

Best Answer

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Solution Accepted

    I opened your file in Notepad++ so I could look at the encodings of the separators. I can see it uses the control characters SOH, STX & ETX as separators so I simply used the ReadCSV operator and added these separators as a RegEx. 

     

    Here's a handy list of the control character codes, to use them with regex put \c followed by the letter.  So in this case \cA, \cB & \cC or in full notation \cA|\cB|\cC

    https://en.wikipedia.org/wiki/C0_and_C1_control_codes#SOH 

    See the below process as an example.

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.4.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.4.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="7.4.000" expanded="true" height="68" name="Read CSV" width="90" x="246" y="187">
    <parameter key="csv_file" value="C:\Users\think\Downloads\sample_set.csv"/>
    <parameter key="column_separators" value="\cA|\cB|\cC"/>
    <parameter key="use_quotes" value="false"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="att1.true.polynominal.attribute"/>
    <parameter key="1" value="att2.true.real.attribute"/>
    <parameter key="2" value="att3.true.attribute_value.attribute"/>
    <parameter key="3" value="att4.true.integer.attribute"/>
    <parameter key="4" value="att5.true.integer.attribute"/>
    <parameter key="5" value="att6.true.integer.attribute"/>
    <parameter key="6" value="att7.true.integer.attribute"/>
    <parameter key="7" value="att8.true.integer.attribute"/>
    <parameter key="8" value="att9.true.integer.attribute"/>
    <parameter key="9" value="att10.true.integer.attribute"/>
    <parameter key="10" value="att11.true.integer.attribute"/>
    <parameter key="11" value="att12.true.integer.attribute"/>
    <parameter key="12" value="att13.true.integer.attribute"/>
    <parameter key="13" value="att14.true.attribute_value.attribute"/>
    <parameter key="14" value="att15.true.polynominal.attribute"/>
    <parameter key="15" value="att16.true.real.attribute"/>
    <parameter key="16" value="att17.true.polynominal.attribute"/>
    <parameter key="17" value="att18.true.polynominal.attribute"/>
    <parameter key="18" value="att19.true.polynominal.attribute"/>
    <parameter key="19" value="att20.true.polynominal.attribute"/>
    <parameter key="20" value="att21.true.polynominal.attribute"/>
    <parameter key="21" value="att22.true.integer.attribute"/>
    <parameter key="22" value="att23.true.polynominal.attribute"/>
    <parameter key="23" value="att24.true.integer.attribute"/>
    <parameter key="24" value="att25.true.polynominal.attribute"/>
    <parameter key="25" value="att26.true.polynominal.attribute"/>
    <parameter key="26" value="att27.true.polynominal.attribute"/>
    <parameter key="27" value="att28.true.polynominal.attribute"/>
    </list>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Hi Robin,

     

    We don't have an operator that extracts tables and such content from PDF's available. There is a prototype but not sure if our developers want to release it. Maybe someone from Dev would chime in here.

     

     

     

     

  • robinrobin Member Posts: 100 Guru

    Thanks Thomas

     

    This is not a PDF but a fixed lenght flat file, I have attached an example.

     

    There are no standard delimeters, however looking in hex there seems to be 03 as a field seperator. 

     

    PS: I have added the CSV file extension to enable the upload of the file, usually this does not come through with an extention. 

     

     

  • ey1ey1 Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 21 RM Research

    Hi,

     

    Looking at your file, I see some control characters and their combinations, which you could use as delimiters in the Split operator. I attach the process which I tried. Im not sure if thats what you need, but once you have example set, you can filter examples of interest.

     

    Regards,

    Edwin

  • robinrobin Member Posts: 100 Guru

    @JEdward wrote:

    I opened your file in Notepad++ so I could look at the encodings of the separators. I can see it uses the control characters SOH, STX & ETX as separators so I simply used the ReadCSV operator and added these separators as a RegEx. 

     

    Thank you! This is the solution I was looking for. 

     

Sign In or Register to comment.