RapidMiner

RapidMiner

Importing Adobe Audience Manager files

SOLVED
Contributor II

Importing Adobe Audience Manager files

Hi

I am receiving a file from adobe audience manager, it is a fixed width file. I am not able to access the sevrer directly to extract the information for security reasons. The file is dropped in my folder. Which methode of reading the file in would you recommend? 

 

The file contains customer demographic information which I need to access to be able to determine which marketing collateral will have the highest response rate for each individual. 

 

Thanks

5 REPLIES
Highlighted
Community Manager

Re: Importing Adobe Audience Manager files

Hi Robin,

 

We don't have an operator that extracts tables and such content from PDF's available. There is a prototype but not sure if our developers want to release it. Maybe someone from Dev would chime in here.

 

 

 

 

Regards,
T-Bone
Twitter: @neuralmarket
Contributor II

Re: Importing Adobe Audience Manager files

Thanks Thomas

 

This is not a PDF but a fixed lenght flat file, I have attached an example.

 

There are no standard delimeters, however looking in hex there seems to be 03 as a field seperator. 

 

PS: I have added the CSV file extension to enable the upload of the file, usually this does not come through with an extention. 

 

 

Attachments

ey
Contributor

Re: Importing Adobe Audience Manager files

Hi,

 

Looking at your file, I see some control characters and their combinations, which you could use as delimiters in the Split operator. I attach the process which I tried. Im not sure if thats what you need, but once you have example set, you can filter examples of interest.

 

Regards,

Edwin

Attachments

Elite III

Re: Importing Adobe Audience Manager files

I opened your file in Notepad++ so I could look at the encodings of the separators. I can see it uses the control characters SOH, STX & ETX as separators so I simply used the ReadCSV operator and added these separators as a RegEx. 

 

Here's a handy list of the control character codes, to use them with regex put \c followed by the letter.  So in this case \cA, \cB & \cC or in full notation \cA|\cB|\cC

https://en.wikipedia.org/wiki/C0_and_C1_control_codes#SOH 

See the below process as an example.

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.4.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.4.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_csv" compatibility="7.4.000" expanded="true" height="68" name="Read CSV" width="90" x="246" y="187">
        <parameter key="csv_file" value="C:\Users\think\Downloads\sample_set.csv"/>
        <parameter key="column_separators" value="\cA|\cB|\cC"/>
        <parameter key="use_quotes" value="false"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="att1.true.polynominal.attribute"/>
          <parameter key="1" value="att2.true.real.attribute"/>
          <parameter key="2" value="att3.true.attribute_value.attribute"/>
          <parameter key="3" value="att4.true.integer.attribute"/>
          <parameter key="4" value="att5.true.integer.attribute"/>
          <parameter key="5" value="att6.true.integer.attribute"/>
          <parameter key="6" value="att7.true.integer.attribute"/>
          <parameter key="7" value="att8.true.integer.attribute"/>
          <parameter key="8" value="att9.true.integer.attribute"/>
          <parameter key="9" value="att10.true.integer.attribute"/>
          <parameter key="10" value="att11.true.integer.attribute"/>
          <parameter key="11" value="att12.true.integer.attribute"/>
          <parameter key="12" value="att13.true.integer.attribute"/>
          <parameter key="13" value="att14.true.attribute_value.attribute"/>
          <parameter key="14" value="att15.true.polynominal.attribute"/>
          <parameter key="15" value="att16.true.real.attribute"/>
          <parameter key="16" value="att17.true.polynominal.attribute"/>
          <parameter key="17" value="att18.true.polynominal.attribute"/>
          <parameter key="18" value="att19.true.polynominal.attribute"/>
          <parameter key="19" value="att20.true.polynominal.attribute"/>
          <parameter key="20" value="att21.true.polynominal.attribute"/>
          <parameter key="21" value="att22.true.integer.attribute"/>
          <parameter key="22" value="att23.true.polynominal.attribute"/>
          <parameter key="23" value="att24.true.integer.attribute"/>
          <parameter key="24" value="att25.true.polynominal.attribute"/>
          <parameter key="25" value="att26.true.polynominal.attribute"/>
          <parameter key="26" value="att27.true.polynominal.attribute"/>
          <parameter key="27" value="att28.true.polynominal.attribute"/>
        </list>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
-- Training, Consulting, Sales in China, Hong Kong & Taiwan --
www.RapidMinerChina.com
Contributor II

Re: Importing Adobe Audience Manager files


JEdward wrote:

I opened your file in Notepad++ so I could look at the encodings of the separators. I can see it uses the control characters SOH, STX & ETX as separators so I simply used the ReadCSV operator and added these separators as a RegEx. 

 

Thank you! This is the solution I was looking for.