Importing Adobe Audience Manager files
Hi
I am receiving a file from adobe audience manager, it is a fixed width file. I am not able to access the sevrer directly to extract the information for security reasons. The file is dropped in my folder. Which methode of reading the file in would you recommend?
The file contains customer demographic information which I need to access to be able to determine which marketing collateral will have the highest response rate for each individual.
Thanks
Best Answer
-
JEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
I opened your file in Notepad++ so I could look at the encodings of the separators. I can see it uses the control characters SOH, STX & ETX as separators so I simply used the ReadCSV operator and added these separators as a RegEx.
Here's a handy list of the control character codes, to use them with regex put \c followed by the letter. So in this case \cA, \cB & \cC or in full notation \cA|\cB|\cC
https://en.wikipedia.org/wiki/C0_and_C1_control_codes#SOH
See the below process as an example.
<?xml version="1.0" encoding="UTF-8"?><process version="7.4.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.4.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="7.4.000" expanded="true" height="68" name="Read CSV" width="90" x="246" y="187">
<parameter key="csv_file" value="C:\Users\think\Downloads\sample_set.csv"/>
<parameter key="column_separators" value="\cA|\cB|\cC"/>
<parameter key="use_quotes" value="false"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="att1.true.polynominal.attribute"/>
<parameter key="1" value="att2.true.real.attribute"/>
<parameter key="2" value="att3.true.attribute_value.attribute"/>
<parameter key="3" value="att4.true.integer.attribute"/>
<parameter key="4" value="att5.true.integer.attribute"/>
<parameter key="5" value="att6.true.integer.attribute"/>
<parameter key="6" value="att7.true.integer.attribute"/>
<parameter key="7" value="att8.true.integer.attribute"/>
<parameter key="8" value="att9.true.integer.attribute"/>
<parameter key="9" value="att10.true.integer.attribute"/>
<parameter key="10" value="att11.true.integer.attribute"/>
<parameter key="11" value="att12.true.integer.attribute"/>
<parameter key="12" value="att13.true.integer.attribute"/>
<parameter key="13" value="att14.true.attribute_value.attribute"/>
<parameter key="14" value="att15.true.polynominal.attribute"/>
<parameter key="15" value="att16.true.real.attribute"/>
<parameter key="16" value="att17.true.polynominal.attribute"/>
<parameter key="17" value="att18.true.polynominal.attribute"/>
<parameter key="18" value="att19.true.polynominal.attribute"/>
<parameter key="19" value="att20.true.polynominal.attribute"/>
<parameter key="20" value="att21.true.polynominal.attribute"/>
<parameter key="21" value="att22.true.integer.attribute"/>
<parameter key="22" value="att23.true.polynominal.attribute"/>
<parameter key="23" value="att24.true.integer.attribute"/>
<parameter key="24" value="att25.true.polynominal.attribute"/>
<parameter key="25" value="att26.true.polynominal.attribute"/>
<parameter key="26" value="att27.true.polynominal.attribute"/>
<parameter key="27" value="att28.true.polynominal.attribute"/>
</list>
</operator>
<connect from_op="Read CSV" from_port="output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>1
Answers
Hi Robin,
We don't have an operator that extracts tables and such content from PDF's available. There is a prototype but not sure if our developers want to release it. Maybe someone from Dev would chime in here.
Thanks Thomas
This is not a PDF but a fixed lenght flat file, I have attached an example.
There are no standard delimeters, however looking in hex there seems to be 03 as a field seperator.
PS: I have added the CSV file extension to enable the upload of the file, usually this does not come through with an extention.
Hi,
Looking at your file, I see some control characters and their combinations, which you could use as delimiters in the Split operator. I attach the process which I tried. Im not sure if thats what you need, but once you have example set, you can filter examples of interest.
Regards,
Edwin