💬0 Comments

🔥0 Discussions

👤0 Members

🔌0 Online

Options

Corrupted import data

ForestG

ForestG Member Posts: 4

Contributor I

April 2014 edited November 2018 in Help

Hello there,

My problem is, that whenever I try to apply the learned Naive Bayes model, the original data from the export.csv produce corrupt records. I have linked a picture of the problem, with the specific blocks and results.
Always the same column goes bad (UserReplyTime, which is used as label), and the Question marks and bad numbers appear at the same position. I am using windows1250 encoding. Thank you for your time,

https://www.dropbox.com/s/y7syzl5xti4ey4q/rapidminer.png

update: Narrowing down the problem: if I copy the ReadCSV "export.csv" object manually, and connect the copy instead of the original's Multiply to the "unl" node, the original export.csv "goes back to normal". However, anything I connect to the said node produces the same problem with even more "?" for the said data... so I am basicly still at the same place.

update2: the same problem happens, if I try to use for example, the k-NN model. The same column has bad values (but in this case, the question marks disappear, but some of the 0's change to random numbers, like 3,4, and some other bigger bumbers, like 303933 changes back to 0. Please, somebody help me with this, I am really stuck.

udpate3: I've changed the picture. (by accident)

Forest

0

Answers

Options
fras Member Posts: 93 Contributor II

April 2014

As far as I can see from your pictures you take a sample of data to train the model and
different sample of data to apply your model. You have no clue at all what your algorithm has learned.
So applying such a model may give you such results. I would strongly suggest to use a validation operator together
with an performance operator to see whether you are on the right way. You may also post your process as XML the
next time.

0
Options
ForestG Member Posts: 4 Contributor I

April 2014

Dear fras,

Thank you for your reply! I did, as you said, now I try to use a Validation process. However, the same problem resist: if I connect the store to the Validation process, my original data seems corrupted at the "Eredeti Tábla" multiplyer. It is odd, that only the Labelled attribute goes corrupt. (inside the Valdiation I use naive Bayes, without Laplace correction)

https://www.dropbox.com/s/y7syzl5xti4ey4q/rapidminer.png

xml:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve mails_saját" width="90" x="45" y="75">
<parameter key="repository_entry" value="//mail01/mails_saját"/>
</operator>
<operator activated="true" class="multiply" compatibility="5.3.015" expanded="true" height="94" name="Eredeti tábla" width="90" x="179" y="75"/>
<operator activated="true" class="discretize_by_user_specification" compatibility="5.3.015" expanded="true" height="94" name="ReplyTimeDiszk" width="90" x="45" y="255">
<parameter key="create_view" value="true"/>
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="regular_expression" value="UserReplyTime"/>
<parameter key="include_special_attributes" value="true"/>
<list key="classes">
<parameter key="1( x > 1hét)" value="Infinity"/>
<parameter key="2(1hét> x > 1nap)" value="604800.0"/>
<parameter key="3(1nap > x > 12 óra)" value="86400.0"/>
<parameter key="4(12óra > x > 1óra)" value="43200.0"/>
<parameter key="5(1óra > x)" value="3600.0"/>
<parameter key="6(nincsvalasz)" value="0.0"/>
</list>
</operator>
<operator activated="true" class="discretize_by_user_specification" compatibility="5.3.015" expanded="true" height="94" name="SzóSzámDiszk" width="90" x="179" y="255">
<parameter key="create_view" value="true"/>
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="regular_expression" value=".*"/>
<parameter key="use_except_expression" value="true"/>
<parameter key="except_regular_expression" value="|ID|UserReplyTime|ThreadID|MailCount|UserMalilNumber|ThreadStartYear|ThreadStartDate|"/>
<list key="classes">
<parameter key="Van" value="Infinity"/>
<parameter key="Nincs" value="0.5"/>
</list>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.3.015" expanded="true" height="76" name="Select Szavak" width="90" x="313" y="255">
<description>Szavak Szelekt
</description>
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="ID|UserReplyTime|1|10|102ben|1111|13|13an|13ig|16|1800tól|2|2000kor|2359kor|23ai|290|3|36|390|3an|4ig|55000ft|56|5ig|70|703|710|812|814ben|8kor|9é|ad|adn|ahoy|ajánlan|ajĂˇnljĂˇt|akar|alapjĂˇ|alkal|alább|alĂˇbb|am|amenny|aminĂ©l|amúgy|amĂşgy|and|andrás|ann|asztal|attil|auth|autó|baj|bajesés|basszus|bejáratú|bejĂˇratĂş|bekért|berendezet|beszer|beszélt|beírhatna|bizottság|biztos|bocs|boll|btw|bul|bérlet|bútor|bĂ©rlet|bĂştor|bĹ‘vĂtĂ©s|bővítés|c|castl|cirkó|cirkĂł|commit|corequotepathfals|corv|csapatĂˇ|csoportvezető|cucc|d|diffmnemonicprefixfals|dolg|dolog|don|dáv|díj|dĂj|egyed|egyelĹ‘|egyelő|elkurvázv|elszámolás|elszĂˇmolĂˇs|elutazás|elég|elĹ‘|ember|emberer|említett|epic|esetleg|est|eszlelt|ezer|feküdn|fel|felpofozn|felszerelv|feltölten|feltĂ¶lten|fizess|fiú|fiĂş|flott|fogadja|fogadjĂˇ|fogyasztás|fogyasztĂˇs|for|fores|forwardol|from|fáj|fájl|fájlkorl|fél|félév|föl|fürdőszob|fĂĽrdĹ‘szobĂˇ|fĂˇjlbĂłl|fĂˇjlkorlĂˇ|fĂˇjlt|fĂˇjt|fĹ±tĂ©st|fűtés|garantál|generált|generĂˇlt|gi|gitsch|glass|gond|gondol|gondoltál|goretity|gyer|gyert|gyűlés|gács|gárd|gárdánkivüli|gép|gĂ©pek|gĂ©pem|h|hagyját|hal|halihĂł|hall|hallgat|hallgatn|halottas|hangos|hangosabb|hav|haver|hell|helló|hellĂł|hely|helyett|hi|hib|hibatl|hibaüzenet|hibaĂĽzenet|hibĂˇ|hiszt|https|httpsdocsgooglecomfiled0b9tqocohdlw8amzmcfu4btv0eveed|httpsnek|httpswwwyoutubecomwatchv3f1aalubx7|hátmasszás|ház|hétvég|hí|hĂ©ten|hĂ©tvĂ©gĂ©n|id|idĹ‘pon|idősebb|igaz|igényes|igĂ©nyes|inbox|infó|ingy|inn|javat|javítot|jelentkezet|jelentkezz|jelentkezés|jelentkezĂ©sĂĽ|jelszó|jelszĂł|jános|jó|jönn|jĂ¶het|jĂł|jĂłval|jĂˇtszan|kap|kapcsol|kb|ked|kedvet|kellen|keres|keresĂĽ|kezdőd|kiadás|kiadó|kiadĂł|kiadĂˇs|kics|kifel|kihagy|kihajozas|kimar|kitalál|klónozn|klĂłnozn|kocs|kok|kolleg|kollégium|kopy|kor|kozm|kulcs|kurva|kurvajó|kuth|kén|kér|kért|készíten|kéthetent|költség|kösz|közösség|kül|kĂ©ne|kĂ©t|kĂ¶ltsĂ©g|kĂ¶szi|kĂ¶vetkezĹ‘|kĂĽlĂ¶n|lakás|lakĂˇs|lanosch|laptop|legfeljebb|legkésőbb|lehetĹ‘sĂ©g|lehetőleg|lehetőség|len|lenézn|lenézés|lesz|letudnĂˇt|levándorol|lezár|lill|lovag|lovagter|létrehozt|lógn|lĂ©trehozt|mad|madworl|marh|max|mb|meccs|meghallgat|megtesz|megváltozot|megy|mek|meleg|menn|mennĂ©|metróállomás|metrĂłĂˇllomĂˇstĂłl|miat|minden|minĹ‘sĂ©gĹ±|minőségű|mondanivaló|mondt|monsteres|mulv|munka|munkĂˇ|más|méret|mérőóra|mĂ©g|mĂ©ret|mĂ©rĹ‘ĂłrĂˇk|mĂˇtĂ©|mĹ±kĂ¶di|működ|nap|negy|nek|nekt|nemes|nemesmĂˇtĂ©|nemsajnos|november|nyuszilist|nyuszitábor|nĂ©lkĂĽl|of|osz|pattan|perc|piciny|pillanat|pisztáciás|pls|plusz|pon|pont|ppt|pptt|pr|privát|privĂˇt|profil|publikus|pupp|push|pár|pénz|pĂ©ntek|pĂłtolnĂˇt|pĂˇndi|pĂˇr|redbull|remél|rendezn|rendezĹ‘|rendezĹ‘gĂˇrd|repository|rezs|robbants|rozcsĂł|rákócz|ráér|rég|rész|rĂˇ|sajnos|sajĂˇ|saturday|sch|seggmasszás|semm|sen|senior|sikeres|simony|simonyis|srác|sshnál|sshnĂˇl|sti|stílus|szamitan|szcstől|szedelény|szerd|szerencs|szeret|szeretn|szeretnét|szeretnĂ©|szeretsz|szerint|szi|sziaszt|szigorú|szintĂ©|szkén|szkĂ©nĂ©|szo|szociológi|szomabt|szomb|szombat|szomsze|sztárvendég|szép|szólt|szĂ©p|tamĂˇs|tanulmány|tar|taz|teambobc|teljes|természetes|tervezet|tesz|tett|thai|thre|ti|tipp|tisztelet|****|tom|tovĂˇbbiakrĂłl|tranz|tud|tudt|tábor|tábordíj|tágas|többi|tökéletes|töltsét|történ|túrázn|tüdő|tĂ¶bb|tĂ¶rtĂ©ni|tĂˇbor|tĂˇgas|tőcsd|ugyanit|utc|utcĂˇ|v|valam|vasút|vel|velet|venn|visszacsatolt|visz|viz|von|várl|wcpump|wi|worl|z|zen|zendí|állít|álló|ár|érdekel|érdekesség|írj|írt|összlealjasodás|üdv|üdvözl|Ă©n|Ă©rdekel|Ă©rdekessĂ©g|Ă©rni|Ă©s|Ăr|Ărj|Ărju|ĂĽdv|Ăśdv|ĂśdvĂ¶zlet|ĂˇllĂtja|ĂˇllĂł|Ăˇr|Ă‰n"/>
</operator>
<operator activated="true" class="set_role" compatibility="5.3.015" expanded="true" height="76" name="LabelBeállítás" width="90" x="447" y="120">
<parameter key="attribute_name" value="UserReplyTime"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="x_validation" compatibility="5.1.002" expanded="true" height="112" name="Validation" width="90" x="581" y="120">
<description>A cross validation including a linear regression.</description>
<process expanded="true">
<operator activated="true" class="naive_bayes" compatibility="5.3.015" expanded="true" height="76" name="Naive Bayes" width="90" x="179" y="30">
<parameter key="laplace_correction" value="false"/>
</operator>
<connect from_port="training" to_op="Naive Bayes" to_port="training set"/>
<connect from_op="Naive Bayes" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="5.3.015" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="5.3.015" expanded="true" height="76" name="Performance" width="90" x="179" y="30"/>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve mails_saját" from_port="output" to_op="Eredeti tábla" to_port="input"/>
<connect from_op="Eredeti tábla" from_port="output 1" to_port="result 1"/>
<connect from_op="Eredeti tábla" from_port="output 2" to_op="ReplyTimeDiszk" to_port="example set input"/>
<connect from_op="ReplyTimeDiszk" from_port="example set output" to_op="SzóSzámDiszk" to_port="example set input"/>
<connect from_op="SzóSzámDiszk" from_port="example set output" to_op="Select Szavak" to_port="example set input"/>
<connect from_op="Select Szavak" from_port="example set output" to_op="LabelBeállítás" to_port="example set input"/>
<connect from_op="LabelBeállítás" from_port="example set output" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="model" to_port="result 2"/>
<connect from_op="Validation" from_port="averagable 1" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>

Any idea?

0
Options
ForestG Member Posts: 4 Contributor I

April 2014

+1 thing: What I discovered, is that if I set any of my Rows as a "Label", it produces the same error. (and the original the shows correct values.

0

Sign In or Register to comment.