Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"Problems with special characters in server logs"
Hey community,
I have an issue with a Logfile I want to read in with the LogFileSource - operator.
Here the XML-Code of the operator:
<operator name="xxx" class="LogFileSource">
<parameter key="config_file" value="/Users/xxx/ConfigurationFile.xml"/>
<parameter key="log_dir" value="/Users/xxx"/>
<parameter key="filetype_filter" value="ico|gif|jpg|jpeg|css|js|GIF|JPG|png|PNG|flash|xml|Xml|DropIT|Default|Login|axd|404|edit|robots|util|css|NotFound|Util|PlugIn|Sites|admin|Templates|templates|bmp|pdf"/>
<parameter key="only_HTTP_200" value="true"/>
<list key="browser_matcher">
</list>
<list key="os_matcher">
</list>
<list key="language_matcher">
</list>
</operator>
Here the line in the file which causes the problem:
2010-03-09 00:37:48 141.76.45.35 - W3SVC9 SEAREWS002 192.168.97.8 80 GET /NotFound.aspx 404;http://www.are360.com/sv/upplevelser/Skidakning-Alpint/Are/[glow=yellow,2,300]ctl00_ã≤∂Êr,w+HÕèæ30ûfi®8Ø?∏95?.ÜCMÖ^iâGy∞uˇÜTõd¡´•´∑˚Q‡⁄¬Ñ7ù˜Æ∆}∆ü,fm[/glow] 500 0 1596 281 16 HTTP/1.1 www.are360.com - - -
As you can see there are many special characters in this log. For the URI should be valid (at least the underlined part) these characters should not be allowed.
When reading in the file the Message viewer always writes the "WARNING: could not read line" message.
Can anyone help?
Regards,
Are
I have an issue with a Logfile I want to read in with the LogFileSource - operator.
Here the XML-Code of the operator:
<operator name="xxx" class="LogFileSource">
<parameter key="config_file" value="/Users/xxx/ConfigurationFile.xml"/>
<parameter key="log_dir" value="/Users/xxx"/>
<parameter key="filetype_filter" value="ico|gif|jpg|jpeg|css|js|GIF|JPG|png|PNG|flash|xml|Xml|DropIT|Default|Login|axd|404|edit|robots|util|css|NotFound|Util|PlugIn|Sites|admin|Templates|templates|bmp|pdf"/>
<parameter key="only_HTTP_200" value="true"/>
<list key="browser_matcher">
</list>
<list key="os_matcher">
</list>
<list key="language_matcher">
</list>
</operator>
Here the line in the file which causes the problem:
2010-03-09 00:37:48 141.76.45.35 - W3SVC9 SEAREWS002 192.168.97.8 80 GET /NotFound.aspx 404;http://www.are360.com/sv/upplevelser/Skidakning-Alpint/Are/[glow=yellow,2,300]ctl00_ã≤∂Êr,w+HÕèæ30ûfi®8Ø?∏95?.ÜCMÖ^iâGy∞uˇÜTõd¡´•´∑˚Q‡⁄¬Ñ7ù˜Æ∆}∆ü,fm[/glow] 500 0 1596 281 16 HTTP/1.1 www.are360.com - - -
As you can see there are many special characters in this log. For the URI should be valid (at least the underlined part) these characters should not be allowed.
When reading in the file the Message viewer always writes the "WARNING: could not read line" message.
Can anyone help?
Regards,
Are
Tagged:
0
Answers
if I understand you correctly the problem already is inside the log files? Then I would try to open them with a text editor, if they are still there, try open it with another encoding. Might be the log file is stored as UTF-16 and you are reading it as ANSI.
Greetings,
Sebastian
thanks for your help.
unfortunately it was not the solution to my problem
But due to the fact that these entries are less than 0.01% of all entries, we decided to just leave them out.
Having a closer look at them they all turned out to have status 404 which makes them not interesting for our analyses anyways.
Best,
Edin