Options

"Problems with special characters in server logs"

AreAre Member Posts: 5 Contributor II
edited May 2019 in Help
Hey community,

I have an issue with a Logfile I want to read in with the LogFileSource - operator.

Here the XML-Code of the operator:

    <operator name="xxx" class="LogFileSource">
                <parameter key="config_file" value="/Users/xxx/ConfigurationFile.xml"/>
                <parameter key="log_dir" value="/Users/xxx"/>
                <parameter key="filetype_filter" value="ico|gif|jpg|jpeg|css|js|GIF|JPG|png|PNG|flash|xml|Xml|DropIT|Default|Login|axd|404|edit|robots|util|css|NotFound|Util|PlugIn|Sites|admin|Templates|templates|bmp|pdf"/>
                <parameter key="only_HTTP_200" value="true"/>
                <list key="browser_matcher">
                </list>
                <list key="os_matcher">
                </list>
                <list key="language_matcher">
                </list>
            </operator>




Here the line in the file which causes the problem:

2010-03-09 00:37:48 141.76.45.35    - W3SVC9 SEAREWS002 192.168.97.8 80 GET /NotFound.aspx 404;http://www.are360.com/sv/upplevelser/Skidakning-Alpint/Are/[glow=yellow,2,300]ctl00_ã≤&#6;∂Êr,&#26;w+HÕè&#21;&#127;æ30ûfi®&#21;8Ø?∏95&#29;?.ÜCMÖ^iâGy&#28;∞uˇÜTõd¡´•´&#6;∑˚Q‡⁄¬Ñ7ù˜Æ∆}∆&#3;&#23;ü,f&#17;m[/glow] 500 0 1596 281 16 HTTP/1.1 www.are360.com - - -




As you can see there are many special characters in this log. For the URI should be valid (at least the underlined part) these characters should not be allowed.
When reading in the file the Message viewer always writes the "WARNING: could not read line" message.

Can anyone help?

Regards,
Are
Tagged:

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    if I understand you correctly the problem already is inside the log files? Then I would try to open them with a text editor, if they are still there, try open it with another encoding. Might be the log file is stored as UTF-16 and you are reading it as ANSI.

    Greetings,
    Sebastian
  • Options
    AreAre Member Posts: 5 Contributor II
    Hi Sebastian,

    thanks for your help.

    unfortunately it was not the solution to my problem :(

    But due to the fact that these entries are less than 0.01% of all entries, we decided to just leave them out.

    Having a closer look at them they all turned out to have status 404 which makes them not interesting for our analyses anyways.

    Best,

    Edin
Sign In or Register to comment.