RapidMiner

Learner III ccricha
Learner III

Unable to read file from disk using Execute Python operator

Hello, I am trying to get a better understanding of how RM Server can interact with its environment. I wrote a log file using an Execute Python operator within RM to create a test log file. I am now trying to use a different Execute Python operator to read the log file from disk (Linux) and then use a Store operator to store this data in the remote repository. All of this if running on the Linux RM Server.

 

What ends up happening is that RM wries an empty dataset. When I look in the server.log file I see multiple lines of this:

 

WARNING [com.rapidminer.operator.Operator] (scheduledprocess_1503585018370) Read CSV: Could not parse line 0 in input: com.rapidminer.tools.CSVParseException: Value quotes not closed at position 0. Last characters read: ,"

 

Here is my overall process:

Overall processOverall processPython codePython code

Is the data frame not being constructed properly? It appears that the Execute Python process is writing a temporary CSV file somewhere that RM is trying to read and is failing to do so.

3 REPLIES
Community Manager Community Manager
Community Manager

Re: Unable to read file from disk using Execute Python operator

Hi @ccricha - good to have you here.  I guess my first question is why are you using python scripts to read/write log files?  There are very nice, easy-to-use operators built in to RapidMiner that will do this for you:

 

Screen Shot 2017-08-24 at 1.35.31 PM.png

 

I have used these operators in RM processes running on an Ubuntu server running RM Server with no problems at all.  Give it a try?

 

Scott

Scott Genzer
Senior Community Manager
RapidMiner, Inc.
Learner III ccricha
Learner III
Solution

Re: Unable to read file from disk using Execute Python operator

Hello Scott, thanks for your reply. This is mainly in case I need to do use a more detailed python process for more complex data transformations, needing to read/write to a database within that script, etc. and that I want to use python's logging module to log to disk. There are some cases were detailed logging is necessary and RM is not going to be a good tool for doing that. I know that I can successfully log to disk from a script in an ExecutePython operator, and I was finally able to read the file using the "Read Document" operator and then store it to the repository. It just seemed to me though that this should still work as it is returning a DataFrame object, but instead throws a CSVParseException. Anyway, I will look at using "Read Document" instead for reading and analyzing log files in the future.

 

Thanks

 

Community Manager Community Manager
Community Manager

Re: Unable to read file from disk using Execute Python operator

hi @ccricha - ok that makes sense and yes, Read Document will do a much better job in that it will just grab your text file instead of CSV which is looking for a structure.  Good luck.

 

Scott

Scott Genzer
Senior Community Manager
RapidMiner, Inc.
Polls
How can RapidMiner increase participation in our new competitions?
Twitter Feed