Options

Exporting results from the RapidMiner text mining extension to Excel

HansGiebenrathHansGiebenrath Member Posts: 2 Contributor I
edited November 2018 in Help
Hi everyone!

Unfotunately, I’m a total novice at doing text mining using RapidMiner 6.0.008 and am under a certain time pressure when it comes to finishing my text mining task. That’s why I even need help with a supposedly easy step in the aftermath of the actual text mining process.

To give you a brief description of the simple process I’ve been able to carry out so far, here’s an explanation of the text mining design.
For the main process, I first applied the operator “Process Documents from Files”. Through double-clicking on that operator, I specified the elements of the process “Vector Creation” that in my case were only the operators “Tokenize” and “Filter Stopwords”. Having done that, I ran the whole process.

Now, switching to the results window, the only thing that’s left to do is to export the results into an excel spreadsheet, in which further steps of analysis can be carried out.

So, if somebody helped me with the question of how to export these results into an Excel spreadsheet, I’d be grateful.

Answers

  • Options
    duyguduygu Member Posts: 12 Contributor II
    Hello,

    there is "Write Excel" operator which writes an ExampleSet to a Excel spreadsheet file. You can even specify the file format.

    I hope it helps :)

    Duygu.
  • Options
    HansGiebenrathHansGiebenrath Member Posts: 2 Contributor I
    Hi.

    Thanks for your reply, Duygu.

    But, it only partially contributes to the solution of my problem, which is due to the fact that I didn’t clarify properly what the entire problem is.

    So, here’s a more detailed description.

    As it is shown in the tutorial section “Writing the Labor-Negotiations data set into an Excel file” of the window describing the “Write Excel” operator that is being opened on the design screen if one clicks on the “Write Excel” operator, one has to connect the input port of the “Write Excel” operator with the output port of the “Retrieve” operator in order to write the data set coming from the “Retrieve” operator in an Excel file.
    Having established the connection between the operators mentioned, the “Write Excel” operator is said to be capable of receiving data from the “Retrieve” operator. The “Retrieve” operator, in turn, receives its data from a specified repository location.

    It’s at this stage where my problem is. RapidMiner refuses to carry out the operation/process of writing the data set coming from the “Retrieve” operator in an Excel file, because it says that the data set is no actual data but a process. The material that the process in question consists of is a compilation of pdf documents that, beforehand, I have already used within the “Process Documents from Files” operator to carry out a text mining tsak. As RapidMiner does not seem to allow to import documents that are in a pdf format into an existing repository and store it as data (a ‘format’ the “Retrieve” operator can read), it’s not possible for me to convert the pdf documents into data (e.g., data in the Excel format) that can be read and processed by the “Retrieve” operator. Thus, there’s no way for me to write the results of my text mining process into en Excel file using the “Retrieve” operator and the “Write Excel” operator.

    Hence, does anybody of you know how to convert the process composed of pdf documents into a format that RapidMiner allows to import into an existing repository and that the “Retrieve” operator can read so that writing an Excel file becomes feasible?

    I got stuck at this point and just don’t know how to proceed.  :'(
  • Options
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,995 RM Engineering
    Hi,

    a couple of things:

    1) You can store/retrieve any file via the "Open File" and "Write File" operators. Just change the resource type from file to repository blob entry. This can be handy in case you want to store a snapshot of some volatile data in your repository for future use.
    2) If you have documents (a port called 'doc') coming out of your operator at the end, you can change that to an Example set via the "Documents to Data" operator and vice versa.
    3) Store and Retrieve operators can store/retrieve any data that Studio produces (i.e. example sets or documents or models or ...) in the repository
    4) Your error description sounds like you did not store the results of your process in your repository, but rather the process itself. If you want to store some results, just add a "Store" operator right before the end of the process. If you already have the results in the results perspective, you can right-click the result header and select "Store XYZ in repository" without running the process again.

    Once you have an example set (for example via "Documents to Data" operator), you can write it into an Excel file via the "Write Excel" operator. Hope this helps!

    Regards,
    Marco
Sign In or Register to comment.