"No files showing up in

nascentnascent Member Posts: 5 Contributor II
edited May 23 in Help
Hi All,

Recently I've started using RapidMiner 5. I want to process a set of .doc files. I've a collection of .doc and .txt files in a folder. So I've added "Process Documents from Files" operator. While adding the directory, I tried to see the files inside it. But there is no file showing up inside the directory. I've also tried with only .txt file in a directory. Still, there is no files showing up in the directory. Please guide me.

Answers

  • nascentnascent Member Posts: 5 Contributor II
    Am I doing right? Any solution?
  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458   Unicorn
    Hello

    Could you post the XML of the process?

    regards

    Andrew
  • colocolo Member Posts: 236  Guru
    Hi nascent,

    the parameter "text directories" allows just selecting directories, not single files. So you either have to organize the important files in directories in a reasonable way or you have to use the "file pattern" parameter to read only certain documents from the specified directory. If you leave the default value as it is (*), all documents will be imported.

    Regards
    Matthias
  • nascentnascent Member Posts: 5 Contributor II
    Thanks for the response!

    @awchisholm
    The following is my xml report:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.009">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.009" expanded="true" name="Process">
        <process expanded="true" height="-20" width="-50">
          <operator activated="true" class="text:process_document_from_file" compatibility="5.1.002" expanded="true" height="76" name="Process Documents from Files" width="90" x="121" y="116">
            <list key="text_directories"/>
            <process expanded="true">
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Process Documents from Files" from_port="example set" to_port="result 1"/>
          <connect from_op="Process Documents from Files" from_port="word list" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>

    @colo
    I'm not accessing single file. For accessing single file, I tried Read Document and was able to read the content. But my case is to read all the documents inside a folder. The folder is having .doc, .xls, .xlsx and .txt
    I even tried to have a single .txt file inside a folder. This time default value (*) should work. But still, the file inside the folder is not recognized.
  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458   Unicorn
    Hello

    I made a few changes - it works for me - you will need to create two folders c:\temp\class1 and c:\temp\class2 to put your files in.


    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.008" expanded="true" name="Process">
        <process expanded="true" height="206" width="212">
          <operator activated="true" class="text:process_document_from_file" compatibility="5.1.002" expanded="true" height="76" name="Process Documents from Files" width="90" x="112" y="30">
            <list key="text_directories">
              <parameter key="class1" value="c:\temp\class1"/>
              <parameter key="class2" value="c:\temp\class2"/>
            </list>
            <parameter key="extract_text_only" value="false"/>
            <process expanded="true" height="809" width="852">
              <operator activated="true" class="text:tokenize" compatibility="5.1.002" expanded="true" height="60" name="Tokenize" width="90" x="253" y="30"/>
              <connect from_port="document" to_op="Tokenize" to_port="document"/>
              <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Process Documents from Files" from_port="example set" to_port="result 1"/>
          <connect from_op="Process Documents from Files" from_port="word list" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>

    regards

    Andrew
  • nascentnascent Member Posts: 5 Contributor II
    Thanks Andrew. But still things are not working for me. I'm sorry if I'm doing something wrong in the setup. Here's what I did:

    Created two folders with each having 1 .doc file. Added the "Process Documents from Files". Copied your xml and changed the key and value of parameter tag. Executed but not getting any result.

    I've got the following problem/warning message in the bottom:
    Mandatory input missing at port Process Documents from Files.document1

    2 fix options:
    Connect to Process Documents from Files.document
    Insert operator generating Document...

    Location:
    Process Documents from Files.document1

    The following is the log:
    Sep 2, 2011 10:21:21 AM INFO: No filename given for result file, using stdout for logging results!
    Sep 2, 2011 10:21:21 AM INFO: Process starts
    Sep 2, 2011 10:21:22 AM INFO: Loading initial data.
    Sep 2, 2011 10:21:22 AM INFO: Saving results.
    Sep 2, 2011 10:21:22 AM INFO: Process finished successfully after 0 s

    What I'm still surprised is, why I'm not seeing any files inside the folders? I clicked on "text directories" -> given a "class name" and selected a directory. In this stage, if I go further inside the folder, I see no files!!!
  • nascentnascent Member Posts: 5 Contributor II
    Matthias and Andrew, I've got the text miner running. Out of the suggested fixes, I've tried "Connect to Process Documents from Files.document" and it worked. Thanks a lot for your responses. Your responses gave me the hope to proceed.
  • restuarrestuar Member Posts: 8 Contributor II
    sorry to resurrect an old problem but i have experienced the same problem and have tried all the fixes suggested but it still does not work. i noticed though that it works on some machines and in some it does not. the issue is that the location of the files to be read does not stay permanently in the edit list. if you add operators and go back, the edit list is reset to zero. does this go back to an installation problem?
  • pierrotpierrot Member Posts: 1 Contributor I
    I got the same problem as restuar. (actually everything is fine on a window machine but it does not work on a Mac...) Do you have any idea on how to solve the problem described by restuar ?


    Sorry, this was a silly remark... The work around consisting in editing directly XML (see post from awchisholm) works ! I only forgot to validate my XML modifications...

    Once again, sorry !
Sign In or Register to comment.