Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"newbie: Excel to text"

shilaskishilaski Member Posts: 8 Contributor II
edited May 2019 in Help
Here is my project scope.  I have an excel spreadsheet of warranty claims with around 9100 entries.  One of the columns within the spreadsheet contains a comment section.  This section is where a tech will write what was wrong with the vehicle.  These sections are what I want to text mine.

I have figured out how to load the sheet and run it thru the filter so I am just concentrating on data that I am interested in.  Now I am guessing that I need to use the text plugin tool to create word vectors (please tell me if I am wrong).  It appears that the textinput operator expects an exampleset as it's input from a directory.  My question is how to correctly load the textinput operator.  Of couse I could be completely wrong...maybe there is a better way to do this?

Here is what I have

<operator name="Root" class="Process" expanded="yes">
    <operator name="ExcelExampleSource" class="ExcelExampleSource">
        <parameter key="excel_file" value="C:\Documents and Settings\shilaski.TRICO29\Desktop\50percentWarranty\Conceptsheet_Claims_short.xls"/>
        <parameter key="first_row_as_names" value="true"/>
    </operator>
    <operator name="AttributeFilter" class="AttributeFilter">
        <parameter key="condition_class" value="attribute_name_filter"/>
        <parameter key="parameter_string" value="comments"/>
    </operator>
    <operator name="Nominal2String" class="Nominal2String">
    </operator>
    <operator name="TextInput" class="TextInput" expanded="yes">
        <parameter key="create_text_visualizer" value="true"/>
        <parameter key="id_attribute_type" value="long"/>
        <parameter key="use_content_attributes" value="true"/>
        <operator name="StringTokenizer" class="StringTokenizer">
        </operator>
    </operator>
</operator>

Answers

  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 295 RM Product Management
    Hi Stacy,

    in principal you are right. You simply have to use the [tt]StringTextInput[/tt] operator instead of the [tt]TextInput[/tt]. The first one will load the texts from strings form an already present example set. The latter one will load the texts from files directly.

    Hope that helps,
    regards,
    Tobias
  • shilaskishilaski Member Posts: 8 Contributor II
    Alright...Here is where I am at..

    <operator name="Root" class="Process" expanded="yes">
        <description text="#ylt#h3#ygt#Finding important terms#ylt#/h3#ygt##ylt#p#ygt#This experiments shows how to find terms that are characteristic for a set of texts#ylt#/p#ygt#. #ylt#p#ygt##ylt#b#ygt#Hint:#ylt#/b#ygt#In the interactive keyword selection, click on weight to sort the terms by their relevance to the class specified in the CorpusBasedWeighting operator.#ylt#/p#ygt#"/>
        <operator name="ExcelExampleSource" class="ExcelExampleSource">
            <parameter key="datamanagement" value="long_array"/>
            <parameter key="excel_file" value="C:\Documents and Settings\shilaski.TRICO29\Desktop\50percentWarranty\Conceptsheet_Claims_short.xls"/>
        </operator>
        <operator name="AttributeFilter" class="AttributeFilter">
            <parameter key="condition_class" value="attribute_name_filter"/>
            <parameter key="parameter_string" value="comments"/>
        </operator>
        <operator name="Nominal2String" class="Nominal2String">
        </operator>
        <operator name="StringTextInput" class="StringTextInput" expanded="yes">
            <parameter key="default_content_language" value="english"/>
            <parameter key="vector_creation" value="TermOccurrences"/>
            <operator name="StringTokenizer" class="StringTokenizer">
            </operator>
            <operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
            </operator>
            <operator name="TokenLengthFilter" class="TokenLengthFilter">
                <parameter key="min_chars" value="3"/>
            </operator>
            <operator name="PorterStemmer" class="PorterStemmer">
            </operator>
        </operator>
        <operator name="CorpusBasedWeighting" class="CorpusBasedWeighting">
            <parameter key="class_to_characterize" value="graphics"/>
        </operator>
        <operator name="InteractiveAttributeWeighting" class="InteractiveAttributeWeighting">
        </operator>
    </operator>

    Problem now is that I keep on getting an error

    Error in: StringTextInput (StringTextInput) The input example set does not contain any attributes with value type string. Some operators require example sets with attributes of a specific value type. Please refer to the documentation of the used operators for further details.
  • shilaskishilaski Member Posts: 8 Contributor II
    figured it out.  Somehow I missed called out the parameter for which column I wanted.  Had it called out before,  but I supposed I should have troubleshot it before posting to the forums.

    Thanks
Sign In or Register to comment.