Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

citation parsing

neda65neda65 Member Posts: 4 Contributor I
edited November 2018 in Help

hi

I would like to citation  analysis and of each string citation the name of author, title, date, etc. to pay.
But I do not know from what operators and how do I use.
I got used to extract information  operator, but only the first string is extracted from the file.
Help me please

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,527 RM Data Scientist

    Hi neda,

     

    can you get the citation as bibtex and use Read BibTex?

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • neda65neda65 Member Posts: 4 Contributor I

    I have a file with this content

    (<author> A. Cau, R. Kuiper, and W.-P. de Roever. </author> <title> Formalising Dijkstra's development strategy within Stark's formalism. </title> <editor> In C. B. Jones, R. C. Shaw, and T. Denvir, editors, </editor> <booktitle> Proc. 5th. BCS-FACS Refinement Workshop, </booktitle> <date> 1992. </date><author> M. Kitsuregawa, H. Tanaka, and T. Moto-oka. )

    Of course this is a big file and i have use svm and crf for Assessment file And to compare the two methods together.
    But do not know how to do that !!!!!

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,527 RM Data Scientist

    Hi Neda,

     

    have you tried to read it in with Read XML?


    ~martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • neda65neda65 Member Posts: 4 Contributor I

    hi

    type of my file is txt and i dont know how set operate read xml.

    i convert my file with html and set xpath for attribute:

    //author

    //title

    //date

    and ....

    but xpath for exampel????

    then set xml how i do???

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,527 RM Data Scientist

    Dear Neda,

     

    your file format is very similar to XML. if you replace the first ( with <xml> and the last ) with </xml> it might be possible to read it in. However your posted file has the problem, that there are two authors. This is of course a bit strange.

     

    Another way to read it in would be a parsing in RM. Please have a look at the attached process. You can built similar things with the Process Documents from Files operator to parse all your files.

     

    ~

    Martin

    Spoiler
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.1.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="6.5.000" expanded="true" height="68" name="Create Document" width="90" x="112" y="85">
    <parameter key="text" value="(&lt;author&gt; A. Cau, R. Kuiper, and W.-P. de Roever. &lt;/author&gt; &lt;title&gt; Formalising Dijkstra's development strategy within Stark's formalism. &lt;/title&gt; &lt;editor&gt; In C. B. Jones, R. C. Shaw, and T. Denvir, editors, &lt;/editor&gt; &lt;booktitle&gt; Proc. 5th. BCS-FACS Refinement Workshop, &lt;/booktitle&gt; &lt;date&gt; 1992. &lt;/date&gt;&lt;author&gt; M. Kitsuregawa, H. Tanaka, and T. Moto-oka. )"/>
    </operator>
    <operator activated="true" class="text:extract_information" compatibility="6.5.000" expanded="true" height="68" name="Extract Information" width="90" x="246" y="85">
    <parameter key="query_type" value="Regular Expression"/>
    <list key="string_machting_queries"/>
    <list key="regular_expression_queries">
    <parameter key="date" value="&lt;date&gt;(.*)&lt;/date&gt;"/>
    </list>
    <list key="regular_region_queries">
    <parameter key="date" value="&lt;date&gt;.&lt;/date&gt;"/>
    </list>
    <list key="xpath_queries"/>
    <list key="namespaces"/>
    <list key="index_queries"/>
    <list key="jsonpath_queries"/>
    </operator>
    <operator activated="true" class="text:documents_to_data" compatibility="6.5.000" expanded="true" height="82" name="Documents to Data" width="90" x="447" y="85">
    <parameter key="text_attribute" value="text"/>
    </operator>
    <connect from_op="Create Document" from_port="output" to_op="Extract Information" to_port="document"/>
    <connect from_op="Extract Information" from_port="document" to_op="Documents to Data" to_port="documents 1"/>
    <connect from_op="Documents to Data" from_port="example set" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • neda65neda65 Member Posts: 4 Contributor I

    hi

    i undrestanf for citation parsing should use of svm^structer. but this operator is'nt on list of operator. do you know how add this alogoritm to rapidminer?

    tanks

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,527 RM Data Scientist

    Dear Neda,

     

    are you referring to a struct SVM: https://www.cs.cornell.edu/people/tj/svm_light/svm_struct.html ? If so you really take out the big guns first. The struct svm is possibly the most complex solution you could think about.

     

    I know that Katharina Morik from the CS chair in dortmund had some eye on the topic, but I think there is no integration into RM yet. You would need to integrate it using Java yourself.

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.