citation parsing

neda65neda65 Member Posts: 4 Contributor I
edited November 2018 in Help

hi

I would like to citation  analysis and of each string citation the name of author, title, date, etc. to pay.
But I do not know from what operators and how do I use.
I got used to extract information  operator, but only the first string is extracted from the file.
Help me please

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi neda,

     

    can you get the citation as bibtex and use Read BibTex?

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • neda65neda65 Member Posts: 4 Contributor I

    I have a file with this content

    (<author> A. Cau, R. Kuiper, and W.-P. de Roever. </author> <title> Formalising Dijkstra's development strategy within Stark's formalism. </title> <editor> In C. B. Jones, R. C. Shaw, and T. Denvir, editors, </editor> <booktitle> Proc. 5th. BCS-FACS Refinement Workshop, </booktitle> <date> 1992. </date><author> M. Kitsuregawa, H. Tanaka, and T. Moto-oka. )

    Of course this is a big file and i have use svm and crf for Assessment file And to compare the two methods together.
    But do not know how to do that !!!!!

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi Neda,

     

    have you tried to read it in with Read XML?


    ~martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • neda65neda65 Member Posts: 4 Contributor I

    hi

    type of my file is txt and i dont know how set operate read xml.

    i convert my file with html and set xpath for attribute:

    //author

    //title

    //date

    and ....

    but xpath for exampel????

    then set xml how i do???

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Dear Neda,

     

    your file format is very similar to XML. if you replace the first ( with <xml> and the last ) with </xml> it might be possible to read it in. However your posted file has the problem, that there are two authors. This is of course a bit strange.

     

    Another way to read it in would be a parsing in RM. Please have a look at the attached process. You can built similar things with the Process Documents from Files operator to parse all your files.

     

    ~

    Martin

    Spoiler
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.1.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="6.5.000" expanded="true" height="68" name="Create Document" width="90" x="112" y="85">
    <parameter key="text" value="(&lt;author&gt; A. Cau, R. Kuiper, and W.-P. de Roever. &lt;/author&gt; &lt;title&gt; Formalising Dijkstra's development strategy within Stark's formalism. &lt;/title&gt; &lt;editor&gt; In C. B. Jones, R. C. Shaw, and T. Denvir, editors, &lt;/editor&gt; &lt;booktitle&gt; Proc. 5th. BCS-FACS Refinement Workshop, &lt;/booktitle&gt; &lt;date&gt; 1992. &lt;/date&gt;&lt;author&gt; M. Kitsuregawa, H. Tanaka, and T. Moto-oka. )"/>
    </operator>
    <operator activated="true" class="text:extract_information" compatibility="6.5.000" expanded="true" height="68" name="Extract Information" width="90" x="246" y="85">
    <parameter key="query_type" value="Regular Expression"/>
    <list key="string_machting_queries"/>
    <list key="regular_expression_queries">
    <parameter key="date" value="&lt;date&gt;(.*)&lt;/date&gt;"/>
    </list>
    <list key="regular_region_queries">
    <parameter key="date" value="&lt;date&gt;.&lt;/date&gt;"/>
    </list>
    <list key="xpath_queries"/>
    <list key="namespaces"/>
    <list key="index_queries"/>
    <list key="jsonpath_queries"/>
    </operator>
    <operator activated="true" class="text:documents_to_data" compatibility="6.5.000" expanded="true" height="82" name="Documents to Data" width="90" x="447" y="85">
    <parameter key="text_attribute" value="text"/>
    </operator>
    <connect from_op="Create Document" from_port="output" to_op="Extract Information" to_port="document"/>
    <connect from_op="Extract Information" from_port="document" to_op="Documents to Data" to_port="documents 1"/>
    <connect from_op="Documents to Data" from_port="example set" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • neda65neda65 Member Posts: 4 Contributor I

    hi

    i undrestanf for citation parsing should use of svm^structer. but this operator is'nt on list of operator. do you know how add this alogoritm to rapidminer?

    tanks

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Dear Neda,

     

    are you referring to a struct SVM: https://www.cs.cornell.edu/people/tj/svm_light/svm_struct.html ? If so you really take out the big guns first. The struct svm is possibly the most complex solution you could think about.

     

    I know that Katharina Morik from the CS chair in dortmund had some eye on the topic, but I think there is no integration into RM yet. You would need to integrate it using Java yourself.

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.