The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

Labelling training cases in polynominal text clasification task

User22883User22883 Member Posts: 5 Contributor I
edited November 2019 in Help
Each case in my dataset contains multiple sentences as shown below.

"Criterion 4Writing General – language and grammar and referencing.UnacceptableSentence structure and grammar inadequate for clarity and/or incomplete referencing of sourced material.AcceptableSentence structure and grammar adequate, but errors cause distraction and/or errors in referencing.GoodSentence structure and grammar adequate, with minor errors that do not distract reader from the main message.Very GoodSentence structures and grammar are good with correct referencing of all sourced material.ExcellentEmploys words with fluency for ease of reading. Writing and references are essentially error free."

I would like to classify the cases according to their main focus.  My labels are ["Information Literacy", Written Communication", Digital Literacy"...] 8 in total.

When developing the training set some cases clearly relate to one area such as Information Literacy... In those instances my training data looks like this:

ID, Text, Lable
01 "string", "Information Literacy"

However, some cases relate to multiple labels. 

My question is how should these cases be documented in the training set?

Hope that makes sense.

Best Answer


  • Options
    User22883User22883 Member Posts: 5 Contributor I
    Hi Rodrigo,

    I understand the methodology thank you.  Could you provide some guidance on how I might go about implementing the approach?

Sign In or Register to comment.