Classifying English Articles Based on Difficulty
Hi,Common European Framework of Reference for Languages categorise language difficulty into 3 main level group A,B,C and each level group has two subleve. The levels are (A1 Begginer, A2 elementary ..... C2 Mastery).
I have thousandes of documents that I need to group based on difficulty level using RabidMiner or Python. One concept is to use a document with the most commonly spoken words and see how close the words in an article , for example, to the most common 1000 words. But this approch ignore the gramatical difficulty. In addition to the words difficulty, I need to add Part-of-speech tagging for each article, the length of each sentence and then find a way to consider the article as easy or difficult. It would be great if there is ready to use library that can do this.
What packages could help in this? And what process do you recommend.