Answers >> Qingdao >> Culture
  • Yazid21
    Points:27
    (0)
    (0)

    How to measure the difficulty of a text?

    I want to find some way to measure the difficulty level of some Chinese eBooks arrange them from easiest to most difficult.

    Here is what I tried:

    1. I ran the text in a word segmenting software to identify each unique word.
    2. In the text, I replaced all HSK 1 words with \"1\", all HSK 2 words with \"2\", etc., and all words not in the HSK list with \"7\".
    3. I removed all remaining symbols. This resulted in a document containing only numbers, from 1 to 7.
    4. I then calculated the mean average of all of the numbers in the document.

    I thought this would give a rough estimate of the difficulty level. However, the calculations all came out very close to 3.5. There was little difference in the numbers between a children\'s story that I can easily read and a challenging novel.
     

    Is there some way to improve this? Or a better process for determining the relative difficulty?

    4 years agoin Culture-Qingdao
    Answers(3) Comments(0)
  • Willson228
    Points:86
    (0)
    (0)

    My approach is the number of distinct (unknown) vocabulary items relative to length of the text. This is a very inaccurat result, but there is some correlation with difficulty. I think that if you combine it with sentence/clause length as Roddy suggest you should get pretty decent results.

    It\'s however not an exact science. Specially when the sentence length and unknown vocabulary is uneven distributed reality may be quite different. Generally for example dialog is relative easy while narrative is often a bit harder. Specialist subjects may contain relatively small, but subject specific rare vocabulary etc.

    4 years ago
  • NONTETHELELO
    Points:75
    (0)
    (0)

    I don\'t think the new HSK lists are appropriate for the process described in the op.

    The lower levels contain too few words to be useful with native contents, that\'s why it averages at 3.5.

    But maybe they could work better with the various levels of the Chinese Breeze series?

    You\'d need different lists for native content I think. Perhaps try the old HSK lists.

    edit: the new hsk lists don\'t have words such as 星期天! useless.

    4 years ago
  • NONTETHELELO
    Points:75
    (0)
    (0)

    Yes there is an easier and better way \":)\"

    Try Chinese Text Analyser here http://www.chinese-forums.com/index.php?/topic/44383-introducing-chinese-text-analyser/

    Hope this is the sort of thing you meant.

    4 years ago

Know the answers?


Need to hire an expat for a job?

Or want to apply our jobs in China and receive offers,it's free to sign up