I want to find some way to measure the difficulty level of some Chinese eBooks arrange them from easiest to most difficult.
Here is what I tried:
I thought this would give a rough estimate of the difficulty level.
However, the calculations all came out very close to 3.5. There was
little difference in the numbers between a children\'s story that I can
easily read and a challenging novel.
Is there some way to improve this? Or a better process for determining the relative difficulty?
My approach is the number of distinct (unknown) vocabulary items relative to length of the text. This is a very inaccurat result, but there is some correlation with difficulty. I think that if you combine it with sentence/clause length as Roddy suggest you should get pretty decent results.
It\'s however not an exact science. Specially when the sentence length and unknown vocabulary is uneven distributed reality may be quite different. Generally for example dialog is relative easy while narrative is often a bit harder. Specialist subjects may contain relatively small, but subject specific rare vocabulary etc.
I don\'t think the new HSK lists are appropriate for the process described in the op.
The lower levels contain too few words to be useful with native contents, that\'s why it averages at 3.5.
But maybe they could work better with the various levels of the Chinese Breeze series?
You\'d need different lists for native content I think. Perhaps try the old HSK lists.
edit: the new hsk lists don\'t have words such as 星期天! useless.
Yes there is an easier and better way
Try Chinese Text Analyser here http://www.chinese-forums.com/index.php?/topic/44383-introducing-chinese-text-analyser/
Hope this is the sort of thing you meant.