varnamproject-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Varnamproject-discuss] Round 2 testing


From: Kevin Martin
Subject: [Varnamproject-discuss] Round 2 testing
Date: Sat, 30 Aug 2014 22:29:43 +0530

Followed another testing method :

1. Created a word corpus from chapter 1 of  ഒരുകുടുംബപുരാണം from sayahna.org
2. Learned the word corpus (about 1000 words)
3. Tested transliterating chapter 2 of ഒരുകുടുംബപുരാണം (about 1100 words).
4. Repeated the test with and without stemmer

Saw an improvement in accuracy of 1%! The idea was that since the both the sets are from similar sources, some words would overlap and some words will repeat with a different suffix. However, I think that the decreased accuracy improvementt might be because I'm typing the manglish incorrectly. I remember you mentioning some sort of "manglish" standard. Is it available online somewhere?

The sh and ruby scripts I used and the word corpus from the novel are all in my tools repository [1]

[1] https://github.com/lonesword/tools

reply via email to

[Prev in Thread] Current Thread [Next in Thread]