Originally posted by Fry-kun
View Post
Once you'd sieved out visually similar words you'd have to sieve out homophones. Sore soar saw etc., bow, bough.. and not just in RP but in every existing dialect.. and they'll be everywhere. I doubt there's even a comprehensive reference compiled anywhere! US dialects in particular, with their t/d homophones and spongy vowels are littered with unlikely homophones. Frog & fraud, cold & called for example. Even if there was such a list, it'd be obsolete as soon as it was complete as language is constantly evolving.
Anyway, once you'd finished that, assuming you ever did, you'd have to do the same for every existing dialect of every other language on earth! That's assuming you're not just relying the words' lengths as a (shockingly poor) proxy for difference: I suspect comparing enormous strings of letters which supposedly happen to represent words in a foreign language would be worse than comparing the underlying hash. ...and then of course, there are plenty of languages which don't even have 65k words.
Once you'd sorted out that lot and allowed for future deviations in pronunciation, how would the source choose which language to encode to? You'd have to know the first language of every potential recipient... or publish the digest/"fingerprint" encoded into every known language - which would be rather unwieldy!
Encoding to a word-list which is compiled from every language on earth, carefully sieved to remove all visual and phonetic similarities could help resolve some of those problems... perhaps... the result might be quite good for visual comparison... but it'd be an absolute bastard to type or read over the phone!
I'm pretty certain the idea is unworkable.
Comment