Teaching quality, lack of resources, weak infrastructures, exhaustion of will, problems so perennial that we should make them into calendars. What if we decided we don’t need to teach and learn English anymore? How simple would life be?
The world of Star Trek is just around the corner with hand-held simultaneous translation devices. Ahead of the game is the Jibbigo application, which promises speech recognition of 40,000 words for ten languages in a smartphone app. Potentially, we could save ourselves thousands of hours of language learning and quickly be multilingual with a battery charge. Gone will be TESOL departments, private language schools and the memorizing of archaic idioms for tests. There will be a tremendous sigh of relief from the masses.
There is no need to emphasize how convenient this will be. Think of how productive we will be when we have time to spend studying something else? How interconnected will commerce be when we don’t get lost in translation? What cross-cultural romances will bloom when misunderstandings are wiped away?
What will change of course is the very way we communicate and we will have to think again how valuable that is. Is our communication clumsy and crude in a way that aliens laugh at us, or is our bumbling attempt at communicating the very essence of what makes us human? Would we really want multilingual romances to depend on headsets and special goggles from which love in translated into subtitles? First of all, how is it even possible?
Japan’s NTT DoCoMo and Microsoft have developed prototypes, so rather than speculative science fiction, we can get information on how the technology works. Microsoft’s model uses virtual neurons to replicate what the human brain does. Neural networks weigh the value of information collaboratively, not that different from what Wikipedia does. The next fairly inscrutable thing to understand is that these neural networks are in layers through which information is sifted through, also apparently consistent with the way the brain works. Microsoft does this with nine layers. The bottom one is for processing sound and the higher levels sort information to determine the most likely intended meaning.
For Google’s model, the rough edges are sanded down by crowdsourcing. Samples from smartphones are used to compare and select the most likely solution. In this way, the strongest skeptic may be proven wrong as technology finds ways to handle nuance, humor, sarcasm, contextual and cultural references, dialects, accents, slang and just about any other tool we use to communicate as language loving beings. Even for lovers, audio will most likely be able to replicate individual voiceprints, inflections, intonation and everything else needed to not only be understood but to be felt.
So at the end of the day, if there are no more Luddites in the room, what can be the case to reject this technology? One might be that it will make our brains lazy if our layers of neural networks lack exercise. We might evolve out of legs too if we don’t use them anymore. Another might be that we will miss the fun of miscommunication. We won’t return home from foreign lands with amusing stories about taxi drivers and banana sellers, basically those delightful experiences when we find we can communicate through our own ingenuity, humor and magnanimity. Let’s wholeheartedly enjoy our fumbling, bu