(BTW, the chap in the cup is a SpinVox “Mobster”….)
My favourite VTT (Voice-to-text) company, SpinVox, have raised an interesting thread for conversation in their blog (read it here). More accurately, James Whatley, their “chief of interweb wizardry and bloggery“, has written an inciteful and thought provoking piece on where “voice” (as in one of the ways we communicate) is headed – it had my two brain cells buzzing away for hours thinking about it – some thoughts below…
They say that people are predominantly either visual, aural, or kinaesthetic learners – that means, the way you best take in information is either by seeing, by hearing, or by feeling – so what happens when you can “see” your voice? As James neatly puts it, “I’m not talking about sound waves or pretty patterns on an oscilloscope, but your actual VOICE. The words you use to articulate your thoughts are similar and yet also completely different to those of the person sitting beside you.” Does someone who is an aural learner start to change the ways in which they represent their world? Could intelligent conversion of some of the many “data feeds” which we experience cause a paradigm shift in our learning/understanding?
Another thought: the ways in which we are “authenticated” or “identified” in many areas of our lives is often visual (e.g. Photo). Occasionally, other techniques are used such as textual (e.g. signature). Where more security is required, some kind of psychometric information is sometimes used (e.g. a “fact” that only you might know). But with VTT, another form of authentication becomes possible – granted, these sorts of advanced ID schemes are available, and have been for some time, to the likes of government agencies, but they have not been “mass-market” – with a good (or better still, excellent) voice conversion algorithm and pattern-matcher, you can search for phrases, quirks, or “tells”, that can ID someone – yes, that next-gen voice analyser in 24 could become real!
Searching your voice – no, this isn’t you best deciding how you might hit that high “A” that Mariah can do in all her records, but you can’t quite get to, it’s about a new kind of search, using you voice pattern as metadata. Just like the recent advances in image searching (you don’t type a text string, you supply image data), searching voice has the potential to be huge – it unlocks another kind of content that is plentiful, constantly being updated, and probably very valuable – essentially most of the things that make the interweb go round!
Wow, some deeper thoughts there on VTT – but, as James points out in his blog, “We’ve enabled the notion of ‘voice to content’ – now what does that empower you to do?“
Thoughts are most welcome!
(Note: on a less high-brow note, there are 10 IntoMobile points up for grabs for the best caption to the picture above – I am going for “When Mobster was told for 100th time he was a really sweet guy, he decided to find out how sweet….”)