I just started looking into the software julius. It would seem to me that if I am speaking all the time when I use my computer and for instance speak the content that I place on my blog, it has a one to one correspondence to a simple hash of dictionary and voice dictionary that would be common for all people in its structure and only different in the frequency output. It only makes sense that I can sit at my computer and speak while I type to correlate speech segments to words. At first glance it would seem to be a matter of simple dictionaries with some nuance in the emphasis or context to identify words that sound alike and have different meaning.
The same data base structure would be true for anybody using the same language. It would be possible to share an entire voice between users as easily as a dictionary.
It also seems reasonable that in the case of video chat, if I were to create a wavefornt OBJ file of my face with its animation aspects ( and textures ) to correlate with that voice dictionary as well as some non verbal communication tools, that I could chat with a person face to face , without sending video streams at all. The data rate would be extremely limited. If I accept the vocal dictionary of a person whom I speak to often, then communication by IRC could be written on the interface and spoken on my end. It would also not require transferring video except frames which represented complex images that could not be decomposed or were unique to the individual receiving them. It is even possible to exchange words and images that might irritate a person on the opposite end. There is no reason that I can't replace sh*t with poop, though I am not offended by such things and would rather hear it like it is ( though it might be interesting to swap it out for sheisa, govno or merde ).
There are translation programs and I don't see why it would be necessary to translate a sentence more than one time, if it is correct. Given the amount of storage available locally and on the Internet, it hardly makes sense for any number of companies or individuals to duplicate the effort of producing a common useful data set.
The use of HMM ( Hidden Markov Models ) in julius should be interesting to look at. I also have a new technique for voice recognition that I suspect others have not figured out yet and it makes a convenient framework to test my hypothesis.
It would seem to me that I could carry a flash drive about with me and plug it in to any machine and have it recognize my voice. I think that something beyond Fourier or FFTs could be extracted from the speech stream, but I guess I will see as I proceed. I am also integrating this with some experiments in LAN ethernet packet networking on CSMA systems. I think that I may have smurfed or pinged one of my local computer to death once already by accident while playing around.
Many other strange things are happening and many more than I could ever blog about. Continuous reading phased dimensional PCR-RFLP is a concept and method I am developing for observing changes in the local genome. I would assume that if I just PCR and RFLP continuously like background noise detection, then if there is a change in the RFLP over time, it would indicate the presence of other insects, creatures, bacteria, viruses, or people. Like the background noise that pervades a city, I would guess that it has tones and overtones that serve as a reference frame for change. I will see how that works out also and report any significant results. I would also suspect that any attempt to destroy normal flora ( being too clean ) would simply create an opportunity for a new species, like deforestation and recovery.