Sources Contact Advanced Search Tutorials

An Interest In:

Web News this Week

Search Archive

Some of Our Sources

View All Sources

Help Webnuz

Referal links:

April 11, 2022 11:03 pm GMT

Add Speech Recognition to Your PC even to your TV

Overview of My Submission

Most of our day to day usage of computers uses our computers as a sound device so I thought it will be nice if I somehow connect default audio output to voice recognition so that independent of what software you use all words will be recognized ? Teams , youtube, tiktok, twitter, Edge, VLC, you name it ( unfortunately for Windows only ) . And how much can we push it like Subtitles for cable TV

Submission Category:

Accessibility Advocates

Link to Code on GitHub

bleakview / deepgramwinsys

Deepgram sound to text converter for all sounds in emitted Windows

deepgramwinsys

Deepgram sound to text converter for all sounds in emitted Windows

What you can find in this repository?

How to get started Deepgram in windows forms
Sample for custom label control with borders in Windows form
How to get and capturesystem wide default audio output
How to record captured audio as mp3
How to save and get system settings

View on GitHub

Additional Resources / Info

In order to have system wide voice recognitionyou need basically need two components a voice recognizer service like deepgram and some way to eavesdrop generated sounds from PC. I chose C# as language as I will be directly connecting to System. In order to loopback (technical term for connecting to self sound system) in windows you use Wasapi driver. I chose NAudio library for windows system.
And here are some of the results.

It works on teams

It works on browser

While the system has settings, transparent properties the most critical system is getting audio and recognize it.

private async void ConvertAndTranscript(){    //enter credentials for deepgram    var credentials = new Credentials(textBoxApiKey.Text);    //Create our export folder to record sound and CSV file    var outputFolder = CreateRecordingFolder();    //File settings    var dateTimeNow = DateTime.Now;    var fileName = $"{dateTimeNow.Year}_{dateTimeNow.Month}_{dateTimeNow.Day}#{dateTimeNow.Hour}_{dateTimeNow.Minute}_{dateTimeNow.Minute}_record";    var soundFileName = $"{fileName}.mp3";    var csvFileName = $"{fileName}.csv";    var outputSoundFilePath = Path.Combine(outputFolder, soundFileName);    var outputCSVFilePath = Path.Combine(outputFolder, csvFileName);    //init deepgram    var deepgramClient = new DeepgramClient(credentials);    //init loopback interface    _WasapiLoopbackCapture = new WasapiLoopbackCapture();    //generate memory stream and deepgram client    using (var memoryStream = new MemoryStream())    using (var deepgramLive = deepgramClient.CreateLiveTranscriptionClient())    {        //the format that will we send to deepgram is 24 Khz 16 bit 2 channels          var waveFormat = new WaveFormat(24000, 16, 2);        var deepgramWriter = new WaveFileWriter(memoryStream, waveFormat);        //mp3 writer if we wanted to save audio        LameMP3FileWriter? mp3Writer = checkBoxSaveMP3.Checked ?            new LameMP3FileWriter(outputSoundFilePath, _WasapiLoopbackCapture.WaveFormat, LAMEPreset.STANDARD_FAST) : null;        //file writer if we wanted to save as csv        StreamWriter? csvWriter = checkBoxSaveAsCSV.Checked ? File.CreateText(outputCSVFilePath) : null;        //deepgram options        var options = new LiveTranscriptionOptions()        {            Punctuate = true,            Diarize = true,            Encoding = Deepgram.Common.AudioEncoding.Linear16,            ProfanityFilter = checkBoxProfinityAllowed.Checked,            Language = _SelectedLanguage.LanguageCode,            Model = _SelectedModel.ModelCode,        };        //connect         await deepgramLive.StartConnectionAsync(options);        //when we receive data from deepgram this is mostly taken from their samples        deepgramLive.TranscriptReceived += (s, e) =>        {            try            {                if (e.Transcript.IsFinal &&                   e.Transcript.Channel.Alternatives.First().Transcript.Length > 0)                {                    var transcript = e.Transcript;                    var text = $"{transcript.Channel.Alternatives.First().Transcript}";                    _CaptionForm?.captionLabel.BeginInvoke((Action)(() =>                    {                        csvWriter?.WriteLine($@"{DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss \"GMT\"zzz")},""{text}""");                        _CaptionForm.captionLabel.Text = text;                        _CaptionForm?.captionLabel.Refresh();                    }));                }            }            catch (Exception ex)            {            }        };        deepgramLive.ConnectionError += (s, e) =>        {        };        //when windows tell us that there is sound data ready to be processed        //better than polling        _WasapiLoopbackCapture.DataAvailable += (s, a) =>        {            mp3Writer?.Write(a.Buffer, 0, a.BytesRecorded);            var buffer = ToPCM16(a.Buffer, a.BytesRecorded, _WasapiLoopbackCapture.WaveFormat);            deepgramWriter.Write(buffer, 0, buffer.Length);            deepgramLive.SendData(memoryStream.ToArray());            memoryStream.Position = 0;        };        //when recording stopped release and flush all file pointers         _WasapiLoopbackCapture.RecordingStopped += (s, a) =>        {            if (mp3Writer != null)            {                mp3Writer.Dispose();                mp3Writer = null;            }            if (csvWriter != null)            {                csvWriter.Dispose();                csvWriter = null;            }            _WasapiLoopbackCapture.Dispose();        };        _WasapiLoopbackCapture.StartRecording();        while (_WasapiLoopbackCapture.CaptureState != NAudio.CoreAudioApi.CaptureState.Stopped)        {            if (_CancellationTokenSource?.IsCancellationRequested == true)            {                _CancellationTokenSource?.Dispose();                _CancellationTokenSource = null;                return;            }            Thread.Sleep(500);        }    }}

The rest of the code is for getting code ready to exexute show hide forms etc.

So after all how can you you have subtitles on TV ? In order to achieve this you need to somehow enter TV signal to PC I use a usb capture card for this in order to process capture card input I use OBS once I get the audio signal it makes no difference to me since I process all output sound signals. Then I use computers HDMI output to send signal to TV. It makes no difference to TV and Cable box.

Ps: If you have some issues with lagging check your network connection also there seems to be a problem with memory stream which is not happy with hacky solution. Any PR is welcomed.

Original Link: https://dev.to/bleakview/add-speech-recognition-to-your-pc-even-to-your-tv-4j0n

Share this article:

View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To