An Interest In:
Web News this Week
- April 27, 2024
- April 26, 2024
- April 25, 2024
- April 24, 2024
- April 23, 2024
- April 22, 2024
- April 21, 2024
Add Speech Recognition to Your PC even to your TV
Overview of My Submission
Most of our day to day usage of computers uses our computers as a sound device so I thought it will be nice if I somehow connect default audio output to voice recognition so that independent of what software you use all words will be recognized ? Teams , youtube, tiktok, twitter, Edge, VLC, you name it ( unfortunately for Windows only ) . And how much can we push it like Subtitles for cable TV
Submission Category:
Accessibility Advocates
Link to Code on GitHub
bleakview / deepgramwinsys
Deepgram sound to text converter for all sounds in emitted Windows
deepgramwinsys
Deepgram sound to text converter for all sounds in emitted Windows
What you can find in this repository?
- How to get started Deepgram in windows forms
- Sample for custom label control with borders in Windows form
- How to get and capturesystem wide default audio output
- How to record captured audio as mp3
- How to save and get system settings
Additional Resources / Info
In order to have system wide voice recognitionyou need basically need two components a voice recognizer service like deepgram and some way to eavesdrop generated sounds from PC. I chose C# as language as I will be directly connecting to System. In order to loopback (technical term for connecting to self sound system) in windows you use Wasapi driver. I chose NAudio library for windows system.
And here are some of the results.
While the system has settings, transparent properties the most critical system is getting audio and recognize it.
private async void ConvertAndTranscript(){ //enter credentials for deepgram var credentials = new Credentials(textBoxApiKey.Text); //Create our export folder to record sound and CSV file var outputFolder = CreateRecordingFolder(); //File settings var dateTimeNow = DateTime.Now; var fileName = $"{dateTimeNow.Year}_{dateTimeNow.Month}_{dateTimeNow.Day}#{dateTimeNow.Hour}_{dateTimeNow.Minute}_{dateTimeNow.Minute}_record"; var soundFileName = $"{fileName}.mp3"; var csvFileName = $"{fileName}.csv"; var outputSoundFilePath = Path.Combine(outputFolder, soundFileName); var outputCSVFilePath = Path.Combine(outputFolder, csvFileName); //init deepgram var deepgramClient = new DeepgramClient(credentials); //init loopback interface _WasapiLoopbackCapture = new WasapiLoopbackCapture(); //generate memory stream and deepgram client using (var memoryStream = new MemoryStream()) using (var deepgramLive = deepgramClient.CreateLiveTranscriptionClient()) { //the format that will we send to deepgram is 24 Khz 16 bit 2 channels var waveFormat = new WaveFormat(24000, 16, 2); var deepgramWriter = new WaveFileWriter(memoryStream, waveFormat); //mp3 writer if we wanted to save audio LameMP3FileWriter? mp3Writer = checkBoxSaveMP3.Checked ? new LameMP3FileWriter(outputSoundFilePath, _WasapiLoopbackCapture.WaveFormat, LAMEPreset.STANDARD_FAST) : null; //file writer if we wanted to save as csv StreamWriter? csvWriter = checkBoxSaveAsCSV.Checked ? File.CreateText(outputCSVFilePath) : null; //deepgram options var options = new LiveTranscriptionOptions() { Punctuate = true, Diarize = true, Encoding = Deepgram.Common.AudioEncoding.Linear16, ProfanityFilter = checkBoxProfinityAllowed.Checked, Language = _SelectedLanguage.LanguageCode, Model = _SelectedModel.ModelCode, }; //connect await deepgramLive.StartConnectionAsync(options); //when we receive data from deepgram this is mostly taken from their samples deepgramLive.TranscriptReceived += (s, e) => { try { if (e.Transcript.IsFinal && e.Transcript.Channel.Alternatives.First().Transcript.Length > 0) { var transcript = e.Transcript; var text = $"{transcript.Channel.Alternatives.First().Transcript}"; _CaptionForm?.captionLabel.BeginInvoke((Action)(() => { csvWriter?.WriteLine($@"{DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss \"GMT\"zzz")},""{text}"""); _CaptionForm.captionLabel.Text = text; _CaptionForm?.captionLabel.Refresh(); })); } } catch (Exception ex) { } }; deepgramLive.ConnectionError += (s, e) => { }; //when windows tell us that there is sound data ready to be processed //better than polling _WasapiLoopbackCapture.DataAvailable += (s, a) => { mp3Writer?.Write(a.Buffer, 0, a.BytesRecorded); var buffer = ToPCM16(a.Buffer, a.BytesRecorded, _WasapiLoopbackCapture.WaveFormat); deepgramWriter.Write(buffer, 0, buffer.Length); deepgramLive.SendData(memoryStream.ToArray()); memoryStream.Position = 0; }; //when recording stopped release and flush all file pointers _WasapiLoopbackCapture.RecordingStopped += (s, a) => { if (mp3Writer != null) { mp3Writer.Dispose(); mp3Writer = null; } if (csvWriter != null) { csvWriter.Dispose(); csvWriter = null; } _WasapiLoopbackCapture.Dispose(); }; _WasapiLoopbackCapture.StartRecording(); while (_WasapiLoopbackCapture.CaptureState != NAudio.CoreAudioApi.CaptureState.Stopped) { if (_CancellationTokenSource?.IsCancellationRequested == true) { _CancellationTokenSource?.Dispose(); _CancellationTokenSource = null; return; } Thread.Sleep(500); } }}
The rest of the code is for getting code ready to exexute show hide forms etc.
So after all how can you you have subtitles on TV ? In order to achieve this you need to somehow enter TV signal to PC I use a usb capture card for this in order to process capture card input I use OBS once I get the audio signal it makes no difference to me since I process all output sound signals. Then I use computers HDMI output to send signal to TV. It makes no difference to TV and Cable box.
Ps: If you have some issues with lagging check your network connection also there seems to be a problem with memory stream which is not happy with hacky solution. Any PR is welcomed.
Original Link: https://dev.to/bleakview/add-speech-recognition-to-your-pc-even-to-your-tv-4j0n
Dev To
An online community for sharing and discovering great ideas, having debates, and making friendsMore About this Source Visit Dev To