Sez Who? Support for iOS 10 Speech Recognition in Xamarin -- Visual Studio Magazine

Sez Who? Support for iOS 10 Speech Recognition in Xamarin

With Apple's update to its phone OS, we look at the support that Xamarin has for it by building a simple speech app.

By Wallace McClure
11/07/2016

Get Code Download

On September 7 Apple introduced it's newest phone operating system, iOS 10, a gold master version of the XCode programming language and the iOS SDK Xamarin followed up the next day with binding support for iOS10 APIs, and developers were off to the races. Apple supplied the final releases of iOS10 and developer tools the next week after, and Xamarin had updates in less than 24 hours.

Technology happens that quick these days.

In this and future articles (depending on how you respond to this one), I'll highlight some of the new features in iOS10 and Xamarin's support of these features. Since we aren't going to attempt to cover everything here, I'd like to look especially at what I think will be useful now, and this time we'll cover speech recognition capabilities to get you started. Speech recognition is getting more useful as an input option for apps where users need to be less distracted (when driving, for example).

First, let's quick look at some of the newer features, changes, and deprecated features:

True Tone Display. iOS 10 uses ambient light sensors in an iOS device to adjust the color and intensity of the display based on the current lighting that the device is in.
App Extensions. Apple has provided new app extension points.
App Search Enhancements. Core spotlight provides a number of new search enhancements.
CallKit. This provides a mechanism for VOIP apps to integrate with the iPhone UI and provide a familiar user experience.
Message App Integration. New functionality allows for the sending of stickers, media files, and interactive messages.
Publisher Enhancements. Anyone will be allowed to sign up and deliver content to the Apple News app.
Security and Privacy Enhancements. These enhancements will improve the security of code and ensure user privacy.
SiriKit. This allows an app to integrate with Siri.
Speech Recognition. It allows an app to support continuous speech recognition and transcribe speech.
Video Subscriber Account. This provides a single sign in experience for apps that present authenticated on demand streaming.
Wide Color. It extends the support for extended range pixel formats. There are numerous other new features, updates to existing features, and deprecated APIs. Check the references at the of this article for links.

Speech Recognition
While there are clearly a number of exciting new features in iOS 10, what I find to be most exciting is the integration with voice, which I'll cover here. There are several items to be aware of with speech recognition in iOS 10:

A connection to the Internet and Apple's servers must be available. Most of the speech recognition works by converting the text to a digital format and sending it to Apple's servers. The textual result is returned to the application.
The speech recognition is based on the same technology as Siri and Apple's keyboard dictation.
Multiple interpretations of what the user said are returned.
Confidence levels for the translations are returned.
Requires iOS 10, but not a specific device.

There are several permissions that must be provided. These included the NSSpeechRecognitionUsageDescription key in the Info.plist file. Along with the info.plist key, the program should call SFSpeechRecognizer.RequestAuthorization method. This method will let a user allow or disallow speech recognition.

The general steps to use speech recognition in an iOS 10 app are:

Provide a usage description in the Info.plist file. This is provided in the NSSpeechRecognitionUsageDescription key.
Request authorization with the SFSpeechRecognizer.RequestAuthorization method.
For an existing audio file, use the SFSpeechURLRecognitionRequest class.
For live audio, use the SFSpeechAudioBufferRecognition class.
Pass the speech recognition class to a speech recognizer to start the recognition process.
The results that are returned are provided asynchronously. This means when trying to communicate back with the user interface, you will need to talk to the UI on the correct thread.

Let's build a simple application. The application presents the user with a UILabel which will display the transcribed text, a button for starting speech recognition, and a button for stopping as well as a label to display the output as it goes (see Figure 1).

*Figure 1:* Simple UI for Speech Recognition App in iOS 10

The code in Listing 1 performs the speech recognition. It is a modified version of a Xamarin example.

Listing:1: Recognizing Speech in iOS App

using System;
using UIKit;
using Speech;
using Foundation;
using AVFoundation;

namespace SpeechRecognition
{
     public partial class ViewController : UIViewController
     {
          private AVAudioEngine AudioEngine = new AVAudioEngine();
          private SFSpeechRecognizer SpeechRecognizer = new SFSpeechRecognizer();
          private SFSpeechAudioBufferRecognitionRequest LiveSpeechRequest = new SFSpeechAudioBufferRecognitionRequest();
          private SFSpeechRecognitionTask RecognitionTask;

          protected ViewController(IntPtr handle) : base(handle)
          {
               // Note: this .ctor should not contain any initialization logic.
          }

          public override void ViewDidLoad()
          {
               base.ViewDidLoad();
               speechText.LineBreakMode = UILineBreakMode.WordWrap;
               speechText.Lines = 0;
               startRec.Enabled = false;
               // Request user authorization
               SFSpeechRecognizer.RequestAuthorization((SFSpeechRecognizerAuthorizationStatus status) =>
               {
                    // Take action based on status
                    switch (status)
                    {
                         case SFSpeechRecognizerAuthorizationStatus.Authorized:
                              // User has approved speech recognition
                              BeginInvokeOnMainThread(() =>
                              {
                                   startRec.Enabled = true;
                              });
                         break;
                         case SFSpeechRecognizerAuthorizationStatus.Denied:
                    // User has declined speech recognition
                    
                         break;
                         case SFSpeechRecognizerAuthorizationStatus.NotDetermined:
                    // Waiting on approval
                    
                         break;
                         case SFSpeechRecognizerAuthorizationStatus.Restricted:
                    // The device is not permitted
                    
                         break;
                    }
               });
               StopRec.TouchUpInside += StopRec_TouchUpInside;
               startRec.TouchUpInside += StartRec_TouchUpInside;
          }

          public override void ViewDidDisappear(bool animated)
          {
               base.ViewDidDisappear(animated);
               StopRecording();
          }

          public override void DidReceiveMemoryWarning()
          {
               base.DidReceiveMemoryWarning();
               // Release any cached data, images, etc that aren't in use.
          }

          void StopRec_TouchUpInside(object sender, EventArgs e)
          {
               StopRecording();
          }

          void StartRec_TouchUpInside(object sender, EventArgs e)
          {
               StartRecording();
          }

          public void StartRecording()
          {
               // Setup audio session
               var node = AudioEngine.InputNode;
               var recordingFormat = node.GetBusOutputFormat(0);
               node.InstallTapOnBus(0, 1024, recordingFormat, (AVAudioPcmBuffer buffer, AVAudioTime when) =>
               {
                    // Append buffer to recognition request
                    LiveSpeechRequest.Append(buffer);
               });

               // Start recording
               AudioEngine.Prepare();
               NSError error;
               AudioEngine.StartAndReturnError(out error);

               // Did recording start?
               if (error != null)
               {
        // Handle error and return
        
        return;
               }

               // Start recognition
               RecognitionTask = SpeechRecognizer.GetRecognitionTask(LiveSpeechRequest, (SFSpeechRecognitionResult result, NSError err) =>
               {
                    // Was there an error?
                    if (err != null)
                    {
                         // Handle error

                    }
                    else {
                         var currentText = result.BestTranscription.FormattedString;
                         if (result.Final)
                         {
                              Console.WriteLine("You said \"{0}\".", currentText);
                         }
                         else {
                              BeginInvokeOnMainThread(() => {
                                   speechText.Text += (!String.IsNullOrEmpty(speechText.Text) ? System.Environment.NewLine : String.Empty) + currentText;
                              });
                         }
                    }
               });
          }

          public void StopRecording()
          {
               AudioEngine.Stop();
               LiveSpeechRequest.EndAudio();
          }

          public void CancelRecording()
          {
               AudioEngine.Stop();
               RecognitionTask.Cancel();
          }
     }
}

Let's go over what this code is doing:

The developer needs to add the appropriate keys in the info.plist file.
Request Authorization. This is accomplished by the SFSpeechRecognizer class and calling the RequestAuthorization static method. There are several return types. In this example, we are only concerned with the Authorized return value. The return values are provided by the SFSpeechRecognizerAuthorizationStatus enum.
In the StartRecording method, an attempt is made to start recording. Any errors are handled at this time. A buffer is set up since the person will likely be talking faster than the application is able to perform the speech recognition.
A recognition task is started. A handle is used to keep the recognition task. Remember the recognition is handled outside of the UI thread, so writing back to the UI (as this example does) requires the application to call onto the UI thread. This is the reason for the InvokeOnMainThread method calls.
If the user stops recognition, the various engines should be notified.
If a cancelation is done, the app should call the .Cancel method of the recognition task to free up memory and process.
Be aware that speech recognition in iOS is not available on an unlimited basis. Some things to keep in mind:
There are a limited number of recognitions that can be performed per day. Per application.
Apps will be limited on a global basis per day.
Speech recognition requires going back to the Apple servers. Therefore network connection speeds will play a factor in the ability to use it.
Speech recognition costs battery and network traffic. Apple imposes an audio duration limit of one minute.

Along with the technical issues, there are privacy issues. When an application is recording user speech, be sure to indicate the recording is occurring. Speech recognition should not be used for sensitive data. Don't use it for passwords, health data, financial information, or the nuclear access codes (especially the nuclear access codes; nothing ruins your day more than a 5,000 degree sunburn). Finally, an application should give the user an opportunity to make changes before acting on the text.

With the speech recognition features that are new in iOS 10, we've just scratched the surface. If you want more, write to the editor of this magazine and ask for it and we'll go into depth on other features. Enjoy working with speech recognition for now and good luck with programming!

Resources

• Xamarin: Intro to iOS 10
• Xamarin: Introduction to iOS 10 (Speech)

About the Author

Wallace (Wally) B. McClure has authored books on iPhone programming with Mono/Monotouch, Android programming with Mono for Android, application architecture, ADO.NET, SQL Server and AJAX. He's a Microsoft MVP, an ASPInsider and a partner at Scalable Development Inc. He maintains a blog, and can be followed on Twitter.

Printable Format

comments powered by Disqus

Featured

You Can Now Apply for Early-Stage AI Agent 'Computer Use' in Copilot Studio

On the way to autonomous AI, Microsoft announced an early access research preview of "computer use" for Copilot Studio wherein AI agents visually interact with any app or website -- clicking, typing, and navigating like a human.
KaleidoSearch Embeds dtSearch Engine

dtSearch, a specialist in enterprise and developer text retrieval software and document filters, announced it's now powering an upgrade to KaleidoSearch from Contegra Systems.
.NET 10 Preview 3 Adds Native Container Publishing

While Preview 3 comes with a wide array of incremental improvements across performance, libraries, CLI tooling, ASP.NET Core/Blazor and more, it doesn't introduce any major new features.
Trending Model Context Protocol for AI Agents Gets C# SDK

The Model Context Protocol (MCP) for agentic AI has gained much traction since being introduced by Anthropic last November, and now it has a C# SDK.
As Agentic AI Explodes, Microsoft Announces MS365 Copilot Agent Debugging

Microsoft announced agent debugging functionality for Microsoft 365 Copilot directly from the AI tool itself, no Visual Studio 2022 or Visual Studio Code needed.