Convert Audio Input to Text in Swift

2 min readAug 24, 2021

The capability of machine learning algorithms has advanced at a rapid pace over the past couple decades, providing us with an array of tools to improve our workflows. The API solution that we will be discussing in this tutorial is one such tool that can be leveraged to simplify a task, and in this case, that task is transcribing audio to text.

All we will need for this process is the audio file, which can be MP3 or WAV, and our API key to call the speech recognition function with the following code:

import Foundation
#if canImport(FoundationNetworking)
import FoundationNetworking
#endifvar semaphore = DispatchSemaphore (value: 0)let parameters = [
     [
          "key": "speechFile",
          "src": "/path/to/file",
          "type": "file"
     ]] as [[String : Any]]let boundary = "Boundary-\(UUID().uuidString)"
var body = ""
var error: Error? = nil
for param in parameters {
     if param["disabled"] == nil {
          let paramName = param["key"]!
          body += "--\(boundary)\r\n"
          body += "Content-Disposition:form-data; name=\"\(paramName)\""
          if param["contentType"] != nil {
               body += "\r\nContent-Type: \(param["contentType"] as! String)"
          }
          let paramType = param["type"] as! String
          if paramType == "text" {
               let paramValue = param["value"] as! String
               body += "\r\n\r\n\(paramValue)\r\n"
          } else {
               let paramSrc = param["src"] as! String
               let fileData = try NSData(contentsOfFile:paramSrc, options:[]) as Data
               let fileContent = String(data: fileData, encoding: .utf8)!
               body += "; filename=\"\(paramSrc)\"\r\n"
                 + "Content-Type: \"content-type header\"\r\n\r\n\(fileContent)\r\n"
          }
     }
}
body += "--\(boundary)--\r\n";
let postData = body.data(using: .utf8)var request = URLRequest(url: URL(string: "https://api.cloudmersive.com/speech/recognize/file")!,timeoutInterval: Double.infinity)
request.addValue("multipart/form-data", forHTTPHeaderField: "Content-Type")
request.addValue("YOUR-API-KEY-HERE", forHTTPHeaderField: "Apikey")
request.addValue("multipart/form-data; boundary=\(boundary)", forHTTPHeaderField: "Content-Type")request.httpMethod = "POST"
request.httpBody = postDatalet task = URLSession.shared.dataTask(with: request) { data, response, error in 
     guard let data = data else {
          print(String(describing: error))
          semaphore.signal()
          return
     }
     print(String(data: data, encoding: .utf8)!)
     semaphore.signal()
}task.resume()
semaphore.wait()

This will allow you to optimize the accessibility of your applications and websites by providing a text accompaniment to any audio files. If you need to retrieve an API key, you can do so by registering for a free account on the Cloudmersive website; this provides 800 monthly calls across our entire API library.

Convert Audio Input to Text in Swift

Written by Cloudmersive