Convert Audio Input to Text in Swift

Cloudmersive
2 min readAug 24, 2021

The capability of machine learning algorithms has advanced at a rapid pace over the past couple decades, providing us with an array of tools to improve our workflows. The API solution that we will be discussing in this tutorial is one such tool that can be leveraged to simplify a task, and in this case, that task is transcribing audio to text.

All we will need for this process is the audio file, which can be MP3 or WAV, and our API key to call the speech recognition function with the following code:

import Foundation
#if canImport(FoundationNetworking)
import FoundationNetworking
#endif
var semaphore = DispatchSemaphore (value: 0)let parameters = [
[
"key": "speechFile",
"src": "/path/to/file",
"type": "file"
]] as [[String : Any]]
let boundary = "Boundary-\(UUID().uuidString)"
var body = ""
var error: Error? = nil
for param in parameters {
if param["disabled"] == nil {
let paramName = param["key"]!
body += "--\(boundary)\r\n"
body += "Content-Disposition:form-data; name=\"\(paramName)\""
if param["contentType"] != nil {
body += "\r\nContent-Type: \(param["contentType"] as! String)"
}
let paramType = param["type"] as! String
if paramType == "text" {
let paramValue = param["value"] as! String
body += "\r\n\r\n\(paramValue)\r\n"
} else {
let paramSrc = param["src"] as! String
let fileData = try NSData(contentsOfFile:paramSrc, options:[]) as Data
let fileContent = String(data: fileData, encoding: .utf8)!
body += "; filename=\"\(paramSrc)\"\r\n"
+ "Content-Type: \"content-type header\"\r\n\r\n\(fileContent)\r\n"
}
}
}
body += "--\(boundary)--\r\n";
let postData = body.data(using: .utf8)
var request = URLRequest(url: URL(string: "https://api.cloudmersive.com/speech/recognize/file")!,timeoutInterval: Double.infinity)
request.addValue("multipart/form-data", forHTTPHeaderField: "Content-Type")
request.addValue("YOUR-API-KEY-HERE", forHTTPHeaderField: "Apikey")
request.addValue("multipart/form-data; boundary=\(boundary)", forHTTPHeaderField: "Content-Type")
request.httpMethod = "POST"
request.httpBody = postData
let task = URLSession.shared.dataTask(with: request) { data, response, error in
guard let data = data else {
print(String(describing: error))
semaphore.signal()
return
}
print(String(data: data, encoding: .utf8)!)
semaphore.signal()
}
task.resume()
semaphore.wait()

This will allow you to optimize the accessibility of your applications and websites by providing a text accompaniment to any audio files. If you need to retrieve an API key, you can do so by registering for a free account on the Cloudmersive website; this provides 800 monthly calls across our entire API library.

--

--

Cloudmersive

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.