How to generate an English text description of an image in Python using Deep Learning
If you have ever trained a deep learning AI for a task, you probably know the time investment and fiddling involved. So how can we achieve those excellent results without all the aggravation? That’s simple, we use an API that already has Deep Learning as part of its repertoire. Let me show you how to use it.
First we need to install our API client, like so:
pip install cloudmersive-image-api-client
And now we can call the function, recognize_describe:
from __future__ import print_functionimport timeimport cloudmersive_image_api_clientfrom cloudmersive_image_api_client.rest import ApiExceptionfrom pprint import pprint# Configure API key authorization: Apikeyconfiguration = cloudmersive_image_api_client.Configuration()configuration.api_key['Apikey'] = 'YOUR_API_KEY'# Uncomment below to setup prefix (e.g. Bearer) for API key, if needed# configuration.api_key_prefix['Apikey'] = 'Bearer'# create an instance of the API classapi_instance = cloudmersive_image_api_client.RecognizeApi(cloudmersive_image_api_client.ApiClient(configuration))image_file = '/path/to/file.txt' # file | Image file to perform the operation on. Common file formats such as PNG, JPEG are supported.try:# Describe an image in natural languageapi_response = api_instance.recognize_describe(image_file)pprint(api_response)except ApiException as e:print("Exception when calling RecognizeApi->recognize_describe: %s\n" % e)
That wraps up our setup, now let’s test it out on this image.
And here we have our response from the API.
{
"Successful": true,
"Highconfidence": true,
"BestOutcome": {
"ConfidenceScore": 0.365462526679039,
"Description": "A woman sitting in front of a laptop computer."
},
"RunnerUpOutcome": {
"ConfidenceScore": 0.20903514232486486,
"Description": "A woman sitting at a table using a laptop computer."
}
}
We have two guesses at the possible image contents, along with confidence scores for each. Not bad.