Image recognition in the Cloud

Blog

Stay updated

How to use Azure Cognitive Services, Amazon Rekognition and Google Vision AI libraries in Typescript

Tuesday, February 05, 2019

During one of the Azure academy we held for Overnet Education, our partner for training, we dealt with the subject of image recognition, that generated interest among students. I believe it could be interesting a comparison on this argument between different cloud providers, using Node.js and Typescript.

Foreword

All major Cloud service providers offer tools for images analysis based on the artificial intelligence. You only need to provide an image (even a video in Amazon Rekognition) and the service can identify things, people, text, scenes and activities, as well as possible inappropriate contents.

In this article, we will use the Amazon, Azure and Google SDK Javascript, in a Node project written in Typescript, which analyze locally uploaded images. Each Cloud is different from others for the way of use, services offered and for the returned output. For this reason, I will only refer to the analysis that is possible to make on the three SDK.

For Amazon, the main requisite is the creation of an AWS account and an IAM user. A complete guide, that shows the whole procedure, is available at this link. If you don’t have an Amazon account yet, you can access through the AWS free program.

For Azure, the procedure to create a free account of Computer Vision is available at this address. If you don’t have an account on the Azure cloud, you can access through a Initial free program .

The documentation to start working with Google Vision on Google Cloud, is available at this address

Setup of the project Node.js

The access to AI services requires the execution of calls to REST services. We will use Node.js to do that, the natural platform to manage this kind of problems. We create a Node.js project from scratch and we will add later all needed dependencies.

Requisites to install are as follows:

Node.js from https://nodejs.org (in the LTS version)
Typescript with the command npm install -g typescript

To do this, we open the command line in an empty folder and digit inside it the command npm init. We answer to wizard questions and, at the end of the process, we will find in the folder the file package.json, which contains all needed settings. Usually, in the same folder, I run the command git init to initialize git and git remote add to configure the remote repository of the code.

Now we should modify the file package.json. First of all, the scripts section, we insert a section “build”: “tsc” to indicate that we will use the typescript compilator during the build stage.

We need then some dependencies for the development stage:

npm install @types/node –save-dev that is the definition of types for Typescript of Node;
npm install ts-node –save-dev that we will use to compile and run the code in node;
npm install typescript –save-dev;
npm install tslint –save-dev that is a linter for the Typescript language.

Since we are using Typescript, it’s necessary to add another configuration file, called tsconfig.json in the root directory. We will use only the bare minimum, that is options to compile and the path of source files (with extension .ts).

Let’s add now the configuration needed to execute the debug of the code inside Visual Studio Code. It’s possible to manually create this configuration, adding a folder .vscode and a file called launch.json to this folder

In this file, runtimeArgs serves to indicate to Node that ts-node must be loaded before the code runs.

Verify now if the configuration works, writing some code in index.ts and running a debug session in VS Code

Finish the configuration with the installation of:

npm install aws-sdk –save, thanks to which we will be able to use the functions of Rekognition;
npm install @google-cloud/vision. for the Google SDK;
npm install –save request request-promise, to directly execute REST calls (in case of Azure);
npm install –save-dev @types/request, to have in Typescript the request types;
npm install –save-dev @types/request-promise, to have in Typescript request-promise types.

Configuration and use of Azure Computer Vision

From Azure portal, it’s possible to extract the service endpoint and the two secret keys. To simplify the use, we can insert a configuration file:

export let config = {
    azureVisionConfig: {
        azureEndPoint: "https://westeurope.api.cognitive.microsoft.com/vision/v1.0/",
        azureKey1: "yourkey1",
        azureKey2: "yourkey2",
    },
};

The official documentation of the API shows that http calls to be run are of type:

We need then an efficient system to build the query string, with which we interrogate the sevice. We create an interface to build request paramenters:

The documentation shows also the json returned from a request. We can convert this result in an interface to typify the response:

export interface ICategory {
    name: string;
    score: number;
}
export interface IImageType {
    clipArtType: number;
    lineDrawingType: number;
}
export interface ITag {
    name: string;
    confidence: number;
}
export interface ICaption {
    text: string;
    confidence: number;
}
export interface IDescription {
    tags: string[];
    captions: ICaption[];
}
export interface IMetadata {
    width: number;
    height: number;
    format: string;
}
export interface IAzureCognitiveServiceResponse {
    categories: ICategory[];
    imageType: IImageType;
    tags: ITag[];
    description: IDescription;
    faces: any[];
    requestId: string;
    metadata: IMetadata;
}

Create then an unique method in a class, that will take in input the path of a local file and an object of AzureRequestParameters type and that returns a IAzureCognitiveServiceResponse. The library request-promise allows us to create an async method, that returns a typified Promise.

A simple example of call is:

const responseAzure1 = helperAzure.AnalyzeImage("../images/laurie.jpg", new AzureRequestParameters({
    language: "en",
    visualFeatures: ["Faces", "ImageType"],
})).then((data: IAzureCognitiveServiceResponse) => {
    data.faces.forEach( (face) => {
        console.log(face);
    });
});

The image passed in input is as below:

This is the analysis result:

You only need to slightly modify the request, in order to try to identify a celebrity in the image:

The answer is that, with a confidence level over 99%, in the picture appears the actor Hugh Laurie.

For example, we consider the following image:

The following search parameters provide in output a description of the elements in the picture.

Configuration and use of Amazon Rekognition

From the Amazon portal is possible to extract the Access Key and the Secret Access Key that will be used to validate the calls to Rekognition library. The complete documentation is available at this address.

It’s possible to install these credential on our computer, in a file globally available for all our projects or create a file, that contains the keys, in the single project. In this article, we will use the latter option.

Create a Typescript class (called AWSRekognition), that contains the authentication code (pay attention to indicate the IdentityPoolID, as required in Rekognition documentation) and all methods to analyze images.

import * as AWS from "aws-sdk";
 
export class AWSRekognition {
 
    private rekognition: AWS.Rekognition;
 
    constructor() {
        AWS.config.region = "eu-west-1"; // Region
        AWS.config.credentials = new AWS.CognitoIdentityCredentials({
          IdentityPoolId: "eu-west-YOURPOOLID",
        });
        AWS.config.loadFromPath("./src/configuration/credentials.json");
        this.rekognition = new AWS.Rekognition();
    }
}

The first method, we are going to add to this class, analyzes a local image file and try to extract information on faces possibly contained in it through Rekognition. First of all, we read the file and convert it in a string on base64. Then we convert this string in a Buffer, that is an integer array. Rekognition accepts this format for images in input. We create then a helper function, that performs the convertion.

import * as fs from "fs";
 
export function readImage(path: string): Buffer {
   const fileData = fs.readFileSync(path).toString("base64");
   return new Buffer(fileData, "base64");
}

The method that Rekognition uses is the following instead:

public DetectFacesOnLocalImage(filePath: string):
            Promise<PromiseResult<AWS.Rekognition.DetectFacesResponse, AWS.AWSError>> {
            const buffer = readImage(filePath);
            const params = {
                Attributes: [
                    "ALL",
                ],
                Image: {
                Bytes: buffer,
                },
            };
            return this.rekognition.detectFaces(params).promise();
}

The method returns a typed Promise, that we can wait in index.ts. Information passed in output are extremely detailed. The maximum estimated age is equal to 52 years.

import {AWSRekognition} from "./aws-rekognition";
 
const helperAWS: AWSRekognition = new AWSRekognition();
const response = helperAWS.DetectFacesOnLocalImage("./images/laurie.jpg").then(
    (data) => {
        console.log(data.FaceDetails[0].AgeRange.High);
}).catch((err) => {
    console.error(err);
});

The same method can be simplified if the image has been uploaded in a bucket S3. The image shows in the web console AWS the content of a bucket called immaginisalvatore. Simply pass the bucket name and the image key to the method detectFaces.

public DetectFacesOnS3Images(bucketName: string, key: string):
       Promise<PromiseResult<AWS.Rekognition.DetectFacesResponse, AWS.AWSError>> {
    const params = {
        Attributes: [
            "ALL",
        ],
        Image: {
            S3Object: {
            Bucket: bucketName,
            Name: key,
            },
        },
    };
    return this.rekognition.detectFaces(params).promise();
}

We can try to ask to Rekognition if it is able to associate the face to a celebrity:

public DetectCelebritiesOnS3Image(bucketName: string, key: string) {
    const params = {
        Image: {
            S3Object: {
                Bucket: bucketName,
                Name: key,
            },
        },
    };
    return this.rekognition.recognizeCelebrities(params).promise();

const response = helperAWS.DetectCelebritiesOnS3Image("immaginisalvatore", "laurie.jpg").then(
    (data) => {
        console.log(data.CelebrityFaces[0].Name + " " + data.CelebrityFaces[0].MatchConfidence);
}).catch((err) => {
    console.error(err);
});

The answer is clear: the image shows the actor Hugh Laurie with a confidence of 99,99%.

This is instead the method we can invoke to analyze the image of the table prepared for dinner:

public DetectLabelsOnS3Image(bucketName: string, key: string):
    Promise<PromiseResult<AWS.Rekognition.DetectLabelsResponse, AWS.AWSError>> {
    const params = {
        Image: {
            S3Object: {
                Bucket: bucketName,
                Name: key,
            },
        },
    };
    return this.rekognition.detectLabels(params).promise();
}

Identified elements are the following:

const response = helperAWS.DetectLabelsOnS3Image("immaginisalvatore", "cena.jpg").then(
    (data) => {
        data.Labels.forEach((label) => {
            console.log(label.Name + ": " + label.Confidence);
        });
}).catch((err) => {
    console.error(err);
});

Configuration and use of Google Vision

The API in Google Vision require:

the creation of a Google Cloud project;
the abilitation in the project of Vision API;
the creation of a json file, that contains the security keys and the projectId.

Unfortunately, files for the definition of types in the SDK, we installed in setup stage, still don’t exist. The class that uses the API is the following:

The code to analyze the image of Hugh Laurie is the following:

The result is as follows (including the detail on the face elements)

The call to analyze the table for dinner is:

And this is the output detail:

Conclusion

We tested the Typescript code on Node.js, that uses the Javascript SDK of AI, provided by main Cloud providers. The simplicity in use and the abundance of methods and information returned, make it an instrument of infinite potentiality. For example, in an application client Angular (or React) or in a serverless scenario, where the execution of a function in the cloud is triggered from the upload of an image on a storage, and this function uses the AI to moderate the image itself. We would like to remind you the Cloud offers to analyze text inside images is equally rich.

You can find the code here: https://github.com/sorrentmutie/image-recognition

Written by

Salvatore Sorrentino

See author's posts