Introduction

Overview of some core capabilities of artificial intelligence (AI)

AI Workloads (Common)

Generative AI and agents
Natural language processing (NLP) and text analytics
Speech
Computer vision
Information extraction

Generative AI

Generative AI is a branch of AI that enables software applications to generate new content; often natural language dialogs, but also images, video, code, and other formats. The ability to generate content is based on a language model, which has been trained with huge volumes of data.

There are large language models (LLMs) and small language models (SLMs) - the difference is based on the volume of data and the number of variables in the model.

LLMs are powerful and generalize well, but can be more costly to train and use.
SLMs tend to work well in scenarios that are more focused on specific topic areas or that require easily deployed small models for local applications and agents on devices.

Natural language processing (NLP)

Enables computers to understand, interpret, and generate human language (text or speech) in a meaningful way.

Ex: Analysing document, calls and identify important information. Answering frequently answered questions.

Computer Vision

Computer vision is the area of artificial intelligence that deals with the analysis of visual input; such as photographs, videos, and live camera feeds. Computer vision is accomplished by using large numbers of images to train a model.

Speech recognition & Speech synthesis

Speech recognition is the ability of AI to “hear” and interpret speech. Usually this capability takes the form of speech-to-text (where the audio signal for the speech is transcribed into text).

Speech synthesis is the ability of AI to vocalize words as spoken language. Usually this capability takes the form of text-to-speech in which information in text format is converted into an audible signal.

Automated transcription of calls or meetings.
Automating audio descriptions of video or text.

AI Applications

An AI application is a software solution that uses AI techniques—such as computer vision, speech, and information extraction—to perform tasks that typically require human-like intelligence.

Components of AI Apploication

Data Layer: collection, storage, and management of data used for training, inference, and decision-making. Ex: Azure SQL, CosmoDB or data lake
Model Layer: involves the selection, training, and deployment of machine learning or AI models. Ex: Azure OpenAI custom build Azure ML etc
Computer Layer: AI applications require compute resources to train and run models. Ex: Azure functions, Azure Kubernetes service, Azure app service etc
Integration & Orchestration Layer: The integration and orchestration layer connects models and data with business logic and user interfaces. Ex: SDK’s & API for imntegrating AI Capabilities into application.

AI application is systems designed to perform tasks that typically require human intelligence.

Key AI workloads:

Generative AI
Agents and automation
Speech
Text analysis
Computer Vision
Information Extraction

All these workloads are built on the foundation of machine learning.

AI applications are:

Model-powered: They use trained models to process inputs and generate outputs, such as text, images, or decisions.
Dynamic: Unlike static programs, AI apps can improve over time through retraining or fine-tuning.

Machine Learning

learn from data and improve over time without being explicitly programmed. Machine learning (ML) is the primary method we use to reach AI and is made possible by data-driven algorithms.

Types of ML:

Supervised and Unsupervised Learning: such as regression (supervised) for predicting prices, classification (supervised) for spam detection, and clustering (unsupervised) for customer segmentation.
Deep Learning: A specialized branch of ML using neural networks with multiple layers for tasks like image recognition and speech synthesis. Deep learning provides the foundation through neural networks that learn complex patterns from massive datasets.
Generative AI: uses deep learning capabilities to create new content—text, images, audio, code—rather than just classify or predict outcomes.

Information extraction

The basis for most document analysis solutions is a computer vision technology called optical character recognition (OCR), which can identify the location of text in an image. OCR is often combined with an analytical model that can interpret individual values in the document, and so extract specific fields.

Ex: processing an expense claim, Identifying key points and follow-up actions from meeting transcripts or recordings.