AI glossary

Stay sharp on the AI and tech curve with clear definitions of terms used on the site.

A

Active learning

A subset of machine learning (ML) in which AI models can interactively query a human annotator to label the most useful data points for it to learn from, driving better accuracy rates by using smaller training data if the model can choose data that best serves its learning goals

Agentic AI

An AI system composed of multiple AI agents operating within an agentic workflow, where each agent completes a subtask autonomously to collectively mimic human decision-making in real-time and achieve the overall goal

AI workflow

A type of work process, AI workflow incorporates AI-driven technologies to streamline and orchestrate the steps needed to build, deploy, and maintain a functioning AI system for business operations

Algorithm

A set of instructions enabling artificial intelligence systems such as machine learning models, rule-based engines, or computer vision models to complete a task by processing data, recognizing patterns, and making decisions

Artificial intelligence (AI)

A branch of computer science, an artificial intelligence (AI) system simulates human intelligence through machines built with hardware, software, data, and rule-based systems to accomplish learning, problem-solving or decision-making tasks

Augmented reality

A technology that overlays computer-generated elements onto real-world environments to enhance what users see and experience; augmented reality can be improved by integrating AI subfields such as computer vision, machine learning, generative AI, and natural language processing

Autonomous system

A system that can make decisions in real time without human intervention, often using a models-in-the-loop (MIL) setup where AI models are actively involved in the process—for example, in self-driving vehicles

Autophagous loop

A self-referential cycle where generative AI models trained on data produced by previous generations of the model

B

BigCode

An open scientific collaboration that focuses on the responsible development and use of large language models (LLM) for coding purposes

Black box

A part of the AI system where users are unable to know how its internal processes work to deliver results, despite having visibility into the system's input and output

Bounding box

A rectangular area used to define an object within an image or a space, typically used in computer vision models for object detection

C

Chemin Logic

A proprietary holistic AI solution encompassing data, model, workflow design, and skill stacks for seamless AI adoption

Classification

A data annotation task that assigns a label to different objects within a specific category

Claude

A family of multimodal models excelling at natural language processing (NLP) to accomplish tasks such as summarizing documents, long form text generation, or code writing

Clustering

An unsupervised learning technique that enables AI systems to categorize and effectively interpret information without prior training, in which algorithms use structures in datasets to identify data patterns and relationships

CodeLlama

A large language model (LLM) derived from Llama 2, it is fine-tuned for programming tasks through extensive training on code-specific datasets, allowing it to generate both code and its natural language descriptions

Codestral

A generative AI model specifically built for code generation tasks and trained on over 80+ programming languages, enabling developers to work across diverse coding environments and projects

Computer vision

A field in AI that enables computer systems to interpret and understand objects in a digital image and videos in performing tasks requiring visual perception

Copilot

A virtual assistant partnering human operators in AI workflows to facilitate decision-making, provide supporting information, or bridge a gap in improving operational efficiencies

D

DALL-E

A generative AI model that creates images from text descriptions by leveraging deep learning techniques to translate textual prompts into visual representations

Data annotation

A tag for an item in a dataset—also called an AI annotation, is typically created by human annotators using tools like bounding boxes, labels, or text descriptions

Data augmentation

A process of expanding a dataset by adding new synthetic data or modifying existing data points to diversify training data available to machine learning models, improving the model's ability to generalize to new scenarios

Data labeling

A process of assigning names to data following a defined schema, enabling AI models to identify and learn patterns that are key to delivering the desired output

Data pipeline

A set of processes to transform raw data into a desired format which includes stages of data collection, data processing by annotators, and validation by industry experts before being ingested by the machine learning (ML) model

Data validation

A process to ensure the dataset meets the pre-defined criteria, is consistent, and stays true to context, which can be carried out via human-led assessments or system assisted validation

Dataset

A collection of data with unique characteristics, specific to an AI use case for training AI models to perform based on their intended purposes

DeepSeek

A company that develops large language models (LLM) with availability in Chinese and English

Docker container

A packaging format that packages up an application's code and all its dependencies in a standard format to be able to run across diverse computing environments

DynamoDB

A database service that is highly scalable with its dynamic data management that automatically distributes the traffic over its servers

E

Edge case

A scenario that is outside a system's normal parameters; in the context of artificial intelligence (AI), edge cases usually refer to situations that the AI model did not encounter before and will require human intervention

Entity recognition

A part of natural language processing (NLP), it is also known as named entity recognition (NER) and involves extracting information from large chunks of text by detecting and categorizing pre-defined crucial information within the text

EvalPlus

An open-source AI tool made to improve the evaluation of large language models, especially in code-related tasks

Expert-generated data

A type of dataset with data points curated by industry-vetted experts for training and testing of machine learning (ML) models

F

Fine-tuning

A method of improving the performance of an existing AI model by adjusting its internal parameters using new or more relevant data to optimize its performance on a specific task; with experts utilizing techniques such as preferred fine-tuning (PEFT), supervised fine-tuning (SFT), or verbalization fine-tuning (VLT)

First-token accuracy

A performance metric used in sequence generation tasks that measures the time it takes for an AI model to process a prompt; in AI terms, generate the first token of its output

Five-shot learning

A technique in AI model training, it stems from the few-shot learning concept and refers to the selected number of training examples given to the model

Fresh data loop

A concept that involves a continuous cycle of data collection, processing, and learning by machine learning (ML) models to act on updated data

Fully synthetic loop

A concept that involves AI models learning from data that is artificially generated, such as through algorithms and simulations, that mirror real-world scenarios

G

Gemini Flash

A member of the Google Gemini 2.0 series of natively multimodal models with the capability to understand input from text, images, audio, and video files to generate content-related outputs

Gemma

A family of lightweight, open-source small language models derived from Google's Gemini technology, designed for efficient text-based AI tasks across devices, with specialized variants for coding, data retrieval, vision-language processing, and recurrent generation

Generative AI

A class of AI systems capable of creating new and original content such as text, videos, or images, based on patterns learned from specific training data

Generative design

A design process powered by generative algorithms and AI to create visual solutions based on a set of input goals, constraints, and parameters

Generative pre-trained transformers (GPT)

A family of AI models based on a transformer architecture capable of understanding human language to generate human-like responses

Ground truth

A source of information that is certified to be true, supported by direct observation and measurement, as opposed to inference-led labels

H

HITL feedback loop

A method in training AI models, known as human-in-the-loop feedback that integrates human insight, experience, and judgement within the processes of machine learning (ML) systems to ensure AI outputs are sound and accurate

Hyper-parameter optimization (HPO)

A process of finding the optimal configuration to train an AI model, the hyper-parameter optimization exercise requires expert knowledge and experimentation to find the optimal setting for the model to achieve peak performance as these settings cannot be estimated from data

J

Javascript

A high-level programming language known for use in creating interactive and dynamic content on web pages

JSON

A data interchange format that is text-based derived from the JavaScript syntax, JSON (JavaScript Object Notation) is language-independent, making it an ideal choice for exchanging data across diverse programming language and platforms

L

Labeled data

A process to assign specific names to data points using annotation tools based on context for the purpose of effective model training

Large language models (LLM)

A type of language model, LLMs are trained on vast amounts of text and conversations to generate human-like responses, often using techniques like retrieval-augmented generation (RAG) to access relevant external information in real time

LiveCodeBench

An evaluation benchmark of LLMs for code in areas such as code generation, self-repair, code execution, and test output prediction

Llama

A family of LLMs by Meta, Llama (Large Language Model Meta AI) are multimodal models, able to understand and analyze text and images to generate text-based responses according to specific use cases

M

Machine learning

A subfield of AI that applies algorithms to enable computers to improve performance on a task through learning by analyzing patterns in data without explicit programming instructions

MaLLaM

A large language model (LLM) that can process Bahasa Melayu and its nuances such as slangs, colloquialisms, and diverse dialects across states

Metadata

A unit of information about data describing relationships between multiple components within a dataset

Mistral

A company that develops large language models (LLMs) for a range of purposes, including general-purpose text-based models, specialized models tailored to specific domains, and fully open-source research models that users can fine-tune for their own needs

ML model benchmarking

A process to assess an AI model's performance using standardized methods, metrics, and datasets to determine its accuracy, reliability, and efficiency

Model autophagy disorder (MAD)

A situation where generative AI trained on its own generated data, in the absence of fresh real-world data, becomes problematic due to the AI model echoing earlier mistakes and delivering flawed outputs

Model bias

A systematic and repeatable error in machine learning (ML) models that produces skewed or unfair outcomes, often arising from factors such as selection bias, algorithmic bias, or prejudicial bias

Model evaluation

A validation process based on use cases to ensure the AI model behaves as expected, in terms of output adherence, correctness, and safety

Model training

A process to teach AI models to deliver specific outcomes through data labeling and human-in-the-loop (HITL) feedback

Model-assisted labeling

A collaborative exercise between human annotators and pre-trained machine learning (ML) models to expedite the data labeling task

Multi-model comparison

A process of comparing different AI models side-by-side using Chemin's proprietary evaluation tool, with use case-specific metrics to determine the model best suited for the task

Multimodal

A characteristic of AI systems where a single machine learning (ML) model can process varying types of data including text, image, audio, video, and interpret it to produce a specified output

N

Natural language processing

An area of AI that studies interactions between human and computers that enable machine learning (ML) models to understand, interpret, and respond to human conversations in a natural and meaningful way

O

Off-the-shelf datasets

A set of ready datasets curated by Chemin's experts to accelerate the model training process for quick deployment

OpenAI

A venture that develops AI technologies and is behind popular models such as GPT, ChatGPT, Whisper, DALL·E, and Sora

Optical character recognition (OCR)

A technique to identify characters within images or documents and convert them into machine-friendly text formats

P

pass@k metric

A metric to check if an AI model is able to produce at least one correct code solution out of k tries (number of successful outcomes in a specific number of trials) for a given problem

Pixtral

A large language model (LLM) developed by Mistral AI, it is able to understand both natural images and text-based documents to deliver tasks including answering questions, following instructions, understanding figures and charts, and multimodal reasoning

Polygon

An annotation method of outlining objects in images using a series of connected points to create a multi-sided shape, it is useful in capturing irregular shapes for computer vision models

Pre-curated data

A component in the data curation process that involves datasets that have been organized, processed, and validated to be ready for training machine learning (ML) models

Pre-trained model

A model trained on vast amounts of data for a certain use case and can be fine-tuned to perform specific tasks

Programmatic labeling

A data labeling method where human experts create labeling functions and utilize computational models to auto-label large datasets

Prompt engineering

A process of using language to create inputs that guide AI models to generate desired responses

Python

A programming language popular for its simplicity and readability, leveraged by AI professionals for areas such as machine learning (ML), data analysis, and natural language processing (NLP)

Q

Qwen

A family of generative AI models by AliBaba group catering to a diverse range of domains and tasks including language, vision-language, coding, mathematics, and audio-related tasks

R

Ready-made models

An AI model that is designed for a domain's use case which can be further fine-tuned to fit a more specific task within the domain

Red-teaming

A practice involving individual experts assessing the security protocols, red-teaming often involves testing AI infrastructures against simulated real-world attacks to provide insights for security readiness, and improve the organization's resilience against potential threats

Reinforcement learning

A subfield of machine learning (ML) that focuses on enabling AI agents to make sequential decisions by continuously exploring their environment and receiving rewards for correct actions, thereby reinforcing desired behaviors

Reinforcement learning from AI feedback (RLAIF)

A learning process, reinforcement learning from AI feedback (RLAIF) involves AI models making decisions and refines its output by interpreting feedback within its own environment

Reinforcement learning with human feedback (RLHF)

A technique contributing to AI model's learning, reinforcement learning with human feedback (RLHF) involves including human feedback and insights to guide the algorithms

Responsible AI

An umbrella term for AI governance comprising the areas of AI risk management, AI safety, as well as ethics and societal impact at every stage of AI development and deployment

Ruby

A general-purpose programming language that is object-oriented and dynamic, designed with the intention to make it act as a buffer between human programmers and the underlying computing machinery

Rust

A programming language that is known to be fast, user-friendly, and can be used to build a wide range of applications—from web servers to game engines

S

SahabatAI

A collection of open-source large language models (LLMs) with a focus on Bahasa Indonesia and its regional languages

Sailor

A family of open language models tailored for Southeast Asian (SEA) languages that are pre-trained from Qwen language models to understand and generate text in various SEA languages including Thai, Indonesian, Vietnamese, Lao, and Malay

Sandbox training

A virtual space for training annotators on data annotation tasks without the risk of affecting actual systems

SEA-LION

A family of open-source large language models (LLMs), Southeast Asian Languages in One Network (SEA-LION) develops models focused on the understanding of Southeast Asia's diverse languages, cultures, and contexts

SeaLLMs

A family of large language models (LLMs) optimized for languages used in Southeast Asia regions including Chinese, Indonesian, Vietnamese, Thai, Tagalog, Malay, Burmese, Khmer, Lao, Tamil, and Javanese

Semantic segmentation

A process of assigning class labels to pixels in images and is one of the sub-categories of image segmentation to enable computer vision models to better understand and interpret visual information

State-of-the-art (SOTA)

A broad term known as "state-of-the-art" and in AI, SOTA refers to models that set the benchmark in performance, often validated through peer-reviewed research or demonstrated in machine learning (ML) competitions, as they excel in key metrics such as speed, accuracy, and resource efficiency

Supervised fine-tuning

A process that takes a pre-trained model and expands its training on a new dataset for the model to adapt and perform a specific task, supervised fine-tuning is often used to improve efficiency of AI models for specialized applications

Supervised learning

A technique in machine learning (ML) where a model trains on labeled data to make predictions or decisions when given new input data

Supervised spot checking

A form of quality assurance where supervised machine learning (ML) models are evaluated with human reviewers validating a sample of the model's predictions

Synthetic augmentation loop

A type of training dataset made up of both synthetic data and a fixed set of real data for training the next iteration of an AI model

Synthetic data

Artificially generated data used to train models when real data is limited or unavailable—often created using methods like generative adversarial networks (GANs)

Synthetic labeling

A data labeling method where AI models generate new data labels from pre-existing datasets

T

Temperature

A parameter in AI that controls the model's randomness of outputs, wherein a lower temperature produces predictable outputs while higher temperatures lead to more creative outputs

Throughput

A general term to mean amount of material passing through a system; and in AI, throughput typically refers to data or number of tasks a model can handle at any one time

Tokens

A fundamental unit of data that an AI system can process and is usually reflected in the form of a character or a word

Training datasets

A type of dataset used as a foundational material to shape AI systems' capacity to recognize patterns, interpret data, and make decisions

Transfer learning

A machine learning (ML) technique that uses a pre-trained model and adapts it to a new but related task, transferring knowledge from one context to another

TypeScript

A programming language based on JavaScript with optional static type definitions, open-source, and commonly used for developing large-scale web applications

U

Unfiltered data

A description of data in free-form formats that has not been identified, filtered, or organized—in other words, raw data

Unlabeled data

A description of data referring to information without proper identifiers which can impact the performance of AI models

Unsupervised learning

A machine learning (ML) method where AI models identify patterns, structures, or relationships in unlabeled data without explicit output instructions

V

Visual Learning Models (VLM)

A group of multimodal models, visual learning models or vision language models (VLMs) are able to learn from images and text, and generate text outputs with some being able to capture spatial properties in an image, output bounding boxes when prompted, localize different entities, or answer questions regarding their relative positions

Z

Zero-shot prompt

A prompt technique where the AI model receives no prior examples, thereby testing its ability to perform solely based on the input prompt—a demonstration of zero-shot learning capabilities

Drive plans toward progress

Catalyze AI growth with the latest, advanced technologies. Visualize your idea to life with our no-cost proof of value.