Posts by Tags

Adadelta

Optimizers

3 minute read

Published:

Optimizers update model parameters using gradients. The optimizer matters, but it is only one part of the recipe: initialization, normalization, batch size, learning-rate schedule, warmup, gradient clipping, weight decay, and data quality often matter just as much.

Adafactor

Optimizers

3 minute read

Published:

Optimizers update model parameters using gradients. The optimizer matters, but it is only one part of the recipe: initialization, normalization, batch size, learning-rate schedule, warmup, gradient clipping, weight decay, and data quality often matter just as much.

Adagrad

Optimizers

3 minute read

Published:

Optimizers update model parameters using gradients. The optimizer matters, but it is only one part of the recipe: initialization, normalization, batch size, learning-rate schedule, warmup, gradient clipping, weight decay, and data quality often matter just as much.

Adam

Optimizers

3 minute read

Published:

Optimizers update model parameters using gradients. The optimizer matters, but it is only one part of the recipe: initialization, normalization, batch size, learning-rate schedule, warmup, gradient clipping, weight decay, and data quality often matter just as much.

AdamW

Optimizers

3 minute read

Published:

Optimizers update model parameters using gradients. The optimizer matters, but it is only one part of the recipe: initialization, normalization, batch size, learning-rate schedule, warmup, gradient clipping, weight decay, and data quality often matter just as much.

BatchNorm

ML: Normalization

2 minute read

Published:

Normalization makes optimization easier by controlling the scale and distribution of activations, features, weights, or gradients. The right normalization depends on the architecture and batch regime.

CBOW

Embeddings

3 minute read

Published:

Machine learning models do not understand raw text directly. They need text to be converted into numeric vectors. An embedding is a learned vector representation for a token, word, sentence, document, image, or other object.

CatBoost

Ensemble Methods

2 minute read

Published:

Ensembles combine multiple models to improve generalization. The main idea is to reduce variance, bias, or both by making predictions from many weaker learners.

Decision Tree

Ensemble Methods

2 minute read

Published:

Ensembles combine multiple models to improve generalization. The main idea is to reduce variance, bias, or both by making predictions from many weaker learners.

Dropout

Regularization

3 minute read

Published:

Regularization is any training choice that helps a model generalize instead of only memorizing the training set. It can be a penalty in the loss, noise during training, constraints on parameters, better data augmentation, or a validation-based stopping rule.

Early Stopping

Regularization

3 minute read

Published:

Regularization is any training choice that helps a model generalize instead of only memorizing the training set. It can be a penalty in the loss, noise during training, constraints on parameters, better data augmentation, or a validation-based stopping rule.

Evaluation

Modern ML Practice in 2026

3 minute read

Published:

This is a short snapshot of practical machine learning work as of 2026. The biggest shift is that many projects now start from pretrained foundation models, but the hard work is still data, evaluation, latency, cost, reliability, and deployment.

FastText

Embeddings

3 minute read

Published:

Machine learning models do not understand raw text directly. They need text to be converted into numeric vectors. An embedding is a learned vector representation for a token, word, sentence, document, image, or other object.

Fine-tuning

Modern ML Practice in 2026

3 minute read

Published:

This is a short snapshot of practical machine learning work as of 2026. The biggest shift is that many projects now start from pretrained foundation models, but the hard work is still data, evaluation, latency, cost, reliability, and deployment.

GloVe

Embeddings

3 minute read

Published:

Machine learning models do not understand raw text directly. They need text to be converted into numeric vectors. An embedding is a learned vector representation for a token, word, sentence, document, image, or other object.

Gradient Boosting

Ensemble Methods

2 minute read

Published:

Ensembles combine multiple models to improve generalization. The main idea is to reduce variance, bias, or both by making predictions from many weaker learners.

GroupNorm

ML: Normalization

2 minute read

Published:

Normalization makes optimization easier by controlling the scale and distribution of activations, features, weights, or gradients. The right normalization depends on the architecture and batch regime.

InPlace-ABN

ML: Normalization

2 minute read

Published:

Normalization makes optimization easier by controlling the scale and distribution of activations, features, weights, or gradients. The right normalization depends on the architecture and batch regime.

L1

Regularization

3 minute read

Published:

Regularization is any training choice that helps a model generalize instead of only memorizing the training set. It can be a penalty in the loss, noise during training, constraints on parameters, better data augmentation, or a validation-based stopping rule.

L2

Regularization

3 minute read

Published:

Regularization is any training choice that helps a model generalize instead of only memorizing the training set. It can be a penalty in the loss, noise during training, constraints on parameters, better data augmentation, or a validation-based stopping rule.

LLM

Modern ML Practice in 2026

3 minute read

Published:

This is a short snapshot of practical machine learning work as of 2026. The biggest shift is that many projects now start from pretrained foundation models, but the hard work is still data, evaluation, latency, cost, reliability, and deployment.

Label Smoothing

Regularization

3 minute read

Published:

Regularization is any training choice that helps a model generalize instead of only memorizing the training set. It can be a penalty in the loss, noise during training, constraints on parameters, better data augmentation, or a validation-based stopping rule.

LayerNorm

ML: Normalization

2 minute read

Published:

Normalization makes optimization easier by controlling the scale and distribution of activations, features, weights, or gradients. The right normalization depends on the architecture and batch regime.

LightGBM

Ensemble Methods

2 minute read

Published:

Ensembles combine multiple models to improve generalization. The main idea is to reduce variance, bias, or both by making predictions from many weaker learners.

Multimodal

Modern ML Practice in 2026

3 minute read

Published:

This is a short snapshot of practical machine learning work as of 2026. The biggest shift is that many projects now start from pretrained foundation models, but the hard work is still data, evaluation, latency, cost, reliability, and deployment.

One-hot Vectors

Embeddings

3 minute read

Published:

Machine learning models do not understand raw text directly. They need text to be converted into numeric vectors. An embedding is a learned vector representation for a token, word, sentence, document, image, or other object.

PEFT

Modern ML Practice in 2026

3 minute read

Published:

This is a short snapshot of practical machine learning work as of 2026. The biggest shift is that many projects now start from pretrained foundation models, but the hard work is still data, evaluation, latency, cost, reliability, and deployment.

RAG

Modern ML Practice in 2026

3 minute read

Published:

This is a short snapshot of practical machine learning work as of 2026. The biggest shift is that many projects now start from pretrained foundation models, but the hard work is still data, evaluation, latency, cost, reliability, and deployment.

Embeddings

3 minute read

Published:

Machine learning models do not understand raw text directly. They need text to be converted into numeric vectors. An embedding is a learned vector representation for a token, word, sentence, document, image, or other object.

RMSNorm

ML: Normalization

2 minute read

Published:

Normalization makes optimization easier by controlling the scale and distribution of activations, features, weights, or gradients. The right normalization depends on the architecture and batch regime.

RMSprop

Optimizers

3 minute read

Published:

Optimizers update model parameters using gradients. The optimizer matters, but it is only one part of the recipe: initialization, normalization, batch size, learning-rate schedule, warmup, gradient clipping, weight decay, and data quality often matter just as much.

Random Forest

Ensemble Methods

2 minute read

Published:

Ensembles combine multiple models to improve generalization. The main idea is to reduce variance, bias, or both by making predictions from many weaker learners.

SGD

Optimizers

3 minute read

Published:

Optimizers update model parameters using gradients. The optimizer matters, but it is only one part of the recipe: initialization, normalization, batch size, learning-rate schedule, warmup, gradient clipping, weight decay, and data quality often matter just as much.

Skip-gram

Embeddings

3 minute read

Published:

Machine learning models do not understand raw text directly. They need text to be converted into numeric vectors. An embedding is a learned vector representation for a token, word, sentence, document, image, or other object.

SparseAdam

Optimizers

3 minute read

Published:

Optimizers update model parameters using gradients. The optimizer matters, but it is only one part of the recipe: initialization, normalization, batch size, learning-rate schedule, warmup, gradient clipping, weight decay, and data quality often matter just as much.

SpectralNorm

ML: Normalization

2 minute read

Published:

Normalization makes optimization easier by controlling the scale and distribution of activations, features, weights, or gradients. The right normalization depends on the architecture and batch regime.

Transformers

Embeddings

3 minute read

Published:

Machine learning models do not understand raw text directly. They need text to be converted into numeric vectors. An embedding is a learned vector representation for a token, word, sentence, document, image, or other object.

Weight Decay

Regularization

3 minute read

Published:

Regularization is any training choice that helps a model generalize instead of only memorizing the training set. It can be a penalty in the loss, noise during training, constraints on parameters, better data augmentation, or a validation-based stopping rule.

Word2Vec

Embeddings

3 minute read

Published:

Machine learning models do not understand raw text directly. They need text to be converted into numeric vectors. An embedding is a learned vector representation for a token, word, sentence, document, image, or other object.

XGBoost

Ensemble Methods

2 minute read

Published:

Ensembles combine multiple models to improve generalization. The main idea is to reduce variance, bias, or both by making predictions from many weaker learners.

asyncio

Python in Detail

3 minute read

Published:

This note reviews Python details that matter in day-to-day ML engineering: the data model, dictionaries, memory management, and concurrency choices.

dictionary

Python in Detail

3 minute read

Published:

This note reviews Python details that matter in day-to-day ML engineering: the data model, dictionaries, memory management, and concurrency choices.

elu

Activation Functions

1 minute read

Published:

Activation functions introduce nonlinearity. Without them, a deep network is still equivalent to one linear transformation.

gc

Python in Detail

3 minute read

Published:

This note reviews Python details that matter in day-to-day ML engineering: the data model, dictionaries, memory management, and concurrency choices.

gelu

Activation Functions

1 minute read

Published:

Activation functions introduce nonlinearity. Without them, a deep network is still equivalent to one linear transformation.

leaky-relu

Activation Functions

1 minute read

Published:

Activation functions introduce nonlinearity. Without them, a deep network is still equivalent to one linear transformation.

multiprocessing

Python in Detail

3 minute read

Published:

This note reviews Python details that matter in day-to-day ML engineering: the data model, dictionaries, memory management, and concurrency choices.

python

Python in Detail

3 minute read

Published:

This note reviews Python details that matter in day-to-day ML engineering: the data model, dictionaries, memory management, and concurrency choices.

relu

Activation Functions

1 minute read

Published:

Activation functions introduce nonlinearity. Without them, a deep network is still equivalent to one linear transformation.

sigmoid

Activation Functions

1 minute read

Published:

Activation functions introduce nonlinearity. Without them, a deep network is still equivalent to one linear transformation.

silu

Activation Functions

1 minute read

Published:

Activation functions introduce nonlinearity. Without them, a deep network is still equivalent to one linear transformation.

softmax

Activation Functions

1 minute read

Published:

Activation functions introduce nonlinearity. Without them, a deep network is still equivalent to one linear transformation.

swish

Activation Functions

1 minute read

Published:

Activation functions introduce nonlinearity. Without them, a deep network is still equivalent to one linear transformation.

tanh

Activation Functions

1 minute read

Published:

Activation functions introduce nonlinearity. Without them, a deep network is still equivalent to one linear transformation.

threading

Python in Detail

3 minute read

Published:

This note reviews Python details that matter in day-to-day ML engineering: the data model, dictionaries, memory management, and concurrency choices.