Posts by Tags

Optimizers

3 minute read

Published: November 20, 2023

Optimizers update model parameters using gradients. The optimizer matters, but it is only one part of the recipe: initialization, normalization, batch size, learning-rate schedule, warmup, gradient clipping, weight decay, and data quality often matter just as much.

Optimizers

3 minute read

Published: November 20, 2023

Optimizers update model parameters using gradients. The optimizer matters, but it is only one part of the recipe: initialization, normalization, batch size, learning-rate schedule, warmup, gradient clipping, weight decay, and data quality often matter just as much.

Optimizers

3 minute read

Published: November 20, 2023

Optimizers update model parameters using gradients. The optimizer matters, but it is only one part of the recipe: initialization, normalization, batch size, learning-rate schedule, warmup, gradient clipping, weight decay, and data quality often matter just as much.

Optimizers

3 minute read

Published: November 20, 2023

Optimizers update model parameters using gradients. The optimizer matters, but it is only one part of the recipe: initialization, normalization, batch size, learning-rate schedule, warmup, gradient clipping, weight decay, and data quality often matter just as much.

Optimizers

3 minute read

Published: November 20, 2023

Optimizers update model parameters using gradients. The optimizer matters, but it is only one part of the recipe: initialization, normalization, batch size, learning-rate schedule, warmup, gradient clipping, weight decay, and data quality often matter just as much.

ML: Normalization

2 minute read

Published: November 20, 2023

Normalization makes optimization easier by controlling the scale and distribution of activations, features, weights, or gradients. The right normalization depends on the architecture and batch regime.

Embeddings

3 minute read

Published: December 06, 2023

Machine learning models do not understand raw text directly. They need text to be converted into numeric vectors. An embedding is a learned vector representation for a token, word, sentence, document, image, or other object.

Ensemble Methods

2 minute read

Published: November 20, 2023

Ensembles combine multiple models to improve generalization. The main idea is to reduce variance, bias, or both by making predictions from many weaker learners.

Ensemble Methods

2 minute read

Published: November 20, 2023

Ensembles combine multiple models to improve generalization. The main idea is to reduce variance, bias, or both by making predictions from many weaker learners.

Regularization

3 minute read

Published: December 06, 2023

Regularization is any training choice that helps a model generalize instead of only memorizing the training set. It can be a penalty in the loss, noise during training, constraints on parameters, better data augmentation, or a validation-based stopping rule.

Regularization

3 minute read

Published: December 06, 2023

Regularization is any training choice that helps a model generalize instead of only memorizing the training set. It can be a penalty in the loss, noise during training, constraints on parameters, better data augmentation, or a validation-based stopping rule.

Modern ML Practice in 2026

3 minute read

Published: May 24, 2026

This is a short snapshot of practical machine learning work as of 2026. The biggest shift is that many projects now start from pretrained foundation models, but the hard work is still data, evaluation, latency, cost, reliability, and deployment.

Embeddings

3 minute read

Published: December 06, 2023

Machine learning models do not understand raw text directly. They need text to be converted into numeric vectors. An embedding is a learned vector representation for a token, word, sentence, document, image, or other object.

Modern ML Practice in 2026

3 minute read

Published: May 24, 2026

This is a short snapshot of practical machine learning work as of 2026. The biggest shift is that many projects now start from pretrained foundation models, but the hard work is still data, evaluation, latency, cost, reliability, and deployment.

Embeddings

3 minute read

Published: December 06, 2023

Machine learning models do not understand raw text directly. They need text to be converted into numeric vectors. An embedding is a learned vector representation for a token, word, sentence, document, image, or other object.

Ensemble Methods

2 minute read

Published: November 20, 2023

Ensembles combine multiple models to improve generalization. The main idea is to reduce variance, bias, or both by making predictions from many weaker learners.

ML: Normalization

2 minute read

Published: November 20, 2023

Normalization makes optimization easier by controlling the scale and distribution of activations, features, weights, or gradients. The right normalization depends on the architecture and batch regime.

ML: Normalization

2 minute read

Published: November 20, 2023

Normalization makes optimization easier by controlling the scale and distribution of activations, features, weights, or gradients. The right normalization depends on the architecture and batch regime.

Regularization

3 minute read

Published: December 06, 2023

Regularization is any training choice that helps a model generalize instead of only memorizing the training set. It can be a penalty in the loss, noise during training, constraints on parameters, better data augmentation, or a validation-based stopping rule.

Regularization

3 minute read

Published: December 06, 2023

Regularization is any training choice that helps a model generalize instead of only memorizing the training set. It can be a penalty in the loss, noise during training, constraints on parameters, better data augmentation, or a validation-based stopping rule.

Modern ML Practice in 2026

3 minute read

Published: May 24, 2026

This is a short snapshot of practical machine learning work as of 2026. The biggest shift is that many projects now start from pretrained foundation models, but the hard work is still data, evaluation, latency, cost, reliability, and deployment.

Regularization

3 minute read

Published: December 06, 2023

Regularization is any training choice that helps a model generalize instead of only memorizing the training set. It can be a penalty in the loss, noise during training, constraints on parameters, better data augmentation, or a validation-based stopping rule.

ML: Normalization

2 minute read

Published: November 20, 2023

Normalization makes optimization easier by controlling the scale and distribution of activations, features, weights, or gradients. The right normalization depends on the architecture and batch regime.

Ensemble Methods

2 minute read

Published: November 20, 2023

Ensembles combine multiple models to improve generalization. The main idea is to reduce variance, bias, or both by making predictions from many weaker learners.

Modern ML Practice in 2026

3 minute read

Published: May 24, 2026

This is a short snapshot of practical machine learning work as of 2026. The biggest shift is that many projects now start from pretrained foundation models, but the hard work is still data, evaluation, latency, cost, reliability, and deployment.

Embeddings

3 minute read

Published: December 06, 2023

Machine learning models do not understand raw text directly. They need text to be converted into numeric vectors. An embedding is a learned vector representation for a token, word, sentence, document, image, or other object.

Modern ML Practice in 2026

3 minute read

Published: May 24, 2026

This is a short snapshot of practical machine learning work as of 2026. The biggest shift is that many projects now start from pretrained foundation models, but the hard work is still data, evaluation, latency, cost, reliability, and deployment.

Modern ML Practice in 2026

3 minute read

Published: May 24, 2026

This is a short snapshot of practical machine learning work as of 2026. The biggest shift is that many projects now start from pretrained foundation models, but the hard work is still data, evaluation, latency, cost, reliability, and deployment.

Embeddings

3 minute read

Published: December 06, 2023

Machine learning models do not understand raw text directly. They need text to be converted into numeric vectors. An embedding is a learned vector representation for a token, word, sentence, document, image, or other object.

ML: Normalization

2 minute read

Published: November 20, 2023

Normalization makes optimization easier by controlling the scale and distribution of activations, features, weights, or gradients. The right normalization depends on the architecture and batch regime.

Optimizers

3 minute read

Published: November 20, 2023

Optimizers update model parameters using gradients. The optimizer matters, but it is only one part of the recipe: initialization, normalization, batch size, learning-rate schedule, warmup, gradient clipping, weight decay, and data quality often matter just as much.

Ensemble Methods

2 minute read

Published: November 20, 2023

Ensembles combine multiple models to improve generalization. The main idea is to reduce variance, bias, or both by making predictions from many weaker learners.

Optimizers

3 minute read

Published: November 20, 2023

Optimizers update model parameters using gradients. The optimizer matters, but it is only one part of the recipe: initialization, normalization, batch size, learning-rate schedule, warmup, gradient clipping, weight decay, and data quality often matter just as much.

Embeddings

3 minute read

Published: December 06, 2023

Machine learning models do not understand raw text directly. They need text to be converted into numeric vectors. An embedding is a learned vector representation for a token, word, sentence, document, image, or other object.

Optimizers

3 minute read

Published: November 20, 2023

Optimizers update model parameters using gradients. The optimizer matters, but it is only one part of the recipe: initialization, normalization, batch size, learning-rate schedule, warmup, gradient clipping, weight decay, and data quality often matter just as much.

ML: Normalization

2 minute read

Published: November 20, 2023

Normalization makes optimization easier by controlling the scale and distribution of activations, features, weights, or gradients. The right normalization depends on the architecture and batch regime.

Embeddings

3 minute read

Published: December 06, 2023

Machine learning models do not understand raw text directly. They need text to be converted into numeric vectors. An embedding is a learned vector representation for a token, word, sentence, document, image, or other object.

Regularization

3 minute read

Published: December 06, 2023

Regularization is any training choice that helps a model generalize instead of only memorizing the training set. It can be a penalty in the loss, noise during training, constraints on parameters, better data augmentation, or a validation-based stopping rule.

Embeddings

3 minute read

Published: December 06, 2023

Machine learning models do not understand raw text directly. They need text to be converted into numeric vectors. An embedding is a learned vector representation for a token, word, sentence, document, image, or other object.

Ensemble Methods

2 minute read

Published: November 20, 2023

Ensembles combine multiple models to improve generalization. The main idea is to reduce variance, bias, or both by making predictions from many weaker learners.

Python in Detail

3 minute read

Published: November 20, 2023

This note reviews Python details that matter in day-to-day ML engineering: the data model, dictionaries, memory management, and concurrency choices.

Python in Detail

3 minute read

Published: November 20, 2023

This note reviews Python details that matter in day-to-day ML engineering: the data model, dictionaries, memory management, and concurrency choices.

Activation Functions

1 minute read

Published: November 20, 2023

Activation functions introduce nonlinearity. Without them, a deep network is still equivalent to one linear transformation.

Python in Detail

3 minute read

Published: November 20, 2023

This note reviews Python details that matter in day-to-day ML engineering: the data model, dictionaries, memory management, and concurrency choices.

Activation Functions

1 minute read

Published: November 20, 2023

Activation functions introduce nonlinearity. Without them, a deep network is still equivalent to one linear transformation.

Activation Functions

1 minute read

Published: November 20, 2023

Activation functions introduce nonlinearity. Without them, a deep network is still equivalent to one linear transformation.

Python in Detail

3 minute read

Published: November 20, 2023

This note reviews Python details that matter in day-to-day ML engineering: the data model, dictionaries, memory management, and concurrency choices.

Python in Detail

3 minute read

Published: November 20, 2023

This note reviews Python details that matter in day-to-day ML engineering: the data model, dictionaries, memory management, and concurrency choices.

Activation Functions

1 minute read

Published: November 20, 2023

Activation functions introduce nonlinearity. Without them, a deep network is still equivalent to one linear transformation.

Activation Functions

1 minute read

Published: November 20, 2023

Activation functions introduce nonlinearity. Without them, a deep network is still equivalent to one linear transformation.

Activation Functions

1 minute read

Published: November 20, 2023

Activation functions introduce nonlinearity. Without them, a deep network is still equivalent to one linear transformation.

Activation Functions

1 minute read

Published: November 20, 2023

Activation functions introduce nonlinearity. Without them, a deep network is still equivalent to one linear transformation.

Activation Functions

1 minute read

Published: November 20, 2023

Activation functions introduce nonlinearity. Without them, a deep network is still equivalent to one linear transformation.

Activation Functions

1 minute read

Published: November 20, 2023

Activation functions introduce nonlinearity. Without them, a deep network is still equivalent to one linear transformation.

Python in Detail

3 minute read

Published: November 20, 2023

This note reviews Python details that matter in day-to-day ML engineering: the data model, dictionaries, memory management, and concurrency choices.

Rinat

Posts by Tags

Adadelta

Adafactor

Adagrad

Adam

AdamW

BatchNorm

CBOW

CatBoost

Decision Tree

Dropout

Early Stopping

Evaluation

FastText

Fine-tuning

GloVe

Gradient Boosting

GroupNorm

InPlace-ABN

L1

L2

LLM

Label Smoothing

LayerNorm

LightGBM

Multimodal

One-hot Vectors

PEFT

RAG

RMSNorm

RMSprop

Random Forest

SGD

Skip-gram

SparseAdam

SpectralNorm

Transformers

Weight Decay

Word2Vec

XGBoost

asyncio

dictionary

elu

gc

gelu

leaky-relu

multiprocessing

python

relu

sigmoid

silu

softmax

swish

tanh

threading