Python in Detail

3 minute read

Published: November 20, 2023

This note reviews Python details that matter in day-to-day ML engineering: the data model, dictionaries, memory management, and concurrency choices.

Updated: May 24, 2026.

Data Model

Everything in Python is an object: integers, functions, classes, modules, and exceptions. Names are references to objects, not boxes containing values.

Practical consequences:

Mutable objects such as list, dict, and set can be changed in place.
Immutable objects such as int, float, str, tuple, and frozenset cannot be changed in place.
Default mutable function arguments are shared across calls, so use None and create the object inside the function.
Most Python APIs rely on protocols such as iteration, context management, descriptors, and the data model methods like __len__, __iter__, and __getitem__.

Dictionary

Python dictionaries are hash tables. Keys must be hashable, which usually means immutable and with a stable __hash__.

Important properties:

Average lookup, insert, and delete are O(1).
Since Python 3.7, insertion order is part of the language guarantee.
Dictionary views such as dict.keys() are dynamic views, not copied lists.
Do not mutate a dictionary while iterating over it unless you iterate over a copied list of keys.

Dictionaries are fast, but performance can degrade with many hash collisions. For ML workloads, dictionaries are often used for vocabularies, feature maps, metadata, caching, and JSON-like records.

Memory Management

CPython primarily uses reference counting. When an object’s reference count reaches zero, it can be deallocated immediately. CPython also has a cyclic garbage collector for objects that reference each other.

Useful habits:

Use context managers for files, locks, database connections, and GPU-related resources.
Prefer generators or streaming datasets when full materialization is unnecessary.
Watch accidental references in global caches, closures, notebooks, and logging.
Use tracemalloc, profilers, and small reproducible scripts for memory leaks.

In ML code, memory pressure often comes from arrays, tensors, dataloaders, pinned memory, cached batches, or keeping computation graphs alive by storing tensors that still require gradients.

Threading, Multiprocessing, Asyncio

Threading is useful for I/O-bound work and for libraries that release the GIL. It is not usually the best path for pure Python CPU-bound loops.

Multiprocessing uses separate processes. It is better for CPU-bound work, but it has serialization and startup costs. For model training, process-based parallelism is common in data loading and distributed training.

Asyncio is cooperative concurrency for many waiting tasks, such as network calls, streaming APIs, or service orchestration. It does not make CPU-bound Python code faster by itself.

Current Python Notes

As of 2026, Python 3.14 is the current feature line. The biggest trend is optional free-threaded CPython, which can run without the traditional GIL in supported builds. This is promising for CPU parallelism, but it does not remove the need for correct locking, and extension libraries must be compatible.

Python 3.14 also adds standard-library support around multiple interpreters, improved tooling, and new syntax/features such as template strings. For production ML work, the conservative choice is still to match the Python version supported by PyTorch, TensorFlow, CUDA wheels, and deployment infrastructure.

Share on

Twitter Facebook LinkedIn

Rinat

Python in Detail

Data Model

Dictionary

Memory Management

Threading, Multiprocessing, Asyncio

Current Python Notes

Share on

You May Also Enjoy

Modern ML Practice in 2026

Regularization

Embeddings

Optimizers