Python in Detail
Published:
This note reviews Python details that matter in day-to-day ML engineering: the data model, dictionaries, memory management, and concurrency choices.
Updated: May 24, 2026.
Data Model
Everything in Python is an object: integers, functions, classes, modules, and exceptions. Names are references to objects, not boxes containing values.
Practical consequences:
- Mutable objects such as
list,dict, andsetcan be changed in place. - Immutable objects such as
int,float,str,tuple, andfrozensetcannot be changed in place. - Default mutable function arguments are shared across calls, so use
Noneand create the object inside the function. - Most Python APIs rely on protocols such as iteration, context management, descriptors, and the data model methods like
__len__,__iter__, and__getitem__.
Dictionary
Python dictionaries are hash tables. Keys must be hashable, which usually means immutable and with a stable __hash__.
Important properties:
- Average lookup, insert, and delete are
O(1). - Since Python 3.7, insertion order is part of the language guarantee.
- Dictionary views such as
dict.keys()are dynamic views, not copied lists. - Do not mutate a dictionary while iterating over it unless you iterate over a copied list of keys.
Dictionaries are fast, but performance can degrade with many hash collisions. For ML workloads, dictionaries are often used for vocabularies, feature maps, metadata, caching, and JSON-like records.
Memory Management
CPython primarily uses reference counting. When an object’s reference count reaches zero, it can be deallocated immediately. CPython also has a cyclic garbage collector for objects that reference each other.
Useful habits:
- Use context managers for files, locks, database connections, and GPU-related resources.
- Prefer generators or streaming datasets when full materialization is unnecessary.
- Watch accidental references in global caches, closures, notebooks, and logging.
- Use
tracemalloc, profilers, and small reproducible scripts for memory leaks.
In ML code, memory pressure often comes from arrays, tensors, dataloaders, pinned memory, cached batches, or keeping computation graphs alive by storing tensors that still require gradients.
Threading, Multiprocessing, Asyncio
Threading is useful for I/O-bound work and for libraries that release the GIL. It is not usually the best path for pure Python CPU-bound loops.
Multiprocessing uses separate processes. It is better for CPU-bound work, but it has serialization and startup costs. For model training, process-based parallelism is common in data loading and distributed training.
Asyncio is cooperative concurrency for many waiting tasks, such as network calls, streaming APIs, or service orchestration. It does not make CPU-bound Python code faster by itself.
Current Python Notes
As of 2026, Python 3.14 is the current feature line. The biggest trend is optional free-threaded CPython, which can run without the traditional GIL in supported builds. This is promising for CPU parallelism, but it does not remove the need for correct locking, and extension libraries must be compatible.
Python 3.14 also adds standard-library support around multiple interpreters, improved tooling, and new syntax/features such as template strings. For production ML work, the conservative choice is still to match the Python version supported by PyTorch, TensorFlow, CUDA wheels, and deployment infrastructure.
Further reading:
