More on this book
Community
Kindle Notes & Highlights
Started reading
May 16, 2024
Premature abstraction is as bad as premature optimization.
The iceberg is called the Python Data Model, and it is the API that we use to make our own objects play well with the most idiomatic language features.
You can think of the data model as a description of Python as a framework. It formalizes the interfaces of the building blocks of the language itself, such as sequences, functions, iterators, coroutines, classes, context managers, and so on.
The special method names are always written with leading and trailing double underscores.
The term magic method is slang for special method,
“Dunder” is a shortcut for “double underscore before and after.” That’s why the special methods are also known as dunder methods.
“Any use of __*__ names, in any context, that does not follow explicitly documented use, is subject to breakage without warning.”
We use namedtuple to build classes of objects that are just bundles of attributes with no custom methods, like a database record.
The first thing to know about special methods is that they are meant to be called by the Python interpreter, and not by you.
Python variable-sized collections written in C include a struct2 called PyVarObject, which has an ob_size field holding the number of items in the collection.
The __repr__ special method is called by the repr built-in to get the string representation of the object for inspection. Without a custom __repr__, Python’s console would display a Vector instance <Vector object at 0x10e100070>.
The string returned by __repr__ should be unambiguous and, if possible, match the source code necessary to re-create the represented object.
In contrast, __str__ is called by the str() built-in and implicitly used by the print function. It should return a string suitable for display to end users.
Programmers with prior experience in languages with a toString method tend to implement __str__ and not __repr__. If you only implement one of these special methods in Python, choose __repr__.
By default, instances of user-defined classes are considered truthy, unless either __bool__ or __len__ is implemented. Basically, bool(x) calls x.__bool__() and uses the result. If __bool__ is not implemented, Python tries to invoke x.__len__(), and if that returns zero, bool returns False. Otherwise bool returns True.
Iterable to support for, unpacking, and other forms of iteration
Sized to support the len built-in function
Container to support the ...
This highlight has been truncated due to consecutive passage length restrictions.
Python does not require concrete classes to actually inherit from any of these ABCs. Any class that implements __len...
This highlight has been truncated due to consecutive passage length restrictions.
Sequence, formalizing the interface of built-ins like list and str
Mapping, implemented by dict, collections.defaultdict, etc.
Set, the interface of the set and frozenset...
This highlight has been truncated due to consecutive passage length restrictions.
Since Python 3.7, the dict type is officially “ordered,” but that only means that the key insertion order is preserved. You cannot rearrange the keys in a dict however you like.
By implementing special methods, your objects can behave like the built-in types, enabling the expressive coding style the community considers Pythonic.
In contrast, consider Go. Some objects in that language have features that are magic, in the sense that we cannot emulate them in our own user-defined types. For example, Go arrays, strings, and maps support the use brackets for item access, as in a[i]. But there’s no way to make the [] notation work with a new collection type that you define.
Even worse, Go has no user-level concept of an iterable interface or an iterator object, therefore its for/range syntax is limited to supporting five “magic” built-in types, including arrays, strings, and maps.
But I mention it because the term metaobject protocol is useful to think about the Python Data Model and similar features in other languages. The metaobject part refers to the objects that are the building blocks of the language itself. In this context, protocol is a synonym of interface. So a metaobject protocol is a fancy synonym for object model: an API for core language constructs.
A container sequence holds references to the objects it contains, which may be of any type, while a flat sequence stores the value of its contents in its own memory space, not as distinct Python objects.
flat sequences are more compact, but they are limited to holding primitive machine values like bytes, integers, and floats.
Every Python object in memory has a header with metadata. The simplest Python object, a float, has a value field and two metadata fields: ob_refcnt: the object’s reference count ob_type: a pointer to the object’s type ob_fval: a C double holding the value of the float
On a 64-bit Python build, each of those fields takes 8 bytes. That’s why an array of floats is much more compact than a tuple of floats: the array is a single object holding the raw values of the floats, while the tuple consists of several...
This highlight has been truncated due to consecutive passage length restrictions.
The built-in concrete sequence types do not actually subclass the Sequence and MutableSequence abstract base classes (ABCs), but they are virtual subclasses registered with those ABCs—as
you could also start from a listcomp, but a genexp (generator expression) saves memory because it yields items one by one using the iterator protocol instead of building a whole list just to feed another constructor.
If the two lists used in the Cartesian product had a thousand items each, using a generator expression would save the cost of building a list with a million items just to feed the for loop.
Tuples hold records: each item in the tuple holds the data for one field, and the position of the item gives its meaning.
The Python interpreter and standard library make extensive use of tuples as immutable lists, and so should you.
be aware that the immutability of a tuple only applies to the references contained in it. References in a tuple cannot be deleted or replaced. But if one of those references points to a mutable object, and that object is changed, then the value of the tuple changes.
an object is only hashable if its value cannot ever change. An unhashable tuple cannot be inserted as a dict key, or a set element.
If you want to determine explicitly if a tuple (or any object) has a fixed value, you can use the hash built-in
To evaluate a tuple literal, the Python compiler generates bytecode for a tuple constant in one operation; but for a list literal, the generated bytecode pushes each element as a separate constant to the data stack, and then builds the list.
Because of its fixed length, a tuple instance is allocated the exact memory space it needs. Instances of list, on the other hand, are allocated with room to spare, to amortize the cost of future appends.
The references to the items in a tuple are stored in an array in the tuple struct, while a list holds a pointer to an array of references stored elsewhere. The indirection is necessary because when a list grows beyond the space currently allocated, Python needs to reallocate the array of references to make room. The extra indirection makes CPU caches less effective.
The most visible form of unpacking is parallel assignment; that is, assigning items from an iterable to a tuple of variables,
The expression after the match keyword is the subject. The subject is the data that Python will try to match to the patterns in each case clause.
On the surface, match/case may look like the switch/case statement from the C language—but that’s only half the story.4 One key improvement of match over switch is destructuring—a more advanced form of unpacking.
Sequence patterns may be written as tuples or lists or any combination of nested tuples and lists, but it makes no difference which syntax you use: in a sequence pattern, square brackets and parentheses mean the same thing.
A sequence pattern can match instances of most actual or virtual subclasses of collections.abc.Sequence, with the exception of str, bytes, and bytearray.
Unlike unpacking, patterns don’t destructure iterables that are not sequences (such as iterators).
The _ symbol is special in patterns: it matches any single item in that position, but it is never bound to the value of the matched item.
You can bind any part of a pattern with a variable using the as keyword: case [name, _, _, (lat, lon) as coord]: