More on this book
Community
Kindle Notes & Highlights
Started reading
May 16, 2024
Attributes with a single _ prefix are called “protected” in some corners of the Python documentation.
if you define a class attribute named __slots__ holding a sequence of attribute names, Python uses an alternative storage model for the instance attributes: the attributes named in __slots__ are stored in a hidden array or references that use less memory than a dict.
If you declare __slots__ = () (an empty tuple), then the instances of the subclass will have no __dict__ and will only accept the attributes named in the __slots__ of the base class.
Careless optimization is worse than premature optimization: you add complexity but may not get any benefit.
An object should be as simple as the requirements dictate—and not a parade of language features. If the code is for an application, then it should focus on what is needed to support the end users, not more. If the code is for a library for other programmers to use, then it’s reasonable to implement special methods supporting behaviors that Pythonistas expect. For example, __eq__ may not be necessary to support a business requirement, but it makes the class easier to test.
String/bytes representation methods: __repr__, __str__, __format__, and __bytes__
Methods for reducing an object to a number: __abs__, __bool__, and __hash__
The __eq__ operator, to support testing and hashing (al...
This highlight has been truncated due to consecutive passage length restrictions.
N-dimensional vectors (with large values of N) are widely used in information retrieval, where documents and text queries are represented as vectors, with one dimension per word. This is called the Vector space model. In this model, a key relevance metric is the cosine similarity (i.e., the cosine of the angle between the vector representing the query and the vector representing the document). As the angle decreases, the cosine approaches the maximum value of 1, and so does the relevance of the document to the query.
In the context of object-oriented programming, a protocol is an informal interface, defined only in documentation and not in code. For example, the sequence protocol in Python entails just the __len__ and __getitem__ methods.
In other words, indices exposes the tricky logic that’s implemented in the built-in sequences to gracefully handle missing or negative indices and slices that are longer than the original sequence. This method produces “normalized” tuples of nonnegative start, stop, and stride integers tailored to a sequence of the given length.
The key idea is to reduce a series of values to a single value. The first argument to reduce() is a two-argument function, and the second argument is an iterable. Let’s say we have a two-argument function fn and a list lst. When you call reduce(fn, lst), fn will be applied to the first pair of elements—fn(lst[0], lst[1])—producing a first result, r1. Then fn is applied to r1 and the next element—fn(r1, lst[2])—producing a second result, r2. Now fn(r2, lst[3]) is called to produce r3 … and so on until the last element, when a single result, rN, is returned.
When using reduce, it’s good practice to provide the third argument, reduce(function, iterable, initializer), to prevent this exception: TypeError: reduce() of empty sequence with no initial value (excellent message: explains the problem and how to fix it). The initializer is the value returned if the sequence is empty and is used as the first argument in the reducing loop, so it should be the identity value of the operation. As examples, for +, |, ^ the initializer should be 0, but for *, & it should be 1.
The mapping step produces one hash for each component, and the reduce step aggregates all hashes with the xor operator.
The solution with map would be less efficient in Python 2, where the map function builds a new list with the results. But in Python 3, map is lazy: it creates a generator that yields the results on demand, thus saving memory—just
The zip function is named after the zipper fastener because the physical device works by interlocking pairs of teeth taken from both zipper sides, a good visual analogy for what zip(left, right) does.
One of them is the zip built-in, which makes it easy to iterate in parallel over two or more iterables by returning tuples that you can unpack into variables, one for each item in the parallel inputs.
That’s why I believe “informal interface” is a reasonable short explanation for “protocol”
When implementing a class that emulates any built-in type, it is important that the emulation only be implemented to the degree that it makes sense for the object being modeled. For example, some sequences may work well with retrieval of individual elements, but extracting a slice may not make sense.
One thing I know: “idiomatic” does not mean using the most obscure language features.
If you want the sum of a list of items, you should write it in a way that looks like “the sum of a list of items,” not in a way that looks like “loop over these items, maintain another variable t, perform a sequence of additions.” Why do we have high-level languages if not to express our intentions at a higher level and let the language worry about what low-level operations are needed to implement it?
Program to an interface, not an implementation.
The best approach to understanding a type in Python is knowing the methods it provides—its interface—as
Duck typing Python’s default approach to typing from the beginning.
Goose typing The approach supported by abstract base classes (ABCs) since Python 2.6, which relies on runtime checks of objects against ABCs.
Static typing The traditional approach of statically-typed languages like C and Java; supported since Python 3.5 by the typing module, and enforced by external type checkers compliant with PEP 484—Type Hints.
Static duck typing An approach made popular by the Go language; supported by subclasses of typing.Protocol—new in Python 3.8—also enforced by external type checkers.
object protocol specifies methods which an object must provide to fulfill a role.
Dynamic protocol The informal protocols Python always had. Dynamic protocols are implicit, defined by convention, and described in the documentation. Python’s most important dynamic protocols are supported by the interpreter itself, and are documented in the “Data Model” chapter of The Python Language Reference.
An object may implement only part of a dynamic protocol and still be useful; but to fulfill a static protocol, the object must provide every method declared in the protocol class, even if your program doesn’t need them all.
Static protocols can be verified by static type checkers, but dynamic protocols can’t.
The philosophy of the Python Data Model is to cooperate with essential dynamic protocols as much as possible.
Most ABCs in the collections.abc module exist to formalize interfaces that are implemented by built-in objects and are implicitly supported by the interpreter—both of which predate the ABCs themselves. The ABCs are useful as starting points for new classes, and to support explicit type checking at runtime (a.k.a. goose typing) as well as type hints for static type checkers.
In summary, given the importance of sequence-like data structures, Python manages to make iteration and the in operator work by invoking __getitem__ when __iter__ and __contains__ are unavailable.
Monkey patching is dynamically changing a module, class, or function at runtime, to add features or fix bugs.
When you follow established protocols, you improve your chances of leveraging existing standard library and third-party code, thanks to duck typing.
Many bugs cannot be caught except at runtime—even in mainstream statically typed languages.3 In a dynamically typed language, “fail fast” is excellent advice for safer and easier-to-maintain programs. Failing fast means raising runtime errors as soon as possible, for example, rejecting invalid arguments right a the beginning of a function body.
An abstract class represents an interface. Bjarne Stroustrup,
Python doesn’t have an interface keyword. We use abstract base classes (ABCs) to define interfaces for explicit type checking at runtime—also supported by static type checkers.
Abstract base classes complement duck typing by providing a way to define interfaces when other techniques like hasattr() would be clumsy or subtly wrong (for example, with magic methods). ABCs introduce virtual subclasses, which are classes that don’t inherit from a class but are still recognized by isinstance() and issubclass();
What goose typing means is: isinstance(obj, cls) is now just fine…as long as cls is an abstract base class—in other words, cls’s metaclass is abc.ABCMeta.
And, don’t define custom ABCs (or metaclasses) in production code. If you feel the urge to do so, I’d bet it’s likely to be a case of the “all problems look like a nail”–syndrome for somebody who just got a shiny new hammer—you (and future maintainers of your code) will be much happier sticking with straightforward and simple code,
goose typing entails: Subclassing from ABCs to make it explict that you are implementing a previously defined interface. Runtime type checking using ABCs instead of concrete classes as the second argument for isinstance and issubclass.
even with ABCs, you should beware that excessive use of isinstance checks may be a code smell—a symptom of bad OO design.
It’s usually not OK to have a chain of if/elif/elif with isinstance checks performing different actions depending on the type of object: you should be using polymorphism for that—i.e., design your classes so that the interpreter dispatches calls to the proper methods, instead of you hardcoding the dispatch logic in if/elif/elif blocks.
On the other hand, it’s OK to perform an isinstance check against an ABC if you must enforce an API contract: “Dude, you have to im...
This highlight has been truncated due to consecutive passage length restrictions.
ABCs are meant to encapsulate very general concepts, abstractions, introduced by a framework—things like “a sequence” and “an exact number.” [Readers] most likely don’t need to write any new ABCs, just use existing ones correctly, to get 99.9% of the benefits without serious risk of misdesign.
Python does not check for the implementation of the abstract methods at import time (when the frenchdeck2.py module is loaded and compiled), but only at runtime when we actually try to instantiate FrenchDeck2.
Iterable, Container, Sized Every collection should either inherit from these ABCs or implement compatible protocols. Iterable supports iteration with __iter__, Container supports the in operator with __contains__, and Sized supports len() with __len__.
Collection This ABC has no methods of its own, but was added in Python 3.6 to make it easier to subclass from Iterable, Container, and Sized.