More on this book
Community
Kindle Notes & Highlights
Started reading
May 16, 2024
We can make patterns more specific by adding type information.
The expressions str(name) and float(lat) look like constructor calls, which we’d use to convert name and lat to str and float. But in the context of a pattern, that syntax performs a runtime type check: the preceding pattern will match a four-item sequence in which item 0 must be a str, and item 3 must be a pair of floats.
The *_ matches any number of items, without binding them to a variable. Using *extra instead of *_ would bind the items to extra as a list with 0 or more items.
Pattern matching is an example of declarative programming: the code describes “what” you want to match, instead of “how” to match it.
A common feature of list, tuple, str, and all sequence types in Python is the support of slicing operations, which are more powerful than most people realize.
The notation a:b:c is only valid within [] when used as the indexing or subscript operator, and it produces a slice object: slice(a, b, c). As we will see in “How Slicing Works”, to evaluate the expression seq[start:stop:step], Python calls seq.__getitem__(slice(start, stop, step)).
to evaluate a[i, j], Python calls a.__getitem__((i, j)).
Except for memoryview, the built-in sequence types in Python are one-dimensional, so they support only one index or slice, and not a tuple of them.
The ellipsis—written with three full stops (...) and not … (Unicode U+2026)—is recognized as a token by the Python parser. It is an alias to the Ellipsis object, the single instance of the ellipsis class.
Slices are not just useful to extract information from sequences; they can also be used to change mutable sequences in place—that is, without rebuilding them from scratch.
When the target of the assignment is a slice, the righthand side must be an iterable object, even if it has just one item.
Both + and * always create a new object, and never change their operands.
Beware of expressions like a * n when a is a sequence containing mutable items, because the result may surprise you. For example, trying to initialize a list of lists as my_list = [[]] * 3 will result in a list with three references to the same inner list, which is probably not what you want.
The special method that makes += work is __iadd__ (for “in-place addition”). However, if __iadd__ is not implemented, Python falls back to calling __add__.
Repeated concatenation of immutable sequences is inefficient, because instead of just appending new items, the interpreter has to copy the whole target sequence to create a new one with the new items concatenated.
Avoid putting mutable items in tuples.
Augmented assignment is not an atomic operation—we just saw it throwing an exception after doing part of its job.
Inspecting Python bytecode is not too difficult, and can be helpful to see what is...
This highlight has been truncated due to consecutive passage length restrictions.
The list.sort method sorts a list in place—that is, without making a copy. It returns None to remind us that it changes the receiver11 and does not create a new list.
Python API convention: functions or methods that change an object in place should return None to make it clear to the caller that the receiver was changed, and no new object was created.
In contrast, the built-in function sorted creates a new list and returns it. It accepts any iterable object as an argument, including immutable sequences and generators
Regardless of the type of iterable given to sorted, it always returns a newly created list.
The list type is flexible and easy to use, but depending on specific requirements, there are better options. For example, an array saves a lot of memory when you need to handle millions of floating-point values. On the other hand, if you are constantly adding and removing items from opposite ends of a list, it’s good to know that a deque (double-ended queue) is a more efficient FIFO14 data structure.
If your code frequently checks whether an item is present in a collection (e.g., item in my_collection), consider using a set for my_collection, especially if it holds a large number of items. Sets are optimized for fast membership checking.
A Python array is as lean as a C array.
The built-in memoryview class is a shared-memory sequence type that lets you handle slices of arrays without copying bytes.
A memoryview is essentially a generalized NumPy array structure in Python itself (without the math). It allows you to share memory between data-structures (things like PIL images, SQLite databases, NumPy arrays, etc.) without first copying. This is very important for large data sets.
Using notation similar to the array module, the memoryview.cast method lets you change the way multiple bytes are read or written as units without moving bits around. memoryview.cast returns yet another memoryview object, always sharing the same memory.
if you are doing advanced numeric processing in arrays, you should be using the NumPy libraries.
For advanced array and matrix operations, NumPy is the reason why Python became mainstream in scientific computing applications. NumPy implements multi-dimensional, homogeneous arrays and matrix types that hold not only numbers but also user-defined records, and provides efficient element-wise operations.
SciPy is a library, written on top of NumPy, offering many scientific computing algorithms from linear algebra, numerical calculus, and statistics. SciPy is fast and reliable because it leverages the widely used C and Fortran codebase from the Netlib Repository. In other words, SciPy gives scientists the best of both worlds: an interactive prompt and high-level Python APIs, together with industrial-strength number-crunching functions optimized in C and Fortran.
Most NumPy and SciPy functions are implemented in C or C++, and can leverage all CPU cores because they release Python’s GIL (Global Interpreter Lock).
The .append and .pop methods make a list usable as a stack or a queue (if you use .append and .pop(0), you get FIFO behavior). But inserting and removing from the head of a list (the 0-index end) is costly because the entire list must be shifted in memory.
The class collections.deque is a thread-safe double-ended queue designed for fast inserting and removing from both ends. It is also the way to go if you need to keep a list of “last seen items” or something of that nature, because a deque can be bounded—i.e., created with a fixed maximum length.
But there is a hidden cost: removing items from the middle of a deque is not as fast. It is really optimized for appending and popping from the ends.
The append and popleft operations are atomic, so deque is safe to use as a FIFO queue in multithreaded applications without the need for locks.
Python sequences are often categorized as mutable or immutable, but it is also useful to consider a different axis: flat sequences and container sequences.
Tuples in Python play two roles: as records with unnamed fields and as immutable lists.
Calling hash(t) on a tuple is a quick way to assert that its value is fixed. A TypeError will be raised if t contains mutable items.
Some objects contain references to other objects; these are called containers.
It is also more efficient because the key function is invoked only once per item, while the two-argument comparison is called every time the sorting algorithm needs to compare two items. Of course, Python also has to compare the keys while sorting, but that comparison is done in optimized C code and not in a Python function that you wrote.
The sorting algorithm used in sorted and list.sort is Timsort, an adaptive algorithm that switches from insertion sort to merge sort strategies, depending on how ordered the data is.
Python is basically dicts wrapped in loads of syntactic sugar.
Class and instance attributes, module namespaces, and function keyword arguments are some of the core Python constructs represented by dictionaries in memory.
The __builtins__.__dict__ stores all built-in types, objects, and functions.
Hash tables are the engines behind Python’s high-performance dicts.
Other built-in types based on hash tables are set and frozenset.
Python 3.9 supports using | and |= to merge mappings. This makes sense, since these are also the set union operators.
In contrast with sequence patterns, mapping patterns succeed on partial matches. In the doctests, the b1 and b2 subjects include a 'title' key that does not appear in any 'book' pattern, yet they match.
There is no need to use **extra to match extra key-value pairs, but if you want to capture them as a dict, you can prefix one variable with **. It must be the last in the pattern, and **_ is forbidden because it would be redundant.