Fluent Python: Clear, Concise, and Effective Programming
Rate it:
Open Preview
9%
Flag icon
The automatic handling of missing keys is not triggered because pattern matching always uses the d.get(key, sentinel) method—where the default sentinel is a special marker value that cannot occur in user data.
9%
Flag icon
Therefore, they all share the limitation that the keys must be hashable (the values need not be hashable, only the keys).
9%
Flag icon
An object is hashable if it has a hash code which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash code.
9%
Flag icon
Numeric types and flat immutable types str and bytes are all hashable. Container types are hashable if they are immutable and all contained objects are also hashable. A frozenset is always hashable, because every element it contains must be hashable by definition. A tuple is hashable only if all its items are hashable.
9%
Flag icon
User-defined types are hashable by default because their hash code is their id(), and the __eq__() method inherited from the object class simply compares the object IDs.
9%
Flag icon
The way d.update(m) handles its first argument m is a prime example of duck typing: it first checks whether m has a keys method and, if it does, assumes it is a mapping. Otherwise, update() falls back to iterating over m, assuming its items are (key, value) pairs.
10%
Flag icon
In other words, the end result of this line… my_dict.setdefault(key, []).append(new_value) …is the same as running… if key not in my_dict:     my_dict[key] = [] my_dict[key].append(new_value)
10%
Flag icon
Sometimes it is convenient to have mappings that return some made-up value when a missing key is searched. There are two main approaches to this: one is to use a defaultdict instead of a plain dict. The other is to subclass dict or any other mapping type and add a __missing__ method.
10%
Flag icon
Here is how it works: when instantiating a defaultdict, you provide a callable to produce a default value whenever __getitem__ is passed a nonexistent key argument.
10%
Flag icon
The callable that produces the default values is held in an instance attribute named default_factory.
10%
Flag icon
The default_factory of a defaultdict is only invoked to provide default values for __getitem__ calls, and not for the other methods. For example, if dd is a defaultdict, and k is a missing key, dd[k] will call the default_factory to create a default value, but dd.get(k) still returns None, and k in dd is False.
10%
Flag icon
This method is not defined in the base dict class, but dict is aware of it: if you subclass dict and provide a __missing__ method, the standard dict.__getitem__ will call it whenever a key is not found, instead of raising KeyError.
10%
Flag icon
A search like k in my_dict.keys() is efficient in Python 3 even for very large mappings because dict.keys() returns a view, which is similar to a set, as we’ll see in “Set Operations on dict Views”. However, remember that k in my_dict does the same job, and is faster because it avoids the attribute lookup to find the .keys method.
10%
Flag icon
Now that the built-in dict also keeps the keys ordered since Python 3.6, the most common reason to use OrderedDict is writing code that is backward compatible with earlier Python versions.
10%
Flag icon
The regular dict was designed to be very good at mapping operations. Tracking insertion order was secondary.
10%
Flag icon
OrderedDict was designed to be good at reordering operations. Space efficiency, iteration speed, and the performance of update operations were secondary.
10%
Flag icon
A ChainMap instance holds a list of mappings that can be searched as one. The lookup is performed on each input mapping in the order it appears in the constructor call, and succeeds as soon as the key is found in one of those mappings.
10%
Flag icon
Updates or insertions to a ChainMap only affect the first input mapping.
10%
Flag icon
It’s better to create a new mapping type by extending collections.UserDict rather than dict.
11%
Flag icon
The dict instance methods .keys(), .values(), and .items() return instances of classes called dict_keys, dict_values, and dict_items, respectively. These dictionary views are read-only projections of the internal data structures used in the dict implementation.
11%
Flag icon
A view object is a dynamic proxy. If the source dict is updated, you can immediately see the changes through an existing view.
11%
Flag icon
The dict_values class is the simplest dictionary view—it implements only the __len__, __iter__, and __reversed__ special methods.
11%
Flag icon
Keys must be hashable objects. They must implement proper __hash__ and __eq__ methods
11%
Flag icon
Item access by key is very fast. A dict may have millions of keys, but Python can locate a key directly by computing the hash code of the key and deriving an index offset into the hash table, with the possible overhead of a small number of tries to find a matching entry.
11%
Flag icon
Key ordering is preserved as a side effect of a more compact memory layout for dict in CPython 3.6, which became an official language feature in 3.7.
11%
Flag icon
Despite its new compact layout, dicts inevitably have a significant memory overhead. The most compact internal data structure for a container would be an array of pointers to the items.8 Compared to that, a hash table needs to store more data per entry, and Python needs to ...
This highlight has been truncated due to consecutive passage length restrictions.
11%
Flag icon
To save memory, avoid creating instance attributes outside of ...
This highlight has been truncated due to consecutive passage length restrictions.
11%
Flag icon
A set is a collection of unique objects.
11%
Flag icon
Set elements must be hashable. The set type is not hashable, so you can’t build a set with nested set instances. But frozenset is hashable, so you can have frozenset elements inside a set.
11%
Flag icon
Don’t forget that to create an empty set, you should use the constructor without an argument: set(). If you write {}, you’re creating an empty dict—this hasn’t changed in Python 3.
11%
Flag icon
Set elements must be hashable objects. They must implement proper __hash__ and __eq__ methods
11%
Flag icon
Membership testing is very efficient. A set may have millions of elements, but an element can be located directly by computing its hash code and deriving an index offset, with the possible overhead of a small number of tries to find a matching element or exhaust the search.
11%
Flag icon
Sets have a significant memory overhead, compared to a low-level array pointers to its elements—which would be more compact but also much slower to search beyond a handful of elements.
12%
Flag icon
The concept of “string” is simple enough: a string is a sequence of characters.
12%
Flag icon
The identity of a character—its code point—is a number from 0 to 1,114,111 (base 10), shown in the Unicode standard as 4 to 6 hex digits with a “U+” prefix, from U+0000 to U+10FFFF.
12%
Flag icon
The actual bytes that represent a character depend on the encoding in use. An encoding is an algorithm that converts code points to byte sequences and vice versa.
12%
Flag icon
Converting from code points to bytes is encoding; converting from bytes to code points is decoding.
13%
Flag icon
Creating a bytes or bytearray object from any buffer-like source will always copy the bytes. In contrast, memoryview objects let you share memory between binary data structures,
13%
Flag icon
When converting text to bytes, if a character is not defined in the target encoding, UnicodeEncodeError will be raised, unless special handling is provided by passing an errors argument to the encoding method or function.
13%
Flag icon
ASCII is a common subset to all the encodings that I know about, therefore encoding should always work if the text is made exclusively of ASCII characters.
13%
Flag icon
Not every byte holds a valid ASCII character, and not every byte sequence is valid UTF-8 or UTF-16; therefore, when you assume one of these encodings while converting a binary sequence to text, you will get a UnicodeDecodeError if unexpected bytes are found.
13%
Flag icon
UTF-8 is the default source encoding for Python 3, just as ASCII was the default for Python 2.
13%
Flag icon
How do you find the encoding of a byte sequence? Short answer: you can’t.
13%
Flag icon
The way UTF-8 was designed, it’s almost impossible for a random sequence of bytes, or even a nonrandom sequence of bytes coming from a non-UTF-8 encoding, to be decoded accidentally as garbage in UTF-8, instead of raising UnicodeDecodeError.
13%
Flag icon
To avoid confusion, the UTF-16 encoding prepends the text to be encoded with the special invisible character ZERO WIDTH NO-BREAK SPACE (U+FEFF). On a little-endian system, that is encoded as b'\xff\xfe' (decimal 255, 254).
13%
Flag icon
One big advantage of UTF-8 is that it produces the same byte sequence regardless of machine endianness, so no BOM is needed.
13%
Flag icon
This UTF-8 encoding with BOM is called UTF-8-SIG in Python’s codec registry. The character U+FEFF encoded in UTF-8-SIG is the three-byte sequence b'\xef\xbb\xbf'.
13%
Flag icon
Python scripts can be made executable in Unix systems if they start with the comment: #!/usr/bin/env python3.
13%
Flag icon
The best practice for handling text I/O is the “Unicode sandwich” (Figure 4-2).5 This means that bytes should be decoded to str as early as possible on input (e.g., when opening a file for reading). The “filling” of the sandwich is the business logic of your program, where text handling is done exclusively on str objects. You should never be encoding or decoding in the middle of other processing. On output, the str are encoded to bytes as late as possible.
13%
Flag icon
Code that has to run on multiple machines or on multiple occasions should never depend on encoding defaults. Always pass an explicit encoding= argument when opening text files, because the default may change from one machine to the next, or from one day to the next.