Fluent Python: Clear, Concise, and Effective Programming
Rate it:
Open Preview
14%
Flag icon
Do not open text files in binary mode unless you need to analyze the file contents to determine the encoding—even then, you should be using Chardet instead of reinventing the wheel
14%
Flag icon
Ordinary code should only use binary mode to open binary files, like raster images.
14%
Flag icon
locale.getpreferredencoding() is the most important setting.
14%
Flag icon
Text files use locale.getpreferredencoding() by default.
14%
Flag icon
If you omit the encoding argument when opening a file, the default is given by locale.getpreferredencoding()
14%
Flag icon
the encoding for standard I/O is UTF-8 for interactive I/O, or defined by locale.getpreferredencoding() if the output/input is redirected to/from a file.
14%
Flag icon
sys.getdefaultencoding() is used internally by Python in implicit conversions of binary data to/from str. Changing this setting is not supported.
14%
Flag icon
sys.getfilesystemencoding() is used to encode/decode filenames (not file contents). It is used when open() gets a str argument for the filename; if the filename is given as a byte...
This highlight has been truncated due to consecutive passage length restrictions.
14%
Flag icon
Therefore, the best advice about encoding defaults is: do not rely on them.
14%
Flag icon
String comparisons are complicated by the fact that Unicode has combining characters: diacritics and other marks that attach to the preceding character, appearing as one when printed.
14%
Flag icon
Normalization Form C (NFC) composes the code points to produce the shortest equivalent string, while NFD decomposes, expanding composed characters into base characters and separate combining characters.
14%
Flag icon
The other two normalization forms are NFKC and NFKD, where the letter K stands for “compatibility.” These are stronger forms of normalization, affecting the so-called “compatibility characters.”
14%
Flag icon
In the NFKC and NFKD forms, each compatibility character is replaced by a “compatibility decomposition” of one or more characters that are considered a “preferred” representation, even if there is some formatting loss—ideally, the formatting should be the responsibility of external markup, not part of Unicode.
14%
Flag icon
NFKC and NFKD normalization cause data loss and should be applied only in special cases like search and indexing, and not for permanent storage of text.
14%
Flag icon
Case folding is essentially converting all text to lowercase, with some additional transformations. It is supported by the str.casefold() method.
14%
Flag icon
NFC is the best normalized form for most applications. str.casefold() is the way to go for case-insensitive comparisons.
15%
Flag icon
The Unicode standard provides an entire database—in the form of several structured text files—that includes not only the table mapping code points to character names, but also metadata about the individual characters and how they are related. For example, the Unicode database records whether a character is printable, is a letter, is a decimal digit, or is some other numeric symbol. That’s how the str methods isalpha, isprintable, isdecimal, and isnumeric work. str.casefold also uses information from a Unicode table.
16%
Flag icon
If you build a regular expression with bytes, patterns such as \d and \w only match ASCII characters; in contrast, if these patterns are given as str, they match Unicode digits or letters beyond ASCII.
16%
Flag icon
As the world adopts Unicode, we need to keep the concept of text strings separated from the binary sequences that represent them in files, and Python 3 enforces this separation.
16%
Flag icon
Unicode provides multiple ways of representing some characters, so normalizing is a prerequisite for text matching.
16%
Flag icon
In memory, Python 3 stores each str as a sequence of code points using a fixed number of bytes per code point, to allow efficient direct access to any character or slice.
16%
Flag icon
Since Python 3.3, when creating a new str object, the interpreter checks the characters in it and chooses the most economic memory layout that is suitable for that particular str: if there are only characters in the latin1 range, that str will use just one byte per code point. Otherwise, two or four bytes per code point may be used, depending on the str.
16%
Flag icon
Data classes are like children. They are okay as a starting point, but to participate as a grownup object, they need to take some responsibility.
16%
Flag icon
Python offers a few ways to build a simple class that is just a collection of fields, with little or no extra functionality. That pattern is known as a “data class”—and dataclasses is one of the packages that supports this pattern.
17%
Flag icon
By default, @dataclass produces mutable classes. But the decorator accepts a keyword argument frozen—shown in Example 5-3. When frozen=True, the class will raise an exception if you try to assign a value to a field after the instance is initialized.
17%
Flag icon
Type hints—a.k.a. type annotations—are ways to declare the expected type of function arguments, return values, variables, and attributes.
17%
Flag icon
Think about Python type hints as “documentation that can be verified by IDEs and type checkers.” That’s because type hints have no impact on the runtime behavior of Python programs.
18%
Flag icon
This is normal Python behavior: regular instances can have their own attributes that don’t appear in the class.7
18%
Flag icon
Class attributes are often used as default attribute values for instances, including in data classes.
18%
Flag icon
The default_factory parameter lets you provide a function, class, or any other callable, which will be invoked with zero arguments to build a default value each time an instance of the data class is created.
18%
Flag icon
It’s good that @dataclass rejects class definitions with a list default value in a field. However, be aware that it is a partial solution that only applies to list, dict, and set. Other mutable values used as defaults will not be flagged by @dataclass. It’s up to you to understand the problem and remember to use a default factory to set mutable default values.
18%
Flag icon
Prior to Python 3.9, the built-in collections did not support generic type notation. As a temporary workaround, there are corresponding collection types in the typing module. If you need a parameterized list type hint in Python 3.8 or earlier, you must import the List type from typing and use it: List[str].
18%
Flag icon
Creating a hierarchy of data classes is usually a bad idea,
19%
Flag icon
The @dataclass decorator doesn’t care about the types in the annotations, except in two cases, and this is one of them: if the type is ClassVar, an instance field will not be generated for that attribute.
19%
Flag icon
The other case where the type of the field is relevant to @dataclass is when declaring init-only variables,
19%
Flag icon
These are classes that have fields, getting and setting methods for fields, and nothing else. Such classes are dumb data holders and are often being manipulated in far too much detail by other classes.
19%
Flag icon
A code smell is a surface indication that usually corresponds to a deeper problem in the system.
19%
Flag icon
The main idea of object-oriented programming is to place behavior and data together in the same code unit: a class. If a class is widely used but has no significant behavior of its own, it’s possible that code dealing with its instances is scattered (and even duplicated) in methods and functions throughout the system—a recipe for maintenance headaches.
20%
Flag icon
What makes City or any class work with positional patterns is the presence of a special class attribute named __match_args__, which the class builders in this chapter automatically create.
20%
Flag icon
You can combine keyword and positional arguments in a pattern. Some, but not all, of the instance attributes available for matching may be listed in __match_args__. Therefore, sometimes you may need to use keyword arguments in addition to positional arguments in a pattern.
20%
Flag icon
basic principle of object-oriented programming: data and the functions that touch it should be together in the same class. Classes with no logic may be a sign of misplaced logic.
20%
Flag icon
PEP 526 syntax for annotating instance and class attributes reverses the established convention of class statements: everything declared at the top-level of a class block was a class attribute (methods are class attributes, too). With PEP 526 and @dataclass, any attribute declared at the top level with a type hint becomes an instance attribute:
20%
Flag icon
Python variables are like reference variables in Java; a better metaphor is to think of variables as labels with names attached to objects.
20%
Flag icon
If you imagine variables are like boxes, you can’t make sense of assignment in Python; instead, think of variables as sticky notes,
20%
Flag icon
With reference variables, it makes much more sense to say that the variable is assigned to an object, and not the other way around. After all, the object is created before the assignment.
20%
Flag icon
Variables are bound to objects only after the objects are created
20%
Flag icon
To understand an assignment in Python, read the righthand side first: that’s where the object is created or retrieved. After that, the variable on the left is bound to the object, like a label stuck to it. Just forget about the boxes.
20%
Flag icon
An object’s identity never changes once it has been created; you may think of it as the object’s address in memory. The is operator compares the identity of two objects; the id() function returns an integer representing its identity.
20%
Flag icon
The == operator compares the values of objects (the data they hold), while is compares their identities.
20%
Flag icon
if you are comparing a variable to a singleton, then it makes sense to use is.