Bhaskar Chowdhury’s Kindle Notes & Highlights for System Performance Tuning: Help for Unix Administrators

John Hennessy and David Patterson: they are titled Computer Organization and Design: The Hardware/Software Interface and Computer Architecture: A Quantitative Approach (both published by Morgan Kaufmann).

Noted.

11%

NCA uses a kernel module to transparently cache static web content in a kernel memory buffer, and replies to HTTP document requests for documents in its cache without ever waking up the application web server.

SUN's NCA

15%

processor performance doubles roughly every eighteen months, but memory performance doubles roughly every seven years.

Kinda bummer,but true.

16%

Caches are organized into equal-sized chunks called lines.

hmm

17%

Linux addresses this issue by adopting an empirical rule related to the processor’s cache size: the larger the processor’s cache, the longer a process will wait for a piece of time on that processor.

Yup.

18%

Every LWP has a kernel thread, but every kernel thread need not have an LWP:

Unidirectional.

18%

Threads generally fall into five categories for scheduling: timesharing (ts), interactive (ia), kernel, real-time (rt), and interrupt.

yup

20%

Buses implement either circuit-switched or packet-switched protocols.

Righto!

22%

Spin locking is accomplished entirely within the processor’s cache, so that it does not cause excess bus traffic.

22%

On a system with many processors that is under load, the mutex locking itself can actually become a bottleneck.[16] If this is the case, adding additional processors will hinder, rather than help, performance.

A classic case , which bust the myth.

22%

poor application design and implementation is a possible, indeed a likely, root cause of poor performance on multiprocessor systems.

Caveat! Adding more processors doesn't help.

22%

larger processor caches often cause a great performance increase on multiprocessor systems.

That's becasue lack of travel.

24%

PCI is a synchronous bus architecture, which means that all data transfers are performed relative to a system clock. The initial PCI specification permitted a maximum clock rate of 33 MHz, but the later Revision

24%

2.1 specification extended this to 66 MHz.

25%

The high speed of the PCI bus (up to 528 MB/second, at 64-bit data paths and a 66 MHz clock rate) limits the number of expansion slots on a single bus to no more than three or four slots due to electrical concerns.

PCI limits.

25%

Each PCI device includes a set of registers that contain configuration data. These registers define what the type of the card is (SCSI, Ethernet, a framebuffer, etc.), as well as who manufactured the card, what the interrupt level of the card is, and so on.

PCI Config store.

25%

PCI supports both 5-volt and 3.3-volt signaling levels.

PCI Voltage range.

25%

In general, interrupt priorities are assigned in decreasing order of IRQ; that is, the system timer (IRQ 0) has priority over all other IRQs.

System IRQ

26%

idle time is zero, as reported by vmstat, the first thing you should check is if your system has I/O throughput problems.

32%

The difference between them is how each memory cell is designed. Dynamic cells are charge-based, where each bit is represented by a charge stored in a tiny capacitor. The charge leaks away in a short period of time, so the memory must be continually refreshed to prevent data loss. The act of reading a bit also serves to drain the capacitor, so it’s not possible to read that bit again until it has been refreshed. Static cells, however, are based on gates, and each bit is stored in four or six connected transistors. SRAM memories retain data as long as they have power; refreshing is not ...more

32%

cheaper and offers the highest densities of cells per chip; it is smaller, less power-intensive, and runs cooler. However, SRAM is as much as an order of magnitude faster, and therefore is used in high-performance environments.

32%

The first represents the amount of time required to read or write a given location in memory, and is called the memory access time.

MAT

32%

second, the memory cycle time, describes how frequently you can repeat a memory reference.

MCT

32%

(SDRAM) memory, which uses a clock to synchronize the input and output of signals. This clock is coordinated with the CPU clock, so the timings of all the components are synchronized. SDRAM also implements two memory banks on each module, which essentially doubles the memory throughput; it also allows multiple memory requests to be pending at once. A variation on SDRAM, called double-data rate SDRAM (DDR SDRAM) is able to read data on both the rising and falling edges of the clock, which doubles the data rate of the memory chip.

32%

The virtual memory system is responsible for managing the associations between the used portions of this virtual address space into physical memory.

34%

If a process tries to write to a shared page, it incurs a copy-on-write fault.[5

COW

34%

kswapd’s behavior is controlled by three parameters, called tries_base, tries_min, and swap_cluster,

kswapd

35%

system that is paging is writing selected, infrequently used pages of memory to disk,

Paging

35%

while a system that is swapping is writing entire processes from memory to disk.

Swapping

35%

Paging is not necessarily indicative of a problem; it is the action of the page scanner to try and increase the size of the free list by moving inactive pages to disk.

how paging works

36%

Memory is consumed by four things: the kernel, filesystem caches, processes, and intimately shared memory.

Memory consumptions

69%

10 Mb/s Ethernet, the actual signals placed on the wire use a technique known as Manchester encoding, which allows the clock signal and the data to be transmitted in one logical parcel.

69%

This parcel, formally called a bit-symbol, includes the logical inverse of the encoded bit followed by the actual value of the encoded bit, so that there is always a signal transition in the middle of the bit-symbol. For example, the bit “0” would be encoded in Manchester as the bit-symbol “01.” This seems silly, since it appears to double the amount of work required to send a bit of data, but just like differential signaling, it is useful in long-distance communications. Its biggest disadvantage is that it generates signal changes on the wire twice as fast as the data rate, which makes the ...more

70%

The original standard used thick coaxial cable, and was known as 10BASE5. The 10 encodes the network data rate in Mb/s, the “BASE” refers to the use of a signaling method known as baseband, and the 5 describes the maximum segment length in 100 meter increments.

77%

The default values for tcp_conn_req_max_q0 and tcp_conn_req_max_q are 1,024 and 128, respectively.

77%

A certain type of denial-of-service attack, called SYN flooding, involves sending a large number of SYN packets with nonexistent source addresses. Because the second SYN is never acknowledged, the listen queue fills up and new connections get through only as old ones time out and are discarded from the queue. Whenever a dubious connection is discarded, the tcpHalfOpenDrop counter is incremented; a high value indicates that a SYN flood was likely attempted. If you observe this behavior, you can improve your protection by increasing tcp_conn_req_max_q0.

79%

NFS is stateless, the server and client need a mechanism to determine the other’s state in order to know when to reacquire a lock (e.g., when the server is rebooted) and when to invalidate a lock (e.g., when the client unmounts the filesystem); this is the role played by statd.

80%

Version 2 mount will default to UDP, and a Version 3 mount will default to TCP.

87%

gethrtime(), a function call that returns the current time in nanoseconds, and directly accessing the TICK register.