Pro PHP Application Performance will help you understand all the technologies and components which play a role in how well your applications run. When seconds can mean the difference between retaining a user and losing a user, it's important for all of us to have optimization as part of our project roadmap. But what components within your application should you analyze? How should you optimize? And how can you measure how well your application is performing? These are some of the questions that are answered in this book. Along the way you will also learn the "why" of optimizing. You’ll discover why you should optimize a specific component, why selecting one function over another is beneficial, and how to find and use the optimization tools available to the open source community. You’ll also learn how to deploy caching software as well as web server software. Pro PHP Application Performance will also teach you more advanced techniques, such • Using Xdebug to profile functions that are not running as efficiently as possible. • Comparing opcode executed by different PHP functions to narrow the search for functions that run efficiently. • Using strace to analyze Apache while your application is served to the user. Once you complete this book you will have a complete sense of where to start optimizing and, most importantly, have the tools which allow you to continue optimizing in other PHP applications going forward.
Pro PHP Application Performance Chapter 1. Benchmarking Techniques
- The tools we’ll use are Apache Benchmark (ab) and Siege. - PHP application stack - can get information about : • Total time a single request took to respond • Total response size from the server • Total number of requests a web server can handle per second
- Defining the Request/Response Lifecycle
Apache Benchmark - ab –n 1 http://www.example.com/ ( evaluate 1 request to the url example.com ) Understanding the results
Server Information :
Server Software: Apache/2.2.3 Server Hostname: www.ubershare.com Server Port: 80
Document Information :
Document Path: / Document Length: 14363 bytes
Transfer rate :
Concurrency Level: 1 Time taken for tests: 1.970 seconds Complete requests: 1 Failed requests: 0 Write errors: 0 Total transferred: 14911 bytes HTML transferred: 14363 bytes Requests per second: 0.51 [#/sec] (mean) Time per request: 1970.338 [ms] (mean) Time per request: 1970.338 [ms] (mean, across all concurrent requests) Transfer rate: 7.39 [Kbytes/sec] received
The HTML transferred, Requests per second, and Time per request are the key fields for us. Our goal is to successfully lower the HTML transferred, increase the Requests per second, and lower the Time per request values throughout this book. - ab -c 10 -n 100 http://www.bzi.ro/
Requests per second: 0.86 [#/sec] (mean)
Connection Times (ms) min mean[+/-sd] median max Connect: 123 487 783.8 260 3457 Processing: 3461 10827 6145.4 9548 33340 Waiting: 640 1598 3112.2 985 19654 Total: 3653 11313 6208.1 10120 34533
Server Software: Apache/2.2.3 Server Hostname: www.ubershare.com Server Port: 80
Document Path: / Document Length: 14388 bytes
Concurrency Level: 10 Time taken for tests: 9.451 seconds Complete requests: 100 Failed requests: 0 Write errors: 0 Total transferred: 1493600 bytes HTML transferred: 1438800 bytes Requests per second: 10.58 [#/sec] (mean) Time per request: 945.089 [ms] (mean) Time per request: 94.509 [ms] (mean, across all concurrent requests) Transfer rate: 154.33 [Kbytes/sec] received
Connection Times (ms) min mean[+/-sd] median max Connect: 165 196 43.1 178 386 Processing: 551 713 209.2 646 1530 Waiting: 205 317 197.6 253 1166 Total: 722 909 213.1 832 1695
Percentage of the requests served within a certain time (ms) 50% 832 66% 873 75% 932 80% 948 90% 1331 95% 1447 98% 1653 99% 1695 100% 1695 (longest request)
ab –c 10 –t 20 http://www.example.com/ (Timed Tests ) - simulate ten simultaneous user visits to the site over a 20-second interval
To simulate a request by a Chrome browser, you could use the following ab command: ab -n 100 -c 5 -H "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWeb Kit/534.2 (KHTML, like Gecko) Chrome/6.0.447.0 Safari/534.2" http://www.example.com/
Siege
The second benchmarking tool we’ll use is Siege. Like ab, Siege allows you to simulate user traffic to your web-hosted document, but unlike ab, Siege provides you the ability to run load simulations on a list of URLs you specify within a text file. It also allows you to have a request sleep before conducting another request, giving the feeling of a user reading the document before moving onto another document on your web application.
siege –c 5 –t10S http://www.bzi.ro/ ( simulate a load test with five concurrent users for ten seconds )
- In other words, the smaller the data size requested by the user, the faster the response.
Chapter 2 - Improving Client Download and Rendering Performance
- use Firebug - user YSlow - Google’s Page Speed - JavaScript Placement - by moving all JavaScript toward the bottom of your HTML file, you are allowing the HTML to render much sooner
Minification Tools - YUI Compressor - http://developer.yahoo.com/yui/compre... Usage : java –jar yuicompressor-2.4.2.jar –o js-mini.js js-to-minify.js Closure Compiler - tool takes a JavaScript file as an input and “compiles from JavaScript to better JavaScript,” as the official web site, http://code.google.com/closure/compiler, points out. The Closure Compiler can be used in three ways: • Using a Java jar file that can be downloaded to your local hard drive • Via the online tool on the official web site • Using its RESTful APIs Usage : java -jar compiler.jar --js jquery.js --js_output_file jquery-min-cc.js
Chapter 3. PHP Code Optimization Php Best practices : - require vs. require_once => require is faster - calculating Loop Length in Advance - accessing Array Elements Using foreach vs. for vs. while : choose foreach - file access : file_get_contents boosts its performance when reading small files on systems that support it.
Chapter 4. Opcode Caching Caching tools : • Alternative PHP Cache (APC) • XCache • eAccelerator The PHP Life Cycle :
⇒ remove the three steps, Lexicon Scan, Parse, and Create Opcode
Alternative PHP Cache Alternative PHP Cache (APC) is a PECL extension that is available for both Unix and Windows servers. APC installs directly on the Zend engine to provide a caching system that redirects the request to the cached Opcode if it’s present and has not expired. APC uses shared memory and a mapping table to fetch the Opcode for a specific PHP script. XCache XCache is another Opcode caching tool that is used by PHP. XCache, like APC, uses shared memory to store the Opcode and uses this cached Opcode to satisfy a request for a PHP script. Like APC, XCache is also available for both Unix-based systems and Windows. As of this writing, XCache 1.2.X is the most stable release, and XCache 1.2.2 will be the version I will use to test Opcode caching as well as install. eAccelerator The final Opcode caching tool we will look at is eAccelerator (eA), which works much like APC and XCache. eA was created by Dmitry Stogov and was originally part of the Turck MMcache project. Like APC and XCache, eA stores cached content within shared memory but also allows for a separate option to store cached data on disk.
Chapter 6. Choosing the Right Web Server Choise :
- Security and Stability => Apache - Availability of Engineers with Detailed Knowledge => Apache - Your Site Is Predominantly Static Content => lighttpd or Nginx - You Are Hosting in a Managed Service => Apache - You Are Using Unusual PHP Extensions => Apache
Web Server Type Description Prefork : Process-based web server; for each incoming request, a forked process is used to satisfy the request. Threaded : Thread-based web server; for each incoming request, a thread is used to satisfy the request.
Chapter 7. Web Server and Delivery Optimization - ApacheTop is the main tool we will use to inspect the performance of our web server , a Real-Time Access Log File Analyzer apachetop -f /var/log/apache2/access.log => display the accumulated request rate since it was started for all 2xx, 3xx, 4xx, and 5xx status codes
Using Round-Robin DNS The simplest way of distributing traffic between multiple web servers is to use “round- robin DNS.” This involves setting multiple “A” records for the hostname associated with your cluster of machines, one for each web server. The DNS service will deliver this list of addresses in a random order to each client, allowing a simple distribution of requests among all of the members of the farm.
Using a Load Balancer A load balancer is a device that distributes requests among a set of servers operating as a cluster or farm. Its role is to make the farm of servers appear to be a single server from the viewpoint of the user’s browser.
- 2 types of load balancer - software & hardware
Totally software-based solutions: Solutions such as the Linux Virtual Server project (www.linuxvirtualserver.org) allow you to operate a load balancing service directly on each server of your web farm.
Software solutions using a separate load balancing server: You can run a software load balancer on a separate machine that “fronts” your web farm. Products such as HAProxy, Squib, and Apache running with mod_proxy allow you to build your own load balancing appliances.
Physical load balancing appliances: An alternative to rolling your own load balancing appliance is to use a commercial device such as that supplied by F5, Coyote Point, Cisco, and Juniper. These devices often provide many other facilities such as caching, SSL termination, and I/O optimization.
Load balancing services: Many cloud-based solutions provide load balancing services that allow you to map a single IP address to multiple web servers. The Amazon Elastic Load Balancer is a service that is provided to support balancing of requests between instances hosted in the Amazon EC2 cloud.
Using Direct Server Return As your traffic grows and you add more and more servers to your web server farm, another performance issue can surface that limits the rate at which you can deliver pages. A technique has been developed called direct server return (DSR), which bypasses the load balancing system for web server responses, and writes the response directly from the web server to the user’s browser. This sleight of hand is done at the networking level, so your application is not aware of the difference. It means, however, that the load balancer is dealing only with the requests, which tend in the most part to be small compared to the responses.
Sharing Sessions Between Members of a Farm
When you distribute your application, you have to ensure that all web servers can access the same session data for each user. There are three main ways this can be achieved. 1. Memcache: Use a shared Memcache instance to store session data. When you install the Memcache extension using PECL, it will prompt you as to whether you wish to install session support. If you do, it will allow you to set your session.save_handler to “Memcache” and it will maintain shared state. 2. Files in a shared directory: You can use the file-based session state store (session.save_handler=”files”) so long as you make sure that session.save_path is set to a directory that is shared between all of the machines. NFS is typically used to share a folder in these circumstances. 3. Database: You can create a user session handler to serialize data to and from your back-end database server using the session ID as a unique key.
CDN
A content distribution network (CDN) is a hierarchically distributed network of caching proxy servers, with a geographical load balancing capability built in. Some typical CDN systems include the following: • Akamai: One of the best-known and most extensive CDN solutions, not really suitable for small to medium sites because of its costs. • CD Networks: Like Akamai, this content distribution network is designed for large-scale deployments. • Limelight: Another well-known CDN system, Limelight also provides remote storage of assets as well as distribution. • Amazon CloudFront: A simple CDN integrated with Amazon EC2/S3, notable for its contract-free pay-as-you-go model; not quite as extensive as previously mentioned solutions
Monitoring Systems
Ganglia: A real-time monitoring system especially suitable for monitoring arrays or farms of servers, as well as providing performance statistics about individual servers.
Cacti: Another well-recommended real-time monitoring tool, notable for its very large number of available “probes” for monitoring every part of your application stack; Nagios: The grandfather of open source monitoring systems, extremely good at system availability monitoring
Chapter 8. Database Optimization
MyISAM: The Original Engine
- is the original storage engine that was developed alongside MySQL itself - sites that are perhaps 95–100 percent read-based, it is without a doubt the best solution - Fast unique key lookup times - Supports full-text indexing - Select count(*) is fast. - Takes up less space on disk
Cons : - Table-level locking; if your application spends more than 5 percent of its time writing to a table, then table locks are going to slow it down. - Non-transactional, no start => commit/abort capability - Has durability issues; table crash can require lengthy repair operations to bring it back online.
InnoDB: The Pro’s Choice
InnoDB is an ACID-compliant (atomicity, consistency, isolation, durability) storage engine, which includes versioning and log journaling, and has commit, rollback, and crash-recovery features to prevent data corruption. InnoDB also implements row-level locking and consistent non-locking reads, which can significantly increase multi-user concurrency and performance.
Pros : - Transactional; queries can be abandoned and rolled back. Crashes don’t result in damaged data. - Has row-level locking; concurrent writes to different rows of the same table don’t end up being serialized. - Supports versioning for full ACID capability - Supports several strategies for online backup - Improves concurrency in high-load, high-connection applications
Cons :
- SELECT count(*) queries are considerably slower. - No full-text indexing - Auto Increment fields must be first field in table; can cause issues with migration - Takes up more disk space - Can be slower than MyISAM for some simpler query forms, but excels at complex multi-table queries
Tuning Your Database Server’s Memory
a) “top” display for this server. It tells us more about how much load is on the server, if it is swapping, and how much of the total memory the mysqld process is using : top - 09:38:13 up 235 days, 2:39, 1 user, load average: 0.62, 0.56, 0.44 Tasks: 69 total, 1 running, 65 sleeping, 0 stopped, 3 zombie Cpu(s): 4.3%us, 0.0%sy, 0.0%ni, 86.5%id, 9.0%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 17927580k total, 17851776k used, 75804k free, 185020k buffers Swap: 0k total, 0k used, 0k free, 4989364k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16570 mysql 15 0 12.9g 11g 6032 S 9 66.4 3567:29 mysqld 1 root 15 0 10304 800 672 S 0 0.0 0:07.93 init
b) iostat -d -c -x 2 I/O performance using the iostat tools. iostat should be installable from your distribution’s software repositories, and should be available on all distributions. We will use it with the -d -c and -x options, which enable device, CPU, and extended stats. Figure 8–5 shows the output that is produced when running iostat against our example server. c) mysqltuner.pl