How to control caching of your web server
to avoid file suits?
While working for a world’s top most product company, one day a Product Manager working there approached to me. He had a legal letter served by one of our client. The legal notice says that our company has to bear a cost of 25,000 Australian dollars as bandwidth cost incurred to our client. Interestingly the legal notice did have technical explanation for excessive bandwidth consumption. The client used 100+ licenses of my employer’s products. Each installed product has an updater client to download updates. The client uses squid like cache server to optimise bandwidth. Unfortunately update server switched off caching flag for a 1 GB sized update, due to which the cache server starts downloading for every time it gets a proxy request from the updater client on different machines. Anyway the matter get sorted by enabling the flags and offering extensions to the license products. But it was a great learning and caching procedures were added in product manager’s and testing team check list for future updates releases.
Caching control helps in many ways such as
➊. Avoid round trips thus web page loads faster
➋. Saves Bandwidth means saves money
➌. Avoids unnecessary server load
While hosting your web site, you may use different approaches to control browser cache.
1. Using Meta Tags
2. Setting http headers via server side scripts (e.g. PHP)
3. Through web server configuration (e.g. httpd.conf, .htaccess files in apache)
4. Use of CDNs (Content Delivery Networks)
While hosting a web site, you may consider following cache configuration rules to control web traffic
1. Images like company logo, favicon etc. aren’t going to change frequently. It may include style sheets and java script files that you thing wouldn’t change for weeks or months. Keep these files in a separate folder and control cache headers through .htaccess files.
#place inside your .htaccess file
Options FollowSymLinks MultiViews
Allow from all
ExpiresByType text/html "access plus 1 day"
ExpiresByType text/css "access plus 1 day"
ExpiresByType image/gif "access plus 1 month"
ExpiresByType image/jpg "access plus 1 month"
ExpiresByType image/png "access plus 1 month"
ExpiresByType application/x-shockwave-flash "access plus 1 day"
2. Special Handling for index page (e.g. index.html or index.php etc.) . The main page of the web site changes frequently. You must consider special rules for this page. Probably switching off cache control flags for index pages.
3. Set ETAGS. ETAG or tag stands for entity-tag which is an unique hash identifier for the resource being served by the web server. Typically it is the hash of that resource or time stamp of that resource. When browser check if the ETAG of the cached resource is different from the resource at the web server then it shall download the new resource from web server. ETAGS are generally used by web developers.
4. Use Consistency in URLs. Many a times images. CSS files are referred by various web pages of the web site. At different places different path URLs refer to same file. Make sure refer to path correctly and consistently. Avoid using different versions of Java Script library. This is the most common mistake I have seen among web developers. They integrate different query plugins each referring to different version of jquery.
5. Use Content Delivery Networks (CDNs) : CDNs act as buffer between your web server and the browser client. Since CDN intermediate servers are hosted at various geographic locations, your content will be served from the server near to your client’s geo-ip location. If you deliver static content like software installers or data files to various users world wide, CDNs will optimise the content delivery and provide a better user experience.
Controlling cache-flags at server side can improve the performance of your web site and reduces bandwidth usage (better ray bandwidth invoices).