How to configure Squid

as a reverse Proxy Server (server accelerator)


1. What is Squid?

Squid Squid is used by hundreds of Internet Providers world-wide to provide their users with the best possible web access. Squid optimizes the data flow between client and server to improve performance and caches frequently-used content to save bandwidth. Squid can also route content requests to servers in a wide variety of ways to build cache server hierarchies which optimize network throughput.

Squid is a proxy server and web cache daemon. It has a wide variety of uses, from speeding up a web server by caching repeated requests; to caching web, DNS and other computer network lookups for a group of people sharing network resources; to aiding security by filtering traffic. Although primarily used for HTTP and FTP, Squid includes limited support for several other protocols including TLS, SSL, Internet Gopher and HTTPS. Squid was originally designed to run on Unix-like systems. Released under the GNU General Public License, Squid is free software. As an example, Squid is used by the Wikimedia Foundation on Wikipedia.

It is good to read the FAQ section of Squid when some unexpected problem happens (or even better, before it happens!). Click here or here (older version but useful) to read the FAQ section.

2. What is the Reverse Proxy (httpd-accelerator) mode?

Squid working in the Reverse Proxy (httpd-accelerator) mode caches incoming requests for outgoing data (i.e., that which you publish to the world). It takes load away from your HTTP server and internal network. Squid pulls the HTTP data from the "real" HTTP server (which means that only the accelerator needs to know where the real server is). The outside world sees no difference (apart from an increase in speed).

The Squid redirector can make one accelerator act as a single front-end for multiple servers. If you need to move parts of your filesystem from one server to another, or if separately administered HTTP servers should logically appear under a single URL hierarchy, the accelerator makes the right thing happen. Measurement of the Squid cache and its Harvest counterpart suggest an order of magnitude performance improvement over CERN or other widely available caching software. This order of magnitude performance improvement on hits suggests that the cache can serve as an httpd accelerator, a cache configured to act as a site's primary httpd server (on port 3128), forwarding references that miss to the site's real httpd (on port 80). The cache serves references to cachable objects, such as HTML pages and GIFs, and the true httpd (on port 80) serves references to non-cachable objects, such as queries and cgi-bin programs. If a site's usage characteristics tend toward cachable objects, this configuration can dramatically reduce the site's web workload.

3. Installation

We will consider two different arquitectures.

In the first case we have a Proxy Server, where we will install Squid listening at port 3128 (default port), with two NICs. We assume the first NIC is connected to the router of the ISP provider. For instance you may have a Router called "Livebox" if your ISP provider is called "Orange". Let us assume that the Proxy Server has an static IP as 192.168.1.10 and the DNS name is cosmos.linuxmaniac.net (you will have to adapt from now on this tutorial to your own parameters). In the /etc/hosts file of the Proxy Server define 192.168.1.10 as cosmos.linuxmaniac.net. The second NIC is directly connected to a second computer working as a Web Server (a server as Apache listening at port 80 is assumed to be installed and working in this computer).

You have to change the configuration of your Router. In the router you have to forward the port 80 to the port 3128 of the computer 192.168.1.10 with DNS hostname cosmos.linuxmaniac.net. If you have a Livebox Router:

The DNS name is assigned in Equipment -> Display, and clicking on the Proxy server, in Device Settings, you can write the correct "DNS hostname".
The static IP is assigned in Livebox -> Advanced Configuration -> DHCP, in the section "Static IP address".
To forward the ports, go to Livebox -> Advanced Configuration -> NAT/PAT, and choose HTTP as Application/Service, External Port: 80, Internal Port:3128, Protocol: TCP, Device: cosmos.linuxmaniac.net.

The Web Server will be assumed to have an internal DNS hostname (configured in /etc/hosts in the Web Server) "hp hp.home" with static IP: 192.168.11.12 (adapt that data to your own setup).

In the internal network, it is convenient in the case of a small LAN, to assign the DNS hostname "cosmos.linuxmaniac.net" to the IP 192.168.1.10 to all computers having access to the Proxy Server, by means of the /etc/hosts file, as explained in section 15.

In the case where you have a single computer running Squid and Apache things are not essentially different. In this case you have a "virtual second computer" working with the IP 127.0.0.1 (which actually is the internal IP address of every computer). You can check that by typing this IP in your favorite browser and Apache will serve your web. This means that 192.168.1.10 will be the IP of the Proxy Server and 127.0.0.1 will be the IP of your web server. Keep this in mind from now on.

Of course we install Squid3 in the proxy server using Debian's repositories (version 3.2) in Synaptic.

4. Configuration

4.1 Setting Squid in reverse proxy mode

The configuration file is /etc/squid3/squid.conf. Open it with your favorite editor so we can make the necessary changes. The first thing we do is to tell Squid to operate in reverse proxy mode, and setup a default host name which will be useful when connecting to the reverse proxy by IP address or an alias.

Around line 1078, under the section NETWORK OPTIONS, add the following:

# Squid normally listens to port 3128
http_port 3128 accel defaultsite=cosmos.linuxmaniac.net vhost

where cosmos.linuxmaniac.net is the DNS hostname of the proxy server where we are installing Squid and 3128 is the port Squid is listening by default.

4.2 Defining the Web Server

Next we tell Squid where were to find and how to connect to the Web Server. In Squid, this is referred to as a "peer cache". Only our Web Server contains all the information we need, therefore Squid only needs to connect to it as a parent "peer" to get that "cached" data.

Around line 1826 under the section OPTIONS WHICH AFFECT THE NEIGHBOR SELECTION ALGORITHM go ahead and add the following:

# Choosing Web Server 192.168.11.12 listening at port 80 and name hp
cache_peer 192.168.11.12 parent 80 0 no-query originserver name=hp

We assume the server has IP 192.168.11.12 and 80 is the port where the Web Server is listening; the name is arbitrary and it is used for configuration purposes only.

If you use one single computer, write this instead:

cache_peer 127.0.0.1 parent 80 0 no-query originserver name=hp

4.3 Permission's Assignment

Now we create some ACL's (Access Control List) giving users permission to access our webserver through a specific host name.

Around line 646, under the ACL section, add in:

acl hp_users dstdomain cosmos.linuxmaniac.net

Replace "cosmos.linuxmaniac.net" with the DNS hostname you will use to access your Reverse Proxy Server over the Internet. The ACL name "hp_users" can be what ever you want. We will use this name to apply the ACL to permissions that we will create.

If your site has several DNS hostnames, you can add them on the same line space separated:

acl hp_users dstdomain cosmos.linuxmaniac.net cosmos.linuxlover.net universe.linuxlover.net

Now lets setup general HTTP access permissions for all users on the proxy. Around line 784, under the ACL section, add in:

## INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS #
http_access allow hp_users

Make sure you replace "hp_users" with whatever ACL name you used.

Finally, allow the Proxy users access to the Web Server and deny everything else. Around line 1869 or under the "cache_peer_access" section, add in:

cache_peer_access hp allow hp_users
cache_peer_access hp deny all

Once again replace "hp" with the peer cache name you used and replace "hp_users" with the ACL name you used. Save the configuration file and restart Squid using the command line:

sudo /etc/init.d/squid3 restart

Now, cross your fingers.

5. Memory Cache Options

Go to the section MEMORY CACHE OPTIONS, about line 1950. The parameter cache_mem does NOT specify the maximum size of the process. It only specifies how much memory to use for caching "hot" (very popular) replies. Squid's actual memory usage is depends very strongly on your incoming request load. Squid uses memory for other things as well. The default size of cache_mem is 256 MB. Probably you may want to increase this amount depending on your system. In my case:

cache_mem 500 MB

The parameter maximum_object_size_in_memory is quite self-explanatory. Objects greater than this size will not be attempted to kept in the memory cache. This should be set high enough to keep objects accessed frequently in memory to improve performance whilst low enough to keep larger objects from hoarding cache_mem. The default size is 512KB. You may change this parameter depending on your system. In my case:

maximum_object_size_in_memory 50000 KB

It is good to monitor the memory used by Squid and to adjust the parameters depending on the system behaviour.

6. Deny access to certain files

Let us assume that our web server contains .php files. But we wish to access to them only from the internal network directly (you may forward port 80 in the Proxy server to port 80 of the web server in the case you are using two computers). This means we wish to deny external access to those files through Squid. Then you need to define an ACL name as we did before around line 646 under the ACL section:

# ACL name to deny access to .php files acl php_ext urlpath_regex -i \.php$

where php_ext is an arbitrary name for the access list. In the section http_access, about line 769 we add:

# Deny access
http_access deny php_ext

to deny the access to those files. It is possible to configure more sofisticated or specific ACLs. Check the comments on the ACL section in the configuration file to find out.

7. Logging in Apache (In the web server, of course)

If Squid is succesfully configured and running, you may check Apache's Logs and you will see that the IPs of your clients are lost. All requests seems to come from the Proxy Server. To correct this, at the end of the file /etc/Apache2.conf, where logging is configured, follow the instructions about changing %h for %{X-Forwarded-for}i and save the file. This allows correct logging of clients in the access log, but not in the error log.

To get the right error logs, install the libapache2-mod-rpaf package. Then load the module rpaf:

sudo a2enmod rpaf

Edit rpaf.conf and change the proxy server IP (127.0.0.1 by default) (only if the Proxy Server and the Web Server are in different computers, otherwise do not change the IP). Restart Apache:

/etc/init.d/apache2 restart

and check the changes in the logs.

8. Human readable logs for Squid

Default log format for Squid is defined at the end of the section LOGFILE OPTIONS, about line 2184:

# This defines the default log format "squid" logformat squid %ts.%03tu %6tr %>a %Ss/%03>Hs %<st %rm %ru %un %Sh/%<A %mt

The meaning of every part is explained in the comments of the configuration file. Check also in here for a more elaborated and clear explanation. I like having the access time in a human readable form and a bit more information about the Referer and User-Agent, so I use the following customized logformat:

# This defines the log format "squid"
logformat squid %{%d %H:%M:%S}tl %6tr %>a %Ss/%03>Hs %<st %rm %ru "%{Referer}>h" "%{User-Agent}>h" %mt

In the following section of the configuration file, the location of the log file is defined by means of the variable access_log:

# Default:
access_log /var/log/squid3/access.log squid

If you wish to change the log file, this is the place to do it.

9. Sending Squid logs to a remote computer

We may want to receive logs of the Proxy server into another computer of our internal network to monitor the activity. This requires two main steps, first we must tell to the Proxy Server that it must send the logs, then we must tell to the remote computer to listen to the Proxy Server and write the logs in a certain file. Are you ready?

9.1 Configuration of the Proxy Server

First of all, we need Squid to send the logs to rsyslogd. rsyslogd is a basic program running in your system taking care of the system logs; the r in rsyslogd means remote. rsyslogd is the program that will send the logs to the remote computer. We open the file /etc/squid3/squid.conf and we go the section where access_log is defined. By default you should have:

#Default:
access_log /var/log/squid3/access.log squid

Which means that Squid is sending the logs directly to the file /var/log/squid3/access.log with the log format squid, which is defined in the previous section of the configuration file. Now, we add the following line in order to get Squid to send the logs to rsyslogd with the log format squid:

access_log syslog:local2.info squid

You need to restart Squid so the changes take effect:

sudo /etc/init.d/squid3 restart

Next, we edit the configuration file of rsyslogd: /etc/rsyslog.conf. In the first section MODULES the first two lines must be active:

#################
#### MODULES ####
#################

$ModLoad imuxsock # provides support for local system logging
$ModLoad imklog # provides kernel logging support (previously done by rklogd)

In the section RULES, add the following lines, before any other rules:

###############
#### RULES ####
###############

#
# Logging for my beloved Squid3
#

local2.* /var/log/squid3.log
local2.* @@192.168.1.12:514
local2.* @@192.168.1.22:514
& ~

As you can see we are adding four lines. The first line is telling rsyslogd to write the logs into a file /var/log/squid3.log. We do this to check that rsyslogd is receiving the logs from Squid. Later on we can delete this line. It is good to create the file /var/log/squid3.log in advance using touch:

sudo touch /var/log/squid3.log

because rsyslogd is not able to create files. The second and third lines means that rsyslogd sends the logs to the remote computers with IP 192.168.1.12 and IP 192.168.1.22 using the standard port 514. @@ means we use TCP protocol to send the logs (we could use UDP instead with one single @). Of course you change the IPs depending on the IPs of your remote computers. The last line makes sure that squid logs are not written in other system log files. Finally we need to restart rsyslogd:

sudo service rsyslog restart

The Proxy Server is done. Now we must configure the remote computer.

9.2 Configuration of the Remote Computer

We edit the configuration file of rsyslogd: /etc/rsyslog.conf. In the first section MODULES must look like the following one:

#################
#### MODULES ####
#################

$ModLoad imuxsock # provides support for local system logging
$ModLoad imklog # provides kernel logging support
$ModLoad immark # provides --MARK-- message capability

# provides UDP syslog reception
#$ModLoad imudp
#$UDPServerRun 514

# provides TCP syslog reception
$ModLoad imtcp
$InputTCPServerRun 514

Notice that we expect the input logs in port 514 using TCP protocol. In the section RULES, add the following lines, before any other rules:

###############
#### RULES ####
###############

if $app-name == '(squid)' then /var/log/squid3.log
& ~

#if $fromhost-ip startswith '192.168.1.' then /var/log/squid3.log
#& ~

The first line is quite self-explanatory, it sends Squid logs to the file /var/log/squid3.log (which again must be created using touch). The second line makes sure that Squid logs do not appear in other system logs. The commented lines provide an alternative way to get Squid logs into a file. You can play with it if you want to.

Finally we need to restart rsyslogd:

sudo service rsyslog restart

Now check the logs and make sure the remote computer receives Squid activity reports.

10. Further reading

In the following links you will find more information:

http://wiki.squid-cache.org/SquidFaq/ReverseProxy
http://wiki.squid-cache.org/ConfigExamples/Reverse/BasicAccelerator
http://wiki.squid-cache.org/SquidFaq/SquidMemory
http://wiki.squid-cache.org/ConfigExamples/Intercept/DebianWithRedirectorAndReporting
http://www.comfsm.fm/computing/squid/FAQ-6.html
http://blog.spench.net/2010/02/24/tips-for-setting-up-squid-in-reverse-proxy-web-accelerator-accel-mode/
http://www.kaizou.org/2009/02/sample-apache-cache-configuration/
http://www.techstacks.com/howto/log-client-ip-and-xforwardedfor-ip-in-apache.html
http://www.mediawiki.org/wiki/Squid_Reverse_Proxy


Raconet Linux