The reverse proxy server
With the proliferation of cloud computing and single-board computers, the term - reverse proxy server, becomes a frequent mention in technical specifications that we may encounter as a developer or system implementation consultant.
Finding myself having to reiterate my understanding of the reverse proxy server, I reckoned that I should document what I know about the reverse proxy server so that I have a place where people can reference when they are lost with the topic of reverse proxy server.
So, what is a reverse proxy server?
Getting to know the proxy server
To understand the reverse proxy server, it is beneficial to first understand what is a proxy server. A proxy server is a process that sits between a client and a web server, receiving HTTP requests from the client and forwarding the HTTP requests to the web server that the client want to get HTTP responses from. When the web server returns the HTTP responses, the proxy server will receive the HTTP responses from the web server and send them back to the client.
In an organization setting, the clients and the proxy server usually belong in the same network. A proxy server is deployed to communicate with web servers outside the organization network on behalf of the clients in the organization.
Getting to know the reverse proxy server
If we reverse the path of communication, we will get a reverse proxy server. A reverse proxy server sits in front of other web servers (also known as upstream servers) to listen for HTTP requests coming from clients outside of the organization network.
A reverse proxy server will inspect every HTTP request that it receives to decide which upstream server will process a HTTP request in order to generate the corresponding HTTP response. Once the reverse proxy server figures out which upstream server should process the HTTP request, it will proceed to forward the HTTP request to that upstream server and wait for the upstream server to return a HTTP response. Once it receives the HTTP response from the upstream server, it will send the HTTP response back to the client.
In an organization setting, a reverse proxy and the upstream servers usually belong in the same network. A reverse proxy server is deployed to listen for HTTP requests from clients outside the organization on behalf of upstream servers in the organization.
Why do we have to use a reverse proxy server?
Why can't the client just talk to the upstream server directly? Why must we introduce the hassle of putting a reverse proxy server in between the client and the upstream server? While it is faster for the client to communicate with the upstream server directly, there are many good reasons as to why we want to put a reverse proxy server in front of our servers that are doing the actual work of process the HTTP requests from the client.
We only have one public IP address for clients to communicate with
These days, it is common for us to have a couple of computing devices that we wish to access from outside of our home network. Most Internet Service Providers provide a router and a modem to give subscribers the flexibility to access the internet with multiple computing devices.
The router will request for a public IP address via the modem and maintain a private network for our home devices to connect to. Each phone or computer at home will then request for a private IP address from the router by joining the private network.
Every home device that we want to make accessible to clients outside from our home network is a server that is waiting to serve HTTP requests from clients.
However, since clients from outside of our home network can only see the public IP address that the home router had acquired, HTTP requests can only reach the router's network port with the public IP address.
If we have only one device that we wish to access from outside our home network, we only need to configure two network address translation rules to map port 80 and 443 of the public IP address to the same ports of the private IP address of the home device.
However, if we wish to access our WordPress site on our Raspberry Pi 3, a Raspberry Pi 3 CCTV camera and another Raspberry Pi Zero W CCTV camera, we will need a reverse proxy server to help coordinate HTTP requests addressed to each of these devices.
We want to make it harder for hackers to get hold of our application data from outside our organization network
With a reverse proxy server, the upstream servers are hidden from the outside world. And since our upstream servers are the ones that interact with application data, we can make it difficult for hackers to get hold of our application data.
We want to have the flexibility to introduce more upstream servers to serve similar kinds of HTTP requests
Since the upstream servers are the ones that do the grunt work of generating HTTP responses for the HTTP requests, the reverse proxy server utilizes minimal computing power for each HTTP request. If the traffic demand increases for our application, we can utilize a reverse proxy server to balance the traffic load to different upstream servers that run the same application logic.
How to use a reverse proxy server?
To use a reverse proxy server, we need to be aware of a few things.
The IP address for clients to reach the reverse proxy server
The very first thing that we need is the IP address for clients to reach the proxy server. If you manage to get hold of a virtual instance on a cloud provider, chances are you will be able to secure a public IP address that will not change. For example, if you create a droplet on DigitalOcean, your droplet will be assigned a public IP address. This is the IP address that you will use in DNS configurations for resolving domain names that your reverse proxy server uses to proxy HTTP requests to the respective upstream servers.
If the reverse proxy server has to get its public IP address from an ISP, chances are its public IP address will change. In such a case, you will need to create Dynamic DNS configurations for resolving each domain name that your reverse proxy server uses to proxy HTTP requests to the respective upstream servers. The Dynamic DNS configurations will update the public IP address mapping to each domain name when your reverse proxy server gets a new public IP address.
For example, if you get your domain name from Namecheap, this is how you can get a Raspberry Pi to use Namecheap dynamic DNS to update your domain when your public IP address changes.
The port numbers that the reverse proxy server will listen for HTTP requests
Once we are sure that our reverse proxy server is reachable at the network layer of the TCP/IP stack, the next thing that we need to do is to decide on the port numbers that our reverse proxy server will listen to.
Usually we will configure the reverse proxy server to listen on port 80 for HTTP traffic and on port 443 for HTTPS traffic.
The domain names that the reverse proxy server will inspect to pass HTTP requests to different upstream servers
As an example, suppose that I wish to send HTTP traffic for cctv.techcoil.com to my Raspberry Pi 3 CCTV and nas.techcoil.com to my Synology DS216J NAS DiskStation. My reverse proxy server sits in the private network behind my home router and network address translation rules are created to map port 80 and port 443 of a public IP address to the private IP address of my reverse proxy server.
In this case, I will add two DNS A records with my DNS service provider to map the two domain names to that public IP address. This will result in HTTP requests for the two domain names to reach the reverse proxy server.
I then configure the reverse proxy server to map cctv.techcoil.com
to the private IP address of my Raspberry Pi 3 CCTV and nas.techcoil.com
to the private IP address of my Synology DS216J NAS DiskStation.
The URI patterns that the reverse proxy server will inspect to pass HTTP request to different upstream servers
Apart from differentiating HTTP requests based on domain names, a reverse proxy server can usually be configured to differentiate HTTP requests made to the same domain name by different URI patterns.
Suppose that I want to use WordPress to manage my blog content and a self written Python 3 Flask application to manage my forum content. In such a case, I will configure my reverse proxy server to send HTTP requests for https://techcoil.com/blog/*
to an instance of the PHP-FPM (FastCGI Process Manager) with WordPress source code and https://techcoil.com/forum/*
to my Python 3 Flask application.
The communication protocol for the reverse proxy server to communicate with the upstream server
A communication protocol is an agreement between two parties that defines how communication can happen. A good reverse proxy server is capable of speaking different protocols with upstream servers of different makes.
For example, nginx provides the following set of communication protocols to communicate with different types of servers:
- proxy_pass passes a request to a HTTP server.
- fastcgi_pass passes a request to a FastCGI server.
- uwsgi_pass passes a request to a uwsgi server.
- scgi_pass passes a request to an SCGI server.
- memcached_pass passes a request to a memcached server.
Apart from the communication mechanism, the communication protocol also includes whether the communication channel between the reverse proxy server and the upstream server is encrypted or not.
You do not need to encrypt the communication channel between the reverse proxy server and the upstream server if:
- The reverse proxy server and the upstream server communicates via a private network which is not reachable by the public.
- A stranger cannot connect to your private network without entering some password. Physical access to your router is also protected by some lock mechanism.
The IP address and port number of the upstream server
Each reverse proxy configuration will certainly include the IP address and port number of the upstream server. The IP address and port number of the upstream server is usually supplied with the communication protocol.
The remote IP address of the connecting client
As the upstream server does not deal directly with the client, it will not be able to know the remote IP address of the connecting client. If the upstream server need the client IP address in order to generate the HTTP response, the reverse proxy server need to include the remote IP address of the connecting client in the forwarded HTTP request. Such information is usually included as additional HTTP headers in the forwarded HTTP request.