Transparent Caching, IP Address Access Lists, and Site Licenses for Web Content.
Transparent Caching and IP Address Access Lists
Update, January 2006: In the four-plus years since we wrote this document, it looks like our view has won out and IP-based authorization has become the common way of authorizing universities for electronic resources. The rest of this document is thus historic. (File with the MS system administrators we dealt with in 1998 who constantly explained that noone would be using Unix by 2000...)
At our web site we have restricted some files to subscribers. While we have issued passwords to a handful of individuals 99% of users are at locations with site licenses. We use IP address to authorize these users. The advantage for us - minimal clerical effort - is also an advantage for the customer. We are aware that IP addresses can be forged, but we believe that is not likely to be a significant problem, given the nature of the content we are selling.
Last week, I started to get complaints from European users that their IP addresses were being rejected. Eventually I tracked the problem down to a cache engine installed by *our* ISP that was caching our server pages and sending them out to unsuspecting clients. Since I didn't know anything about this device, the cache engine was not on the authorized IP list, and was being given 'not authorized' replies to all requests. These it was relaying to users. A look in my logs showed that 25% of all my hits were to a host named 'wccp-1.hck.idt.net', and there were no more hits from European sites. I had never heard of server side caching, and indeed an afternoon on Deja-news and Altavista turned up only a handful of mentions of the possibility of such a thing.
Several messages suggested marking pages as non-cache-able, or already expired. This does not work. Although the engine does not cache such pages, my server still sees the engine IP address, rather than the customer IP address. I also tried putting 'cgi-bin' and '?' in the URL with the same result.
Apparently some cache engines provide an 'X-Forwarded-For' header with the IP address of the original client. It looks like this:
It is not clear how many cache engines provide this service. According to the Squid FAQ there is an option to replace the IP address with the string 'unknown', to enhance client side security. Since headers can be supplied by users, the level of security provided by depending upon this header is extremely low. Note that the IP address seen by the Web server is fairly trustworthy, since that is where replies are sent, and while it is easy to forge the source IP address, that doesn't do much good if you expect a reply.
At the moment, the only thing I can think of to do is to move the restricted portion of the site to a web server running on a port other than port 80. The cache engine only caches the standard HTTP port. This isn't ideal from the user's point of view because it adds some syntax to a URL I had kept simple enough to remember, but it should work. I haven't tested it yet.
Below is what my ISP offered me. It includes some comments from Cisco, the vendor of the Cache engine, which are notable for their tone. Cisco ignores the use of IP addresses for site authorization, and accuses me of being a jerk for using IP addresses for session tracking. I don't do session tracking, and even the thought of establishing a login-id and password for every economics student at every subscribing University shows that it is impractical. Am I missing something? Is there a reason Cisco doesn't just say, 'Use another port'? Could cookies be used to implement a site license scheme?
The following is quoted directly from the ISP's message:
>As for the wccp-1.hck and wccp-2.hck caching servers, they are Cisco >transparent caches, which means that they intercept web calls regardless >of whether or not the web browser is configured for a proxy server. > >Servers of these types are slowly gaining popularity, as such more and >more providers are beginning to install them network wide. This should >serve as an early warning to you that your site authorization function >will continue to have problems down the road. > >Here is what Cisco has to say about this. If you want to respond to their >comments, feel free to send them along to me, and I will forward it to >them for a response. > >>``The bottom line is that ANY IP based authentication will fail when there >>is a Cache Engine (or any other Proxy web server) between the customer and >>them. They will continue to have these problems ad infinitum until they >>stop using client IP addresses. Our question to them is: WHY? Why use >>client based IP's to define a session instead of cookies? That's why >>cookies were invented, to identify a specific user and alter the content >>you serve them based on their needs or history. The CE and all other >>Proxies and caches all support Cookie passing while all breaking any >>system that requires the original IP of the client. As far as I (and the >>engineers here) know, Microsoft's ASPs do not default to using client IP >>based session identifying, and must be intentionally configured that way.'' >> >>``This is why we created the WCCP ACL's: to compensate for those people >>out there who insist on using client IP-based authentication. For >>security reasons, it is easily hacked and next to useless by itself >>(although it's often used in conjunction with other security measures) and >>for identification I don't see any benefit over cookies.'' >> >>``So for now, these are the options: >>1)Convince them to stop using client IP-based authentication and start >>using cookies instead (not very likely to happen). >>2)Keep the ACL on them.'' >> >>``Long Term: >>The only way we could ever even think of doing anything to correct this is >>if their server can recognize that the same IP address is being used more >>then once at the same time and send one of the standard HTTP header error >>codes in it's response to our TCP SYN packet. In our 2.x releases (next >>year), this will allow us to activate the planned auto-bypass (or >>intelligent-bypass) feature causing the client to connect directly to the >>site (bypassing us) without any human intervention needed. Without any >>feedback from their server that something is wrong, we can never correct >>this situation.''
Daniel Feenberg feenberg of nber dot org National Bureau of Economic Research http://www.nber.org 617-588-0343 (voice) 617-868-2742 (fax)I received this thoughtfull comment.
>>>>>> >>Date: Tue, 02 Jul 2002 14:23:16 -0500 >>From: Joe Cooper