NATIONAL BUREAU OF ECONOMIC RESEARCH
NATIONAL BUREAU OF ECONOMIC RESEARCH

Transparent Caching, IP Address Access Lists, and Site Licenses for Web Content.

Transparent Caching and IP Address Access Lists

Update, January 2006: In the four-plus years since we wrote this document, it looks like our view has won out and IP-based authorization has become the common way of authorizing universities for electronic resources. The rest of this document is thus historic. (File with the MS system administrators we dealt with in 1998 who constantly explained that noone would be using Unix by 2000...)

At our web site we have restricted some files to subscribers. While we have issued passwords to a handful of individuals 99% of users are at locations with site licenses. We use IP address to authorize these users. The advantage for us - minimal clerical effort - is also an advantage for the customer. We are aware that IP addresses can be forged, but we believe that is not likely to be a significant problem, given the nature of the content we are selling.

Last week, I started to get complaints from European users that their IP addresses were being rejected. Eventually I tracked the problem down to a cache engine installed by *our* ISP that was caching our server pages and sending them out to unsuspecting clients. Since I didn't know anything about this device, the cache engine was not on the authorized IP list, and was being given 'not authorized' replies to all requests. These it was relaying to users. A look in my logs showed that 25% of all my hits were to a host named 'wccp-1.hck.idt.net', and there were no more hits from European sites. I had never heard of server side caching, and indeed an afternoon on Deja-news and Altavista turned up only a handful of mentions of the possibility of such a thing.

Several messages suggested marking pages as non-cache-able, or already expired. This does not work. Although the engine does not cache such pages, my server still sees the engine IP address, rather than the customer IP address. I also tried putting 'cgi-bin' and '?' in the URL with the same result.

Apparently some cache engines provide an 'X-Forwarded-For' header with the IP address of the original client. It looks like this:

X-Forwarded-For: 123.321.123.1

It is not clear how many cache engines provide this service. According to the Squid FAQ there is an option to replace the IP address with the string 'unknown', to enhance client side security. Since headers can be supplied by users, the level of security provided by depending upon this header is extremely low. Note that the IP address seen by the Web server is fairly trustworthy, since that is where replies are sent, and while it is easy to forge the source IP address, that doesn't do much good if you expect a reply.

At the moment, the only thing I can think of to do is to move the restricted portion of the site to a web server running on a port other than port 80. The cache engine only caches the standard HTTP port. This isn't ideal from the user's point of view because it adds some syntax to a URL I had kept simple enough to remember, but it should work. I haven't tested it yet.

Below is what my ISP offered me. It includes some comments from Cisco, the vendor of the Cache engine, which are notable for their tone. Cisco ignores the use of IP addresses for site authorization, and accuses me of being a jerk for using IP addresses for session tracking. I don't do session tracking, and even the thought of establishing a login-id and password for every economics student at every subscribing University shows that it is impractical. Am I missing something? Is there a reason Cisco doesn't just say, 'Use another port'? Could cookies be used to implement a site license scheme?

The following is quoted directly from the ISP's message:

>As for the wccp-1.hck and wccp-2.hck caching servers, they are Cisco
>transparent caches, which means that they intercept web calls regardless
>of whether or not the web browser is configured for a proxy server.
>
>Servers of these types are slowly gaining popularity, as such more and
>more providers are beginning to install them network wide.  This should
>serve as an early warning to you that your site authorization function
>will continue to have problems down the road.
>
>Here is what Cisco has to say about this.  If you want to respond to their
>comments, feel free to send them along to me, and I will forward it to
>them for a response.
>
>>``The bottom line is that ANY IP based authentication will fail when there
>>is a Cache Engine (or any other Proxy web server) between the customer and
>>them.  They will continue to have these problems ad infinitum until they
>>stop using client IP addresses.  Our question to them is: WHY? Why use
>>client based IP's to define a session instead of cookies? That's why
>>cookies were invented, to identify a specific user and alter the content
>>you serve them based on their needs or history.  The CE and all other
>>Proxies and caches all support Cookie passing while all breaking any
>>system that requires the original IP of the client.  As far as I (and the
>>engineers here) know, Microsoft's ASPs do not default to using client IP
>>based session identifying, and must be intentionally configured that way.''
>>
>>``This is why we created the WCCP ACL's: to compensate for those people
>>out there who insist on using client IP-based authentication.  For
>>security reasons, it is easily hacked and next to useless by itself
>>(although it's often used in conjunction with other security measures) and
>>for identification I don't see any benefit over cookies.''
>>
>>``So for now, these are the options:
>>1)Convince them to stop using client IP-based authentication and start
>>using cookies instead (not very likely to happen).
>>2)Keep the ACL on them.''
>>
>>``Long Term:
>>The only way we could ever even think of doing anything to correct this is
>>if their server can recognize that the same IP address is being used more
>>then once at the same time and send one of the standard HTTP header error
>>codes in it's response to our TCP SYN packet.  In our 2.x releases (next
>>year), this will allow us to activate the planned auto-bypass (or
>>intelligent-bypass) feature causing the client to connect directly to the
>>site (bypassing us) without any human intervention needed.  Without any
>>feedback from their server that something is wrong, we can never correct
>>this situation.''

Daniel Feenberg                            feenberg of nber dot org
National Bureau of Economic Research       http://www.nber.org
617-588-0343 (voice)                       617-868-2742 (fax)
I received this thoughtfull comment.

>>
>>>>
>>Date: Tue, 02 Jul 2002 14:23:16 -0500
>>From: Joe Cooper 
>>To: feenberg of nber dot org
>>Subject: Regarding cache.html
>>
>>Hi Daniel,
>>
>>A client sent me a link to your site (http://www.nber.org/cache.html), 
>>after he was doing some reading up about similar issues.  I just thought 
>>I'd chime in a bit on the subject, since Cisco didn't answer all of your 
>>questions straightaway.
>>
>> > Am I missing something?
>>
>>Possibly.  I think more likely your ISP is missing something.  I can't 
>>think of a reason for your ISP to be using the cache transparently on 
>>outgoing connections.  This sounds like a misconfiguration, that would 
>>be easily corrected in most environments.  And in fact, it tickles my 
>>mind into thinking they're running an open proxy (running an open proxy 
>>is a major no-no for all kinds of reasons...it is the proxy 
>>administrators equivalent of running an open mail relay, with very 
>>similar end results).
>>
>> > Is there a reason Cisco doesn't just say, 'Use another port'?
>>
>>Because nearly any port can be proxied.  In many environments, a proxy 
>>is mandatory for security reasons.  In such cases, no matter what port 
>>you use, the proxy will be used for the connection.  You may also 
>>introduce other problems, like access control preventing your site from 
>>being visited, as some proxies limit which ports may be browsed.  Not 
>>very likely, but it could happen.  Proxies are becoming more common.  It 
>>is a fact of life (and a positive thing, in my humble opinion...it's not 
>>like us Squid nerds are making a killing in this business, so you can 
>>probably safely assume we have good intentions, even if you're 
>>suspicious of proprietary vendors).  For security, performance, and 
>>bandwidth conservation reasons.
>>
>> > Could cookies be used to implement a site license scheme?
>>
>>Not that I know of, unless you have an online signup procedure that ends 
>>with sending a cookie.  Even so, cookie filters, browser changes, etc. 
>>can lead to the cookie going away...so you are back to authenticating 
>>them traditionally to reset the cookie.
>>
>>However, Squid proxies (the most popular http proxy in the world) 
>>default to sending the X-Forwarded-For header, which Apache will let you 
>>get at.  It is very rarely altered, but when you unwrap it you may find 
>>private IPs inside (192.168.x.x, 172.16.x.x, 10.x.x.x).  Not much good 
>>for security either.
>>
>>Another thought, assuming your ISP changes their cache not to cache 
>>internal servers to outside clients, you can probably safely perform 
>>access control on the IP of the proxy at each site in question.  I get 
>>the impression the folks you deal with are at Universities...if that is 
>>the case, and you want to allow the whole school access, it is easiest 
>>to use their proxy IP.
>>
>>Finally, the simplest thing is probably a site-wide login:password.  It 
>>is easy to implement with htpasswd, and relatively secure.  Spoofing is 
>>no longer an issue, though plain text passwords are.  I don't see how it 
>>could be any more complex than your current job of finding the IP, and 
>>setting it in your htaccess file.  Just run htpasswd to add the new 
>>username and password when you get a new client, instead.
>>
>>So, anyway, you're not a jerk for using IP access controls, but it 
>>/will/ cause problems for a lot of reasons...not just transparent 
>>proxies.  NATted networks will end up not working correctly either (and 
>>it is becoming more and more common to have only one or two real IPs at 
>>a site and masqueraded private IPs behind them--our office is that way, 
>>we've got 1.5Mbits of ADSL on a single IP...getting more IPs would cost 
>>us a couple hundred dollars extra per month, as we'd need to get a 
>>business account with server privileges).
>>
>>Good luck!
>>-- 
>>Joe Cooper 
>>Web caching appliances and support.
>>http://www.swelltech.com
>>
>>
>>
 
Publications
Activities
Meetings
NBER Videos
Data
People
About

Support
National Bureau of Economic Research, 1050 Massachusetts Ave., Cambridge, MA 02138; 617-868-3900; email: info@nber.org

Contact Us