Getting the Query String When You're Using SES URL Rewriting

I just finished up an administrative application (a process that's going to lead to a couple of posts in the near future), and I came across an issue that I hadn't seen before, so I thought I'd post about it.

If you're using Search Engine Safe (SES) URL rewriting in your application, the QUERY_STRING value of the CGI scope on the server is no longer available to you. You might want to capture this, as I do, for logging purposes. It's often useful to know the URL variables passed to a given request for auditing and security, or for debugging a user problem. If your application is re-writing all of your URLs to make them search engine safe, you don't get this information anymore. For example: a typical URL in a Web application might look like:

Search engines haven't, historically, liked such URLs and have tended to stop right at the end of the "path" listed in the URL (eg; http://www.myapp.com/) and ignore everything after the ? (the standard query string delimiter). That's changing, but most Web application developers will tell you to make such URLs "search engine safe," like this:

Search engines will typically follow that URL because it lacks a query string. This can result in better search results at the end of the day.

If you want to capture that query string, however, you can't rely on the old server CGI scope standby, QUERY_STRING, however. That's because in a SES URL, there is no query string. There is no ? to delimit it. So what do you do?

The CGI specification indicates that a PATH_INFO value should be passed in addition to a number of other variables, including the QUERY_STRING value. The PATH_INFO value will contain "The extra path information, as given by the client. In other words, scripts can be accessed by their virtual pathname, followed by extra information at the end of this path. The extra information is sent as PATH_INFO." The PATH_INFO essentially becomes QUERY_STRING, and you can capture and parse it as you see fit for your logging/auditing/processing purposes.

In my case, I'm using Mach-II as my MVC framework, and it handles the SES URL rewriting for me (and a whole lot of other things as well). Anything that comes after the /index.cfm in the URL path then populates the CGI.PATH_INFO value, and that's what I use to capture my "query string" for logging, manipulation, or anything else I see fit.

Comments
Comments are not allowed for this entry.
BlogCFC was created by Raymond Camden.

Creative Commons License
The content on http://www.iterateme.com/ is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.