URL Encoding with CFML

Some feedback about my Amazon Product Advertising API Signature Generator (yes, I know it needs a better name) has caused me to look at URL encoding in more detail than I ever have before, and it turns out it’s not as straightforward as you might think.

The first problem was that I’d forgotten that I needed to URL encode the parameter values.  No problem, Coldfusion’s built in URLEncodedFormat() function should sort that out with minimal effort.  At least that was the theory.  The trouble is, while it worked most of the time, occasionally I was getting SignatureDoesNotMatch errors. 

A look at the API docs told me that Amazon were expecting me to encode to RFC 3986.  They also helpfully noted that a couple of commonly used escaping methods (for Perl and Java) do not strictly follow RFC 3986.  To see exactly what URLEncodedFormat() was doing, I threw together a quick test:

<cfloop from="32" to ="127" index="i">
	<cfoutput>
		#i# - #chr(i)# - #urlencodedformat(chr(i))#<hr>
	</cfoutput>
</cfloop>

It turns out that URLEncodedFormat() encodes all characters that are not in the ranges a-z, A-Z or 0-9.  Unfortunately this includes a number of characters that RFC 3986 says should not be encoded, specifically “.” (period/dot),  “-” (hypen/minus), “_” (underscore) and “~” (tilde).  This doesn’t seem to matter too much when just encoding a URL to make an HTTP request, for example, but when the encoded message is used to calculate a digital signature, both parties need to be using exactly the same encoding scheme or the signatures will not match.

The obvious alternative for a CF developer, Java’s URLEncoder.encode, does slightly better with only 3 “mistakes” (as pointed out in the API docs).  To achieve RFC 3986 compliant encoding, I was going to have to correct some mistakes made by the functions available to me (or write my own from scratch which I didn’t much fancy), but which one to choose?

I tried two different functions to do the encoding and fix the problems, returning an RFC 3986 encoded string. 

Version 1:

<cffunction name="rfc3986EncodedFormat1" returntype="string" output="false">

	<cfargument name="text" required="yes" type="string">

	<cfreturn replacelist(urlencodedformat(arguments.text), "%2D,%2E,%5F,%7E", "-,.,_,~")>

</cffunction>

Version 2:

<cffunction name="rfc3986EncodedFormat2" returntype="string" output="false">

	<cfargument name="text" required="yes" type="string">

	<cfset var lc = structnew()>

	<cfset lc.objNet = createObject("java","java.net.URLEncoder")>
	<cfset lc.encodedText = lc.objNet.encode(arguments.text, 'utf-8').replace("+", "%20").replace("*", "%2A").replace("%7E", "~")>

	<cfreturn lc.encodedText>

</cffunction>

Comparing the two, there didn’t seem to be a perceptible difference in performance for single calls, but scaling up to 1000 calls in a loop there was a clear winner, with URLEncoder consistently taking between 1000 and 2000 milliseconds, and the URLEncodedFormat() version taking between 2000 and 4000.  Version 2 has made it into version 1.1 of Amazonsig, available from amazonsig.riaforge.org.

As a happy side effect of this, Amazonsig also now encodes extended characters correctly, allowing values like Motörhead to be passed.

3 Comments

  1. dan says:

    I use Python, but thanks for the tip about rfc 3896.

  2. dan says:

    Oops, RFC 3986!

  3. Tim says:

    Glad it helped!