What is URL (Uniform Resource Identifier) and Percent Encoding?

What Is URL (Uniform Resource Identifier) Encoding?

Definition

URL encoding is an encoding format used in URLs. The standard allows the use of arbitrary data inside a Uniform Resource Identifier (a URI; typically a URL) while using only a narrow set of US-ASCII characters. The encoding exists because URLs and HTTP request parameters often contain characters (or other data) that cannot be represented with the limited set of US-ASCII characters (i.e. control characters, etc.).

Reserved and unreserved characters

In general, a URI can contain characters that are either reserved or unreserved. Unreserved characters are characters that have no special meaning; they can be displayed as-is and require no special handling. These include uppercase and lowercase letters (A-Z, a-z), decimal digits (0-9), hyphen (-), period (.), underscore (_), and tilde (~).

Reserved characters, on the other hand, are characters that may delimit the URI into sub-components: characters such as / # & and others. The following is the list of all reserved characters: ! # $ & ' ( ) * + , / : ; = ? @ [ ].

We cannot use reserved character as-is, because this would create ambiguous URIs. For instance, consider URL http://example.com/foo#bar. Does this URL point to an anchor #bar inside resource /foo, or it points to a resource /foo#bar, that is, a resource whose name contains character #? Without URL encoding it would be impossible to tell.

We resolve such ambiguities by encoding reserved characters differently when used as data; when used as delimiters, we encode them as-is.

Percent encoding

To encode reserved characters, we use the percent-encoding scheme. In percent-encoding, each byte is encoded as a character triplet that consists of the percent character % followed by the two hexadecimal digits that represent the byte numeric value. For instance, %23 is the percent-encoding for the binary octet 00100011, which in US-ASCII, corresponds to the character #. Strictly speaking, while the percent character (%) isn't reserved, it nonetheless serves as a special indicator for percent-encoded bytes (and therefore requires special handling). Simply put: it must also be percent-encoded (as %25).

So with percent-encoding, we know that URL http://example.com/foo#bar points to an anchor bar inside resource /foo while http://example.com/foo%23bar points to resource /foo#bar where character # is encoded as %23.

What Is URL (Uniform Resource Identifier) and Percent Encoding

Other characters

Percent encoding is also used to represent other characters; characters that are neither reserved nor unreserved. As an example, imagine a GET request containing a non-ASCII string parameter, such as a search query zajec in jež which is Slovenian for a rabbit and a hedgehog.

In such cases, we have to first encode non-ASCII characters as UTF-8 and then encode each byte of the new string with percent-encoding. So if we send a GET request to the Duckduckgo search engine containing search query zajec in jež, we generate the following URL: https://duckduckgo.com/?q=zajec%20in%20je%C5%BE

Encoding the `space` character

You may have seen cases where the space character was encoded as character +, however, the percent-encoding suggests it should be encoded as %20 (in US-ASCII, the space character is 20 hexadecimal or 32 decimal). So what is going on?

Such encodings are typically created by HTML forms. When a user submits an HTML form, the data is URL-encoded using an early version of the URI percent-encoding rules that contained a number of modifications such as replacing spaces with + and others.

Note however, that using the + instead of %20 is valid only when encoding the application/x-www-form-urlencoded content, such as the query part of an URL. To make this clearer, consider the following cases.

http://www.example.com/search+script.php?search+query=search+term

In this URL, the resource being requested is search+script.php (the plus character (+) is part of the filename), while the parameter name is search query and its value is search term – in the name of the query parameter and in its value the + sign is converted to space while in the name of the resource, search+script.php, the + sign remains.
http://www.example.com/search+script.php?search%20query=search%20term

This case is identical to the example above. The difference—using %20 instead of the + sign in parameter name and value—is only superficial. Both URLs point to the same resource, search+script.php, and they contain the same parameters.
http://www.example.com/search%20script.php?search%20query=search%20term

This example, however, is different. Here the resource name contains the actual space character, so the name of the requested resource is search function.php; the request parameter names and values remain the same as above. Consequently this URL is different from those above.

A URL encoder

The application below performs URL encoding and decoding on arbitrary strings. Feel free to test it out (HTML).

Input <br>
<input type="text" name="input" id="input"><br><br>

Output <br>
<input type="text" name="encoded" id="encoded">

<script>
let input = null;
let encoded = null;

document.addEventListener("DOMContentLoaded", () => {
		input = document.querySelector("#input");
		input.onkeyup = encode;
		encoded = document.querySelector("#encoded");
		encoded.onkeyup = decode;
});

function encode(event) {
		encoded.value = encodeURIComponent(input.value);
}

function decode(event) {
		try {
				input.value = decodeURIComponent(encoded.value);
		} catch (error) {
				input.value = "Invalid URI string";
		}
}
</script>

Glossary

HTTP

Hypertext Transfer Protocol. A protocol that connects web browsers to web servers when they request content.

Encoding

The act of transferring or saving information into a usable file format.

CDN

Storage

Stream

Optimizer

Website Acceleration

Video Delivery

Software Distribution

Dynamic Image Processing

Bunny Academy HTTP

What is URL Encoding?

What Is URL (Uniform Resource Identifier) Encoding?

Definition

Reserved and unreserved characters

Percent encoding

Other characters

Encoding the `space` character

A URL encoder

Further reading

Glossary

HTTP

Encoding

Products

Features

Solutions

Developers

Company

Resources

Support

Big traffic? Talk to Sales

What Is URL (Uniform Resource Identifier) Encoding?

Definition

Reserved and unreserved characters

Percent encoding

Other characters

Encoding the space character

A URL encoder

Further reading

Glossary

HTTP

Encoding

Products

Features

Solutions

Developers

Company

Resources

Encoding the `space` character