about
A ‘short’ or ‘hash’ ID is a seemingly random sets of characters, popularised by services which need unique but readable (and typeable) URLs. For example, YouTube use short IDs for their video URLs:
http://www.youtube.com/watch?v=IfeyUGZt8nk
I found myself needing to generate short IDs for URLs in a recent project. These were the requirements:
I wasn’t looking at generating millions of IDs per day, all I needed was something simple.
Since I was using node.js to generate these IDs, the first port of call was npm, to see if there were any projects which already existed. My search yielded two main results:
These libraries didn’t really fit the requirements, however:
Under the guise of the KISS principle, I set about creating my own generator. Turns out, for my use case, it was rather straightforward.
I started by using JavaScript’s Math.random()
to generate an 8-character ‘random’ ID, using a pre-defined alphabet.
var ALPHABET = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
var ID_LENGTH = 8;
var generate = function() {
var rtn = '';
for (var i = 0; i < ID_LENGTH; i++) {
rtn += ALPHABET.charAt(Math.floor(Math.random() * ALPHABET.length));
}
return rtn;
}
Math.random
creates a number between 0
(inclusive) and 1
(exclusive), Math.random() * alphabet.length
creates a number anywhere between 0
and just less than alphabet.length
.Math.floor()
takes the floating point number and returns the nearest integer, thus ending up with an integer between 0
and alphabet.length - 1
(in this case, 62
).alphabet.charAt(index)
returns the character in the string alphabet
at the given index.This was a good start, and satisfied requirement 1) - it creates an 8 character ID which is easy to type.
Requirement 2) specifies it should be easy to speak over the phone. This will be useful in support situations - e.g. if a user needs assistance, they should be able to speak the ID to an operator without too many difficulties.
The first thing to trim is the lowercase/uppercase, which could cause confusion. IDs look nicer with lowercase, so the uppercase was removed.
var ALPHABET = '0123456789abcdefghijklmnopqrstuvwxyz';
Next up, numbers that look like letters. This is another source of confusion, and so the numbers 0
and 1
and the letters o
and l
were removed.
var ALPHABET = '23456789abcdefghijkmnpqrstuvwxyz';
Lastly, rude words, aka requirement 3). It could be quite embarrassing to generate an ID with a rude word. Sticking to English profanities, it turns out that most rude words are formed from the majority of the following characters: cfhistu
. These characters were also removed.
var ALPHABET = '23456789abdegjkmnpqrvwxyz';
Our final alphabet contains 25 characters.
Uniqueness is a difficult goal to achieve. I was quite relaxed about requirement 4) - my ID generator would not need to generate millions of IDs per day.
Since we are generating an 8-character ID, the theoretical probability of a clash using the alphabet above of 25 chars is 25^8 = 152,587,890,625 - over 152 billion to 1.
However, since Math.random()
is a pseudorandom generator, which means it is not guaranteed to always produce random numbers, the probability will drop slightly. This being said, it is good enough for my needs.
As a failsafe, a function can be added to check that the generated ID is indeed unique:
var UNIQUE_RETRIES = 9999;
var generateUnique = function(previous) {
previous = previous || [];
var retries = 0;
var id;
// Try to generate a unique ID,
// i.e. one that isn't in the previous.
while(!id && retries < UNIQUE_RETRIES) {
id = generate();
if(previous.indexOf(id) !== -1) {
id = null;
retries++;
}
}
return id;
};
For this to work, you should pass in an array of previously generated IDs.
If the list of IDs was stored in a database, you could generate the ID and then check for its existence in the DB before continuing.
Generating your own ‘short’ ID needn’t be complicated. Depending on your requirements, a simple solution can often suffice.
The full code for this generator can be found here.