Base36 accidental obscenity

  1. Comments (1)

Base36 encoding is often used in various URL-shorteners or similar applications to shorten numeric identifiers, encoding them using a wider base of 36, making the result consist of numbers and letter of Latin alphabet. But did you ever think that such encoding could produce an accidental obscenity?

I did not. But recently I stumbled upon such a question on Russian Stack Overflow, Is it possible for the base36 converted number to return an obscene word?.

Of course, it's possible, and so the question is, did anybody do anything to prevent obscenities from appearing in the encoded values?

I searched the Interned and found that it's indeed a problem for the anxious developer. There are some topics on Stack Overflow, How can I filter out profanity in base36 IDs? for example. Or the problem is mentioned in What are the options for generating user friendly alpha numeric IDs (like business id, SKU).

There is also a base32 proposal which is intended to exclude some characters that can be ambiguous or produce an obscenity.

Hashid library also has an algorithm that is trying to prevent an obscenity.

But of course it's impossible to prevent every word that could be considered offensive from appearance unless all the newly created hashes are first checked against a precompiled list of bad words.


Related articles: