Be careful, when using random values

We really love to use random values as identifiers. They ensure that nobody will be able to figure out, which will be the identifier of next item, allowing us just to write code.

But what will happen, when you will use about of half of different random values that can be generated with your random identifier function?

Of course - you will get data collisions. There is several ways to fix it: remove identifiers after short time, add different values to identifier function, change the generation algorithm itself, etc. Let's look at the issue I've encountered.

Let's presume we have this code:

import random

def _make_new_identifier():
    return "".join(random.sample("abcdefghijklmnopqrstuvwxyz", 20)

According to the python doc, random.sample generates list of specified size of unique elements from passed collection. It means that it generates a combination. After several calculations, we figure out that there is 230230 different identifiers that it is able to generate. Not so much, in fact.

What if we add to alphabet numbers? We get 31k times more different combinations. That's enough for whole Earth population. Or not? In fact, much more to be sure that nobody will get someone else's identifier. Something about an infinity value, which is not possible. Some time there will be amount of users that will make any randomization algorithm create collisions, even if you use huge alphabet and allow not unique elements.

So what to do? There is one good way: do not use random values as main identifier. The best idea I have is to use random value along with incremented number. It will be both protected both from collisions and from possibility to predict next identifier.

That's all, folks. Be careful with randomness 😄

Be careful, when using random values

Product

Explore

Company

Blogs

Support