Generating IDs like Youtube or Bit.ly using PHP

Let’s supose you want to develop your own URL shortener, like Bit.ly for example.

You can, of course, use the ID as a integer, like, 1, 2, 3, etc. If you have 12.345 rows in your database table, you will need 5 digits, like http://example.com/12345. Large applications like Youtube, have much more entries, so, to use numbers, the URL will be very long, like http://youtube.com/watch?v=231268318276783.

Because that, these websites, like YouTube, t.co, bitly.com or even vine.co, are using a generated ID using uppercase letters, lowercase letters, digits and sometimes underscore (_) and hyphen (-). You can check that given a YouTube video URL where you will find something like http://www.youtube.com/watch?v=2Z4m4lnjxkY. You can see they’re using the “2Z4m4lnjxkY” as ID.

The Math – Base 10

You have the ID in pure digits (base 10, or decimal), where you have 10 options for each “position”: 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9. So, with 5 positions you can theoretically represent 10*10*10*10*10 (100.000) IDs. Of course you won’t represent the number 45 as 00045 or even will have the 00000 ID, but they are mathematics options. πŸ˜€

Base 32

So, to improve the number of options you can add more chars, like lowercase letter and digits. Basically you can use a base32 converter. The base32 will convert a decimal number to a string using digits and lowercase letters, like u63j8d:

<?php
// Converting the number 328743826 from base 10 to base 32 
echo base_convert(328743826, 10, 32);

This will produce the result “9pgesi”. Cool, we could reduce a lot the number of characters. But we can do better.

Base 62

With base32 you have all digits and all lowercase letters. With base62 we have also uppercase letters, generating IDs like “8H9j8sD79”. So using a base62 convertor we have much more options than base32. For example, with 5 positions we can have 62*62*62*62*62 = 916.132.832 options.

With PHP you cannot use the base_convert function because it only works from base 2 to base 36 and we need more, we need 62, so we have some already coded sources to do that.

This class was sent to me by Taylor Otwell, the creator of the Laravel Framework. It works very well and will solve your problem to generate base62 strings. You can get the code here!

Example

This is how you can use this class to generate your base62 strings:

<?php 
$my_id = 23435; // get the ID from MySQL for example;
$base62 = Math::to_base($my_id, 62);

To get the decimal ID from the base62 string you have to do:

<?php
$base62 = 'b6H8Jk2';
$decimal = Math::to_base_10($base62, 62);

You must add a new column on your database table called, for example, “base62”. So every time you insert a new item you get the ID, generate the base62 string and save.

Be happy!

Published by

Junior Grossi

senior software engineer & stutterer conference speaker. happy husband & dad. maintains Corcel PHP, elePHPant.me and PHPMG. Engineering Manager @ Paddle

22 thoughts on “Generating IDs like Youtube or Bit.ly using PHP”

  1. Same problem here, the results of the class is too short like when i’m inputting int ‘247’ only resulting string ‘3Z’

  2. Hi Junior,

    Thanks for the insightful post – it put me on the right track for an approach for problem I had to solve.

    One note though – I think you meant Base36 instead of Base32. Base36 would include digits and all lowercase letters ‘a’ thru ‘z’, while Base32 only includes digits and lowercase letters ‘a’ thru ‘v’.

  3. Excellent article, thanks! Its a shame the math class isn’t a package that could be added to multiple projects using composer. Would save cut and pasting it everywhere.

    I’d be happy to host it on my github account but its Taylor Otwell’s baby πŸ™‚

    1. Hi @stevergill! First, thank you for the comment! It’s nice to read that the post was useful for you.

      About your question, you can use the HashIDs project https://github.com/ivanakim… to get the same logic, but using Composer. HashIDs is a project available in many languages, including PHP. Maybe it can help you.

      See you. Cheers.

    1. Hi @James!

      Thanks for the comment. The main goal of using this IDs is to reduce the string length. Using base64_encode for example you have for the id 123 the string MTIzDQo, that is bigger than using integer only.

      []s

  4. Hi,
    You don’t need to add a “base62_slug” column in your fatabase, as you gave the code to convert from base B to base 10 πŸ˜‰

    And, I think it could be better to generate a random integer and store it (so that no ond can get the item he wan’t by forging an ID)

    &#045 PunKeel

    1. Hi PunkKeel!

      You are right! Your object does not need that column on the database for storage πŸ˜‰
      And about forging an ID you can modify the order of your elements (like letters, numbers) to make a custom algorithm and generate “custom hashes”.

      Thank you for the contribution!

      Regards!

    2. When you have to add millions of entries per day into a database you might not want to use random integers… as a base. That would end in many entries with the same id.

      1. Hi Debugger!

        The goal here is to use the generated integer (auto increment) to generate the string like id.

        Thank you. []s

  5. Hey there…cool post, but your example doesn’t seem to work. Running 23435 through the to_base function produces ’65Z’, which is not very useful as an id. Do you not get the same results when using the ‘23435’ ?

    1. Hi Josh!

      The decimal 23435 in base 62 is “65Z”. This is right. The ids can be a combination (logic) of any uppercase or lowercase letter and decimal numbers, decreasing the number of chars you must use to represent a decimal number.

      Any questions just ask!

      Best regards.

Leave a Reply

Your email address will not be published. Required fields are marked *