Let’s supose you want to develop your own URL shortener, like Bit.ly for example.
You can, of course, use the ID as a integer, like, 1, 2, 3, etc. If you have 12.345 rows in your database table, you will need 5 digits, like http://example.com/12345. Large applications like Youtube, have much more entries, so, to use numbers, the URL will be very long, like http://youtube.com/watch?v=231268318276783
.
Because that, these websites, like YouTube, t.co, bitly.com or even vine.co, are using a generated ID using uppercase letters, lowercase letters, digits and sometimes underscore (_) and hyphen (-). You can check that given a YouTube video URL where you will find something like http://www.youtube.com/watch?v=2Z4m4lnjxkY
. You can see they’re using the “2Z4m4lnjxkY” as ID.
The Math – Base 10
You have the ID in pure digits (base 10, or decimal), where you have 10 options for each “position”: 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9. So, with 5 positions you can theoretically represent 10*10*10*10*10 (100.000) IDs. Of course you won’t represent the number 45 as 00045 or even will have the 00000 ID, but they are mathematics options. π
Base 32
So, to improve the number of options you can add more chars, like lowercase letter and digits. Basically you can use a base32 converter. The base32 will convert a decimal number to a string using digits and lowercase letters, like u63j8d
:
<?php // Converting the number 328743826 from base 10 to base 32 echo base_convert(328743826, 10, 32);
This will produce the result “9pgesi”. Cool, we could reduce a lot the number of characters. But we can do better.
Base 62
With base32 you have all digits and all lowercase letters. With base62 we have also uppercase letters, generating IDs like “8H9j8sD79”. So using a base62 convertor we have much more options than base32. For example, with 5 positions we can have 62*62*62*62*62 = 916.132.832 options.
With PHP you cannot use the base_convert
function because it only works from base 2 to base 36 and we need more, we need 62, so we have some already coded sources to do that.
This class was sent to me by Taylor Otwell, the creator of the Laravel Framework. It works very well and will solve your problem to generate base62 strings. You can get the code here!
Example
This is how you can use this class to generate your base62 strings:
<?php $my_id = 23435; // get the ID from MySQL for example; $base62 = Math::to_base($my_id, 62);
To get the decimal ID from the base62 string you have to do:
<?php $base62 = 'b6H8Jk2'; $decimal = Math::to_base_10($base62, 62);
You must add a new column on your database table called, for example, “base62”. So every time you insert a new item you get the ID, generate the base62 string and save.
Be happy!
Same problem here, the results of the class is too short like when i’m inputting int ‘247’ only resulting string ‘3Z’
we, agree. this is an old solution IMHO, I would go for UUID in this case. it’s longer, but unique by context.
Can i encode md5 string into short ids?
and how do i decode it back when needed?
in this case I don’t think it’s a good idea. I think you are looking for uuids, aren’t you? https://github.com/ramsey/uuid
Hi Junior,
Thanks for the insightful post – it put me on the right track for an approach for problem I had to solve.
One note though – I think you meant Base36 instead of Base32. Base36 would include digits and all lowercase letters ‘a’ thru ‘z’, while Base32 only includes digits and lowercase letters ‘a’ thru ‘v’.
Excellent article, thanks! Its a shame the math class isn’t a package that could be added to multiple projects using composer. Would save cut and pasting it everywhere.
I’d be happy to host it on my github account but its Taylor Otwell’s baby π
Hi @stevergill! First, thank you for the comment! It’s nice to read that the post was useful for you.
About your question, you can use the HashIDs project https://github.com/ivanakim… to get the same logic, but using Composer. HashIDs is a project available in many languages, including PHP. Maybe it can help you.
See you. Cheers.
Many thanks for the link – very useful!
Thanks! Welcome π
Out of interest, why wasn’t PHP’s base64 encoder used? That’s been available since PHP 4: http://php.net/manual/en/fu…
Hi @James!
Thanks for the comment. The main goal of using this IDs is to reduce the string length. Using base64_encode for example you have for the id 123 the string MTIzDQo, that is bigger than using integer only.
[]s
I found a copy of the Math library — it seems to work
https://raw.githubusercontent.com/adamgoose/pastes/master/app/libraries/Math.php
Thanks Brad! I’ve asked Taylor but without answer. I create a gist with the content you sent me:
https://gist.github.com/jgr…
Thanks again!
Link for Math class doesn’t work. can you please update it.
Thank you.
Hi Wesam. I’m contacted Taylor to take Math class again. I will contact you soon. Regards
Hi Wesam! Sorry for the late answer! @Brad sent me a copy of the Math class and I created a gist with that content. I’m updating now in the post. https://gist.github.com/jgr…
Thanks.
Hi,
You don’t need to add a “base62_slug” column in your fatabase, as you gave the code to convert from base B to base 10 π
And, I think it could be better to generate a random integer and store it (so that no ond can get the item he wan’t by forging an ID)
- PunKeel
Hi PunkKeel!
You are right! Your object does not need that column on the database for storage π
And about forging an ID you can modify the order of your elements (like letters, numbers) to make a custom algorithm and generate “custom hashes”.
Thank you for the contribution!
Regards!
When you have to add millions of entries per day into a database you might not want to use random integers… as a base. That would end in many entries with the same id.
Hi Debugger!
The goal here is to use the generated integer (auto increment) to generate the string like id.
Thank you. []s
Hey there…cool post, but your example doesn’t seem to work. Running 23435 through the to_base function produces ’65Z’, which is not very useful as an id. Do you not get the same results when using the ‘23435’ ?
Hi Josh!
The decimal 23435 in base 62 is “65Z”. This is right. The ids can be a combination (logic) of any uppercase or lowercase letter and decimal numbers, decreasing the number of chars you must use to represent a decimal number.
Any questions just ask!
Best regards.