Fetching random rows of MySQL efficiently

Saturday, June 7, 2008

After having a discussion with a friend about how inefficient the usual way of fetching random table rows is, I came to the result that he really had the better way. Hence I will present you this solution here. First let's see, what's the problem about the normal way is.

Usually, you would try to get random rows with something like the following query:
SELECT `column` FROM `table` ORDER BY RAND() LIMIT 5
So, you would get 5 random rows from the table and would be happy; but the database wouldn't. It had to assign a random number to every row in the table, to just fetch 5 of them. Well, while speaking about a table with about one hundred entries, this may be fine, but what about a table with one hundred thousand entries? There you will notice a huge performance slowdown.

So now you may ask, if there's a better solution; and yes, there is. It is quite simple, after you understood the principle. I assume at this point, that you have an auto-increment primary key, which should be the default for most tables. For the entire procedure you need three queries (when working with MySQL only; with a scripting language, you could do the second step on client-side, which I also suggest). The first one will determine the highest primary key in the table:
SELECT @max_id := MAX(`id`) FROM `table`;
Now we have the highest ID and can do some tricks on the client side. Let's first try to get a single random row. For this, we need a random number between 1 and the highest ID:
@rand_id := FLOOR(1 + (RAND() * (@max_id - 1)));
Now we will select a random row with this ID. as the row with this the generated random ID could be deleted yet, we have to some some trick:
SELECT `column` FROM `table` WHERE `id` >= @rand_id ORDER BY `id` ASC LIMIT 1
So, now we have a single random column selected. But what, if we want more columns like in our beginning example? Just raising the limit wouldn't be a good idea, since we would get the four following rows. For this result, we have to digg a bit deeper. There is sadly no perfect way for it yet, and it requires some client-side programming.

After fetching the highest ID, you generate ten times so many random IDs as you want to fetch from the table, and concetinate them with ",". So we can want to get 5 rows, you generate 50 random IDs. Now you can try the following query:
SELECT `column` FROM `table` WHERE `id` IN ($randIds) LIMIT 5
After fetching the result, you check, if you got the expected number of rows. If it doesn't match (which should very less the case), you can still fallback to the classic method with "ORDER BY RAND()".

Update 1
I have created a benchmark, to show you how huge the difference between those two methods is, click on the image to get it in full size:



Update 2
As many people don't know, I've also written a new article on devzone which gives a better random results, and also works well with multiple results.

Comments to this article

  • Avatar of Toppy Reply Toppy Wednesday, June 11, 2008 12:53 PM

    Nice article, short, clear and informative. I had no idea MySQL would assign random numbers to each field as stated in the standard rand statement. Looking forward to more juicy tidbits!

  • Avatar of notmessenger Reply notmessenger Thursday, June 12, 2008 7:58 PM

    Very informative - thank you!

  • Avatar of Danny Reply Danny Tuesday, April 28, 2009 6:40 PM

    Excellent. Thanks a lot for this simple and nice idea.
    All the other suggestions I saw were non scalable - just needed to see the >= as the hint to how to overcome the case of deleted rows.
    I am guessing there may be a way to do this using one query + subquery combination.
    (love the ascii art captcha... never seen one before :)

  • Avatar of Romeo Adrian Cioaba Reply Romeo Adrian Cioaba Tuesday, May 5, 2009 7:07 PM

    Excellent solution!

    I was wondering if there is a way to do such a random select using Zend_Framework?

    @Danny: the ascii art captcha is implementend using Zend_Text_Figlet.

  • Avatar of Sid K Reply Sid K Thursday, May 7, 2009 1:20 PM

    Just tried this out on a project I'm working on and the query time down to less than 5% of the old value, thanks so much. I have a question about the maths though.
    FLOOR(1 + (RAND() * (@max_id - 1)))
    Is that not the same as:
    CEIL(RAND() * (@max_id - 1))

  • Avatar of Ben Scholzen 'DASPRiD' Reply Ben Scholzen 'DASPRiD' Monday, May 11, 2009 10:14 PM

    Hey Sid, I'm glad that this helped you ot so much. About the formula: This is a common one I mostly use in SQL for generating random number. Say you want a random number between 2 and 5, then you would do:

    FLOOR(2 + (RAND() * (5 - 1)))

  • Avatar of Piotr Czachur Reply Piotr Czachur Tuesday, May 12, 2009 3:32 PM

    You also be interested in in-depth look on this subject: http://jan.kneschke.de/projects/mysql/order-by-rand

  • Avatar of David Harrison Reply David Harrison Monday, June 29, 2009 11:59 PM

    One problem with random results occurs when you wish to paginate them. It was important for us that our users could maintain the 'same' random order as they moved up and down the pages.

    Our solution was to run a cron job to randomise the random_id column every night. Each day we can just show the results in random_id order. The results paginate fine and there is never a delay with the query. The displayed items just appear reshuffled every day and that may be good enough for many.

  • Avatar of Laurence Reply Laurence Wednesday, October 14, 2009 3:18 AM

    It's not really a random row any more though. It assumes the IDs are spread evenly throughout the table. If there are any big gaps, the randomizer will possibly, or even probably, land in those gaps. The rows returned will be those that occur just after the gap.

    This isn't hypothetical! We use just such a table for caching heterogenous items that can appear in search results. So that we can map from the cached table back to their sources, we add an offset to their original IDs. For the first item type, we use 10000000, for the second 20000000, and so on. Typical IDs look like 10000834 and 20004395. You can see that a randomizer will almost always land in the gaps, and thus return the first of each item type.

  • Avatar of DASPRiD Reply DASPRiD Saturday, December 12, 2009 12:15 PM

    That's true Laurence. But when you look at the article at http://devzone.zend.com I wrote, you'll see a solution which skips that problem.

  • Avatar of polarthebear Reply polarthebear Wednesday, April 7, 2010 6:05 PM

    i liked it! its good tuts dude!

  • Avatar of nerkn Reply nerkn Monday, August 9, 2010 5:39 PM

    http://jan.kneschke.de/projects/mysql/order-by-rand/
    Uses another table. Worth to look. the principle behind is keep another table holeless. Using triger can be reasonable for big project.

    I think db should handle that stuff both random row fetching and last row fetching. I'm humbling for mysql, drizzle please hear!

    • Avatar of Ben Scholzen 'DASPRiD' Reply Ben Scholzen 'DASPRiD' Monday, August 27, 2012 12:04 PM

      I wrote a follow-up for this on devzone.zend.com, which does exactly the same with triggers.

  • Avatar of Pico Reply Pico Wednesday, December 8, 2010 5:47 PM

    Hi Ben,

    I found your solution very usable. While I do understand the original issue with the RAND() implementation and its load on the db server, my questions is:

    Does it pay to use your 2-query solution on a table that will have no more than 50 rows over its projected lifetime? Granted, the query will feed a jquery gallery on the homepage for a site so potentially there might be quite a few daily RAND() calls to the database.

    Cheers

    • Avatar of Ben Scholzen 'DASPRiD' Reply Ben Scholzen 'DASPRiD' Monday, August 27, 2012 12:04 PM

      As you can see in the graph, with 50 rows you won't have any real problem.

Leave a comment

Please note that your email address will not be shown, it is only used to fetch your avatar image from gravatar.com and for notifications.

                                  
  ___ _   _ _   _ _   ___   _____ 
 / __| | | | | | | | | \ \ / / _ \
| (__| |_| | |_| | |_| |\ V /  __/
 \___|\__, |\__,_|\__,_| \_/ \___|
      |___/