10000+ tables in one MySQL database

| 10 Comments

Once in a while I hear people talking about using thousands or tens of thousands of tables in one MySQL database and often how it “doesn’t work”. There are two things to say to them:

  • Are you nuts?!
  • Sure it works

I’ll elaborate a little on both …

Why’d you do that? Are you nuts?!

In most cases when extraordinarily many tables are brought up the “correct answer” is: Fix your schema to not duplicate the same table layout for each customer/user/site/…!

Once in a while though there are good reasons to have way too many lots of tables. Even “takes too long to fix the application now” can be a good enough answer sometimes.

My use case was to use the many tables as an extra “index” for a situation that a regular index couldn’t cover. Tens of thousands of tables, here we come.

Nova Cimangola Cement Plant

Sure it works, just configure it right!

… which is easy!

  • In your startup script (/etc/init.d/mysql with the MySQL RPMs), add “ulimit -n 30000” somewhere near the top to allow mysqld to have 30000 open files (and sockets).

  • In your mysql configuration, add something like the following - adjust the number as appropriate.

    set-variable = table_cache=12000

This will let MySQL keep up to 12000 tables open. The default limit is much too low and the system will spend all its time closing and opening the files. The other few thousand handles are free for database connections or whatever. You surely can tune the numbers more, but I haven’t needed to be more specific yet. MySQL uses two filehandles per open table (for MyISAM, it depends on the table type…)

Flushing the tables

One curious thing you’ll run into is that MySQL can take forever (read: hours!) flushing thousands of tables that have been changed if you do it with a simple “flush tables” or when you are shutting down mysql. That’s of course not “hmn, how curious” but rather insanely frustrating if you were going for a quick restart. This occasionally seems to happen with InnoDB tables too, but in particular with our large MyISAM system (we use fulltext indexes, hence MyISAM) this is a big issue.

With MyISAM tables, you also have a lot of cleaning up to do if the system crashes one way or another with thousands of tables open.

There’s an easy-ish solution to both these problems, though! Just flush the tables individually before the shutdown command and on a regular basis to mitigate the issue if it crashes. Remember the system or MySQL can crash for all sorts of reasons. Recently we had a motherboard with a BIOS or hardware bug that made it unstable after adding more than 16GB memory. On another box, one of the memory sticks went bad so suddenly it came up with 4GB memory less than previously. With MySQL carefully tuned to use all the memory (it was an InnoDB-only installation) it’d try using all 24GB memory and get killed by the kernel when it got above 20! Yeah, that one took some head scratching before I figured it out.

Below is a small program that’ll go through the database twice flushing all tables in all databases and then end with a regular “flush all”. We go through the database twice just in case the first flush took so long that a lot of tables got opened again. Two works for me, depending on your needs you might make it an option. Or a fancy version would check how many open tables are at the end of the run and go through it again if too many are open (and abort with an error if they open faster than they can get closed!). The final “flush tables” worked for me in getting everything closed just before the script exits (and the shutdown or whatever starts).

#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Long qw(GetOptions);
use DBI;

my %args = (
            verbose => 0,
            user     => $ENV{USER},
            hostname  => '127.0.0.1',
            port     => 3306
           );

GetOptions(\%args,
           'verbose!',
           'user=s',
           'password=s',
           'hostname=s',
           'port=i',
           'database=s@'
          );


my $dbh = DBI->connect("dbi:mysql:hostname=$args{hostname};port=$args{port}",
                        $args{user}, $args{password})
  or die DBI->errstr;

my @dbs = $dbh->func('_ListDBs');

for my $db (@dbs) {
  next if uc($db) eq 'INFORMATION_SCHEMA';
  $dbh->do("use $db");
  my $tables = $dbh->selectcol_arrayref("show tables");
  for (1..2) {
    for my $table (@$tables) {
      print "flushing $db.$table\n";
      $dbh->do("flush table $table");
    }
  }
}

print "flushing all\n";
$dbh->do("flush tables");

10 Comments

Don't forget the file system! Some file systems (i.e. ReiserFS) can handle thousands of files in the same directory without any problems. Others, such as Ext2/3 can really get bogged down once you reach a couple of thousand tables in the same database/directory.

I would say in a very large, very busy production environment, your approach would be unscalable, unmaintainable, unusable and unreliable. You should post your business requirement on why you need that many tables and see if there really is a different way to do what you want.

700,000 Here for a business intelligence application. I run hourly flushes -- via a cron -- similar to yours but I gather a list of tables that are not in use ie. "show open tables where in_use > 0". Then I loop through that list and flush the table.

We have a detailed record of the processes that are running so in the event of a crash it is pretty easy to identify the tables that need to be checked/repaired.

Really, the two biggest challenges are querying information_schema ( can run out of memory if not done correctly) , and backups ( slow).

Clever, Ask!

700,000 tables? holey shit, fix your schema man!

I'm toying with --delay-key-write=ALL so I needed to find a way to periodically force a flush of cached data. Unfortunately, FLUSH TABLE name appears to also block access to other tables (that are NOT being flushed). Other threads get stuck in the "Opening tables" state until the flush of that one table is complete... even though they're not attempting to access it.

Judging from the other comments this seems like an unusual situation; a production server shouldn't grind to a halt just because you want to flush a single table!

Any ideas on why this may be happening? MySQL 5.0.51a, FreeBSD 6.2-RELEASE 32 bit and 7.0-RELEASE 64 bit (happens on both)

Hi Rowan,

"FLUSH TABLES" does a database wide lock of sorts as it's also resetting some other caches and buffers (IIRC).

If you use the script from the post then it'll flush one table at a time and it won't lock the other tables as it's doing it.

You are right though, it'd nice if there was a "just make sure everything is on disk" variant that'd not block everything (maybe it wouldn't even need to actually close the table).


- ask

I splited my large database on a windows server that running 5 instances of MySQL. Each instance has 150.000 tables but some time: SHOW TABLES command that make HDD Read/Write too much and queue many of connections. I flush tables and flush hosts every hours.

Any suggestion please?

Thank you.

That's the strange thing, FLUSH TABLE (ie: only flush a specific table) seems to lock other tables as well. The entire server becomes nonresponsive to queries until the flush is complete. I ceased using --delay-key-write=ALL because of this issue. Even without this option, shutting down mysql still writes out an awful lot of data... sometimes taking several minutes. Hopefully I can figure out how to send it a TERM signal when the UPS has lost mains power for 30 seconds rather than when it's running low on batteries; if it's ended with a "graceful" server shutdown once the UPS signals it's time to die then FreeBSD will forcibly kill mysql after a time period of not exiting (from memory 60 seconds).

I've rewritten your script in PHP and I'm going to experiment with running it periodically. It's possible that the couple of times I tried FLUSH TABLE x / FLUSH TABLES there was so much cached that the response time seemed excessive.

I'm still learning, so be gentle. :)

About this Entry

This page contains a single entry by Ask Bjørn Hansen published on February 14, 2008 7:57 PM.

Real World Web: Performance & Scalability, MySQL Edition was the previous entry in this blog.

When you program open source, ... is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Pages

OpenID accepted here Learn more about OpenID
Powered by Movable Type 4.38
/* bf */