Saturday, August 05, 2006

mysql partitioning

Table partitioning is a concept which allows data to be stored in multiple locations on one or more disks and access to the data is through SQL queries. The way data is stored and retrieved is invisible to the end user.

Mysql allows the user to select the rules based on which data would be spread over multiple partitions in a table. Mysql supports horizontab partitioning whereby the rows of a table are spread across multiple locations. Support for vertical partitioning is not there. So using mysql partitioning, you cannot assign different columns to different physical partitions of a table.

Benefits of partitioning a table :
1. Being able to store more data in one table than can be held on a single disk or filesystem partition. Though current operating systems allow extremely huge files on the disk.
2. Data that loses its usefulness can often be easily be removed from the table by dropping the partition containing only that data. And adding of new data can be facilitated by adding a new partition specially for that data.
3. Some queries can be greatly optimized in virtue of the fact that data satisfying a given WHERE clause can be stored only on one or more partitions, thereby excluding any remaining partitions from the search. Data can also be re-organized to move more frequently accessed data to one partition.
4. Queries involving aggregate functions such as SUM() and COUNT() can easily be parallelized.
5. Greater query throughput can be achieved by spreading data seeks over multiple disks.

I had used partitioning with MYIASM tables and found out that they were very useful. The complete logic of how rows were divided and stored and how they need to be accessed was invisible to me. What happened when i created some partitions for the table is that the same number of .MYI and .MYD files were created. Though the table definition remained in the single .frm file. A separate .PAR file was created for the table which i think was being used to define the logic of how database was being partitioned.

Moreover, an insert on the table locks all the partitions for a small time that is, the time needed by mysql to decide in which partition the record should be put.

The logic using which a table can be partitioned are listed as below:

RANGE partitioning -> Here table partitions are defined based on a range of data to be stored in each partition. For example -

CREATE TABLE abc (
name VARCHAR(100) NOT NULL,
email VARCHAR(40),
dob DATE NOT NULL
)
PARTITION BY RANGE( YEAR(dob) ) (
PARTITION p0 VALUES LESS THAN (1960),
PARTITION p1 VALUES LESS THAN (1970),
PARTITION p2 VALUES LESS THAN (1980),
PARTITION p3 VALUES LESS THAN (1990),
PARTITION p4 VALUES LESS THAN MAXVALUE
);

This would create 5 partitions in the table abc. Partition p0 will contain records whose dob is from 0 to 1959, partition p1 will contain records whose dob is from 1960 to 1969 and so on. In case a row is entered whose dob cannot be accomodated in any of the partitions, an error is thrown and the data is not inserted in the table.

LIST partitioning -> Here table partitions are defined based on a list of data for each partition. Rows which satisfy a criteria list is inserted in that partition.

CREATE TABLE abcs (
name VARCHAR(100),
dob DATE NOT NULL DEFAULT '1979-01-01'
age INT
)
PARTITION BY LIST(age) (
PARTITION p0 VALUES IN (3,5,6,9,17),
PARTITION p1 VALUES IN (1,2,10,11,19,20),
PARTITION p2 VALUES IN (4,12,13,14,18),
PARTITION p3 VALUES IN (7,8,15,16)
);

This would create 4 partitions in the table abcs. Partition p0 would contain records whose age would be 3,5,6,9 or 17. Again if you try to insert a record whose dob is not in any of the partition list created, an error is thrown and the record is not inserted.

LINEAR HASH partitioning -> Here table data is evenly divided between all partitions using some simple functions. for example

CREATE TABLE abc (
name VARCHAR(100),
dob DATE NOT NULL DEFAULT '1970-01-01',
age INT
)
PARTITION BY HASH( YEAR(dob) )
PARTITIONS 4;

Here 4 partitions are created and data is distributed using the following formula
Partition number = MOD(YEAR(dob),number_of_partitions);
So since my date of birth is 1980, my record will be in partition number 0.

LINEAR HASH partitioning -> This is almost similar to hash partitioning except for the fact that the algorithm used to divide data is different. The syntax is also almost same. We use PARTITION BY LINEAR HASH instead of PARTITION BY HASH.

The algorithm used is :

Given an expression expr, the partition in which the record is stored when linear hashing is used is partition number N from among num partitions, where N is derived according to the following algorithm:

  • Find the next power of 2 greater than num. We call this value V; it can be calculated as:
    V = POWER(2, CEILING(LOG(2, num)))

  • Set N = F(column_list) & (V - 1).

  • While N >= num
    {
    Set V = CEIL(V / 2)
    Set N = N & (V - 1)
    }



The advantage in partitioning by linear hash is that the adding, dropping, merging, and splitting of partitions is made much faster, which can be beneficial when dealing with tables containing extremely large amounts of data. The disadvantage is that data is less likely to be evenly distributed between partitions as compared with the distribution obtained using regular hash partitioning.

KEY partitioning ->
Partitioning by key is similar to partitioning by hash, except that where hash partitioning employs a user-defined expression, the hashing function for key partitioning is supplied by the MySQL server. The syntax used is PARTITION BY KEY instead of PARTITION BY HASH.

In most of the cases either a primary key or an unique key is used to create partitions.

Mysql also supports sub partitioning whereby a partition can be divided into further sub partitions of similar type.

I think i am creating a very long blog for mysql partitioning. However there are a list of syntaxes available on the mysql site to alter, analyze, optimize, rebuild, checking and repairing partitions. Please check http://dev.mysql.com/doc/refman/5.1/en/partitioning.html. It will give a more detailed idea of whatever ranting i have done over here...

I will be signing off from here now. Next blog may or may not contain more info about partitions...

Anyways will keep blogggging...

No comments: