Need data on disk drive management by OS: getting base I/O unit size, “sync” option, Direct Memory Access

I want to ensure I have done all I can to configure a system's disks for serious database use. The three areas I know of (any others?) to be concerned about are:

  1. I/O size: the database engine and disk's native size should either match, or the database's native I/O size should be a multiple of the disk's native I/O size.
  2. Disks that are capable of Direct Memory Access (eg. IDE) should be configured for it.
  3. When a disk says it has written data persistently, it must be so! No keeping it in cache and lying about it.

I have been looking for information on how to ensure these are so for CENTOS and Ubuntu, but can't seem to find anything at all!

I want to be able to check these things and change them if needed.

Any and all input appreciated.

PLEASE NOTE: The actual hardware involved is VERY modest. The point is to get the most out of what hardware we do have, even though it's "not very serious hardware" from a broader perspective.

MORE:

I appreciate the time taken to read and reply, but I'm hoping to get "answers" that aren't just good database / hardware advice but answers that actually address the specific things I asked about. Namely:

1) What's a good easy way to tell what the I/O unit size is that the OS wants to do? How can I change it? (IOW: If this exclusively a file-system-format issue, how can I tell what was used on an already-created file system? I know /etc/fstab will tell me the file system format... In this case, it's ext3.

2) How can I tell if a disk drive has DMA? If so, how can I turn it on? (I've been told that some drives have this capability, but now I want to follow up and ensure that if these drives have it, it's turned on.)

And, finally;

3) How can I tell if a drive is merely telling the writer that their material is written when it's actually still in cache? And, more importantly, how can I set the system to NOT use such features if / when they exist?

Thank you for your insights. RT

Answers


"serious database use" and you mention IDE in the same sentence?

SSDs or 15k SCSI in a many spindle RAID 1+0 array with separate arrays for data, log and backup. Consider a separate array for tempdb too.

You'd also switch the controller cache to 100% read too to avoid caching issues

Of course, if it's "serious" then you'd consider clustering etc: so a SAN comes in useful here but you may not be as quick as local spindles


1) Check /sys/block/sdX/queue/{max_hw_sectors_kb,max_sectors_kb}. The first is that max transfer size the hw allows, the other is the current maximum which can be set to any value <= max_hw_sectors_kb

2) hdparm -i /dev/sdX

3) Turn off write-back caching (hdparm can do it), or make sure that the filesystem issues barriers when synchronizing (as in fsync(), or journal commit).


You didn't include any info on filesystem or database, so here are some misc pointers.

It is inevitable that you will lose a disk eventually, so its equally important to put a good backup and recovery strategy in place, and mirror your transaction logs, so you can handle a disk failure or even full datafile loss.

1) If possible, put at least one copy of your transaction log on a fixed disk. Don't put your sole transaction log to an external storage subsystem. (Assuming you use a db that supports log mirroring).

2) I agree with gbn, in practice, don't use write caching. I've lost databases on RAID arrays with battery backup. Configure the storage controller card for write-through.

3) Raw devices provide guaranteed writes, but its not worth the hassle. Some filesystems provide synchronous write options too, use one if possible. I am partial to VxFS, but I'm from the Sun world. On Linux, btrfs is eminent at least, but for now, Ext3 works fine if you setup your db properly.


Need Your Help

Shortest path between 2 points in a weighted 2d array

java shortest-path

I'm having problems getting my shortest path algorithm to work correctly. I have a 2d array 20 x 20 that contains the edge weights representing roads between cities. I do not get the correct result...

Both DataSource and DataSourceID are defined on 'GridView1'. Remove one definition

c# asp.net sql

I'm using a gridview and SqlDataSource to bind the data table information to the gridview.