mdraidによる RAID構築 – チラシの裏

数年前に構築した RAID(500GBの RAID1+スペア)が手狭になってきたので 2Tのディスクを使った同じ構成にアップグレード。

用意したのは WDの 2T、WD20EARXを三台。^*1

まずはパーティションを切る。(/dev/sdc /dev/sdd)

root@lachesis:~# fdisk -u /dev/sdc
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0x592756ff.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won’t be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

The device presents a logical sector size that is smaller than
the physical sector size. Aligning to a physical sector (or optimal
I/O) size boundary is recommended, or performance may be impacted.

WARNING: DOS-compatible mode is deprecated. It’s strongly recommended to
switch off the mode (command ‘c’).

Command (m for help): p

Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x592756ff

Device Boot Start End Blocks Id System

Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First sector (63-3907029167, default 64): 64
Last sector, +sectors or +size{K,M,G} (64-3907029167, default 3907029167):
Using default value 3907029167

Command (m for help): p

Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x592756ff

Device Boot Start End Blocks Id System
/dev/sdc1 64 3907029167 1953514552 83 Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
root@lachesis:~# fdisk -u /dev/sdc
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0x594661ea.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won’t be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

The device presents a logical sector size that is smaller than
the physical sector size. Aligning to a physical sector (or optimal
I/O) size boundary is recommended, or performance may be impacted.

WARNING: DOS-compatible mode is deprecated. It’s strongly recommended to
switch off the mode (command ‘c’).

Command (m for help): p

Disk /dev/sdd: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x594661ea

Device Boot Start End Blocks Id System

Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First sector (63-3907029167, default 64):
Using default value 64
Last sector, +sectors or +size{K,M,G} (64-3907029167, default 3907029167):
Using default value 3907029167

Command (m for help): p

Disk /dev/sdd: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x594661ea

Device Boot Start End Blocks Id System
/dev/sdd1 64 3907029167 1953514552 83 Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

次いで RAIDの作成。

root@lachesis:~# mdadm -C /dev/md1 -l1 -n2 /dev/sdc1 /dev/sdd1^*2

作成完了をひたすら待つ。^*3

root@lachesis:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md1 : active raid1 sdd1[1] sdc1[0]
1953513392 blocks super 1.2 [2/2] [UU]
[>………………..] resync = 0.1% (2376768/1953513392) finish=670.1min speed=48524K/sec

12時間後、元データのコピーをし、再起動…md126とか md127って何ですか？

root@lachesis:/etc# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md126 : inactive sdc1[0](S)
1953513528 blocks super 1.2

md127 : active (auto-read-only) raid1 sdd[1]
1953513424 blocks super 1.2 [2/1] [_U]

md0 : active raid1 sde1[0]
488383936 blocks [2/1] [U_]

unused devices:

調べてみると、「mdadm 3.1.4にはバグ有り」という情報が。
手元の環境は…3.1.5だね。

「slackpkg update ; slackpkg upgrade mdadm」してみると、3.2.5に更新されている…が、状況変わらず。

「mdadm –detail –scan」の結果を「/etc/mdadm.conf」に追記というのも試してみたけど効果なし。

以前の RAIDと何が違うのか考えてみると、メタデータのバージョンと容量ぐらいしか。
容量はどうしようもないのでメタデータのバージョンを 0.9にして再構築してみたところ…ビンゴ。
0.90だとボリューム数やボリュームサイズに制限^*4 があるようだけど、今回は問題ないね。

mdadm -C /dev/md1 -l1 -n2 –metadata=0.9 /dev/sdc1 /dev/sdd1

再起動後も /dev/md1が保たれていることが確認できたので resyncが終わるまで半日待ち。

最後にスペアディスクを追加。

root@lachesis:~# mdadm /dev/md1 –add /dev/sdf1
mdadm: added /dev/sdf1

RAIDの状態を確認して終了。

root@lachesis:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md1 : active raid1 sdf1[2](S) sde1[1] sdc1[0]
1953513472 blocks [2/2] [UU]

unused devices: <none>

2012/09/11追記

再起動を行ったら /dev/md127が再度出現。

最後にスペアとして追加したディスクがさまよい出たらしい。

んーむ…

2012/09/15追記

さまよい出たディスクを単体で RAID1に仕立てる。

root@lachesis:~# mdadm -S /dev/md127
root@lachesis:~# mdadm -C /dev/md127 -l1 -n2 /dev/sde1 missing

再起動してやるとリードオンリーになってる。
片肺死んでいる状態で作成したせいかな？

root@lachesis:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md127 : active (auto-read-only) raid1 sde1[0]
1953382336 blocks super 1.2 [2/1] [U_]
md1 : active raid1 sdd1[1] sdc1[0]
1953513472 blocks [2/2] [UU]

強制的に RWに設定して初期化してみる。

root@lachesis:~# mdadm –readwrite /dev/md127

2012/09/17追記

/dev/md1から一台抜いて、/dev/md127に追加＆ resync

root@lachesis:~# mdadm /dev/md1 –fail /dev/sdd1
mdadm: set /dev/sdd1 faulty in /dev/md1
root@lachesis:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md127 : active raid1 sde1[0]
1953382336 blocks super 1.2 [2/1] [U_]

md1 : active raid1 sdd1[2](F) sdc1[0]
1953513472 blocks [2/1] [U_]

unused devices: <none>
root@lachesis:~# mdadm /dev/md1 –remove /dev/sdd1
mdadm: hot removed /dev/sdd1 from /dev/md1
root@lachesis:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md127 : active raid1 sde1[0]
1953382336 blocks super 1.2 [2/1] [U_]

md1 : active raid1 sdc1[0]
1953513472 blocks [2/1] [U_]

unused devices: <none>
root@lachesis:~# mdadm /dev/md127 –add /dev/sdd1
mdadm: added /dev/sdd1
root@lachesis:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md127 : active raid1 sdd1[2] sde1[0]
1953382336 blocks super 1.2 [2/1] [U_]
[>………………..] recovery = 0.0% (74496/1953382336) finish=2184.5min speed=14899K/sec

md1 : active raid1 sdc1[0]
1953513472 blocks [2/1] [U_]

unused devices: <none>

再起動してみると…駄目だ。
元から /dev/md127にあった /dev/sde1が見えなくなって、/dev/sdd^*5 だけの片肺＆auto-readonlyで状況が変わらない。
抜いたディスクを元に戻して再び悩む。

root@lachesis:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md127 : active (auto-read-only) raid1 sdd[1]
1953513424 blocks super 1.2 [2/1] [_U]

md1 : active raid1 sdc1[0]
1953513472 blocks [2/1] [U_]

*1 価格的にはタイの洪水で値上がりし始めた時期レベルで、最安時 +2,000ってところ。

*2 l1は RAID1、n2はディスク二つ

*3 表示を見ればわかるけど、2Tの同期ともなると 670分を要する…11時間か…_no

*4 「This format limits arrays to 28 component devices and limits component devices of levels 1 and greater to 2 terabytes」と書いてあるけど、terabytesが TiBなのか TBなのかわからんな。

*5 /dev/sdd1でないのがそもそもおかしい。

2012/09/11追記

2012/09/15追記

2012/09/17追記

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル