It was a long weekend of watching tape restores and restarting them as necessary but it’s finally over and everything appears to be mostly hunky dory!
I did discovery yet more small misconfigurations and strange behaviour along the way:
- OpenLDAP’s syncrepl using “refereshAndPersist” wasn’t working how I expected it to, no new changes were replicating to the slave LDAP server! I changed the directive to “refreshOnly” and set a 10 minute interval. I made several changes and monitored the slave LDAP server. Changes propagated in about 10 minutes, every time.
- Despite iSCSI’s maturity and the maturity of QLogic’s HBAs I still noticed strange, unexplained target drop outs. Two HBAs per server, two controllers in the IBM DS3300 and just one target out of four was dropping. At first, I couldn’t figure out how to properly reconnect the target on a live system so I rebooted. Later, I discovered you can “disable” and then “enable” the specific target in SANsurfer or iscli, which worked to bring back the dropped target on a live system. Multipath picked up the “new” path right away, as expected.
- Always remember to leave free physical extents in any LVM Volume Group in which you are taking snapshots of the Logical Volumes. It’s freakin’ obvious but I forgot and when I went to do snapshot backups, the snapshots were failing. Now I’m growing some LUNs on the DS3300 so that my VGs have room for snapshots.
All in all, a good weekend that was mostly filled with success.