Enterprise computing solutions

So here’s the deal. I have 3 Dell M905, full-height blades, each with…

4 6-core Opterons (24 CPUs)
128 GB RAM
2 15K 73 GB spindles

4 1 Gig/E copper cards (Broadcom)
4 8 Gb/s Emulex Fibre channel cards

The Chassis is a Dell Blade Chassis m1000e with an 8-port Brocade fiber switch and an 8-port Cisco 3130 copper switch.

I am sharing the Chassis with about 10 Dell M610 half-height, medium load servers. I will have the only machines with access to the fibre channel cards. The rest use the copper interconnects. I can get a VLAN on the copper as needed, and most likely can ask for higher priority traffic routing on the copper.

SAN backend with EMC Clariion, 2 trays of disks, 15 15K 300 GB spindles each.

Must be able to cluster the three machines for high availability. I would prefer an active/active/active (scalable) cluster config, but would consider failover if the right argument can be made.

I have the choice of using any HA/Clustering operating system that will install on these M905’s, so long as

  1. a support contract can be bought for the OS (the manager requires this)
  2. all devices are supported with minimal fuss by installer (with minimal repackaging, as needed)
  3. it can export NFS shares and Samba (I would do straight NFS, but there are non-Windows 7 desktops still in use)

So far, I have

  1. one person recommending to use RedHat 5.5 with GFS2, clvmd and a lump of SAN presented to the cluster for use.
  2. one person recommending Solaris 10 10/9 with ZFS and forcing one LUN per disk/JBOD presentation from the SAN.
  3. A third, more “introspective” party told me to contract it out and have someone to blame when it does not work (not an option).

I need to support 1000+ users (> 250 concurrent on average) of NFS and Samba (it’s literally just an HA/Cluster for file sharing, no web servers or applications/DBs/etc.)


My SAN admin is a great guy and will accommodate my requests (even if it’s more work for him to do JBOD presentations).

What would you do with this setup to satisfy up to 1000+ users?

(BTW, a valid answer could include “hire a consultant to analyze the situation and give a report on how to proceed”)

250 concurrent connections really isn’t that much

Your biggest limitation is going to be disk I/O the servers are serious overkill

Why would you ever do 1 LUN per disk? That would kill performance

How much data are you storing?

I think I understood the first sentence. I’m not exactly sure what you’re doing with this, but if I can’t download ANSYS 13.0 when it comes out, I’m blaming you. :smiley:

I still don’t understand your JBOD theory

Depending on the amount of storage you need and projected growth you should design you LUN or LUNs also if your storing video or a certain type of file you can tune stripe breadth for max performance

Jeff95TA, these are not external facing servers, no worries. But I can yell over the wall at the guy responsible! :slight_smile:


LZ, I don’t really know why I was given these three particular machines. I thought it was overkill too. I’d be better off using them for HPC solver testing or a DB middleware server than file serving, but oh well. We literally had them laying around and they’re under warranty still, so what the heck.

And the idea of not letting the Clariion handle the striping was a baffling approach…so much so that I figured a post to the new neighborhood was in order to see if any SAN admins out there had their spidey-sense start tingling…apparently it did!

The data is binary and dense (.zip, .tar.gz, .rpm, sun flash archives, satellite kickstarts, autoyast2 images, .iso, etc.) individual file sizes range from 1KB to 400GB (a rather wide variety) with a large deviation in sizes.

Everything I am reading tells me that using spreading the disks over 5-6 LUNS ranging from 500 GB to 800 GB in size would probably be a good compromise and leave a few spares in the array for expansion.

The actual underlying data files will only grow about 10% a year, so by the time I need more disks these Clariions and the Blades will be out of warranty and I’ll get to start over in a few years with a new system.

I’m hoping solid state will come down in price and I can go with a tray of solid state drives.

I think this wins as one of the most technical questions asked on here.

Depending on amount of data you need stored usually you build X number of LUNs then stripe across the LUNs and let the SAN handle all that…also leaving room for hot spares.

Since you’re running a variety of file sizes and types there isn’t that much tuning you can do.

Thanks, I appreciate the help and input. I’m going to grab a few disks and config them in various ways, profiling the copying of variously sized files from many scripted, virtual clients spread over a pretty large vSphere. Should give me somewhere to start.

Would solid-state flash-caches help with the I/O bottleneck I will ultimately run into at the Clariion?

I would go the Solars/ZFS/jbod route.

Use good SSDs for l2arc and zil and you’ll get stupidly good throughput on writes, and (after the l2arc gets populated) common reads.

i have absolutely no clue what’s going on in this thread…

bing i think theyre making up words

Solars = Solaris (typo)

ZFS = incredibly nice filesystem/volume manager for Solaris. Give it a try if you haven’t yet; it’s a joy to use.

ZIL = ZFS Intent Log: use a dedicated disk to cache writes. Put it on an SSD for stupidly fast writes

ARC = ZFS in-memory cache

L2ARC = level 2 ARC (use a disk to expand read cache past what’l fit in memory). Put it on an SSD for stupidly fast reads

lol

Test 1. practical test of copying some files from old NAS to new SAN through Solaris 10 9/10 with raidz. Some burst speeds that I got were:


cpu
 us sy wt id
 0 19  0 81
extended device statistics
   r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   0.0  998.0    0.0 113986.4  0.0  6.2    0.0    6.3   2  66 c7t6<LUN1>d0
   0.0 1737.0    0.0 201874.7  0.0  6.0    0.0    3.5   3  66 c7t6<LUN2>d0
   0.0  989.0    0.0 112701.9  0.0  6.2    0.0    6.3   2  66 c7t6<LUN3>d0
 553.0    0.0 13805.2    0.0  0.1  2.1    0.1    3.8   3  78 nas:/qaset
.
.
.
cpu
 us sy wt id
 0  5  0 95
extended device statistics
   r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   0.0 1373.9    0.0 157059.2  0.0  9.8    0.0    7.1   2 100 c7t6<LUN1>d0
   0.0  675.0    0.0 73268.7  0.0  2.1    0.0    3.2   1  23 c7t6<LUN2>d0
   0.0 1362.9    0.0 156475.7  0.0  9.8    0.0    7.2   2 100 c7t6<LUN3>d0
 755.9    0.0 20814.9    0.0  0.0  2.2    0.0    2.9   1  75 nas:/qaset

               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
pool1        237G   163G      0  3.87K      0   318M
.
.
.
pool1        238G   162G      0  4.35K      0   328M

Thoughts?

Our SAN guy was shaking his head in disbelief at this one, but he obliged:

Using a JBOD 12 drive/4 raidz stripe array and a 73 GB SSD ZIL…

rsync -aSH --delete <source> <target>


zpool iostat pool1 60
               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
pool1        711G  2.43T      4    547  2.34K  39.9M
pool1        716G  2.43T      7  1.12K  4.00K  60.4M
pool1        721G  2.42T     12    911  7.88K  57.6M
pool1        <b>724G</b>  2.42T     11    836  9.48K  39.7M
pool1        <b>731G</b>  2.41T      4    930  4.23K  <b>78.5M</b>
pool1        736G  2.41T      6    743  6.43K  63.2M
pool1        742G  2.40T     12    908  14.0K  54.2M
pool1        747G  2.40T      9  1.14K  8.40K  68.7M

Yes, that just wrote nearly 7 GB to the SAN in one minute. I’ve never seen anything written to our CX4-480 that fast.

~7 GB/minute -> ~56 Gb/minute -> ~0.93 Gb/s <- it’s nearly saturating ethernet 1Gb/s line speed pulling from the crusty ol’ Celerra NAS we have.

Oh snap. I just hit a burst write speed of 422 MB/s (~3.3 Gb/s …the DAE back end has a 4.0 Gb/s limit)


# zpool iostat pool1 1
pool1        900G  2.25T      0  3.03K      0   353M
pool1        900G  2.25T      0  5.10K      0   308M
pool1        900G  2.25T      0  3.82K      0   422M

Glad that ZFS is working well for you.

Pity it’s now largely in Oracle’s hands. :frowning:

I can’t believe what a difference this setup makes. Using the flash-cache and RAID 5 LUNS on Linux w/xfs/gfs/gfs2 using either Linux or EMC multipathing can’t even touch this.

The hardcore Solaris geeks I talk to agree about the Oracle thing but they also agree that the ability to dump a supported Solaris on to certified Dell x86 PC buckets is probably the one thing that will prevent Solaris from going the way of the dinosaur, like xfs/SGI and hpfs/HPUX and let’s not forget DEC.

Next step is to bond the 4 copper ports on the M905 and have 4 1Gb/s front ends funnelling down into the system and test reads. Once the 128 GB RAM for ARC warms up, I expect even better things.

…on a related note…testing a linux set up I got this: lol…dazed and confused.

lol

can you guys fix our iTrader problems on your lunch brake?