Cray CS400-AC Manuel d'utilisation et d'entretien

CS400-AC Hardware Replacement Procedures
HR90-2009

Contents
About the CS400-AC Hardware Replacement Procedures.......................................................................................3
CCS Parts Repair and Replacement.........................................................................................................................4
CS400-AC Part Numbers................................................................................................................................5
Power Cords............................................................................................................................................................10
PCI Card Brackets...................................................................................................................................................11
GB522 Overview and Replacement Videos.............................................................................................................13
GB522X Replacement Procedures..........................................................................................................................14
GB522X Power Down....................................................................................................................................14
GB522X Chassis Removal and Installation...................................................................................................14
GB522X Cover Removal...............................................................................................................................15
GB522X Cover Installation............................................................................................................................16
GB522X Exploded Views...............................................................................................................................16
PCIe Card Installation....................................................................................................................................18
HDD/SSD Replacement................................................................................................................................20
GB522X SSD Replacement (High-Mount Bracket).............................................................................20
GB522X HDD Replacement (First Generation Chassis).....................................................................21
GB522M/N Replacement Procedures.....................................................................................................................23
GB522MN Expansion Blade Removal...........................................................................................................23
GB522MN Cover Removal............................................................................................................................24
GB522MN Cover Installation.........................................................................................................................25
GB522MN: Separate/Join Expansion and Compute Blades.........................................................................25
GB522M and GB522N Exploded Views........................................................................................................28
GPU/Coprocessor Replacement...................................................................................................................30
Motherboard, Processor and DIMM Replacement..................................................................................................32
GB522X Motherboard Removal....................................................................................................................32
GB522X Motherboard Installation.................................................................................................................33
Processor and Heatsink Replacement..........................................................................................................34
DIMM Replacement.......................................................................................................................................39
Subrack Parts Replacement Procedures.................................................................................................................41
CS400-AC Fan Assembly Replacement.......................................................................................................41
iSCB Module PCBA Replacement (CS400-AC)............................................................................................42
SR5110 Midplane and Power Backplane Replacement................................................................................44
CS400 Subrack Support Angles....................................................................................................................46
SR Subrack Replacement.............................................................................................................................47
SR Subrack Installation.................................................................................................................................49
Contents
HR90-2009 2
CS400-AC Hardware Replacement Procedures

About the CS400-AC Hardware Replacement Procedures
The CS400-AC Hardware Replacement Procedures lists and shows GreenBlade, Subrack, processor/DIMM, and
drive parts for the system. This manual includes procedures for replacing components in GB522 blades and
expansion blades. It also includes procedures for replacing parts in the subrack. Cray personnel or customers
who perform these replacement procedures must first complete Cray CCS hardware training and observe ESD
precautions when servicing this equipment.
Document Versions
HR90-2009: April 2016.
ESD Precautions
Observe electrostatic discharge (ESD) precautions during the entire removal and installation process. Required
apparel includes an ESD smock, ESD shoes, and an ESD wrist strap.
CAUTION:
●ESD Precautions
● Observe all ESD precautions. Failure to do so can result in equipment damage.
ESD Smock
Wear a Cray approved static-dissipative smock when you service or handle an ESD-sensitive device. Completely
button the smock and wear it as the outermost layer of clothing. You must have a portion of the smock’s sleeves
in direct contact with the skin of your arms. Skin contact is essential for a dissipative path-to-earth ground through
your wrist strap. Tuck hair that exceeds shoulder length inside the back of the smock.
ESD Shoes
Wear approved static-dissipative shoes or approved dissipative heel straps on both shoes when you service or
handle an ESD-sensitive device. When sensitive equipment is exposed to static discharge, ESD shoes provide a
backup to the wrist straps and grounding cords and help prevent an excessive charge from building up on you
when you contact the conductive flooring. Use dissipative footwear in addition to, not as an alternative to, a wrist
strap.
ESD Wrist-strap
Wear a Cray approved wrist strap when you service or handle an ESD-sensitive device to eliminate possible ESD
damage to equipment. Connect the wrist strap cord directly to earth ground.
Feedback
Visit the Cray Publications Portal at https://pubs.cray.com and use the "Contact us" link in the upper-right
corner to make comments online. Your comments and suggestions are important to us. We will respond to them
within 24 hours.
About the CS400-AC Hardware Replacement Procedures
HR90-2009 3
CS400-AC Hardware Replacement Procedures

CCS Parts Repair and Replacement
Cray Cluster Supercomputer (CCS) Support is available to customers of Cray CS300, CS400, and CS-Storm
systems. Four service tiers are available to CCS customers: Depot, Depot Rapid Exchange, Premium and
Premium Plus.
CCS customers should initiate a request for support by creating a case in the CrayPort portal. For Premium and
Premium Plus customers, support outside of local business hours can also be initiated by calling the after-hour
support numbers.
This section provides information on the following service related topics:
● Requesting Part Repair or Replacement
● Spare Part Numbers
Requesting Part Repair or Replacement
Cray Cluster Supercomputer (CCS) customers and field support personnel must run appropriate diagnostics and
perform troubleshooting before requesting part repair or replacement. These prerequisites are for replacement or
repair of defective/suspect parts covered under a Cray Service Level Agreement (SLA). To streamline the return
process, these prerequisites are being aligned with those of Cray’s vendors. Cray Service will assist customers
with troubleshooting and collaborate with vendors as needed. Cray remains the sole authority in granting a Return
Materials Authorization (RMA). Cray may grant an RMA without diagnostic results for specific sites due to their
security policies, or by authorization from a Cray executive, or for certain "Critical" cases as defined in the Cray
Cluster Supercomputer Support Operations Handbook.
Troubleshooting Procedures and Requirements
Cray Service provides detailed troubleshooting requirements and procedures to diagnose and identify faulty
hardware. These procedures are unique to specific part categories. Diagnostic utilities may be used to generate
log files or output that must be sent to Cray Service when requested before an RMA request is processed. As a
minimum, customers must demonstrate that the fault follows the suspect part. Cray will not repair or replace any
returned part that does not comply with these troubleshooting requirements and procedures.
Return Materials Authorization Number
After an RMA is approved, Cray Service issues a unique RMA number for that part. A single RMA number is
generated for each batch of identical parts. Orders containing different parts are issued different RMA numbers.
For example, a replacement of 3 GPUs will share the same RMA number, but a replacement of 1 GPU and 1
HCA will be issued two RMA numbers. Cray will not repair or replace any returned part without an RMA number.
Parts received by Cray without an RMA number will be returned as is.
Packaging and Shipping RMA Parts
All parts returned to Cray must be individually tagged and identified with the RMA number. They must be properly
packaged to prevent shipping and ESD damage. If possible, customers should save and reuse Cray shipping
boxes (in good condition) when returning parts. If this is not possible, customers should provide enough
CCS Parts Repair and Replacement
HR90-2009 4
CS400-AC Hardware Replacement Procedures

packaging (antistatic bags and ESD foam) so there is at least 2 inches of threedimensional clearance between
the part and the shipping box. Cray will not be responsible for any damage as a result of poor customer
packaging and may return the damaged part as is, without repair or replacement. Customers may consolidate
multiple RMA requests in a single CrayPort case. Similarly, customers may consolidate multiple RMAs in a single
box, provided it meets Cray packaging recommendations.
Customers must print the CrayPort case number visibly and legibly on the exterior of the box or on the shipping
address label.
CS400-AC Part Numbers
Part Number Description
GreenBlade\Subrack
101129600 SR5110 subrack, 5U 10X node. Includes L/R angle brackets and cable management bar
000-01601A SR8204 subrack, 8U 4X node. Includes L/R angle brackets and cable management bar
101129800 SR8116 subrack, 8U 16X node. Includes L/R angle brackets and cable management bar
101117701 Chassis, GB522XA, PCIe3 X16 Slot 1, KP
101210400 Chassis, GB522N expansion blade, up to 2X K20/X/K40
101210500 Chassis, GB522N expansion blade, up to 2X K80
101195200 Chassis, GB822XA, 6X 2.5" HDD bay
002-00372B Fan module assembly, air cooled systems, (for SR5110/SR8204/SR8116 subracks)
23101-0295-01 Fan module replacement dual fans: 12V, 3000 rpm with cable assembly
22302-0270-01 Fan module: fan interface board (FIB) for 2 fans
002-00374C iSCB module (SR5110/SR8204/SR8116 subracks)
22302-0313-01 iSCB module PCBA - Ext Board, GB2 IO, 10/100, RJ45, reset/LED
22309-0005-04 iSCB module PCBA - CPU, GB2 IMX253, 16 node mgr, hot-swap (plugs to midplane)
002-00316A Cable management bar assembly, GB2 (SR5110/SR8204/SR8116 subracks)
150-00143B Power supply, GreenBlade 1630W, 1U, 277V input, 133A/12V 6A/5Vsb (SR5110/SR8204/
SR8116)
101139200 Riser card, 1U PCIe-3 x16, S2600KP/TP Slot 1
101243300 Subrack support angle assembly, left side, 25.945 +-0.75" (SR5110/SR8204/SR8116)
101243400 Subrack support angle assembly, right side, 25.945 +-0.75" (SR5110/SR8204/SR8116)
GPUs and Coprocessors
101219400 Cable Assembly, Twin-Ax, Slot 2, PCIEX16 GB522N/M
51010-3218-00 PCIe Cable Bracket, GB522MN (Used with twin-ax cable assy)
56000-0030-01 Gasket EMI, GreenBlade Expansion
22302-0412-00 GPU Node IFB, 4U MIC CoProcessor GB522M, 2 GPU Hot-Swap, KP
CCS Parts Repair and Replacement
HR90-2009 5
CS400-AC Hardware Replacement Procedures

Part Number Description
22302-0412-10 GPU Node IFB, 4U K20/40 GB522N, 2 GPU Hot-Swap, KP
22302-0412-20 GPU Node IFB, 4U K80 GB522N, 2 GPU Hot-Swap, KP
22302-0402-02 GPU Bridge Board/Riser, GB522 Expansion Blade, (MIC/PHI, K20/K40, K80)
22303-0082-01 GPU Front PCBA/LED/push button switch assembly (MIC/PHI, K20/K40, K80 expansion
blades)
101210300 Chassis, GB522M Expansion Blade, with cover/cables/brackets, up to 2X MIC/PHI (not incl.)
101210400 Chassis, GB522N Expansion Blade, with cover/cables/brackets, up to 2X K20/K40 (not incl.)
101210500 Chassis, GB522N Expansion Blade, with cover/cables/brackets, up to 2X K80 (not incl.)
100972402 GPU Card, TESLA ATLAS K40 12GB 235W Passive
101142501 GPU Card, STELLA DUO K80 24GB 300W Passive
130-00149A MIC Card, XEON PHI 7120P Passive 61C 1.25GHZ 16GB 300W
Rackmount Servers (2U)
101137200 Chassis, 2828X, 2U S2600WT2 dual Xeon 8X 2.5" HDD, up to 2X 1100W (Intel: 936032,
R2208WT2YS)
101125300 Chassis, 2828X, 2U S2600WTT dual Xeon 8X 3.5" HDD, up to 2X 1100W (Intel: 936035,
R2308WTTYS)
101142900 Power supply, 1100W, 80+ platinum (2828X)
150-00145A Power supply, 1600W, 80+ platinum (2820XT)
190-00137A Rail kit, 1U/2U, full extension & tool-less, 609.6-762MM 800MM MAX (2828X)
100993600 RMM4 lite remote management module, KVM
Motherboards, DIMMs and Processors
101123200 Intel Kennedy Pass motherboard, dual Xeon Haswell, 8X DIMM, FDR IB (Connect-IB) on-
board
101132800 Intel Kennedy Pass motherboard, dual Xeon Haswell, 8X DIMM, (No InfiniBand on-board)
101123800 Intel Taylor Pass motherboard, dual Xeon Haswell, 16X DIMM, QSFP FDR IB
101132900 Intel Taylor Pass motherboard, dual Xeon Haswell, 16X DIMM, (No InfiniBand on-board)
101181500 Heatsink, 1U Passive, Std - 91.5 x 91.5, 47 fins, AL base, heatpiped/copper fins
101181600 Heatsink, 1U Passive, Wide - 91.5 x 110, 57 fins AL base, heatpiped/copper fins
167-00509A Battery backup, RAID smart lithium-ion 1500 mAh
100944700 DIMM, 8GB DDR4-2133 Reg dual rank 2RX8
100961800 DIMM, 16GB DDR4-2133 Reg dual rank ECC 1.2V 1GX4
100961801 DIMM FRU,16GB DDR4-2133 REG DUAL RANK ECC 1.2V 1GX4 (HYNIX)
100961802 DIMM FRU,16GB DDR4-2133 REG DUAL RANK ECC 1.2V 1GX4 (SAMSUNG)
101107700 DIMM, 32GB DDR4-2133 Reg dual rank ECC 1.2V 1GX4
CCS Parts Repair and Replacement
HR90-2009 6
CS400-AC Hardware Replacement Procedures

Part Number Description
101159100 IC, processor, E5-2660 V3, 2.6 GHz 10C Haswell 105W
101169900 IC, processor, E5-2620 V3, 2.4 GHz 6C Haswell 85W
101187200 IC, processor, E5-2683 V3, 2.40 GHz 14C Haswell 120W
101245301 IC, processor, E5-2699 V3, 2.3 GHz 18C Haswell 145W, SR1XD
101278300 IC, processor, E5-2667 V3, 3.2 GHz 8C Haswell 135W
101078600 IC, processor, E5-2670 V3, 2.3 GHz 12C Haswell 120W, M1
101078700 IC, processor, E5-2680 V3, 2.5 GHz 12C Haswell 120W, M1
101078900 IC, processor, E5-2690 V3, 2.6 GHz 12C Haswell 135W, M1
101079000 IC, processor, E5-2695 V3, 2.3 GHz 14C Haswell 120W, C1
101079100 IC, processor, E5-2650 V3, 2.3 GHz 10C Haswell 105W, M1
101120400 IC, processor, E5-2698 V3, 2.3 GHz 10C Haswell 135W, 40M , C1
Disk Drives
101030500 Disk drive, SATA 600 GB SSD, 2.5" 6 Gb/s, 20nm, Intel S3500
101032700 Disk drive, SATA 480 GB SSD, 2.5" 6 Gb/s, 20nm, Intel S3500
101178400 Disk drive, SATA 120 GB SSD, 2.5" 6 Gb/s, 20nm, Intel S3500
101178500 Disk drive, SATA 160 GB SSD, 2.5" 6 Gb/s, 20nm, Intel S3500
101178600 Disk drive, SATA 240 GB SSD, 2.5" 6 Gb/s, 20nm, Intel S3500
101178700 Disk drive, SATA 300 GB SSD, 2.5" 6 Gb/s, 20nm, Intel S3500
101125800 Disk drive, SAS 300GB 10K 6 2.5" 6Gb/s 64MB, Savvio
101178800 Disk drive, SATA 800 GB SSD, 2.5" 6 Gb/s, 20nm, Intel S3500
101179000 Disk drive, SATA 100 GB SSD, 2.5" 6 Gb/s, 25nm, Intel S3700
101179900 Disk drive, SATA 200 GB SSD, 2.5" 6 Gb/s, 25nm, Intel S3700
101180400 Disk drive, SATA 800 GB SSD, 2.5" 6 Gb/s, 25nm, Intel S3700
101180700 Disk drive, MSATA 120 GB SSD, 2.5" 6 Gb/s, 25nm, Intel 525
101180800 Disk drive, MSATA 180 GB SSD, 2.5" 6 Gb/s, 25nm, Intel 525
101180900 Disk drive, MSATA 240 GB SSD, 2.5" 6 Gb/s, 25nm, Intel 525
101181000 Disk drive, MSATA 60 GB SSD, 2.5" 6 Gb/s, 25nm, Intel 525
101181100 Disk drive, 400 GB SSD, 1/2 Height PCIe 2.0, 25nm, Intel 910
101181200 Disk drive, 800 GB SSD, 1/2 Height PCIe 2.0, 25nm, Intel 910
101205700 Disk drive, SATA 180 GB SSD, 2.5" 6 Gb/s, 20nm, R540/W490, Intel Pro 2500
101206400 Disk drive, SATA 120 GB SSD, 2.5" 6 Gb/s, 20nm, R540/W490, Intel Pro 2500
101206600 Disk drive, SATA 240 GB SSD, 2.5" 6 Gb/s, 20nm, R540/W490, Intel Pro 2500
101206700 Disk drive, SATA 480 GB SSD, 2.5" 6 Gb/s, 20nm, R540/W490, Intel Pro 2500
CCS Parts Repair and Replacement
HR90-2009 7
CS400-AC Hardware Replacement Procedures

Part Number Description
101238200 Disk drive, SATA 2 TB SSD, 2.5" 6 Gb/s, 7200 RPM, 128 MB Cache, Seagate Enterprise
160-00184B Disk drive, SATA 500 GB SSD, 2.5" 6 Gb/s, 7200 RPM, 32 MB Cache, Seagate Constellation
160-00185C Disk drive, SATA 250 GB SSD, 2.5" 6 Gb/s, 7200 RPM, 64 MB Cache, Seagate Constellation
160-00228A Disk drive, SATA 1 TB SSD, 2.5" 6 Gb/s, 7200 RPM, 64 MB Cache, Seagate Constellation
160-00258A Disk drive, SATA3 400 GB SSD, 2.5" 6 Gb/s, Read-500Mb/s, Write-460Mb/s, Intel
PCIe Cards
100629400 PCIe, Gen2 dual port 1GBE [GB822x] RJ45, 10/100/1GE, Intel 82576, (2.5 Gbps) includes
bracket
100882500 PCIe3.0, x8, 40 GbE, HBA, dual-port, ConnectX-3, QSFP (MCX314A-BCBT)
101082500 PCIe3.0, x8, HCA, Connect-IB, single-port QSFP FDR, 56 Gb/s (MCB191A-FCAT)
101196800 PCIe3.0, x8, 40 GbE, HBA, single-port QSFP (MCX313A-BCBT)
101203000 PCIe3.0, 800 GB SSD, 1/2 height, 20 nm, R2800/W1900, P3700
101264800 PCIe3.0, 1.6 TB SSD, 1/2 height, 20 nm MLC, HHHL, P3700
101267000 PCIe3.0, 2.0 TB SSD, 1/2 height, 20 nm MLC, HHHL, P3700
101267100 PCIe3.0, 400 GB SSD, 1/2 height, 20 nm MLC, HHHL, P3700
101267200 PCIe3.0, 2.0 TB SSD, 1/2 height, 20 nm MLC, HHHL, P3600
101267300 PCIe3.0, 1.6 TB SSD, 1/2 height, 20 nm MLC, HHHL, P3600
101268500 PCIe3.0 x16, HCA ConnectX-4 VPI single-port QSFP, EDR 100 Gb/s InfiniBand
101278400 PCIe3.0 x16, HCA ConnectX-4 VPI dual-port QSFP, EDR 100 Gb/s InfiniBand
101290000 PCIe3.0 x16, HCA ConnectX-4 single-port QSFP, EDR 100 Gb/s InfiniBand
101290100 PCIe3.0 x16, HCA ConnectX-4 dual-port QSFP, EDR 100 Gb/s InfiniBand
131-00153A PCIe, x8, RAID SAS 8P Ctrl 0/1/5/6/10/50/60
132-00110B PCIe2.0 x8, Intel True Scale Fabric InfiniBand, single-port, QSFP, QLE7300
132-00125A PCIe2.0 x8, Intel True Scale Fabric InfiniBand, dual-port, QSFP, QLE7300
132-00128A PCIe2.0 x8, Ethernet X520-DA2, dual-port, 10GB SFP+, low profile, full height
132-00137B PCIe3.0 x8, HCA ConnectX-3 VPI single-port QSFP, 40Gb/s QDR IB, 10GbE
132-00138B PCIe3.0 x8, HCA ConnectX-3 VPI dual-port QSFP, 40Gb/s QDR IB, 10GbE
132-00145A PCIe3.0 x8, HCA ConnectX-3 VPI single-port QSFP, 40Gb/s FDR IB, 40GbE
132-00150A PCIe3.0 x8, HCA ConnectX-3 EN dual-port SFP+, 10Gb/s, 40GbE
132-00157A PCIe3.0 x4, HCA ConnectX-3 EN single-port SFP+, 10Gb/s, 10GbE
132-00159A PCIe3.0 x16, HCA Connect-IB dual-port 56Gb/s FDR
Power Cords
085-00370B Power cord, LS14 to LS26, 10A-250/277V, 1.2 m (4 ft)
CCS Parts Repair and Replacement
HR90-2009 8
CS400-AC Hardware Replacement Procedures

Part Number Description
085-00374A Power cord, LS14 to LS26, 10A-250/277V, 1.5 m (5 ft)
085-00382A Power cord, LS25 to LS26, 250/277V, 1.2 m (4 ft)
085-00383A Power cord, LS25 to LS26, 250/277V, 1.5 m (5 ft)
085-00218A Power cord, C14 to C13, 17AWG, 10A-250V, 1.5 m (5 ft)
085-00251A Power cord, C14 to C13, 17AWG, 10A-250V, 1.2 m (4 ft)
085-00361A Power cord, C14 to C13, 17AWG, 10A-250V, 1.8 m (6 ft)
085-00360A Power cord, C14 to C13, 17AWG, 10A-250V, 3.0 m (10 ft)
085-00294A Power cord, C14 to C19, 14AWG, 15A-250V, 1.8 m (6 ft)
085-00384A Power cord, C14 to C19, 14AWG, 13A-250V, 3.0 m (10 ft)
085-00273A Power cord, C20 to C13, 10A-250V, 1.8 m (6 ft)
085-00351A Power cord, C20 to C13, 15A-250V, 1.5 m (5 ft)
CCS Parts Repair and Replacement
HR90-2009 9
CS400-AC Hardware Replacement Procedures

Power Cords
The following power cords are used with CS400-AC systems.
C20 to C13, 250 V
1.8 m (6 ft) - 10 A
1.5 m (5 ft) - 15 A
C20 C13
C14 C19
C14 to C19, 250 V, 14 AWG
1.8 m (6 ft) - 15 A
3.0 m (10 ft) - 13 A
1.5 m (59 in.)
C14 to C13, 250 V, 10 A, 17 AWG
1.2 m (47 in.)
C14 C13
3.0 m (10 ft)
1.8 m (6 ft)
1.5 m (59 in.)
LS25 to LS26, 250-277 V
LS-25 LS-26
1.2 m (47 in.)
1.2 m (47 in.)
1.5 m (59 in.)
LS14 to LS26, 250-277 V, 10 A
LS-14 LS-26
Power Cords
HR90-2009 10
CS400-AC Hardware Replacement Procedures
Table des matières
Autres manuels Cray Serveur




















