Subject: OSCER: Boomer Infiniband switch problems
From: Brandon George 
Date: Mon, 17 Mar 2014 12:33:49 -0500
OSCER users,

Boomer is currently experiencing severe congestion on the core 
Infiniband switch resulting in the disruption of many MPI
jobs.  We have an open case with the vendor and are diligently
working towards resolution.  The switch remains in the current
condition while we pull logs and other debugging information
to try to determine the cause of the issue.

We expect MPI jobs to continue to be disrupted until we find
the culprit and we will take further drastic measures
(including rebooting the core switch) if we don't have a
resolution from the vendor by mid-afternoon.  We'll send
another update when that time comes.

Our apologies for the inconvenience,

-brandon

-- 
Brandon George, RHCE, CSSGB (bcg@ou.edu, 405.325.5113)
Manager of Operations
OU Supercomputing Center for Education & Research (OSCER)
University of Oklahoma Information Technology

==Back==