Recent Announcements
Click on an announcement title below to expand and view more information.
01/21/2026
IT Fabric
Group Responsible: IT Fabric
Affected Area: b725 datacenter
Maintenance Type: service interruption
Systems have been operating stably since the cutover to site chilled water. SCDF is closely monitoring and confirming functionality of major systems one by one. We will continue to run in the current state, with reduced capacities for some systems, until we we have more info from F&O and are recommended to return all systems to normal operations. We believe all major systems are otherwise functioning; please report any issues.
Submitted by: Matt Cowan <cowan@bnl.gov>
01/21/2026
IT Fabric
Group Responsible: IT Fabric
Affected Area: b725 datacenter
Maintenance Type: service interruption
A major leak in one of the cooling tower loops triggered a temperature rise in a number of compute racks (main impact appears limited to low density compute rows 103 and 105, impacting sPHENIX, and to a smaller extent the shared pool and ATLAS Tier1), before the backup BNL site chilled water feed kicked in. It's currently believed all systems should be functioning, albeit sPHENIX, spool and ATLAS T1 are currently running with reduced capacities. Please report any issues.
Submitted by: Matt Cowan <cowan@bnl.gov>
01/16/2026
IT Services
Group Responsible: IT Services
Affected Area: All Comanage Federated logins
Maintenance Type: scheduled downtime
Comanage Registry is undergoing an upgrade and will be unavailable for Federated Logins for 15 min at 6 am Central Time on Friday January 16th. This will affect a variety of services but will be transparent if you are already logged into those services.
Submitted by: Robert Hancock <hancock@bnl.gov>
01/06/2026
IT Services
Group Responsible: IT Services
Affected Area: Globus Access will be down
Maintenance Type: scheduled downtime
Globus Access (https://app.globus.org/) will be down for maintenance from 11AM-3PM today 01/06/26.
Submitted by: Joe Frith <jfrith@bnl.gov>
12/30/2025
Service & Tools
Group Responsible: Service & Tools
Affected Area: BNLBox
Maintenance Type: scheduled downtime
The BNLBox update has been completed, if there are any issues please submit tickets to the RT Storage Management queue at RT-RACF-StorageManagement@bnl.gov
Submitted by: Louis Pelosi <lpelosi@bnl.gov>
12/30/2025
Service & Tools
Group Responsible: Service & Tools
Affected Area: BNLBox
Maintenance Type: scheduled downtime
BNLBox will be unavailable starting at 11:00 AM EST on December 30, 2025, due to scheduled maintenance. During this time, the service will be inaccessible. Users should complete any work before the maintenance window begins. Normal operations are expected to resume once maintenance is complete, on or before 5:00 PM EST on December 30, 2025. Users may resume usage at that time. Thank you for your patience during this scheduled downtime.
Submitted by: Louis Pelosi <lpelosi@bnl.gov>
12/08/2025
IT Fabric
Group Responsible: IT Fabric
Affected Area: b725 Network and Tape rooms
Maintenance Type: transparent
FandO is performing inspection and remediation work on PDU3A4 feeding the Network and Tape rooms in the b725 datacenter today ~9:30am-3pm. All equipment is powered redundantly from another PDU, no outage is expected, but systems will be in a more at-risk state than normal.
Submitted by: Matt Cowan <cowan@bnl.gov>
12/05/2025
Service & Tools
Group Responsible: Service & Tools
Affected Area: SCDF Mattermost
Maintenance Type: service interruption
SCDF Mattermost service has been updated and service has resumed, thank you for your patience. If there are any issues please report problems by opening an RT ticket in the Software queue 'RT-RACF-Software@bnl.gov'
Submitted by: Louis Pelosi <lpelosi@bnl.gov>
12/05/2025
Service & Tools
Group Responsible: Service & Tools
Affected Area: SCDF Mattermost
Maintenance Type: service interruption
An emergency update needs to be applied to the SCDF Mattermost service, users should not experience any interruption to usage, however if there is an issue the service will be down temporarily. This update will fix an issue where users are unable to retrieve group messages using the '+' button from the sidebar and will update the webui. Apologies for the interruption and thank you for your patience.
Submitted by: Louis Pelosi <lpelosi@bnl.gov>
12/01/2025
Service & Tools
Group Responsible: Service & Tools
Affected Area: SCDF Mattermost
Maintenance Type: scheduled downtime
At this time the SCDF Mattermost service scheduled maintenance has completed and service has resumed. Users should now be able to use the service, you may be prompted to log in again if you were in the middle of a session. If there are any issues please open an RT ticket in the Software queue 'RT-RACF-Software@bnl.gov' and thank you again for your patience.
Submitted by: Louis Pelosi <lpelosi@bnl.gov>
12/01/2025
Service & Tools
Group Responsible: Service & Tools
Affected Area: SCDF Mattermost
Maintenance Type: scheduled downtime
A planned maintenance of the SCDF Mattermost will be performed today, 12/01/2025, between 5-6pm EST. The service will be unavailable during this time and following the restoration of the service users may begin using it normally. Thank you for your patience during this scheduled outage.
Submitted by: Louis Pelosi <lpelosi@bnl.gov>
11/26/2025
IT Fabric
Group Responsible: IT Fabric
Affected Area: sPHENIX and shared pool HTC Farm condor worker nodes
Maintenance Type: service interruption
FandO reported PDU1A3 is back up to 187F, but on the connections for all 3 phases this time. SCDF is shedding load from row 103 by draining jobs and powering off idle nodes to get to a stable safe state for the long holiday weekend. This only impacts sPHENIX and shared pool condor users/experiments.
Submitted by: Matt Cowan <cowan@bnl.gov>
10/21/2025
IT Services
Group Responsible: IT Services
Affected Area: OpenShift Virtual Enviroment
Maintenance Type: service interruption
An upgrade was begun today on OpenShift. Though we were informed that it would be transparent to all, we have found that some network interruptions are occurring. The upgrade cannot be interrupted once it has started. The Main servers have completed, and we do not expect any further interruptions. However, there is a possibility of further interruption to services as the upgrade proceeds.
Submitted by: Joe Frith <jfrith@bnl.gov>
04/26/2025
IT Services
Group Responsible: IT Services
Affected Area: SCDF Services
Maintenance Type: Unknown
The SDCC recently unveiled its new website http://www.sdcc.bnl.gov that serves as the entry point to facility services and support. In addition, we have created a public MatterMost channel for feedback and recommendations on the new website. Users are welcome to join this channel (see link below) and participate.\n\nhttps://chat.sdcc.bnl.gov/bnl/channels/sdcc-website-feedback\n\nPlease note that this channel is not meant to be used for support issues (broken links, missing documentation, request for changes, etc). Support requests must be made through the RT ticket system (go to http://www.sdcc.bnl.gov and select 'Get Help')
Submitted by: Tony Wong
04/26/2025
Network
Group Responsible: Network
Affected Area: SCDF Services
Maintenance Type: Unknown
Hi all,\n\nThe clusters are back to normal operations starting 12:30 pm.\n\nThe system affected were IC cluster, Skylake cluster and the volta cluster.\n\nRegards,\nCostin
Submitted by: Costin Caramarcu
04/26/2025
Services & Tools
Group Responsible: Services & Tools
Affected Area: SCDF Communications
Maintenance Type: service interruption
This is just testing the db write
Submitted by: Facility Staff <unknown@example.com>
02/12/2024
Services & Tools
Group Responsible: Services & Tools
Affected Area: Star GPFS
Maintenance Type: Unknown
Access to Star GPFS (gpfs/gpfs01) has been restored. End Date/Time: 2/12/2024 10:01 pm Expected Impact: access is available
Submitted by: Test User
02/12/2024
IT Services
Group Responsible: IT Services
Affected Area: Star GPFS
Maintenance Type: Scheduled Downtime
The Star GPFS File System experienced serious errors over the weekend and is currently Offline. We have opened a ticket with the Vendor and are working to get this resolved. Will send updates as available. End Date/Time: 2/13/2024 7:27 am Expected Impact: access is unavailable until issue is resolved
Submitted by: Test User
01/22/2024
Experimental Support
Group Responsible: Experimental Support
Affected Area: US ATLAS dCache storage service
Maintenance Type: Service Interruption
The US ATLAS storage system is scheduled for an upgrade to the latest software release, dCache 9.2. Consequently, this maintenance process will result in a temporary interruption of accessibility to the storage system from 9:00 AM to 1:00 PM (EST) on January 22, 2024. We apologize in advance for any inconvenience this work may cause. End Date/Time: 1/22/2024 1:00 pm Expected Impact: US ATLAS dCache storage service interruption
Submitted by: Test User
01/05/2024
IT Fabric
Group Responsible: IT Fabric
Affected Area: EIC Lustre
Maintenance Type: Scheduled Downtime
eicoss02.sdcc.bnl.local is back online after outage. End Date/Time: 1/5/2024 5:15 pm Expected Impact: Lustre file system for EIC won't be available.
Submitted by: Test User