BNL Logo

Scientific Computing and Data Facility

Recent Announcements

Click on an announcement title below to expand and view more information.

12/08/2025

IT Fabric

Group Responsible: IT Fabric

Affected Area: b725 Network and Tape rooms

Maintenance Type: transparent

FandO is performing inspection and remediation work on PDU3A4 feeding the Network and Tape rooms in the b725 datacenter today ~9:30am-3pm. All equipment is powered redundantly from another PDU, no outage is expected, but systems will be in a more at-risk state than normal.


Submitted by: Matt Cowan <cowan@bnl.gov>

12/05/2025

Service & Tools

Group Responsible: Service & Tools

Affected Area: SCDF Mattermost

Maintenance Type: service interruption

SCDF Mattermost service has been updated and service has resumed, thank you for your patience. If there are any issues please report problems by opening an RT ticket in the Software queue 'RT-RACF-Software@bnl.gov'


Submitted by: Louis Pelosi <lpelosi@bnl.gov>

12/05/2025

Service & Tools

Group Responsible: Service & Tools

Affected Area: SCDF Mattermost

Maintenance Type: service interruption

An emergency update needs to be applied to the SCDF Mattermost service, users should not experience any interruption to usage, however if there is an issue the service will be down temporarily. This update will fix an issue where users are unable to retrieve group messages using the '+' button from the sidebar and will update the webui. Apologies for the interruption and thank you for your patience.


Submitted by: Louis Pelosi <lpelosi@bnl.gov>

12/01/2025

Service & Tools

Group Responsible: Service & Tools

Affected Area: SCDF Mattermost

Maintenance Type: scheduled downtime

At this time the SCDF Mattermost service scheduled maintenance has completed and service has resumed. Users should now be able to use the service, you may be prompted to log in again if you were in the middle of a session. If there are any issues please open an RT ticket in the Software queue 'RT-RACF-Software@bnl.gov' and thank you again for your patience.


Submitted by: Louis Pelosi <lpelosi@bnl.gov>

12/01/2025

Service & Tools

Group Responsible: Service & Tools

Affected Area: SCDF Mattermost

Maintenance Type: scheduled downtime

A planned maintenance of the SCDF Mattermost will be performed today, 12/01/2025, between 5-6pm EST. The service will be unavailable during this time and following the restoration of the service users may begin using it normally. Thank you for your patience during this scheduled outage.


Submitted by: Louis Pelosi <lpelosi@bnl.gov>

11/26/2025

IT Fabric

Group Responsible: IT Fabric

Affected Area: sPHENIX and shared pool HTC Farm condor worker nodes

Maintenance Type: service interruption

FandO reported PDU1A3 is back up to 187F, but on the connections for all 3 phases this time. SCDF is shedding load from row 103 by draining jobs and powering off idle nodes to get to a stable safe state for the long holiday weekend. This only impacts sPHENIX and shared pool condor users/experiments.


Submitted by: Matt Cowan <cowan@bnl.gov>

10/21/2025

IT Services

Group Responsible: IT Services

Affected Area: OpenShift Virtual Enviroment

Maintenance Type: service interruption

An upgrade was begun today on OpenShift. Though we were informed that it would be transparent to all, we have found that some network interruptions are occurring. The upgrade cannot be interrupted once it has started. The Main servers have completed, and we do not expect any further interruptions. However, there is a possibility of further interruption to services as the upgrade proceeds.


Submitted by: Joe Frith <jfrith@bnl.gov>

04/26/2025

IT Services

Group Responsible: IT Services

Affected Area: SCDF Services

Maintenance Type: Unknown

The SDCC recently unveiled its new website http://www.sdcc.bnl.gov that serves as the entry point to facility services and support. In addition, we have created a public MatterMost channel for feedback and recommendations on the new website. Users are welcome to join this channel (see link below) and participate.\n\nhttps://chat.sdcc.bnl.gov/bnl/channels/sdcc-website-feedback\n\nPlease note that this channel is not meant to be used for support issues (broken links, missing documentation, request for changes, etc). Support requests must be made through the RT ticket system (go to http://www.sdcc.bnl.gov and select 'Get Help')


Submitted by: Tony Wong

04/26/2025

Network

Group Responsible: Network

Affected Area: SCDF Services

Maintenance Type: Unknown

Hi all,\n\nThe clusters are back to normal operations starting 12:30 pm.\n\nThe system affected were IC cluster, Skylake cluster and the volta cluster.\n\nRegards,\nCostin


Submitted by: Costin Caramarcu

04/26/2025

Services & Tools

Group Responsible: Services & Tools

Affected Area: SCDF Communications

Maintenance Type: service interruption

This is just testing the db write


Submitted by: Facility Staff <unknown@example.com>

02/12/2024

Services & Tools

Group Responsible: Services & Tools

Affected Area: Star GPFS

Maintenance Type: Unknown

Access to Star GPFS (gpfs/gpfs01) has been restored. End Date/Time: 2/12/2024 10:01 pm Expected Impact: access is available


Submitted by: Test User

02/12/2024

IT Services

Group Responsible: IT Services

Affected Area: Star GPFS

Maintenance Type: Scheduled Downtime

The Star GPFS File System experienced serious errors over the weekend and is currently Offline. We have opened a ticket with the Vendor and are working to get this resolved. Will send updates as available. End Date/Time: 2/13/2024 7:27 am Expected Impact: access is unavailable until issue is resolved


Submitted by: Test User

01/22/2024

Experimental Support

Group Responsible: Experimental Support

Affected Area: US ATLAS dCache storage service

Maintenance Type: Service Interruption

The US ATLAS storage system is scheduled for an upgrade to the latest software release, dCache 9.2. Consequently, this maintenance process will result in a temporary interruption of accessibility to the storage system from 9:00 AM to 1:00 PM (EST) on January 22, 2024. We apologize in advance for any inconvenience this work may cause. End Date/Time: 1/22/2024 1:00 pm Expected Impact: US ATLAS dCache storage service interruption


Submitted by: Test User

01/05/2024

IT Fabric

Group Responsible: IT Fabric

Affected Area: EIC Lustre

Maintenance Type: Scheduled Downtime

eicoss02.sdcc.bnl.local is back online after outage. End Date/Time: 1/5/2024 5:15 pm Expected Impact: Lustre file system for EIC won&#039;t be available.


Submitted by: Test User

01/05/2024

Services & Tools

Group Responsible: Services & Tools

Affected Area: EIC Lustre

Maintenance Type: Scheduled Downtime

Due to hardware issues, eicoss02.sdcc.bnl.local needs to be brought offline. End Date/Time: 2024-01-05T17:00 Expected Impact: Lustre file system for EIC won't be availble.


Submitted by: value=

12/19/2023

Experimental Support

Group Responsible: Experimental Support

Affected Area: SCDF Services

Maintenance Type: Service Interruption

On Tuesday, December 19th, from 6:30 am EST to 9 pm EST, network connectivity (both WAN and Campus) to/from the SDCC Facility and Science DMZ will be unavailable due to scheduled maintenance. This event marks the final phase of the years-long migration of SDCC services from the old data center to the new one. We anticipate the full restoration of network connectivity by 9 pm EST on December 19th, concluding the outage. During the intervention period, access to SDCC services, including computing resources (both interactive and batch), storage services (disk and tape), and collaborative tools (such as BNLBox, RCF email, MatterMost, and web services), will not be available. To prepare for this network outage, the scheduling of new HTCondor and Slurm batch jobs will cease on Friday, December 15th, at 11 pm EST. This pause will facilitate the smooth draining and graceful termination of ongoing computing jobs. In addition, access to storage and data transfer services will be temporarily halted on Monday, December 18th, at 5 pm EST. This outage does not affect BNL mail service (@bnl.gov domain), which will be available during this period. The SDCC will notify the community about the full restoration of services through its regular mailing lists. End Date/Time: 12/19/2023 9:00 pm Expected Impact: No access to SDCC services during downtime


Submitted by: Test User

12/18/2023

Network

Group Responsible: Network

Affected Area: SCDF Services

Maintenance Type: Service Interruption

This extended downtime is necessary for two key activities: upgrading user home directories beginning on Monday (for all programs but the NSLS) and a network maintenance scheduled for Tuesday (affecting all customers). This maintenance marks the final phase of the multi-year migration of SDCC services to a new data center. Impact on Services: - Storage and Data Transfer Services: Access will be suspended starting Monday, December 18th, at 5 PM EST to allow for a clone of the user&amp;rsquo;s home directories. Special Note for NSLS Program Users: The NFS home directory work scheduled from Monday to Tuesday does not impact the NSLS program. NSLS program users should expect regular operation during this period (until the network maintenance begins on Tuesday). - Computing Resources: Access to computing resources (interactive and batch), storage services (disk and tape), and collaborative tools (BNLBox, RCF email, MatterMost, web services) will be unavailable throughout the respective maintenance periods. - Batch Job Scheduling: Scheduling of new HTCondor and Slurm batch jobs will cease on Friday, December 15th, at 11 PM EST. Any remaining HTCondor jobs will be terminated on Monday, December 18th, at 3 PM. - Interactive Sessions: Open sessions through the facility SSH gateways will be terminated at 5 PM on December 18th to facilitate the cloning/copying of user home directories. - Email Services: The @bnl.gov email services may experience interruptions. Alternatively, users are encouraged to use the sdcc-staff-l@lists.bnl.gov mailing list to contact SDCC staff. - NX Service and SSH Gateway: The NX service and SSH Gateway will be unavailable. Existing cssh sessions initiated before the outage will continue to function. Restoration of Services: Complete restoration of services is anticipated by 9 PM EST on December 19th. The SDCC will notify the community through its regular mailing lists. We apologize for the inconvenience this may cause and thank you for your understanding and cooperation as we complete this essential upgrade to our infrastructure. End Date/Time: 12/19/2023 9:00 pm Expected Impact: No access to SDCC services during downtime


Submitted by: Test User

11/20/2023

Experimental Support

Group Responsible: Experimental Support

Affected Area: SCDF Mattermost

Maintenance Type: Service Interruption

The SDCC Mattermost service (chat.sdcc.bnl.gov) will be suspended on Monday, November 20, 2023, from 6PM EST to 7PM EST End Date/Time: 11/20/2023 7:00 pm Expected Impact: Service will be suspended


Submitted by: Test User

10/13/2023

Network

Group Responsible: Network

Affected Area: SFTP Services

Maintenance Type: Service Interruption

Due to a security vulnerability in Red Hat Linux the SFTP &amp;amp; CFTP Servers will be updated and rebooted between 12:30 PM EST and 1:00PM EST today. This will result in existing connections being disconnected during the reboots. End Date/Time: 10/13/2023 1:30 pm Expected Impact: aacces wil unavailable and current sessions ended


Submitted by: Test User

10/03/2023

Network

Group Responsible: Network

Affected Area: Globus

Maintenance Type: Service Interruption

DTN03/IC Globus maintenance on 10/03/23, server will be down for approx. 1-2 hours as it is relocated in the Data Center End Date/Time: 10/3/2023 3:30 pm Expected Impact: aacces wil unavailable and current sessions ended


Submitted by: Test User