AT camera recovery#

Overview#

The camera is designed to go into FAULT state whenever a limit (temperature/voltage/current/etc) goes out of tolerance (for limits there is typically a warning range before a hard error occurs), or if some unexpected failure occurs during camera operation. Once the camera goes info fault state it is necessary to diagnose the problem, fix it, and then put the camera back into ENABLED mode before it is possible to resume operations. This document describes the general procedure for doing this, and will document any known common failure modes.

This article was triggered by OBS-97 - The LATISS camera got timeout from REB IN PROGRESS on 28 February 2023, but is more general than that specific incident.

Note

The instructions below assume:
  1. The ability to login to the AuxTel CCS computers,

  2. Some familiarity with basic CCS commands/functionality.

We need a separate document to provide this background information since it will need to be referred to from multiple places.

Error diagnosis#

  • ATCamera goes to FAULT state.

Procedure Steps#

  1. Identify which CCS subsystem triggered the problem.

  2. Review the raised alerts and/or log files, and determine IF:
    1. This was a transitory problem which can be documented (via JIRA ticket) and reset,

    2. or something which requires a camera expert to diagnose.

    3. Clear the raised alerts in both the CCS subsystem which triggered the problem and the Master Control Module (MCM) which tracks the overall camera state.

  3. Clear the fault in the ocs-bridge, and switch it back of OFFLINE_AVAILABLE mode.

Note

In either case it is important that an OBS ticket be created so we can track how often specific problems occur, and whether software or hardware changes are needed to prevent future occurrences.

Specific CCS commands for performing these operations are documented below.

Tracking down a CSC problem#

In general there are two approaches on tracking down a CCS problem, either using the ccs-shell command line tool, or using the ccs-console graphical interface. Currently we describe only the first approach.

Warning

Pending TODO: Simulate a fault and verify these commands are correct (perhaps on TTS) (plus highlight responses)

Important

The following commands have the prompt ccs>

  1. Identify which CCS subsystem triggered the problem:

    ats-mcm getRaisedAlertSummary
    
  2. Review the raised alerts and log files

    ats-fp getRaisedAlertSummary
    
  3. Clear the alerts

    ats-fp clearAllAlerts
    ats-fp getRaisedAlertSummary
    ats-mcm clearAllAlerts
    ats-mcm getRaisedAlertSummary
    
  4. Clear the ocs-bridge

    ats-ocs-bridge clearFault
    ats-ocs-bridge setAvailable
    

Post-Condition#

  • AT Camera can now be set to the ENABLED state.

Contingency#

If the procedure was not successful, report the issue in #summit_auxtel and/or activate the Out of hours support.