AT camera recovery#
Overview#
The camera is designed to go into FAULT
state whenever a limit (temperature/voltage/current/etc)
goes out of tolerance (for limits there is typically a warning range before a hard error occurs),
or if some unexpected failure occurs during camera operation. Once the camera goes info fault state
it is necessary to diagnose the problem, fix it, and then put the camera back into ENABLED
mode
before it is possible to resume operations. This document describes the general procedure for doing this,
and will document any known common failure modes.
This article was triggered by OBS-97 - The LATISS camera got timeout from REB IN PROGRESS on 28 February 2023, but is more general than that specific incident.
Note
- The instructions below assume:
The ability to login to the AuxTel CCS computers,
Some familiarity with basic CCS commands/functionality.
We need a separate document to provide this background information since it will need to be referred to from multiple places.
Error diagnosis#
ATCamera goes to
FAULT
state.
Procedure Steps#
Identify which CCS subsystem triggered the problem.
- Review the raised alerts and/or log files, and determine IF:
This was a transitory problem which can be documented (via JIRA ticket) and reset,
or something which requires a camera expert to diagnose.
Clear the raised alerts in both the CCS subsystem which triggered the problem and the Master Control Module (MCM) which tracks the overall camera state.
Clear the fault in the ocs-bridge, and switch it back of OFFLINE_AVAILABLE mode.
Note
In either case it is important that an OBS ticket be created so we can track how often specific problems occur, and whether software or hardware changes are needed to prevent future occurrences.
Specific CCS commands for performing these operations are documented below.
Tracking down a CSC problem#
In general there are two approaches on tracking down a CCS problem, either using the ccs-shell command line tool, or using the ccs-console graphical interface. Currently we describe only the first approach.
Warning
Pending TODO: Simulate a fault and verify these commands are correct (perhaps on TTS) (plus highlight responses)
Important
The following commands have the prompt ccs>
Identify which CCS subsystem triggered the problem:
ats-mcm getRaisedAlertSummary
Review the raised alerts and log files
ats-fp getRaisedAlertSummary
Clear the alerts
ats-fp clearAllAlerts ats-fp getRaisedAlertSummary ats-mcm clearAllAlerts ats-mcm getRaisedAlertSummary
Clear the ocs-bridge
ats-ocs-bridge clearFault ats-ocs-bridge setAvailable
Post-Condition#
AT Camera can now be set to the
ENABLED
state.
Contingency#
If the procedure was not successful, report the issue in #summit_auxtel and/or activate the Out of hours support.