Troubleshooting

Restarting a subsystem

When a subsystem fails it is possible to restart the underlying software and controllers and reset all devices to a nominal state. This can be accomplished by calling GortDeviceSet.restart, for example

>>> await g.ags.restart()
03:45:03 [DEBUG]   Deleting deployment lvmagcam.
03:45:06 [INFO]    Starting deployment from YAML file /home/sdss5/config/kube/actors/lvmagcam.yml.

This restart command is generally safe to use, but there may be simpler and faster troubleshooting that the user can try before resorting it.

Restarting lvmtan

The Twice-As-Nice devices (K-mirrors, focusers, fibre selector) may hang up at times. In this case you can try restarting the telescope subsystem with

await g.telescopes.restart()

Or do a more focused restart of the TAN system with

await g.telescopes.restart_lvmtan()

If this does not work you may need to use the GUI. In lvmweb go to the Motor Controllers secion. There should be eight elements in the interface (three K-mirrors, four focusers, on fibre selector). Each one of them has a small circular “LED” that can be red (not working) or green (connected). Make sure all the devices have green circles. If some of them do now, try restarting the deployment

g.kubernetes.restart_deployment('lvmtan')

then reload the controllers page and wait until all the LEDs are green. You’ll also see some green checkmarks. If they are red crosses that means that the device is in a bad state. Try stopping and aborting the invalid devices and then home them. When you home them you should see the motor numbers/degrees change and a progress bar. The progress bar and numbers must at some point stop (K-mirrors will home at -135 degrees, fibre selector at 0, focusers at 40).

It may requires a few stop/abort/home and even various restarts of the controller to get things to work again.

Restarting a deployment

Most users will just want to restart a subsystem as shown above. For those wanting a finer control of what software is running, the Kubernetes access point provices

LVM actors and services run in a Kubernetes cluster as deployments. To restart an actor or service we must restart the deployment. gort provides a simple object to access the cluster API and perform some usual tasks. For example, to restart the actor lvmguider, for example

>>> g = await Gort(verbosity='debug').init()
>>> g.kubernetes.restart_deployment('lvmguider')
21:28:18 [DEBUG]:   Deleting deployment lvmguider.
21:28:23 [INTO]:    Starting deployment from YAML file /home/sdss5/config/kube/actors/lvmguider.yml.

If the deployment was not running you may see a message indicating that the deployment is being recreated from a YAML file. You can see running deployments with

>>> g.kubernetes.list_deployments()
['local-path-provisioner',
 'rabbitmq',
 'lvmnps',
 'gort-websocket',
 'lvmieb',
 'lvmtelemetry',
 'restapi',
 'kubernetes-dashboard-metrics-scraper',
 'metrics-server',
 'kubernetes-dashboard-cert-manager-webhook',
 'kubernetes-dashboard-nginx-controller',
 'kubernetes-dashboard-metrics-server',
 'kubernetes-dashboard-api',
 'kubernetes-dashboard-web',
 'kubernetes-dashboard-cert-manager-cainjector',
 'kubernetes-dashboard-cert-manager',
 'coredns',
 'traefik',
 'lvm-spec-pressure-sp2',
 'lvm-spec-pressure-sp1',
 'lvm-spec-pressure-sp3',
 'lvm-jupyter',
 'lvmecp',
 'lvmscp',
 'lvmguider',
 'lvmagcam',
 'lvmpwi-sci',
 'lvmpwi-spec',
 'lvmpwi-skye',
 'lvmpwi-skyw',
 'lvmtan',
 'cerebro']

Warning

This feature requires running gort in a machine that has access to the Kubernetes cluster. While you can (but is not recommended) to run gort locally and access the RabbitMQ exchange by forwarding its access port, you won’t be able to do the same to access the Kubernetes API.

Here is a list of deployments, what they do, and when it may be useful to restart them. Users should not try to restart deployments not listed in this table.

Deployment

Function

When to restart

lvmpwi-sci

Science telescope PlaneWave mount control

Science mount not responding

lvmpwi-spec

Spec telescope PlaneWave mount control

Spec mount not responding

lvmpwi-skye

SkyE telescope PlaneWave mount control

SkyE mount not responding

lvmpwi-skyw

SkyW telescope PlaneWave mount control

SkyW mount not responding

lvmtan

K-mirror, focuser, and fibre mask controller

K-mirror, focuser, or mask not responding or stuck

lvmagcam

Auto-guider camera control

Cameras not exposing or not present

lvmguider

Guiding and focusing

Guider not working, guiders won’t change to idle, focus routine failing

lvmnps

Network power switches control

Calibration lamps not working

lvmecp

Enclosure control

Any enclosure related issue (dome not moving, lights, etc.)

lvmscp

Spectrograph control

Spectrographs not taking exposures

lvmieb

Spectrograph motor controllers and electronics

Shutter or hartmann doors not working

cerebro

Collects information from actors and services and stores it in InfluxDB

Data in InfluxDB/Grafana not updating

lvm-jupyter

Jupyter Lab server

Jupyter Lab notebooks not working

lvmtelemetry

Monitors optical table temperatures

Focus-temperature relationship failing

lvm-spec-pressure-sp1

Spec 1 pressure reporting

Spec 1 pressure not showing in Grafana

lvm-spec-pressure-sp2

Spec 2 pressure reporting

Spec 2 pressure not showing in Grafana

lvm-spec-pressure-sp3

Spec 3 pressure reporting

Spec 3 pressure not showing in Grafana