Troubleshooting¶
Restarting a subsystem¶
When a subsystem fails it is possible to restart the underlying software and controllers and reset all devices to a nominal state. This can be accomplished by calling GortDeviceSet.restart
, for example
>>> await g.ags.restart()
03:45:03 [DEBUG] Deleting deployment lvmagcam.
03:45:06 [INFO] Starting deployment from YAML file /home/sdss5/config/kube/actors/lvmagcam.yml.
This restart command is generally safe to use, but there may be simpler and faster troubleshooting that the user can try before resorting it.
Restarting lvmtan¶
The Twice-As-Nice devices (K-mirrors, focusers, fibre selector) may hang up at times. In this case you can try restarting the telescope subsystem with
await g.telescopes.restart()
Or do a more focused restart of the TAN system with
await g.telescopes.restart_lvmtan()
If this does not work you may need to use the GUI. In lvmweb
go to the Motor Controllers secion. There should be eight elements in the interface (three K-mirrors, four focusers, on fibre selector). Each one of them has a small circular “LED” that can be red (not working) or green (connected). Make sure all the devices have green circles. If some of them do now, try restarting the deployment
g.kubernetes.restart_deployment('lvmtan')
then reload the controllers page and wait until all the LEDs are green. You’ll also see some green checkmarks. If they are red crosses that means that the device is in a bad state. Try stopping and aborting the invalid devices and then home them. When you home them you should see the motor numbers/degrees change and a progress bar. The progress bar and numbers must at some point stop (K-mirrors will home at -135 degrees, fibre selector at 0, focusers at 40).
It may requires a few stop/abort/home and even various restarts of the controller to get things to work again.
Restarting a deployment¶
Most users will just want to restart a subsystem as shown above. For those wanting a finer control of what software is running, the Kubernetes
access point provices
LVM actors and services run in a Kubernetes cluster as deployments. To restart an actor or service we must restart the deployment. gort
provides a simple object to access the cluster API and perform some usual tasks. For example, to restart the actor lvmguider
, for example
>>> g = await Gort(verbosity='debug').init()
>>> g.kubernetes.restart_deployment('lvmguider')
21:28:18 [DEBUG]: Deleting deployment lvmguider.
21:28:23 [INTO]: Starting deployment from YAML file /home/sdss5/config/kube/actors/lvmguider.yml.
If the deployment was not running you may see a message indicating that the deployment is being recreated from a YAML file. You can see running deployments with
>>> g.kubernetes.list_deployments()
['local-path-provisioner',
'rabbitmq',
'lvmnps',
'gort-websocket',
'lvmieb',
'lvmtelemetry',
'restapi',
'kubernetes-dashboard-metrics-scraper',
'metrics-server',
'kubernetes-dashboard-cert-manager-webhook',
'kubernetes-dashboard-nginx-controller',
'kubernetes-dashboard-metrics-server',
'kubernetes-dashboard-api',
'kubernetes-dashboard-web',
'kubernetes-dashboard-cert-manager-cainjector',
'kubernetes-dashboard-cert-manager',
'coredns',
'traefik',
'lvm-spec-pressure-sp2',
'lvm-spec-pressure-sp1',
'lvm-spec-pressure-sp3',
'lvm-jupyter',
'lvmecp',
'lvmscp',
'lvmguider',
'lvmagcam',
'lvmpwi-sci',
'lvmpwi-spec',
'lvmpwi-skye',
'lvmpwi-skyw',
'lvmtan',
'cerebro']
Warning
This feature requires running gort
in a machine that has access to the Kubernetes cluster. While you can (but is not recommended) to run gort
locally and access the RabbitMQ exchange by forwarding its access port, you won’t be able to do the same to access the Kubernetes API.
Here is a list of deployments, what they do, and when it may be useful to restart them. Users should not try to restart deployments not listed in this table.
Deployment |
Function |
When to restart |
---|---|---|
lvmpwi-sci |
Science telescope PlaneWave mount control |
Science mount not responding |
lvmpwi-spec |
Spec telescope PlaneWave mount control |
Spec mount not responding |
lvmpwi-skye |
SkyE telescope PlaneWave mount control |
SkyE mount not responding |
lvmpwi-skyw |
SkyW telescope PlaneWave mount control |
SkyW mount not responding |
lvmtan |
K-mirror, focuser, and fibre mask controller |
K-mirror, focuser, or mask not responding or stuck |
lvmagcam |
Auto-guider camera control |
Cameras not exposing or not present |
lvmguider |
Guiding and focusing |
Guider not working, guiders won’t change to idle, focus routine failing |
lvmnps |
Network power switches control |
Calibration lamps not working |
lvmecp |
Enclosure control |
Any enclosure related issue (dome not moving, lights, etc.) |
lvmscp |
Spectrograph control |
Spectrographs not taking exposures |
lvmieb |
Spectrograph motor controllers and electronics |
Shutter or hartmann doors not working |
cerebro |
Collects information from actors and services and stores it in InfluxDB |
Data in InfluxDB/Grafana not updating |
lvm-jupyter |
Jupyter Lab server |
Jupyter Lab notebooks not working |
lvmtelemetry |
Monitors optical table temperatures |
Focus-temperature relationship failing |
lvm-spec-pressure-sp1 |
Spec 1 pressure reporting |
Spec 1 pressure not showing in Grafana |
lvm-spec-pressure-sp2 |
Spec 2 pressure reporting |
Spec 2 pressure not showing in Grafana |
lvm-spec-pressure-sp3 |
Spec 3 pressure reporting |
Spec 3 pressure not showing in Grafana |