Post-Installation Setup¶
The monitoring application requires substantial manual configuration after installation.
Secrets And Tokens¶
While the admin token can be supplied from an existing secret, it is
necessary to run tokenmaker to generate the token that the tasks
(which run as cronjobs) will require and the token that the telegraf and telegraf-ds scrapers will need in order to send their data to the monitoring database.
Set up the Environment¶
Clone the rubin-influx-tools repository and change your current working directory to the clone location.
Create a new virtualenv to work in. Activate it.
Run
make initto installrubin-influx-toolsinto that virtualenv.Export the following environment variables:
INFLUXDB_ORGshould be set to the influx organization, typicallysquareINFLUXDB_TOKENshould be set to the admin token value, which can be found in 1Password under theadmin-tokenentry.INFLUXDB_URLshould be set to the URL for the InfluxDBv2 server (e.g.https://monitoring.lsst.cloud)
Run
tokenmaker. You will need the two tokens created. If you are not using 1Password, save them somewhere safe. If you are:In 1Password, find the set of secrets for the Phalanx environment you’re working on.
Open the
monitoringsecret.Edit the
influx-alert-tokenpassword’s value, and change it to the value of “Token for task/alert creation” that was displayed when you rantokenmaker.Edit the
telegraf-tokenpassword’s value, and change it to the value of “Token for remote telegraf bucket writing” that was displayed when you rantokenmaker.Save the secret.
Audit the secrets: Audit secrets for an environment. This should show only
influx-alert-tokenandtelegraf-tokenwith unexpected values in themonitoringapp. If anything else is incorrect, fix that first before coming back here.Sync the secrets: Sync secrets for an environment.
Delete the
monitoringsecret from themonitoringnamespace in your Phalanx environment. It will be recreated with the new values. This step is not necessary, but you may have to wait up to 15 minutes for the secret to be updated.
This should suffice to get the “monitoring” application going.
Chronograf¶
Now it is very important that you be the first person to visit the Chronograf endpoint and authenticate (it will use the local Gafaelfawr instance to do so).
Initial Chronograf Configuration¶
It will be at the /chronograf path on the general Phalanx instance endpoint.
Log in with OIDC.
This will create your user and make you a super-admin.
You will see a screen with a Get Started button, which you should press.
To set up the connection to influxDBv2:
Switch the auth method to “InfluxDB v2 Auth” at the bottom left.
Set
Connection URLto the InfluxDB endpoint (e.g.https://monitoring.lsst.cloud).Set
Organizationto the InfluxDB org, usuallysquare.Paste the admin token into the
Tokenfield.Set the default retention policy.
30dis typical, and if you don’t have a strong opinion, use it.Press the “Add Connection” button.
Next, skip dashboard creation (Skip at the bottom center of the screen).
Skip Kapacitor setup as well on the next screen. Press View All Connections.
Now you should be at the Chronograf main UI screen.
Chronograf User Policy¶
By clicking the crown icon (Admin) on the left side of the screen, and then choosing Chronograf, and then the “All Users” tab, you can decide whether new users should be super-admins by default or not.
Since there is no mapping from Gafaelfawr scope to Chronograf abilities, you almost certainly do not them to be, and you will have to give new users admin powers (if they should have them) when they first log in.
After there are more admins, of course, someone else can empower new users as they come onboard.
Load dashboards¶
Finally, load the Chronograf dashboards from lsst-sqre/rubin-influx-tools, using the Import Dashboard button in the upper left of the Dashboards screen, acessible through the Dashboard (bunch-of-rectangles) icon on the left side of the main screen.
For each dashboard, take the default options for “Sources in Dashboard”.
Monitoring Agents¶
You will need to update the influx-token secret in any environment that is feeding your new monitoring server, so that the telegraf and telegraf-ds agents are able to talk to it.
This is why it was convenient to save telegraf-token in the 1Password vault for the monitoring server’s environment, because you can trivially cut-and-paste it.
Sync the secrets: Sync secrets for an environment and delete the telegraf and telegraf-ds secrets in their respective namespaces if you’re impatient.
After the secrets are synced, restart the agents; for telegraf that’s the deployment, and for telegraf-ds it’s the daemonset.