Post-Installation Setup¶
The monitoring application requires substantial manual configuration after installation.
Secrets And Tokens¶
While the admin token can be supplied from an existing secret, it is
necessary to run tokenmaker
to generate the token that the tasks
(which run as cronjobs) will require and the token that the telegraf
and telegraf-ds
scrapers will need in order to send their data to the monitoring database.
Set up the Environment¶
Clone the rubin-influx-tools repository and change your current working directory to the clone location.
Create a new virtualenv to work in. Activate it.
Run
make init
to installrubin-influx-tools
into that virtualenv.Export the following environment variables:
INFLUXDB_ORG
should be set to the influx organization, typicallysquare
INFLUXDB_TOKEN
should be set to the admin token value, which can be found in 1Password under theadmin-token
entry.INFLUXDB_URL
should be set to the URL for the InfluxDBv2 server (e.g.https://monitoring.lsst.cloud
)
Run
tokenmaker
. You will need the two tokens created. If you are not using 1Password, save them somewhere safe. If you are:In 1Password, find the set of secrets for the Phalanx environment you’re working on.
Open the
monitoring
secret.Edit the
influx-alert-token
password’s value, and change it to the value of “Token for task/alert creation” that was displayed when you rantokenmaker
.Edit the
telegraf-token
password’s value, and change it to the value of “Token for remote telegraf bucket writing” that was displayed when you rantokenmaker
.Save the secret.
Audit the secrets: Audit secrets for an environment. This should show only
influx-alert-token
andtelegraf-token
with unexpected values in themonitoring
app. If anything else is incorrect, fix that first before coming back here.Sync the secrets: Sync secrets for an environment.
Delete the
monitoring
secret from themonitoring
namespace in your Phalanx environment. It will be recreated with the new values. This step is not necessary, but you may have to wait up to 15 minutes for the secret to be updated.
This should suffice to get the “monitoring” application going.
Chronograf¶
Now it is very important that you be the first person to visit the Chronograf endpoint and authenticate (it will use the local Gafaelfawr instance to do so).
Initial Chronograf Configuration¶
It will be at the /chronograf
path on the general Phalanx instance endpoint.
Log in with OIDC.
This will create your user and make you a super-admin.
You will see a screen with a Get Started
button, which you should press.
To set up the connection to influxDBv2:
Switch the auth method to “InfluxDB v2 Auth” at the bottom left.
Set
Connection URL
to the InfluxDB endpoint (e.g.https://monitoring.lsst.cloud
).Set
Organization
to the InfluxDB org, usuallysquare
.Paste the admin token into the
Token
field.Set the default retention policy.
30d
is typical, and if you don’t have a strong opinion, use it.Press the “Add Connection” button.
Next, skip dashboard creation (Skip
at the bottom center of the screen).
Skip Kapacitor setup as well on the next screen. Press View All Connections
.
Now you should be at the Chronograf main UI screen.
Chronograf User Policy¶
By clicking the crown icon (Admin
) on the left side of the screen, and then choosing Chronograf
, and then the “All Users” tab, you can decide whether new users should be super-admins by default or not.
Since there is no mapping from Gafaelfawr scope to Chronograf abilities, you almost certainly do not them to be, and you will have to give new users admin powers (if they should have them) when they first log in.
After there are more admins, of course, someone else can empower new users as they come onboard.
Load dashboards¶
Finally, load the Chronograf dashboards from lsst-sqre/rubin-influx-tools, using the Import Dashboard
button in the upper left of the Dashboards
screen, acessible through the Dashboard
(bunch-of-rectangles) icon on the left side of the main screen.
For each dashboard, take the default options for “Sources in Dashboard”.
Monitoring Agents¶
You will need to update the influx-token
secret in any environment that is feeding your new monitoring server, so that the telegraf and telegraf-ds agents are able to talk to it.
This is why it was convenient to save telegraf-token
in the 1Password vault for the monitoring
server’s environment, because you can trivially cut-and-paste it.
Sync the secrets: Sync secrets for an environment and delete the telegraf
and telegraf-ds
secrets in their respective namespaces if you’re impatient.
After the secrets are synced, restart the agents; for telegraf
that’s the deployment, and for telegraf-ds
it’s the daemonset.