Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up airflow variable defaults with descriptions automatically #4297

Prev Previous commit
Next Next commit
Automatically set up airflow variable defaults with descriptions.
  • Loading branch information
madewithkode committed May 9, 2024
commit 5f1997b1283c9e92e812a30ad77df39fb4846bc6
14 changes: 11 additions & 3 deletions catalog/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -97,12 +97,20 @@ fi
# if the key doesn't already exist in the database i.e not found in
# $existing_variables
while IFS=$'\t' read -r column1 column2 column3; do
# skip the first meta row
if [[ $column3 == "description" ]] || [[ ${existing_variables[*]} =~ $column1 ]]; then
# skip the first meta row or a row with empty data
if [[ $column3 == "description" ]] || [[ -z $column2 ]]; then
continue
fi

if [ "$column1" != "Key" ]; then
# check if current key already exists
matched=false
for variable in "${existing_variables[@]}"; do
if [[ $variable == "$column1" ]]; then
matched=true
fi
done

if [ "$column1" != "Key" ] && ! $matched; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am...confused about how this logic works 😅 We're checking if column3 is "description" above and skipping on that case, but then we're checking once again for "Key" here? And it looks like this would always be true even on the first line because the TSV uses the term key and not Key. I think this first predicate can be removed.

airflow variables set --description "$column3" "$column1" "$column2"
fi
done <"variables.tsv"
Expand Down
1 change: 1 addition & 0 deletions catalog/variables.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@ key default description
SILENCED_SLACK_NOTIFICATIONS {} Configuration for a silencing Slack notifications from a DAG. Mapping of DAG ID to a list of dictionaries containing the following keys: "issue" (a link to a GitHub issue which describes why the notification is silenced and tracks resolving the problem), "predicate" (Slack notifications whose text or username contain the predicate will be silenced, matching is case-insensitive), and "task_id_pattern" (a regex pattern that matches the task_id of the task that triggered the notification, optional). Declaration: https://github.com/WordPress/openverse/blob/d500b7764c411f7d228ae12c57dce519c8709610/catalog/dags/common/slack.py#L72-L86 Example: { "finnish_museums_workflow": [ { "issue": "https://github.com/WordPress/openverse/issues/1605", "predicate": "AirflowTaskTimeout", "task_id_pattern": "clean_data" } ] }
SKIPPED_INGESTION_ERRORS {} Configuration for silencing an ingestion error and preventing a Slack message from being sent. Mapping of DAG ID to a list of dictionaries containing the following keys: "issue" (a link to a GitHub issue which describes why the error is silenced and tracks resolving the problem), and "predicate" (errors whose classname or message contain the predicate will be skipped, matching is case-insensitive). Declaration: https://github.com/WordPress/openverse/blob/6636dcfbb57abca19ef32027975f78548e10411f/catalog/dags/providers/provider_api_scripts/provider_data_ingester.py#L53-L64 Example: { "science_museum_workflow": [ { "issue": "https://github.com/WordPress/openverse/issues/4013", "predicate": "Service unavailable for url" } ] }
CONFIGURATION_OVERRIDES {} DAG configuration overrides for the provider ingestion workflows. Currently only supports overriding the execution timeout for certain tasks, but allows dynamic overrides at DAG run time. Mapping of DAG ID to a list of dictionaries containing the following keys: "task_id" (a regex pattern that matches the task_id of the task to be modified), and "timeout" (str in "%d:%H:%M:%S" format giving the amount of time the task may take, example: 6d:10h:30m). Declaration: https://github.com/WordPress/openverse/blob/2cffcb9f8da6961e84a00854a3cd472fd0f9dad8/catalog/dags/providers/provider_workflows.py#L42-L58 Example: { "brooklyn_museum_workflow": [ { "task_id_pattern": "pull_image_data", "timeout": "10h" } ] }
TESTING_CONFIGURATION_OVERRIDES {} DAG configuration overrides for the provider ingestion workflows. Currently only supports overriding the execution timeout for certain tasks, but allows dynamic overrides at DAG run time. Mapping of DAG ID to a list of dictionaries containing the following keys: "task_id" (a regex pattern that matches the task_id of the task to be modified), and "timeout" (str in "%d:%H:%M:%S" format giving the amount of time the task may take, example: 6d:10h:30m). Declaration: https://github.com/WordPress/openverse/blob/2cffcb9f8da6961e84a00854a3cd472fd0f9dad8/catalog/dags/providers/provider_workflows.py#L42-L58 Example: { "brooklyn_museum_workflow": [ { "task_id_pattern": "pull_image_data", "timeout": "10h" } ] }