Chemical Analysis
Chemical networks are complex systems where the interplay between many elements often means that small changes in one aspect of the network can greatly effect the outcome in unexpected ways. Nevertheless, there are cases where a simple chemical explanation can be found for some observed behaviour in the model outputs. This tutorial demonstrates how to use some of the functionality of the UCLCHEM library to analyse model outputs and discover these explanations.
We do recommend caution when following the approach laid out in this tutorial. There are many pitfalls which we will try to point out as we go. Ultimately, a comprehensive view of how important a reaction is to the outcome of your model requires detailed statistical calculations such as the use of SHAP values to assign meaningful scores to how much various reactions in a network contribute to a species' abundance. Therefore, care must be taken by the user to ensure that the conclusions they draw from a simpler approach are sound.
We'll use an example from work that was recently published at the time of writing this tutorial to demonstrate the use of uclchem.analysis.analysis()
and how it can be used to draw conclusions about the most important reactions in a network for a given species/behaviour.
import uclchem
from glob import glob
from joblib import Parallel, delayed
import numpy as np
import pandas as pd
import os
H3O+ and SOβ
In a piece of inference work in which we measured the cosmic ray ionization rate (CRIR) in NGC 253 (Holdship et al. 2022). We found that both H3O+ and SO were sensitive to the ionization rate. Furthermore, since H3O+ was increased in abundance by increasing CRIR and SO was destroyed, their ratio was extremely sensitive to the rate.
In the work, we present the plot below which shows how the equilibrium abundance of each species changes with the CRIR as well as the ratio. We plot this for a range of temperatures to show that this behaviour is not particularly sensitive to the gas temperature.
In a sense, this is all the information we need. A complex array of reactions all compete and contribute to produce the outcome of the model. Whatever they are, we see the abundance of these species are very sensitive to the CR over a wide range of temperature and density. However, we can only trust this conclusion as far as we trust the entire chemical network since we don't know what exactly causes this behaviour.
If we use the analysis function, we may find all of this is driven by a small handful of reactions. The benefit would then be that our trust in our conclusions only depends on how much we trust the rates of those specific reactions. If it is very important, we can evaluate those reactions and readers of our work can make the same decisions as information from experiment changes.
1. Generate Test Casesβ
Ideally, we should run a huge grid of models and find some rigorous way to evaluate how important each reaction is to the outcome of the model. However, if we run a small grid representative of different conditions then any simple chemical explanation will be evident if it exists. If it no simple explanation is appropriate, then the analysis module is not appropriate for our task and we could consider more complex approaches.
Let's run a simple grid with all possible combinations of the following:
- A low CRIR (zeta=1) and high CRIR (zeta=1e4)
- A typical cloud density (n=1e4) and high density (n=1e6)
- The lower temperature bound of NGC 253 CMZ (75 K)* and a high temperature (250 K)
- The lower boundary is a bit lower, but the computational time of 50K models is a lot longer than 75K so we stick with a bit higher values for speed
and that will give us enough to work with for our analysis.
temperatures = [75, 250]
densities = [1e4, 1e6]
zetas = [1, 1e4]
parameterSpace = np.asarray(np.meshgrid(temperatures, densities, zetas)).reshape(3, -1)
model_table = pd.DataFrame(parameterSpace.T, columns=["temperature", "density", "zeta"])
model_table["outputFile"] = model_table.apply(
lambda row: f"../output/{row.temperature}_{row.density}_{row.zeta}.csv", axis=1
)
print(f"{model_table.shape[0]} models to run")
if not os.path.exists("../output"):
os.makedirs("../output")
def run_model(row):
# basic set of parameters we'll use for this grid.
ParameterDictionary = {
"baseAv": 10, # UV shielded gas in our model
"freefall": False,
"finalTime": 1e6,
"initialtemp": row["temperature"],
"initialdens": row["density"],
"zeta": row["zeta"],
"outputFile": row["outputFile"],
}
result = uclchem.model.cloud(param_dict=ParameterDictionary)
return result[0] # just the integer error code
results = Parallel(n_jobs=4, verbose=100)(
delayed(run_model)(row) for idx, row in model_table.iterrows()
)
8 models to run [Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done 1 tasks | elapsed: 2.2s
[Parallel(n_jobs=4)]: Done 2 out of 8 | elapsed: 2.9s remaining: 8.6s
[Parallel(n_jobs=4)]: Done 3 out of 8 | elapsed: 8.5s remaining: 14.2s
[Parallel(n_jobs=4)]: Done 4 out of 8 | elapsed: 11.5s remaining: 11.5s
[Parallel(n_jobs=4)]: Done 5 out of 8 | elapsed: 12.7s remaining: 7.6s
[Parallel(n_jobs=4)]: Done 6 out of 8 | elapsed: 13.1s remaining: 4.4s
[Parallel(n_jobs=4)]: Done 8 out of 8 | elapsed: 17.8s finished
2. Run the Analysisβ
The analysis
function is very simple. For every time step in your model output, it will use UCLCHEM to calculate the rate of every reaction which includes a chosen species. It then uses the abundances at that time step to calculate the total contribution of that reaction to the rate of change of the species' abundance.
For example, if we care about H3O+ and find that it is created by