Intermediate level tutorial - how to calibrate a Mizer model

In this tutorial you will learn

In this tutorial, we are going to follow the same path as HTM1 and keep using the North Sea data set as an example.

We finished HTM1 with a manually parametrised model with species coexisting, now we want our model to reflect reality. To do so we are going to compare the model output to data by matching catch, biomass and/or growth rate depending on what is available.

In the plot below we compare the catch output of the model and the averaged catch we extracted from the ICES database in the previous tutorial using the function plotPredObsYield. This is going to be our main comparison used to calibrate the model towards more real life behavior.

## Warning: Removed 1 rows containing missing values (geom_abline).

At the moment, the observed yield are higher than the predicted yield for all species. We want them to be equal (on the diagonal).

Step 1. Calibrate the maximum recruitment

In this section you will:

  • use a package that will calibrate R_max per species

R_max affects the relative biomass of each species (and, combined with the fishing parameters, the catches). We are going to change R_max with the aim of minimising the error between observed and estimated catches or biomasses. Both erepro and kappa affect the relative biomass of each species and can be used (as in Jacobsen et al., 2016 & Spence et al., 2016) but for now we are going to keep them at default value and focus on R_max only.

First let’s set up a function running the model and outputting the difference between predicted catches (getYield()) and actual catches (catchAvg). getError() outputs the sum of squared errors between the two.

# we need 12 Rmaxs, log10 scale
vary <- log10(params_guessed@species_params$R_max)
## test it
getError(vary = vary, params = params_guessed, dat = catchAvg$Catch_1419_tonnes)
## Convergence was achieved in 103.5 years.
## [1] 77.34717

Now, carry out the optimisation. There are several optimisation methods to choose from - we need to select the most robust one to share here. The R package optimParallel seems to be the most robust general R package and has replaced optim. Often this requires repeating the procedure several times but the advantage of using parallel run is the speed compared to packages such as optimx.

# create set of params for the optimisation process
params_optim <- params_guessed
vary <-  log10(params_optim@species_params$R_max) # variable to explore
params_optim<-setParams(params_optim)
# set up workers
noCores <- detectCores() - 1 # keep some spare core
cl <- makeCluster(noCores, setup_timeout = 0.5)
setDefaultCluster(cl = cl)
clusterExport(cl, as.list(ls()))
clusterEvalQ(cl, {
  library(mizerExperimental)
  library(optimParallel)
})
optim_result <- optimParallel::optimParallel(par=vary,getError,params=params_optim, dat = catchAvg$Catch_1419_tonnes, method   ="L-BFGS-B",lower=c(rep(3,12)),upper= c(rep(15,12)),
                            parallel=list(loginfo=TRUE, forward=TRUE))
stopCluster(cl)
# saveRDS(optim_result,"optimParallel_Rmax1.RDS")
# if previous block wasn't run, load the saved value
# load("optimParallel_Rmax1.RDS")
# optim_result <- optim_result

# we are using a saved file:
optim_result <- HTM2_optimParallel_Rmax1

#set up param object again
params_optim <- params_guessed

# replace Rmax with optim values:
params_optim@species_params$R_max <- 10^optim_result$par 
params_optim <-setParams(params_optim)
sim_optim <- project(params_optim, effort = 1, t_max = 100, dt=0.1,initial_n = sim_guessed@n[100,,],initial_n_pp = sim_guessed@n_pp[100,],progress_bar = F)
# saveRDS(sim_optim,"sim_optim1.RDS")
plotCalibration(sim_optim,catch_dat = catchAvg$Catch_1419_tonnes )

Calibrating for best observed/predicted yield makes one species (Sprat) show signs of collapse. We need to look at other parameters to get the community to coexist again.

Step 2. Calibrate the recruitment with erepro.

In this section you will:

  • Look at effect of erepro on the reproductive outputs

  • Check what impact erepro has on the \(F_{msy}\)

erepro represents all the losses that occur between energy allocated to reproduction and the recruitment (expect those due to density dependence). That would be: the energy conversion efficiency between energy allocated to reproduction and eggs, cost of spawning migration and actual spawning (e.g. forgone feeding), and egg mortality. Lowering erepro biologically means higher egg mortality rate or wasteful energy invested into gonads. For example, erepro is currently set to 0.01 meaning, that for every one g of mass allocated to reproduction, 0.01 g will be used as spawn recruitment, the rest is lost. This coefficient is applied before any density-dependence is applied. Let’s use Rshiny to see how varying erepro influences the ecosystem (to play around).

shiny_erepro(input = HTM2_sim_optim1@params, dat = catchAvg$Catch_1419_tonnes)

A good indicator of realistic erepro value is the shape of the fishing mortality rate versus yield curve. Such graph allows to determine the fisheries maximum sustainable yield when harvesting a species and is supposed to have a bell shape. Too high fishing mortality depletes the stock and reduce the yield whereas too low fishing mortality simply means low yield. The \(F_{msy}\) sits in the middle, at the top of the bell. If a species is too reproduction efficient and can replenish its biomass even under high fisheries mortality, it can tolerate a high \(F_{msy}\), and its erepro is too high. Conversely, if a low fisheries mortality depletes the stock right away, the species’ erepro is too low.

Let’s take a look at the current \(F_{msy}\) per species.

Note that plotFmsy() takes a long time to compute, especially at high effort resolution (effortRes argument which determines the number of simulation per species that are run).

Decreasing erepro is going to move the \(F_{msy}\) towards lower effort. Until now we had the same erepro for all species but we can input species-specific values to calibrate our recruitment. To do so, let’s use another shiny app again. The next one allows you to change erepro one at a time and see the effect on the species’ \(F_{msy}\)

shiny_fmsy(params = HTM2_sim_optim1@params, dat = HTM2_Fmsy2) 

Note: the shiny app only calculate the \(F_{msy}\) of one species at a time (for speed reasons) and the final results may not be exactly the same as calculating the \(F_{msy}\) for every species together (as shown below)

We have tweaked erepro per species using the shiny app, let’s add these new values in the mizerParams object and run a simulation again. The values used in the block below are specific to the North Sea and will differ from any other ecosystem.

sim_optim <- HTM2_sim_optim1
params_optim <- sim_optim@params # redundancy if previous blocks were not run 
params_optim@species_params$erepro <- 10^(c(-1.7,-3.2,-4,-4,-3,-3.5,-3.5,-4.1,-2.7,-3,-2.5,-1.7))
params_optim2 <- setParams(params_optim)
sim_optim2 <- project(params_optim2, effort = 1, t_max = 100, progress_bar = F)
# saveRDS(sim_optim2, file = "sim_optim2.RDS")

Now, we see all species’ yield go down at high fishing mortality rate. It means that the species are not super efficient at reproduction anymore and do decrease in biomass under high fishing mortality rate.

Before spending more time hand-tweaking these species, we are going to run the R_max calibration again to see if we improved the yield match (as in step 1).

Iterating the R_max calibration process multiple times can help find better R_max values as the first run of optimParallel() may not find the best combination of R_max. Below is an example of optimParallel() being run five times in a row with progressively smaller error between empirical catches data and modelled catches. After each run of optimParallel() the result is used as new input for the next iteration.

# looping calibration to get more accurate results
sim_loop <- HTM2_sim_optim2
params_loop <- sim_loop@params
for(i in 1:5)
{
# saving the last time step in the param object
params_loop@initial_n <- sim_loop@n[dim(sim_loop@n)[1],,]
params_loop@initial_n_pp <- sim_loop@n_pp[dim(sim_loop@n_pp)[1],]
params_calibration <- params_loop
vary <-  log10(params_calibration@species_params$R_max)
params_calibration<-setParams(params_calibration)
noCores <- detectCores() - 1 # keep a spare core
cl <- makeCluster(noCores, setup_timeout = 0.5)
setDefaultCluster(cl = cl)
clusterExport(cl, as.list(ls()))
clusterEvalQ(cl, {
  library(mizerExperimental)
  library(optimParallel)
})
optim_loop <-optimParallel::optimParallel(par=vary,getError,params=params_calibration, dat = catchAvg$Catch_1419_tonnes, 
                             method   ="L-BFGS-B",lower=c(rep(3,12)),upper= c(rep(15,12)),
                            parallel=list(loginfo=TRUE, forward=TRUE))
stopCluster(cl)

# optim values:
params_loop@species_params$R_max <- 10^optim_loop$par 
# set the param object 
params_loop <-setParams(params_loop)
sim_loop <- project(params_loop, effort = 1, t_max = 100, dt=0.1, initial_n = params_loop@initial_n ,
                      initial_n_pp = params_loop@initial_n_pp, progress_bar = F)
}
# saveRDS(optim_loop,"optimParallel_Rmax2.RDS")
# running sim a bit longer before saving it
sim_loop <- project(params_loop, effort = 1, t_max = 300, progress_bar = F)
params_loop@initial_n <- sim_loop@n[dim(sim_loop@n)[1],,]
params_loop@initial_n_pp <- sim_loop@n_pp[dim(sim_loop@n_pp)[1],]
sim_loop <- project(params_loop, effort = 1, progress_bar = F)
# saveRDS(sim_loop,"sim_optim3.RDS")
## Warning: Removed 1 rows containing missing values (geom_abline).

The different species look more spread around the x = y line but the scale is also three times smaller than the previous plot. Here, only Sandeel is an outlier.

Step 3. Calibrating the growth

In this section you will:

  • Look at the growth curves of each species

  • Tweak the kappa parameter to adjust the growth curves of all species at once

  • Tweak the gamma parameter to adjust the growth curve species by species

Most species have a growth curve similar to the expected von Bertalanffy growth curve, apart from Sprat and Sandeel which have much slower growth than expected. kappa, which is the carrying capacity of the background spectrum will affect the food availability and therefore the growth rate (more food means faster growth). However, changing kappa will affect all growth curves. If only a few species are off, we need to change the gamma parameter (search volume) per species.

The plot above shows the “feeding level”, which is the ratio between consumption and maximum consumption. It therefore cannot exceed 1. The lower straight lines show the “critical feeding level”, which is the amount of food needed to keep the fish from starving. The feeding level here shows that the species reaching the highest size classes have a feeding level close to one, meaning that they feed to satiation. Let’s look at their diet to check what makes them to full.

The diets look great as the fish feed on each other and do not rely too much on the background spectrum. We can note that many species rely heavily on sprat for food, which is probably why sprat struggles to survive in the model. We can reduce the carrying capacity of the background spectrum even more, which could lower the feeding level of the species. Let’s do this using the Rshiny app.

shiny_kappa(param = HTM2_sim_optim3@params, dat = catchAvg$Catch_1419_tonnes)

Decreasing slightly kappa drives Sprat to extinction probably as it becomes the next choice of food when we reduce the size of the background spectrum (as we saw on the diet plots). So how can we reduce the feeding level of the largest predators while keeping Sprat alive?

Such values stop species from growing too fast. Now let’s help the one growing too slow. The most important part of the curve is at the beginning before species reach maturation size. Slowing down growth will delay maturation and can lead to fluctuation in biomass. The feeding level will be close to critical at small size, creating these fluctuations (species compete to much for food). One will notice that changing the gamma of one species will affect others, Sprat and its predators as Sprat is a huge part of the diet of some.

shiny_gamma(param = HTM2_sim_optim3@params, dat = catchAvg$Catch_1419_tonnes)

Let’s input the new gamma value in a new set of parameters

newGamma <- c(1.100000e-10, 6.000000e-11, 9.910525e-11, 7.707098e-11, 2.556902e-11, 4.717030e-11, 5.272039e-11, 1.009092e-10, 4.565019e-11, 7.837794e-11, 4.000000e-10, 2.365167e-10)

params_growth <- sim_loop@params
params_growth@species_params$gamma <- newGamma
params_growth <- setParams(params_growth)
sim_growth <- project(params_growth, effort = 1, t_max = 100, progress_bar = F)

plotCalibration(sim_growth, catch_dat = catchAvg$Catch_1419_tonnes)
## Warning: Removed 1 rows containing missing values (geom_abline).

Small oscillations but the species stabilise again.

Growth curves haven’t changed much but the feeding level has decreased at small size (even if we theoretically increase the feeding of the small species, maybe they are now too good a competitor for the other species and steal their food)

One will note that changing the the search volume of some species did not affect the diet. Therefore gamma doesn’t influence the diet proportion.

Question: What other parameters could be investigated here? Why are Sprat catches so much higher?

Summary

There are some limitations of the above approach and there are many other possible approaches for model calibration and post-hoc exploration. It is also possible to use more data in both the model calibration and evaluation steps. A couple of other approaches we have put together that you may be interested in exploring on your own are:

  • tuneParams() in the ‘mizerExperimental’ package is a more comprehensive shiny app to explore, tune and help you understand mizer models, including the effects of different parameters.

  • HTM3 is a more advanced automated calibration tutorial that uses time-series catch data instead of time-averaged data. This has the advantage of using many more data points and enabling simultaneously estimates of Rmax, erepro, and/or other parameters of interest. It has the advantage of not relying on assumptions about the shape of the single-species yield curves.

These are work in progress but feel free to check them out and let us know what you think.

Finally we would love to have your feedback as we develop these course materials further. Please fill out our survey here:

https://forms.office.com/Pages/ResponsePage.aspx?id=VV3rFZEZvEaNp6slI03uCIO3rXPo2ElDlBTBXAspLLxUNDVCNDI4RUpEMFpKTzFCU01JV1BFVllSSC4u