Using ggplot2 and plotly with mizerSource:
In this tutorial we will use the ggplot2 package and the
plotly package for R to visualise the results from mizer simulations.
Mizer provides several functions for calculating summaries of the
mizer simulation results, see
?summary_functions. Many of
these functions have corresponding plotting functions, see
?plotting_functions. However it is easy to produce
customised plots directly using
plot_ly(). This gives more flexibility than the built-in
plotting functions. Also, you will occasionally want to look at
different quantities for which perhaps there is not built-in plotting
function. In those cases the examples you see below will provide a
Both ggplot2 and plotly works with data frames, and the convenient way to manipulate data frames is the dplyr package.
We create a simple simulation that we will use for our examples below.
params <- newMultispeciesParams(NS_species_params) sim <- project(params, t_max = 10, t_save = 0.5, effort = 0)
From arrays to data frames
Mizer likes to work with arrays indexed by species and time or size.
For example the built-in summary functions return such arrays. These
arrays need to be converted to data frames before they can be
conveniently plotted with either
plot_ly. This conversion is achieved by the function
For example, the function
getBiomass() returns a
two-dimensional array (matrix) with one dimension corresponding to the
time and the second dimension to the species.
biomass <- getBiomass(sim) str(biomass)
## num [1:21, 1:12] 8.77e+07 2.52e+09 3.21e+09 7.50e+08 2.09e+08 ... ## - attr(*, "dimnames")=List of 2 ## ..$ time: chr [1:21] "0" "0.5" "1" "1.5" ... ## ..$ sp : chr [1:12] "Sprat" "Sandeel" "N.pout" "Herring" ...
This array can be converted with the
melt() function to
a data frame that contains one row for each entry in the array.
## 'data.frame': 252 obs. of 3 variables: ## $ time : num 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 ... ## $ sp : Factor w/ 12 levels "Sprat","Sandeel",..: 1 1 1 1 1 1 1 1 1 1 ... ## $ value: num 8.77e+07 2.52e+09 3.21e+09 7.50e+08 2.09e+08 ...
ggplot2 or plotly
In this form the information can be handed to
and converted to a line plot:
We have specified that the time is plotted along the x axis, the value along the y axis and that the different species are represented by the colours of the lines. Note the American spelling “color” required by plotly.
Alternatively we can do the same thing with
Notice the different syntax for the ggplot2 and for plotly packages. The underlying ideas are similar: they are both implementations of the grammar of graphics. I recommend learning ggplot2 first, and switching to plotly only when it has clear advantages (in particular for animations, see below).
The main advantage of plotly, namely the interactivity of the
resulting figures, can be obtained also with the ggplot syntax by
running the result of
ggplot() through the function
Below we will always for each plot first give the ggplot code and then the plotly code.
We may want to add labels to the figure and to each of the axes. In
ggplot this is done with
pg + labs(title = "Biomass plot", x = "Time [years]", y = "Biomass [g]")
In plotly we use the
Filtering out data
We can use the filter function to filter out some of the data. For example we could select only the data for specific species:
Now if we plot this reduced data frame we get
In the above we first converted the array to a data frame with
melt() and then selected the data of interest with
filter(). We could alternatively have first selected only
the desired entries in the array and then created the data frame with
melt() from the resulting smaller array:
nfr <- melt(getBiomass(sim)[, c("Gurnard", "Herring")]) ggplot(nfr) + geom_line(aes(x = time, y = value, color = sp))
The result looks almost identical, except that the colours associated to the species have changed.
Specifying line colours
If we want to make sure the same species always has the same colour, we can use the colours specified by the MizerParams object
## Sprat Sandeel N.pout Herring Dab Whiting Sole ## "#815f00" "#6237e2" "#8da600" "#de53ff" "#0e4300" "#430079" "#6caa72" ## Gurnard Plaice Haddock Cod Saithe Resource Total ## "#ee0053" "#007957" "#b42979" "#142300" "#a08dfb" "green" "black" ## Background Fishing External ## "grey" "red" "grey"
To use these colours in ggplot we add
ggplot(biomass_frame) + geom_line(aes(x = time, y = value, color = sp)) + scale_colour_manual(values = getColours(params))
In plotly we add
colors = getColours(params)r to the
plot_ly(biomass_frame) %>% add_lines(x = ~time, y = ~value, color = ~sp, colors = getColours(params))
Again note the American spelling “colors” required by plotly.
Of course mizer has the
plotSpectra() function for
plotting size spectra. Again it is instructional to create such plots by
We can access the abundance spectra of the species via
N(sim). This is a three-dimensional array (time x species x
size). Let us first look at the abundance at the final time.
drop = FALSE means that the result will again be a 3
## num [1, 1:12, 1:100] 1.92e+06 4.81e+12 1.15e+14 3.86e+12 1.12e+11 ... ## - attr(*, "dimnames")=List of 3 ## ..$ time: chr "10" ## ..$ sp : chr [1:12] "Sprat" "Sandeel" "N.pout" "Herring" ... ## ..$ w : chr [1:100] "0.001" "0.00119" "0.00142" "0.0017" ...
We use the
melt() function to convert this array into a
nf <- melt(final_n)
This has created a data frame with 4 variables and one observation
for each of the 1200 entries in the
1 x 12 x
## 'data.frame': 1200 obs. of 4 variables: ## $ time : int 10 10 10 10 10 10 10 10 10 10 ... ## $ sp : Factor w/ 12 levels "Sprat","Sandeel",..: 1 2 3 4 5 6 7 8 9 10 ... ## $ w : num 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 ... ## $ value: num 1.92e+06 4.81e+12 1.15e+14 3.86e+12 1.12e+11 ...
The first three variables take their values from the dimension names
of the array. Of course the time variable is the same for all
observations, because we selected these before creating the data frame.
The fourth variable called
value is the value of the entry
of the array, so the abundance density in our case.
There are a lot of entries with value 0, which we are not really interested in, so it makes sense to remove them:
nf <- filter(nf, value > 0)
This leaves a data frame with only 945 observations.
We can send this data frame to ggplot and add a line for the spectrum for each species, with a different colour for each, and specify that we want both the x axis and the y axis to be on a logarithmic scale.
pg <- ggplot(nf) + geom_line(aes(x = w, y = value, color = sp)) + scale_x_log10() + scale_y_log10() pg
The corresponding syntax for plotly is
p <- plot_ly(nf) %>% add_lines(x = ~w, y = ~value, color = ~sp) %>% layout(xaxis = list(type = "log", exponentformat = "power"), yaxis = list(type = "log", exponentformat = "power")) p
In the above we used the pipe operator
feeds the return value of one function into the first argument of the
Including resource spectrum
We can include additional lines in the plot by merging several data frames. For example, we can include another line for the resource spectrum. We first convert also the resource abundance at the final time into a data frame and filter out the zero values
This data frame only contains three variables, because it does not
sp column specifying the species. We add this
column with the value “Resource”
nf_pp$sp <- "Resource"
Now this new data frame has the same columns as the data frame
nf and the two can be bound together
nf <- rbind(nf, nf_pp)
Using this extended data frame gives the following plot:
p <- ggplot(nf) + geom_line(aes(x = w, y = value, color = sp)) + scale_colour_manual(values = getColours(params)) + scale_x_log10() + scale_y_log10() p
Of course we could use the same data frame also with plotly.
Limiting the axes
We might want to zoom in on the part that includes the fish. There
are three ways to achieve this. The first is to use
filter() to filter out all the rows in the data frame that
have small w and then plot the resulting data frame as usual:
nf %>% filter(w > 10^-4) %>% ggplot() + geom_line(aes(x = w, y = value, color = sp)) + scale_colour_manual(values = getColours(params)) + scale_x_log10() + scale_y_log10()
The second method is to specify limits for the axes. In ggplot this
is done by adding
limits to the axis scales:
ggplot(nf) + geom_line(aes(x = w, y = value, color = sp)) + scale_colour_manual(values = getColours(params)) + scale_x_log10(limits = c(10^-4, NA)) + scale_y_log10(limits = c(NA, 10^20))
NA means that the existing limits are kept.
In plotly we specify the
range as follows:
plot_ly(nf) %>% add_lines(x = ~w, y = ~value, color = ~sp, colours = getColours(params)) %>% layout(xaxis = list(type = "log", exponentformat = "power", range = c(-4, 4)), yaxis = list(type = "log", exponentformat = "power", range = c(-14, 20)))
Note how in plotly the range is specified by giving the logarithm to base 10 of the limits.
Instead of picking out a specific time we can ask plotly to make an
animation showing the changing spectra over time. So we melt the entire
N(sim) array and use the time variable to specify the
frames with the
frame = ~time argument to
melt(N(sim)) %>% filter(value > 0) %>% plot_ly() %>% add_lines(x = ~w, y = ~value, color = ~sp, colors = getColours(params), frame = ~time, line = list(simplify = FALSE)) %>% layout(xaxis = list(type = "log", exponentformat = "power"), yaxis = list(type = "log", exponentformat = "power"))
Note how this produces a smooth animation in spite of the fact that
we saved the abundances only once a year. That interpolation is
facilitated by the
line = list(simplify = FALSE)
The ggplot package does not provide a similarly convenient way of creating animations. There is the gganimate package, but it is not nearly as convenient.
We may also want to make plots contrasting the results of two different simulations, for example with different fishing policies. To illustrate this we create two simulations with different fishing effort:
sim1 <- project(params, t_max = 10, t_save = 0.2, effort = 2) sim2 <- project(params, t_max = 10, t_save = 0.2, effort = 4)
Let us look at a plot of the fishing yield against time. This is
calculated by the
getYield() function, which returns an
array (time x species) that we can convert to a data frame
Let’s look at the plot of the yield from the first simulation:
To make the plot less cluttered, we keep only the 4 most important species
yield1 <- filter(yield1, sp %in% c("Saithe", "Cod", "Haddock", "N.pout")) yield2 <- filter(yield2, sp %in% c("Saithe", "Cod", "Haddock", "N.pout"))
and plot again
For simulation 2 the plot looks like this:
Comparison will be easier if we combine these two plots. For that we add an extra variable to the data frames that allow us to distinguish them and then we merge them together.
In ggplot we can now use
facet_grid() to put the plot
for each value of
ggplot(yield) + geom_line(aes(x = time, y = value, colour = sp)) + facet_grid(cols = vars(effort))
Or we can use the
linetype aesthetic to represent the
effort values by different line types:
plotly is less good at faceting, but it can put arbitrary plots
side-by-side (or arrange them into a grid) with
subplot(p1, p2, shareX = TRUE, shareY = TRUE)
Note how the
shareY = TRUE argument to
suplot() makes sure the two plots use the same scale on the
y axis, and similarly for
shareX = TRUE.
Also in plotly we can now tie the
effort variable to the
Arguably, ggplot2 does a nicer job in this case.