Autres extensions graphiques

GGally
ggpubr
ggdendro
circlize
Diagrammes de Sankey
DiagrammeR
epicontacts
highcharter

Pour trouver l’inspiration et des exemples de code, rien ne vaut l’excellent site https://www.r-graph-gallery.com/.

GGally

L’extension GGally, déjà abordée dans d’autres chapitres, fournit plusieurs fonctions graphiques d’exploration des résultats d’un modèle ou des relations entre variables.

reg <- lm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width, data = iris)
library(GGally)
ggcoef_model(reg)

data(tips, package = "reshape")
ggpairs(tips)

Plus d’information : https://ggobi.github.io/ggally/

ggpubr

L’extension ggpubr fournit plusieurs fonctions pour produire clés en main différents graphiques bivariés avec une mise en forme allégée.

library(ggpubr)
data("ToothGrowth")
df <- ToothGrowth
ggboxplot(df,
  x = "dose", y = "len",
  color = "dose", palette = c("#00AFBB", "#E7B800", "#FC4E07"),
  add = "jitter", shape = "dose"
)

data("mtcars")
dfm <- mtcars
# Convert the cyl variable to a factor
dfm$cyl <- as.factor(dfm$cyl)
# Add the name colums
dfm$name <- rownames(dfm)
# Calculate the z-score of the mpg data
dfm$mpg_z <- (dfm$mpg - mean(dfm$mpg)) / sd(dfm$mpg)
dfm$mpg_grp <- factor(ifelse(dfm$mpg_z < 0, "low", "high"),
  levels = c("low", "high")
)

ggbarplot(dfm,
  x = "name", y = "mpg_z",
  fill = "mpg_grp", # change fill color by mpg_level
  color = "white", # Set bar border colors to white
  palette = "jco", # jco journal color palett. see ?ggpar
  sort.val = "asc", # Sort the value in ascending order
  sort.by.groups = FALSE, # Don't sort inside each group
  x.text.angle = 90, # Rotate vertically x axis texts
  ylab = "MPG z-score",
  xlab = FALSE,
  legend.title = "MPG Group"
)

ggdotchart(dfm,
  x = "name", y = "mpg_z",
  color = "cyl", # Color by groups
  palette = c("#00AFBB", "#E7B800", "#FC4E07"), # Custom color palette
  sorting = "descending", # Sort value in descending order
  add = "segments", # Add segments from y = 0 to dots
  add.params = list(color = "lightgray", size = 2), # Change segment color and size
  group = "cyl", # Order by groups
  dot.size = 6, # Large dot size
  label = round(dfm$mpg_z, 1), # Add mpg values as dot labels
  font.label = list(
    color = "white", size = 9,
    vjust = 0.5
  ), # Adjust label parameters
  ggtheme = theme_pubr() # ggplot2 theme
) +
  geom_hline(yintercept = 0, linetype = 2, color = "lightgray")

Plus d’informations : https://rpkgs.datanovia.com/ggpubr/

ggdendro

L’extension ggendro avec sa fonction ggdendrogram permet de représenter facilement des dendrogrammes avec ggplot2.

library(ggplot2)
library(ggdendro)
hc <- hclust(dist(USArrests), "ave")
hcdata <- dendro_data(hc, type = "rectangle")
ggplot() +
  geom_segment(data = segment(hcdata), aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_text(data = label(hcdata), aes(x = x, y = y, label = label, hjust = 0), size = 3) +
  coord_flip() +
  scale_y_reverse(expand = c(0.2, 0))

### demonstrate plotting directly from object class hclust
ggdendrogram(hc)

ggdendrogram(hc, rotate = TRUE)

Plus d’informations : https://cran.r-project.org/web/packages/ggdendro/vignettes/ggdendro.html

circlize

L’extension circlize est l’extension de référence quand il s’agit de représentations circulaires. Un ouvrage entier lui est dédié : https://jokergoo.github.io/circlize_book/book/.

Voici un exemple issu de https://www.data-to-viz.com/story/AdjacencyMatrix.html.

library(tidyverse)

── Attaching packages ─────────────────── tidyverse 1.3.2 ──
✔ tibble  3.1.8     ✔ purrr   1.0.1
✔ readr   2.1.3     ✔ forcats 1.0.0
── Conflicts ────────────────────── tidyverse_conflicts() ──
✖ lubridate::as.difftime() masks base::as.difftime()
✖ dplyr::between()         masks data.table::between()
✖ lubridate::date()        masks base::date()
✖ dplyr::filter()          masks stats::filter()
✖ dplyr::first()           masks data.table::first()
✖ lubridate::hour()        masks data.table::hour()
✖ lubridate::intersect()   masks base::intersect()
✖ lubridate::isoweek()     masks data.table::isoweek()
✖ dplyr::lag()             masks stats::lag()
✖ dplyr::last()            masks data.table::last()
✖ lubridate::mday()        masks data.table::mday()
✖ lubridate::minute()      masks data.table::minute()
✖ lubridate::month()       masks data.table::month()
✖ lubridate::quarter()     masks data.table::quarter()
✖ lubridate::second()      masks data.table::second()
✖ lubridate::setdiff()     masks base::setdiff()
✖ purrr::transpose()       masks data.table::transpose()
✖ lubridate::union()       masks base::union()
✖ lubridate::wday()        masks data.table::wday()
✖ lubridate::week()        masks data.table::week()
✖ lubridate::yday()        masks data.table::yday()
✖ lubridate::year()        masks data.table::year()

# Load data
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/13_AdjacencyDirectedWeighted.csv", header = TRUE)
# short names
colnames(data) <- c("Africa", "East Asia", "Europe", "Latin Ame.", "North Ame.", "Oceania", "South Asia", "South East Asia", "Soviet Union", "West.Asia")
rownames(data) <- colnames(data)

# I need a long format
data_long <- data %>%
  rownames_to_column() %>%
  gather(key = "key", value = "value", -rowname)


library(circlize)

========================================
circlize version 0.4.15
CRAN page: https://cran.r-project.org/package=circlize
Github page: https://github.com/jokergoo/circlize
Documentation: https://jokergoo.github.io/circlize_book/book/

If you use it in published research, please cite:
Gu, Z. circlize implements and enhances circular visualization
  in R. Bioinformatics 2014.

This message can be suppressed by:
  suppressPackageStartupMessages(library(circlize))
========================================

# parameters
circos.clear()
circos.par(start.degree = 90, gap.degree = 4, track.margin = c(-0.1, 0.1), points.overflow.warning = FALSE)
par(mar = rep(0, 4))

# color palette
library(viridis)

Le chargement a nécessité le package : viridisLite

mycolor <- viridis(10, alpha = 1, begin = 0, end = 1, option = "D")
mycolor <- mycolor[sample(1:10)]

# Base plot
chordDiagram(
  x = data_long,
  grid.col = mycolor,
  transparency = 0.25,
  directional = 1,
  direction.type = c("arrows", "diffHeight"),
  diffHeight = -0.04,
  annotationTrack = "grid",
  annotationTrackHeight = c(0.05, 0.1),
  link.arr.type = "big.arrow",
  link.sort = TRUE,
  link.largest.ontop = TRUE
)

# Add text and axis
circos.trackPlotRegion(
  track.index = 1,
  bg.border = NA,
  panel.fun = function(x, y) {
    xlim <- get.cell.meta.data("xlim")
    sector.index <- get.cell.meta.data("sector.index")

    # Add names to the sector.
    circos.text(
      x = mean(xlim),
      y = 3.2,
      labels = sector.index,
      facing = "bending",
      cex = 0.8
    )

    # Add graduation on axis
    circos.axis(
      h = "top",
      major.at = seq(from = 0, to = xlim[2], by = ifelse(test = xlim[2] > 10, yes = 2, no = 1)),
      minor.ticks = 1,
      major.tick.length = 0.5,
      labels.niceFacing = FALSE
    )
  }
)

Diagrammes de Sankey

Les diagrammes de Sankey sont un type alternatif de représentation de flux. Voici un premier exemple, qui reprend les données utilisées pour le diagramme circulaire précédent, avec la fonction sankeyNetwork de l’extension sankeyNetwork.

# Package
library(networkD3)

# I need a long format
data_long <- data %>%
  rownames_to_column() %>%
  gather(key = "key", value = "value", -rowname) %>%
  filter(value > 0)
colnames(data_long) <- c("source", "target", "value")
data_long$target <- paste(data_long$target, " ", sep = "")

# From these flows we need to create a node data frame: it lists every entities involved in the flow
nodes <- data.frame(name = c(as.character(data_long$source), as.character(data_long$target)) %>% unique())

# With networkD3, connection must be provided using id, not using real name like in the links dataframe.. So we need to reformat it.
data_long$IDsource <- match(data_long$source, nodes$name) - 1
data_long$IDtarget <- match(data_long$target, nodes$name) - 1

# prepare colour scale
ColourScal <- 'd3.scaleOrdinal() .range(["#FDE725FF","#B4DE2CFF","#6DCD59FF","#35B779FF","#1F9E89FF","#26828EFF","#31688EFF","#3E4A89FF","#482878FF","#440154FF"])'

# Make the Network
sankeyNetwork(
  Links = data_long, Nodes = nodes,
  Source = "IDsource", Target = "IDtarget",
  Value = "value", NodeID = "name",
  sinksRight = FALSE, colourScale = ColourScal, nodeWidth = 40, fontSize = 13, nodePadding = 20
)

Une alternative possible est fournie par l’extension ggalluvial et ses géométries geom_alluvium et geom_stratum.

library(ggalluvial)
ggplot(data = as.data.frame(Titanic)) +
  aes(axis1 = Class, axis2 = Sex, axis3 = Age, y = Freq) +
  scale_x_discrete(limits = c("Class", "Sex", "Age"), expand = c(.1, .05)) +
  xlab("Demographic") +
  geom_alluvium(aes(fill = Survived)) +
  geom_stratum() +
  geom_text(stat = "stratum", infer.label = TRUE) +
  theme_minimal()

Warning: The parameter `infer.label` is deprecated.
Use `aes(label = after_stat(stratum))`.

Mentionnons également l’extension riverplot pour la création de diagrammes de Sankey.

DiagrammeR

DiagrammeR est dédiée à la réalisation de diagrammes en ayant recours à la syntaxe Graphviz (via la fonction grViz) ou encore à la syntaxe Mermaid (via la fonction mermaid).

library(DiagrammeR)
grViz("
digraph boxes_and_circles {

  # a 'graph' statement
  graph [overlap = true, fontsize = 10]

  # several 'node' statements
  node [shape = box,
        fontname = Helvetica]
  A; B; C; D; E; F

  node [shape = circle,
        fixedsize = true,
        width = 0.9] // sets as circles
  1; 2; 3; 4; 5; 6; 7; 8

  # several 'edge' statements
  A->1 B->2 B->3 B->4 C->A
  1->D E->A 2->4 1->5 1->F
  E->6 4->6 5->7 6->7 3->8
}
")

mermaid("
graph LR
A(Rounded)-->B[Rectangular]
B-->C{A Rhombus}
C-->D[Rectangle One]
C-->E[Rectangle Two]
")

mermaid("
sequenceDiagram
  customer->>ticket seller: ask ticket
  ticket seller->>database: seats
  alt tickets available
    database->>ticket seller: ok
    ticket seller->>customer: confirm
    customer->>ticket seller: ok
    ticket seller->>database: book a seat
    ticket seller->>printer: print ticket
  else sold out
    database->>ticket seller: none left
    ticket seller->>customer: sorry
  end
")

mermaid("
gantt
       dateFormat  YYYY-MM-DD
       title Adding GANTT diagram functionality to mermaid

       section A section
       Completed task            :done,    des1, 2014-01-06,2014-01-08
       Active task               :active,  des2, 2014-01-09, 3d
       Future task               :         des3, after des2, 5d
       Future task2              :         des4, after des3, 5d

       section Critical tasks
       Completed task in the critical line :crit, done, 2014-01-06,24h
       Implement parser and jison          :crit, done, after des1, 2d
       Create tests for parser             :crit, active, 3d
       Future task in critical line        :crit, 5d
       Create tests for renderer           :2d
       Add to mermaid                      :1d

       section Documentation
       Describe gantt syntax               :active, a1, after des1, 3d
       Add gantt diagram to demo page      :after a1  , 20h
       Add another diagram to demo page    :doc1, after a1  , 48h

       section Last section
       Describe gantt syntax               :after doc1, 3d
       Add gantt diagram to demo page      :20h
       Add another diagram to demo page    :48h
")

Plus d’informations : https://rich-iannone.github.io/DiagrammeR/

epicontacts

L’extension epicontacts permets de représenter des chaînes de transmission épidémiques.

Pour aller plus loin, on pourra se référer (en anglais), au chapitre dédié du Epidemiologist R Handbook :

highcharter

L’extension highcharter permet de réaliser des graphiques HTML utilisant la librairie Javascript Highcharts.js.

library("highcharter")
data(diamonds, mpg, package = "ggplot2")

hchart(mpg, "scatter", hcaes(x = displ, y = hwy, group = class))

library(tidyverse)
library(highcharter)
mpgman3 <- mpg %>%
  group_by(manufacturer) %>%
  dplyr::summarise(n = n(), unique = length(unique(model))) %>%
  arrange(-n, -unique)

hchart(mpgman3, "treemap", hcaes(x = manufacturer, value = n, color = unique))

data(unemployment)

hcmap("countries/us/us-all-all",
  data = unemployment,
  name = "Unemployment", value = "value", joinBy = c("hc-key", "code"),
  borderColor = "transparent"
) %>%
  hc_colorAxis(dataClasses = color_classes(c(seq(0, 10, by = 2), 50))) %>%
  hc_legend(
    layout = "vertical", align = "right",
    floating = TRUE, valueDecimals = 0, valueSuffix = "%"
  )

Plus d’informations : http://jkunst.com/highcharter/.