Contents

Introduction

This tutorial is a guide on how to run and use the COSMONET R package. It is available from http://bioinfo.na.iac.cnr.it/cosmonet/.

COSMONET depends on different R packages such as APML0, survival, ggpurb, igraph, pheamap.

Installation

COSMONET can be installed through the following commands in the R console.

# ## From the command line
# ### 1. Open a terminal window
# ### 2. Go to the directory that contains your package directory.
# ### 3. Install the package
# R CMD INSTALL COSMONET_0.1.tar.gz
#
# ## From local source file
# ### Go to the link http://bioinfo.na.iac.cnr.it/cosmonet/ and download the package from the Repository - R package
# install.packages("COSMONET_0.1.0.tar.gz", repos = NULL, type="source")

Using COSMONET to investigate cancer data

The aim of this section is to provide the users with a general view of the package, including the main functions, and to describe the package’s outputs. First, we upload the COSMONET R package.

library(COSMONET)

For illustration purposes, we use two examples of cancer data that are included in our package. The data processed by COSMONET are microarray from GEO database and RNA-seq data from GDC data portal. The aim of the study is to identify the genes and pathways that play a crucial role in the severity of the disease related to the survival of cancer patients.

Users can download the normalized data and survival information directly from the Repository of the website (Repository/Data/Download Data).

Example 1

For microarray data, we consider expression measurements of two different breast cancer datasets from GEO database: * GSE2034 dataset as training set (Affymetrix Human Genome U133A Array); * GSE2990 dataset as testing set (Affymetrix Human Genome U133A Array).

RMA (Robust Multichip Average) function from affy R package for within arrays normalization and normalizeBetweenArrays function from limma R package for between arrays normalization are used for the normalization of each single dataset. The annotation is made by using biomart R package after removing redundant probe sets, i.e. probe sets that measure different regions of the same gene. Generally, the pre-processing of the data are done by the users. The summary data is showed in the table below:

Summary of the breast data used for our analysis
GEO Accession Platform Nr. of genes Nr. of samples Survival data
GSE2034 Affymetrix HG-U133A 13229 286 Relapse Free Survival (RFS)
GSE2990 Affymetrix HG-U133A 13229 187 Relapse Free Survival (RFS)

In this study, the survival information is relapse-free survival, i.e., the length of time after primary treatment for a cancer ends that the patient survives without any signs or symptoms of that cancer. In other words, measure the relapse-free survival means to see how well a new treatment works. Also called DFS disease-free survival, and RFS.

To investigate the datasets, we first load the data in an R session and then, we perform the normalization between the training and testing set by using NormalizeBetweenData() function.

# Load the normalized data
load("data/BreastData_Normalized.RData")

# Load survival information: RFS - time (months) and status (alive=0 or death=1)
load("data/BreastData_Survival.RData")

Functional Linkage Matrix and Co-Expression Matrix

To incorporate the a-priori biological information in the penalized regression model, we need to create the network matrix. In our analysis, to construct the molecular graph, we use CreateNetworkMatrix() function. The disease code associated with breast cancer is (DOID:1612.

# Load the tissue specific data
library(tictoc)

tic("Run time")
load("data/BreastData_TissueSpecificEdge.rda")
NetworkMatrix <- CreateNetworkMatrix(
  x1.norm,
  repositoryDisease,
  diseaseID = "DOID:1612",
  topGenes = 500,
  tissueSpecificEdge,
  output = TRUE,
  message = TRUE
)
## [1] "Your list contains 500 genes. 500 out of 500 are used in the network."
## [1] "There are 43865 edges left."
toc()
## Run time: 297.62 sec elapsed
# Network matrix
Omega <- as.matrix(NetworkMatrix$Omega)
dim(Omega)
## [1] 13229 13229

Variable screenings for Cox model

In this section we present the application of the three screening techniques. We use the function ScreeningMethod() to identify the screened genes.

  1. BioMedical Driven (BMD)-screening
# Selection of the BMD-screened genes
cancer.genes <- NetworkMatrix$DiseaseGenes[,2]
screenVars.BMD <- ScreeningMethod(
  x1.norm,
  cancer.genes,
  screening="bmd",
  ranking=NULL,
  thresh=NULL
)
## [1] "Number of BMD-screened genes: 437"
  1. DAta-driven (DAD-screening)
# Ranking of genes that are highly correlated with the patient's survival
load("data/BreastData_RankingGenes.RData")
head(rankingGenes)
##   original.index  symbol marg.coeff       beta        pvals
## 1           2449  CHRNA9  10.437125 -10.437125 2.030443e-02
## 2           3012   CXCL6   9.836375  -9.836375 2.122943e-02
## 3           9444 RACGAP1   8.915556   8.915556 1.484894e-07
## 4           7834   NR2E3   8.896399  -8.896399 8.931928e-05
## 5           2494   CKAP2   8.294733   8.294733 5.502541e-07
## 6           7414    MZB1   8.282109  -8.282109 2.372354e-04

The ranking of genes is obtained applying MarginalCoxRanking() function. Users can download the file of the ranked genes directly from the website (Repository/Data/Download Data).

# Plots
par(mfrow = c(1, 2))
plot(rankingGenes$marg.coeff, rankingGenes$pvals, col = "blue", pch = 20, cex.lab=1.5, cex.axis=1.5, cex.main=1.5, cex.sub=1.5)
# Add the threshold line related to the 50th marginal coefficient (equal to 6.851106)
abline(v=rankingGenes[floor(dim(x1.norm)[1]/log(dim(x1.norm)[1])),3], col="red",lwd=2, lty=2)
plot(rankingGenes$pvals, 1:dim(x1.norm)[2], col = "blue", pch = 20, cex.lab=1.5, cex.axis=1.5, cex.main=1.5, cex.sub=1.5)
# Add the threshold line equal to 50 (i.e., the first 50 genes)
abline(h=floor(dim(x1.norm)[1]/log(dim(x1.norm)[1])), col="red",lwd=2, lty=2)
Plots based on the $p$-values and marginal coefficients of the ranked cancer genes.

Figure 1: Plots based on the \(p\)-values and marginal coefficients of the ranked cancer genes

These plots give an idea of the number of screened genes that can be involved in the analysis to give significance to the results (in the first plot the significant genes are positioned to the right of the dashed red line; in the second one the genes are are below the dashed red line). In particular, we set the threshold equal to \(\lfloor n/ log n \rfloor=50\), i.e., the 50 first genes as DAD-screened genes.

# Selection of the DAD-screened genes
screenVars.DAD <- ScreeningMethod(
  x1.norm,
  cancer.genes,
  screening="dad",
  ranking=rankingGenes,
  thresh=floor(dim(x1.norm)[1]/log(dim(x1.norm)[1]))
)
## [1] "Number of DAD-screened genes: 50"
  1. BioMedical Driven + DAta Driven Screening (BMD+DAD-screening)
# Union of BMD- and DAD-screeened genes
screenVars.BMD.DAD <- ScreeningMethod(
  x1.norm,
  cancer.genes,
  screening="bmd+dad",
  ranking=rankingGenes,
  thresh=floor(dim(x1.norm)[1]/log(dim(x1.norm)[1]))
)
## [1] "Number of BMD+DAD-screened genes: 549"

We consider as resulting set the union of the BMD and DAD-screened genes.

Network-Regularized Cox Regression

Now we are able to apply the penalized methods using the subset of screened variables in order to identify the active regression coefficients and the optimal cutoff (based on the prognostic index) on the training set. We can get reproducible results by fixing \(alpha\) (network influence) and the elements of each folds in the 5-fold cross validation. We perform the complete procedure applying CosmonetTraining() and SelectOptimalCutoff() functions.

# Set alpha and lambda parameters
alpha <- 0.5
# lambda <- 10^(seq(0,-2,-0.05))
# For reproducibility of the results
library(caret)
set.seed(2021)
flds <- createFolds(y1$time, k = 5, list = TRUE, returnTrain = FALSE)
foldids <- rep(1,dim(y1)[1])
foldids[flds$Fold2] = 2
foldids[flds$Fold3] = 3
foldids[flds$Fold4] = 4
foldids[flds$Fold5] = 5
# BMD-screening + variable selection
# pdf(file="results/Train_KM_BMD_median.pdf",width=15, height=7)
tic("Run time")
fitTraining.BMD <- CosmonetTraining(
  10,
  x1.norm,
  y1,
  screenVars.BMD,
  family="Cox",
  penalty="Net",
  Omega=Omega,
  alpha=alpha,
  nlambda = 50,
  foldid = foldids,
  selOptLambda="min",
  optCutpoint="median"
)
Training Phase (BMD-screening): cross-validation plot , Kaplan-Meier curves and distribution plot.

Figure 2: Training Phase (BMD-screening): cross-validation plot , Kaplan-Meier curves and distribution plot

Training Phase (BMD-screening): cross-validation plot , Kaplan-Meier curves and distribution plot.

Figure 3: Training Phase (BMD-screening): cross-validation plot , Kaplan-Meier curves and distribution plot

toc()
## Run time: 11.303 sec elapsed
# dev.off()
# Regression coefficients
beta.BMD <- fitTraining.BMD$beta
# Number of non-zero beta
sum(beta.BMD!=0)
## [1] 40
# write.table(beta.BMD, file = "results/beta_BMD_breast.txt", row.names = T, col.names = F)

# Optimal parameters
pars.BMD <- data.frame(alpha=0.5,opt.lambda=fitTraining.BMD$opt.lambda,opt.cutoff=fitTraining.BMD$opt.cutoff)
pars.BMD
##   alpha opt.lambda opt.cutoff
## 1   0.5    0.11899    -7.3697
# Summary tables
df.train.BMD <- fitTraining.BMD$df
head(df.train.BMD)
##     sample        PI time status 50%
## 1 GSM37048 -8.358512   87      0   2
## 2 GSM36945 -8.352637   92      0   2
## 3 GSM36984 -8.342822  138      0   2
## 4 GSM36852 -8.267668  107      0   2
## 5 GSM36883 -8.254200   88      0   2
## 6 GSM36845 -8.223623   54      0   2
summary.BMD <- fitTraining.BMD$summary
summary.BMD
##     Low.Risk High.Risk  cutoff  p.value
## 50%      143       143 -7.3697 2.82e-14
# write.table(summary.BMD, file = "results/summary_BMD_breast.txt", row.names = T, col.names = T)
# DAD-screening + variable selection
# pdf(file="results/Train_KM_DAD_survCutpoint.pdf",width=15, height=7)
tic("Run time")
fitTraining.DAD <- CosmonetTraining(
  10,
  x1.norm,
  y1,
  screenVars.DAD,
  family="Cox",
  penalty="Net",
  Omega=Omega,
  alpha=alpha,
  nlambda = 50,
  foldid = foldids,
  selOptLambda="min",
  optCutpoint="survCutpoint"
)
Training Phase (DAD-screening): cross-validation plot , Kaplan-Meier curves and distribution plot.

Figure 4: Training Phase (DAD-screening): cross-validation plot , Kaplan-Meier curves and distribution plot

Training Phase (DAD-screening): cross-validation plot , Kaplan-Meier curves and distribution plot.

Figure 5: Training Phase (DAD-screening): cross-validation plot , Kaplan-Meier curves and distribution plot

toc()
## Run time: 1.624 sec elapsed
# dev.off()
# Regression coefficients
beta.DAD <- fitTraining.DAD$beta
# Number of non-zero beta
sum(beta.DAD!=0)
## [1] 32
# write.table(beta.DAD, file = "results/beta_DAD_breast.txt", row.names = T, col.names = F)

# Optimal parameters
pars.DAD <- data.frame(alpha=0.5,opt.lambda=fitTraining.DAD$opt.lambda,opt.cutoff=fitTraining.DAD$opt.cutoff)
pars.DAD
##   alpha opt.lambda opt.cutoff
## 1   0.5 0.04740321  -10.15184
# Summary tables
df.train.DAD <- fitTraining.DAD$df
head(df.train.DAD)
##     sample        PI time status groupRisk id
## 1 GSM36933 -13.83397  110      0         2  1
## 2 GSM36817 -13.68776  113      0         2  2
## 3 GSM36851 -13.37413  100      0         2  3
## 4 GSM36915 -13.31251   97      0         2  4
## 5 GSM36863 -13.23986   98      0         2  5
## 6 GSM36794 -13.04705   87      0         2  6
summary.DAD <- fitTraining.DAD$summary
summary.DAD
##     cutpoint statistic
## PI -10.15184  10.04115
# write.table(summary.DAD, file = "results/summary_DAD_breast.txt", row.names = T, col.names = T)
# BMD+DAD-screening + variable selection
# pdf(file="results/Train_KM_BMD_DAD_median.pdf",width=15, height=7)
tic("Run time")
fitTraining.BMD.DAD <- CosmonetTraining(
  10,
  x1.norm,
  y1,
  screenVars.BMD.DAD,
  family="Cox",
  penalty="Net",
  Omega=Omega,
  alpha=alpha,
  nlambda = 50,
  foldid = foldids,
  selOptLambda="min",
  optCutpoint="median"
)
Training Phase (BMD+DAD-screening): cross-validation plot , Kaplan-Meier curves and distribution plot.

Figure 6: Training Phase (BMD+DAD-screening): cross-validation plot , Kaplan-Meier curves and distribution plot

Training Phase (BMD+DAD-screening): cross-validation plot , Kaplan-Meier curves and distribution plot.

Figure 7: Training Phase (BMD+DAD-screening): cross-validation plot , Kaplan-Meier curves and distribution plot

toc()
## Run time: 10.647 sec elapsed
# dev.off()
# Regression coefficients
beta.BMD.DAD <- fitTraining.BMD.DAD$beta
sum(beta.BMD.DAD!=0)
## [1] 67
# write.table(beta.BMD.DAD, file = "results/beta_BMD_DAD_breast.txt", row.names = T, col.names = F)

# Optimal parameters
pars.BMD.DAD <- data.frame(alpha=0.5,opt.lambda=fitTraining.BMD.DAD$opt.lambda,opt.cutoff=fitTraining.BMD.DAD$opt.cutoff)
pars.BMD.DAD
##   alpha opt.lambda opt.cutoff
## 1   0.5 0.08331165  -23.89154
# Summary tables
df.train.BMD.DAD <- fitTraining.BMD.DAD$df
head(df.train.BMD.DAD)
##     sample        PI time status 50%
## 1 GSM36868 -26.13637  123      0   2
## 2 GSM36863 -26.03310   98      0   2
## 3 GSM37059 -25.95335   83      0   2
## 4 GSM36984 -25.90174  138      0   2
## 5 GSM36780 -25.86187   84      0   2
## 6 GSM36821 -25.83046   88      0   2
summary.BMD.DAD <- fitTraining.BMD.DAD$summary
summary.BMD.DAD
##     Low.Risk High.Risk    cutoff p.value
## 50%      143       143 -23.89154       0
# write.table(summary.BMD.DAD, file = "results/summary_BMD_DAD_breast.txt", row.names = T, col.names = T)

Survival analysis

We perform the testing phase by using CosmonetTesting() and ValidationTest() functions to validate the survival model.

# Validation test for BMD-screening + variable selection
opt.cutoff.BMD <- fitTraining.BMD$opt.cutoff
# pdf(file="results/Test_KM_BMD_breast.pdf",width=15, height=7)
tic("Run time")
fitTesting.BMD <- CosmonetTesting(
  x2.norm,
  y2,
  screenVars.BMD,
  beta.BMD,
  opt.cutoff.BMD
)
Testing Phese (BMD-screening): (A) Kaplan-Maier curves and risk table and (B) distribution of the risk score PI.

Figure 8: Testing Phese (BMD-screening): (A) Kaplan-Maier curves and risk table and (B) distribution of the risk score PI

toc()
## Run time: 0.812 sec elapsed
# dev.off()

# Summary table
df.test.BMD <- fitTesting.BMD$df
head(df.test.BMD)
##       sample        PI       time status groupRisk
## 44  GSM65361 -8.539579  88.273973      0         2
## 182 GSM65875 -8.457235 107.243836      0         2
## 169 GSM65862 -8.424892 142.323288      0         2
## 41  GSM65358 -8.384166  66.969863      0         2
## 25  GSM65342 -8.350312   2.169863      0         2
## 185 GSM65878 -8.348892   8.810959      1         2
# Validation test for DAD-screening + variable selection
opt.cutoff.DAD <- fitTraining.DAD$opt.cutoff
# pdf(file="results/Test_KM_DAD_breast.pdf",width=15, height=7)
tic("Run time")
fitTesting.DAD <- CosmonetTesting(
  x2.norm,
  y2,
  screenVars.DAD,
  beta.DAD,
  opt.cutoff.DAD
)
Testing Phase (DAD-screening): (A) Kaplan-Maier curves and risk table and (B) distribution of the risk score PI.

Figure 9: Testing Phase (DAD-screening): (A) Kaplan-Maier curves and risk table and (B) distribution of the risk score PI

toc()
## Run time: 0.79 sec elapsed
# dev.off()

df.test.DAD <- fitTesting.DAD$df
head(df.test.DAD)
##       sample        PI      time status groupRisk
## 102 GSM65794 -12.51275 103.95616      0         2
## 107 GSM65799 -12.40898  87.94521      0         2
## 172 GSM65865 -12.40538 144.72329      0         2
## 150 GSM65843 -12.29438  84.23014      1         2
## 14  GSM65331 -11.85027 114.01644      0         2
## 178 GSM65871 -11.82383 104.84384      0         2
# Validation test for BMD+DAD-screening + variable selection
opt.cutoff.BMD.DAD <- fitTraining.BMD.DAD$opt.cutoff
# pdf(file="results/Test_KM_BMD_DAD_breast.pdf",width=15, height=7)
tic("Run time")
fitTesting.BMD.DAD <- CosmonetTesting(
  x2.norm,
  y2,
  screenVars.BMD.DAD,
  beta.BMD.DAD,
  opt.cutoff.BMD.DAD
)
Testing Phase (BMD+DAD-screening): (A) Kaplan-Maier curves and risk table and (B) distribution of the risk score PI.

Figure 10: Testing Phase (BMD+DAD-screening): (A) Kaplan-Maier curves and risk table and (B) distribution of the risk score PI

toc()
## Run time: 0.775 sec elapsed
# dev.off()

df.test.BMD.DAD <- fitTesting.BMD.DAD$df
head(df.test.BMD.DAD)
##       sample        PI     time status groupRisk
## 169 GSM65862 -26.11466 142.3233      0         2
## 172 GSM65865 -25.99728 144.7233      0         2
## 178 GSM65871 -25.71374 104.8438      0         2
## 171 GSM65864 -25.70033 120.5260      1         2
## 29  GSM65346 -25.55507  18.6411      0         2
## 161 GSM65854 -25.54970 147.1890      0         2

Venn diagram plotting

We use venn diagrams to show the relation between low-risk groups (or high-risk groups) of each screening method by considering the testing set. We use VennPlot() function.

# Low-risk groups
lowRisk.BMD <- as.character(df.test.BMD$sample[which(df.test.BMD$groupRisk==2)])
lowRisk.DAD <- as.character(df.test.DAD$sample[which(df.test.DAD$groupRisk==2)])
lowRisk.BMD.DAD <- as.character(df.test.BMD.DAD$sample[which(df.test.BMD.DAD$groupRisk==2)])

# High-risk groups
highRisk.BMD <- as.character(df.test.BMD$sample[which(df.test.BMD$groupRisk==1)])
highRisk.DAD <- as.character(df.test.DAD$sample[which(df.test.DAD$groupRisk==1)])
highRisk.BMD.DAD <- as.character(df.test.BMD.DAD$sample[which(df.test.BMD.DAD$groupRisk==1)])
# Display Venn Diagram
vennPlot1 <- VennPlot(lowRisk.BMD, lowRisk.DAD, lowRisk.BMD.DAD, "LR-Group")
Venn diagramm of low-risk groups on testing set

Figure 11: Venn diagramm of low-risk groups on testing set

## NULL
# Display Venn Diagram
vennPlot2 <- VennPlot(highRisk.BMD, highRisk.DAD, highRisk.BMD.DAD, "HR-Group")
Venn diagramm of high-risk groups on testing set

Figure 12: Venn diagramm of high-risk groups on testing set

## NULL

Heat-map visualization

We use HeatmapSurvGroup() function to visualize the heat-map of the gene expression matrix on the training and testing set, respectively, by taking into account the screened genes and the prognostic information (high- and low-risk group).

Heat-maps of the BMD-genes on training and testing set

Figure 13: Heat-maps of the BMD-genes on training and testing set

Heat-maps of the DAD-genes on training and testing set

Figure 14: Heat-maps of the DAD-genes on training and testing set

 Heat-maps of the BMD+DAD-genes on training and testing set

Figure 15: Heat-maps of the BMD+DAD-genes on training and testing set

Figure 13-15 show the Z-score expression matrix of the BMD-, DAD- and BMD+DAD-genes in the training and testing, respectively. The genes in the horizontal direction are clustered in the same order in both sets. Z-scores of the gene expression measurements are used. Z-scores larger than 3.5 were set to 3.5 and Z-scores smaller than -3.5 were set to -3.5. Dark pink color indicates a high level of expression in breast cancer and green color indicates a low level of expression. The patients are divided in high-risk and low-risk groups (i.e., patients with bad and good prognosis).

Correlation between PIs

We perform a correlation analysis on the training and testing set to further investigate the relation between prognostic indices PIs. We use corrPlot() function.

# Renames columns
colnames(df.train.BMD)[5] <- "groupRisk.BMD"
colnames(df.train.DAD)[5] <- "groupRisk.DAD"
colnames(df.train.BMD.DAD)[5] <- "groupRisk.BMD.DAD"

# Correlation plots of prognostic index PI
corrPlot1 <- CorrPlot(df1 = df.train.BMD,
                      df2 = df.train.DAD,
                      groupRisk1 = "groupRisk.BMD",
                      groupRisk2 = "groupRisk.DAD",
                      method="spearman",
                      name1 = "BMD",
                      name2 = "DAD",
                      set = "Training set"
                      )
corrPlot2 <- CorrPlot(df1 = df.train.BMD,
                      df2 = df.train.BMD.DAD,
                      groupRisk1 = "groupRisk.BMD",
                      groupRisk2 = "groupRisk.BMD.DAD",
                      method="spearman",
                      name1 = "BMD",
                      name2 = "BMD.DAD",
                      set = "Training set"
                      )

# Visualization
ggarrange(corrPlot1,corrPlot2, nrow = 1, ncol = 2, common.legend = T, font.label=list(color="black",size=14))
PI-correlation plots on training set

Figure 16: PI-correlation plots on training set

# ggsave(file="results/corr_train_breast.pdf",width=15, height=7)
# Rename columns
colnames(df.test.BMD)[5] <- "groupRisk.BMD"
colnames(df.test.DAD)[5] <- "groupRisk.DAD"
colnames(df.test.BMD.DAD)[5] <- "groupRisk.BMD.DAD"

# Correlation plots of prognostic index PI
corrPlot1 <- CorrPlot(df1 = df.test.BMD,
                      df2 = df.test.DAD,
                      groupRisk1 = "groupRisk.BMD",
                      groupRisk2 = "groupRisk.DAD",
                      method="spearman",
                      name1 = "BMD",
                      name2 = "DAD",
                      set = "Testing set"
                      )
corrPlot2 <- CorrPlot(df1 = df.test.BMD,
                      df2 = df.test.BMD.DAD,
                      groupRisk1 = "groupRisk.BMD",
                      groupRisk2 = "groupRisk.BMD.DAD",
                      method="spearman",
                      name1 = "BMD",
                      name2 = "BMD.DAD",
                      set = "Testing set"
                      )

# Visualization
ggarrange(corrPlot1,corrPlot2, nrow = 1, ncol = 2, common.legend = T, font.label=list(color="black",size=14))
PI-correlation plots on testing set

Figure 17: PI-correlation plots on testing set

# ggsave(file="results/corr_test_breast.pdf",width=15, height=7)

Pathway Analysis

COSMONET includes an RMarkdown script to generate an interactive dashboard of KEGG pathways networks. We report the network created running the proposed model using the BMD+DAD-screened genes. The dashboard contains two tabs, one for the not-isolated genes and one for the full network. Each node corresponds to a gene and the edges represent the KEGG pathways shared by the linked genes. Different node colors are used to show a correlation measure between the genes and the disease under investigation as follows:

  • Red nodes represent the top 500 tissue-specific genes according to the HumanBase tool (mapped up genes);
  • Blue nodes represent the bottom 501 tissue-specific genes according to the HumanBase tool (mapped down genes);
  • White nodes represent the genes that have not been yet identified as tissue-specific in the HumanBase database (not mapped genes).

Therefore, red nodes represent the BMD contribution to the signature and blue and white nodes the DAD contribution, not yet captured in the BMD list. Note that some of the red genes might also be retrieved from the data-driven screening. However, in this context, we want to underline and make sense of the novel information not yet considered.

We implemented the GenerateNetwork() function to visualize the interactive networks. The colours of the nodes and the ID of the disease to investigate can be changed by the user passing the new values as function’s parameters in RUNCosmonetDashboard.Rmd loaded in the same directory as the vignetta.

# Build a data frame composed by two columns: gene names and beta coefficients
ListOfGenes <- data.frame(Symbol=rownames(beta.BMD.DAD), coefficients=beta.BMD.DAD[,1])
rownames(ListOfGenes) <-1:dim(ListOfGenes)[1]
# write.table(ListOfGenes, "ListOfGenes.txt", sep = "\t", row.names = F, col.names = T)

tic("Run time")
networkBreast <- GenerateNetwork(
  ListOfGenes = ListOfGenes,
  header = TRUE,
  diseaseID = "DOID:1612",
  nodesCols = c('#EFCFD4','#D9E3FC','#FFFFFF')
)
toc()

The function will automatically open a new html page to visualise the dashboard. An example of the final dashboard is shown in the figure below (and it can be accessed at the following link).

knitr::include_graphics("COSMONET_NETWORK.pdf")