help

Q1 How to cite GRNdb?

1. Fang et al. GRNdb: decoding the gene regulatory networks in diverse human and mouse conditions, Nucleic Acids Research, 2020 (DOI: 10.1093/nar/gkaa995)
2. Li et al. Single-cell transcriptomic analysis reveals dynamic alternative splicing and gene regulatory networks among pancreatic islets, Science China Life Sciences, 2020 (DOI: 10.1007/s11427-020-1711-x)

Q2 How many conditions are available for human and mouse in GRNdb currently?

At present, GRNdb contains 72 different single-cell conditions (332,920 cells) and 71 bulk conditions (involving 27,748 samples for 33 cancers of TCGA and 27 normal tissues of GTEx) for humans, as well as 41 single-cell conditions (300,150 cells) of different mouse tissues. To ensure the accuracy of gene regulatory network inference, we removed the TCGA (The Cancer Genome Atlas) and GTEx (Genotype-Tissue Expression) datasets that have less than 30 samples. More detailed information can be found on the Browse and Statistics webpages. We will collect and add more datasets in the next update of GRNdb.

Q3 Which tool/pipeline was employed to infer gene regulatory networks in GRNdb?

We employed the SCENIC pipeline to infer the gene regulatory networks (GRNs) based on RNA-seq data as well as the known TF-target relationships and corresponding motifs from RcisTarget database (Aibar et al., Nature Methods, 2017). First, SCENIC utilizes GENIE3 to detect the gene sets coexpressed with transcription factors (TFs). Then, RcisTarget was employed to infer the putative direct-binding targets of TFs based on the motif-TF annotation databases. Finally, the GRNs were identified according to the online pipeline of SCENIC step-by-step. Only the best TF binding motifs predicted by SCENIC for TF-target pairs were shown in GRNdb.

Q4 How did the motifs for TF-target pairs come from?

The motifs for TF-target pairs in a certain condition were identified with the SCENIC pipeline (Aibar et al., 2017, Nature Methods), and only the best motif for each TF-target pair was used in GRNdb. SCENIC employs RcisTarget to identify the transcription factor binding motifs (TFBS) that are over-represented in a given gene set. In this step, SCENIC utilizes a database that includes the scores (rankings) of each motif around the transcription start site (TSS) of the genes in the organism. The motif score for each gene was calculated based on the search space around the TSS. For this analysis, SCENIC uses two databases: i) the database scoring the motifs in the 500bp upstream region of the TSS, and ii) the database scoring 10kb space around the TSS. By default, the motifs with Normalized Enrichment Score (NES) > 3.0 are defined as significantly enriched in the corresponding TF module.

Q5 What is the meaning of NES for each TF-target pair?

In gene regulatory network analysis, the SCENIC pipeline normalized the AUC values into a Normalized Enrichment Score (NES). A high NES score denotes a motif that recovers a large proportion of the input genes within the top of its ranking. To identify the significantly enriched motif, the threshold of 3.0 was used in the SCENIC pipeline, which corresponds to a False Discovery Rate (FDR) between 3% and 9%. Then, the significant motifs are linked back to transcription factors (TFs) using the annotation databases of RcisTarget in the SCENIC pipeline.

Q6 What is the meaning of high- or low-confidence for TF-target pairs?

In gene regulatory network inference, the SCENIC pipeline used two different types of motif annotations provided by the cisTarget databases. One type is annotated in the original database of cisTarget or inferred by orthology, which is denoted as high-confidence. Another type is inferred by motif similarity (indicated as low-confidence). If the TF motif is from the former type, it will be annotated as high-confidence. Otherwise, it will be denotated as low-confidence.

Q7 What is the meaning of GENIE3 Weight for each TF-target pair?

<>SCENIC employs the GENIE3 model to infer gene regulatory networks. The GENIE3 Weights denote the weights of the links directed from TFs to target genes. The higher weights correspond to more likely regulatory links. But the weights of the links do not have any statistical meaning, which only provides a way to rank the regulatory links. Notably, caution must be taken when choosing one cutoff since there is no standard threshold value. More details can be found on the GENIE3 website.

Q8 What is the meaning of the suffix '_extended' in the regulon name?

The main regulons inferred by the SCENIC pipeline only used the “high confidence” annotations of the cisTarget database, which are “direct annotation” and “inferred by orthology” by default. The suffix '_extended' in the regulon name denotes lower confidence annotations inferred by motif similarity are also utilized.

Q9 Where are the single-cell or bulk datasets used in GRNdb from?

We collected the single-cell RNA-seq datasets from public databases of Gene Expression Omnibus (GEO) and ArrayExpress. Moreover, the bulk datasets of diverse cancers and normal tissues were downloaded from UCSC Xena (for TCGA) and GTEx, respectively. The accession ID of the original data for each condition can be found on the Statistics webpage.

Q10 How can I download all the TF-target pairs for a specific condition?

All the TF-target pairs for various conditions of human and mouse can be freely obtained on the Download webpage.

Q11 Which software was used to define the clusters for single-cell datasets?

We employed Seurat (Stuart et al. 2019, Cell) with the standard pipeline to define the clusters for those single-cell datasets without detailed cell-type/cluster information in the published papers of the original studies. If the cell type information can be obtained from the original paper, we used the annotation directly.

Q12 How are the markers of clusters identified for a specific single-cell dataset?

The markers for each cluster of different single-cell conditions were identified with Seurat (Stuart et al. 2019, Cell) using the standard pipeline. Users can freely download the markers for all clusters of a specific condition provided in GRNdb for further analysis.

Q13 How can I know the specific function of a gene searched in GRNdb?

In GRNdb, the human genes have been linked to GeneCards, while the mouse genes have been linked to the MGI database. Once the users click the gene name, it will automatically search the gene in the relevant database and show its detailed information and function.

Q14 How many genes can be searched on the Expression webpage?

There is no limit for the number of input genes on the Expression webpage. In consideration of the performance, we recommend that the count of genes is no more than 30.

Q15 Which tool was used to conduct the survial analysis?

We used the Python package of lifelines to do the survival analysis. The median expression of the input gene was used to stratify the patients of a specific cancer into two different groups. Moreover, we did not set any limit for the number of input genes on the Survival webpage. To ensure the performance, it is better to keep the count of genes less than 30.

Q16 Will new data be added into the GRNdb?

Yes, we will continue to collect more datasets and add the inferred gene regulatory networks into GRNdb in the future.

Q17 How can I browse all TF-target pairs for a certain tissue/condition?

You can go to the Browse webpage first, and then find your interested tissue/condition of a specific organism. Once you click the selected tissue/condition, all the inferred TF-target pairs will be shown.

Q18 How can I download the figures generated in GRNdb?

You just need to click the download sign of your interested figure in GRNdb, then the figure will be downloaded automatically.

Q19 How often does GRNdb update?

We are planning to continuously update GRNdb in the future. Once we finish the processing and analysis of newly collected datasets, we will add the related gene regulatory network information into GRNdb. The updated information will be shown on the Home webpage.

Starting a new search ...

Citations:
1. Fang et al. GRNdb: decoding the gene regulatory networks in diverse human and mouse conditions, Nucleic Acids Research, 2020 (DOI: 10.1093/nar/gkaa995)
2. Li et al. Single-cell transcriptomic analysis reveals dynamic alternative splicing and gene regulatory networks among pancreatic islets, Science China Life Sciences, 2020 (DOI: 10.1007/s11427-020-1711-x)