Title: | Analysis and Visualization of Multi-Omics Data |
---|---|
Description: | A tool for comprehensive transcriptomic data analysis, with a focus on transcript-level data preprocessing, expression profiling, differential expression analysis, and functional enrichment. It enables researchers to identify key biological processes, disease biomarkers, and gene regulatory mechanisms. 'TransProR' is aimed at researchers and bioinformaticians working with RNA-Seq data, providing an intuitive framework for in-depth analysis and visualization of transcriptomic datasets. The package includes comprehensive documentation and usage examples to guide users through the entire analysis pipeline. The differential expression analysis methods incorporated in the package include 'limma' (Ritchie et al., 2015, <doi:10.1093/nar/gkv007>; Smyth, 2005, <doi:10.1007/0-387-29362-0_23>), 'edgeR' (Robinson et al., 2010, <doi:10.1093/bioinformatics/btp616>), 'DESeq2' (Love et al., 2014, <doi:10.1186/s13059-014-0550-8>), and Wilcoxon tests (Li et al., 2022, <doi:10.1186/s13059-022-02648-4>), providing flexible and robust approaches to RNA-Seq data analysis. For more information, refer to the package vignettes and related publications. |
Authors: | Dongyue Yu [aut, cre, cph] |
Maintainer: | Dongyue Yu <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.6 |
Built: | 2024-12-14 03:10:10 UTC |
Source: | https://github.com/sssydysss/transpror |
This function adds a boxplot layer to an existing ‘ggtree' plot object using ggtreeExtra’s geom_fruit for boxplots. It is primarily used to display statistical summaries of the data related to gene expressions or other metrics.
add_boxplot( p, data, fill_color = "#f28131", alpha = 0.6, offset = 0.22, pwidth = 0.5 )
add_boxplot( p, data, fill_color = "#f28131", alpha = 0.6, offset = 0.22, pwidth = 0.5 )
p |
An existing ggtree plot object. |
data |
A data frame containing the data to be plotted. Expected to have columns for 'Sample' and 'value'. |
fill_color |
A character string specifying the fill color for the boxplots. Default is "#f28131". |
alpha |
Numeric value for the transparency of the boxplots. Default is 0.6. |
offset |
Numeric value, the position of the boxplot on the x-axis relative to its gene name. Default is 0.22. |
pwidth |
Numeric value, the width of the boxplot. Default is 0.5. |
A 'ggtree' plot object with the added boxplot layer.
# Check and load required packages if (requireNamespace("ggtreeExtra", quietly = TRUE) && requireNamespace("ggplot2", quietly = TRUE)) { library(ggtreeExtra) library(ggplot2) file_path <- system.file("extdata", "p_tree_test.rds", package = "TransProR") p <- readRDS(file_path) # Create boxplot data frame boxplot_data <- data.frame( Sample = rep(c("Species_A", "Species_B", "Species_C", "Species_D"), each = 30), value = c( rnorm(30, mean = 5, sd = 1), # Data for Species_A rnorm(30, mean = 7, sd = 1.5), # Data for Species_B rnorm(30, mean = 6, sd = 1.2), # Data for Species_C rnorm(30, mean = 8, sd = 1.3) # Data for Species_D ) ) # Call add_boxplot function to add boxplot layer p_with_boxplot <- add_boxplot(p, boxplot_data) } else { message("Required packages 'ggtreeExtra' and 'ggplot2' are not installed.") }
# Check and load required packages if (requireNamespace("ggtreeExtra", quietly = TRUE) && requireNamespace("ggplot2", quietly = TRUE)) { library(ggtreeExtra) library(ggplot2) file_path <- system.file("extdata", "p_tree_test.rds", package = "TransProR") p <- readRDS(file_path) # Create boxplot data frame boxplot_data <- data.frame( Sample = rep(c("Species_A", "Species_B", "Species_C", "Species_D"), each = 30), value = c( rnorm(30, mean = 5, sd = 1), # Data for Species_A rnorm(30, mean = 7, sd = 1.5), # Data for Species_B rnorm(30, mean = 6, sd = 1.2), # Data for Species_C rnorm(30, mean = 8, sd = 1.3) # Data for Species_D ) ) # Call add_boxplot function to add boxplot layer p_with_boxplot <- add_boxplot(p, boxplot_data) } else { message("Required packages 'ggtreeExtra' and 'ggplot2' are not installed.") }
This function adds a new tile layer to an existing 'ggtree' plot object, allowing for separate scales for fill and alpha transparency. This is useful when you want to add additional data layers without interfering with the existing scales in the plot. It utilizes the 'ggnewscale' package to reset scales for new layers.
add_new_tile_layer( p, data, gene_colors, gene_label, alpha_value = c(0.3, 0.9), offset = 0.02, pwidth = 2 )
add_new_tile_layer( p, data, gene_colors, gene_label, alpha_value = c(0.3, 0.9), offset = 0.02, pwidth = 2 )
p |
An existing ggtree plot object. |
data |
A data frame containing the data to be plotted. Expected to have columns for 'Sample', 'Gene', and 'value'. |
gene_colors |
A named vector of colors for genes. |
gene_label |
A character string used as a label in the legend for the genes. |
alpha_value |
A numeric or named vector for setting the alpha scale based on values. |
offset |
Numeric value, the position of the tile on the x-axis relative to its gene name. Default is 0.02. |
pwidth |
Numeric value, the width of the tile. Default is 2. |
A 'ggtree' plot object with the added tile layer and new scales.
# Check and load required packages if (requireNamespace("ggtreeExtra", quietly = TRUE) && requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("ggnewscale", quietly = TRUE)) { library(ggtreeExtra) library(ggplot2) library(ggnewscale) file_path <- system.file("extdata", "p_tree_test.rds", package = "TransProR") p <- readRDS(file_path) # Create new expression data new_expression_data <- data.frame( Sample = rep(c("Species_A", "Species_B", "Species_C", "Species_D"), each = 3), Gene = rep(c("Gene6", "Gene7", "Gene8"), times = 4), Value = runif(12, min = 0, max = 1) # Randomly generate expression values between 0 and 1 ) # Define new gene colors new_gene_colors <- c( Gene6 = "#0b5f63", Gene7 = "#074d41", Gene8 = "#1f5e27" ) # Define gene label and alpha values gene_label <- "New Genes" alpha_value <- c(0.3, 0.9) # Add new tile layer p_with_new_layer <- add_new_tile_layer( p, new_expression_data, new_gene_colors, gene_label, alpha_value, offset = 0.02, pwidth = 2 ) } else { message("Required packages 'ggtreeExtra', 'ggplot2', and 'ggnewscale' are not installed.") }
# Check and load required packages if (requireNamespace("ggtreeExtra", quietly = TRUE) && requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("ggnewscale", quietly = TRUE)) { library(ggtreeExtra) library(ggplot2) library(ggnewscale) file_path <- system.file("extdata", "p_tree_test.rds", package = "TransProR") p <- readRDS(file_path) # Create new expression data new_expression_data <- data.frame( Sample = rep(c("Species_A", "Species_B", "Species_C", "Species_D"), each = 3), Gene = rep(c("Gene6", "Gene7", "Gene8"), times = 4), Value = runif(12, min = 0, max = 1) # Randomly generate expression values between 0 and 1 ) # Define new gene colors new_gene_colors <- c( Gene6 = "#0b5f63", Gene7 = "#074d41", Gene8 = "#1f5e27" ) # Define gene label and alpha values gene_label <- "New Genes" alpha_value <- c(0.3, 0.9) # Add new tile layer p_with_new_layer <- add_new_tile_layer( p, new_expression_data, new_gene_colors, gene_label, alpha_value, offset = 0.02, pwidth = 2 ) } else { message("Required packages 'ggtreeExtra', 'ggplot2', and 'ggnewscale' are not installed.") }
This function dynamically adjusts the transparency scale for visualizations, especially useful when the range of data values varies significantly across different sources. It modifies the transparency scale based on the range of values present in the data, ensuring that the visualization accurately reflects variations within the data.
adjust_alpha_scale(data, name, range = c(0.2, 0.8))
adjust_alpha_scale(data, name, range = c(0.2, 0.8))
data |
A data frame containing the values for which the alpha scale is to be adjusted. |
name |
Character string that will be used as the title of the legend in the plot. |
range |
Numeric vector of length 2 specifying the range of alpha values, defaults to c(0.2, 0.8). |
A ggplot2 alpha scale adjustment layer.
# Assuming 'data' is a DataFrame with a 'value' column plot_data <- data.frame(value = c(10, 20, 30, 40, 50)) ggplot2::ggplot(plot_data, ggplot2::aes(x = 1:nrow(plot_data), y = value)) + ggplot2::geom_point(ggplot2::aes(alpha = value)) + adjust_alpha_scale(plot_data, "Transparency Scale")
# Assuming 'data' is a DataFrame with a 'value' column plot_data <- data.frame(value = c(10, 20, 30, 40, 50)) ggplot2::ggplot(plot_data, ggplot2::aes(x = 1:nrow(plot_data), y = value)) + ggplot2::geom_point(ggplot2::aes(alpha = value)) + adjust_alpha_scale(plot_data, "Transparency Scale")
This function adjusts the saturation and luminance of a given color. It works by converting the color from RGB to Luv color space, applying the scaling factors to the saturation and luminance, and then converting it back to RGB.
adjust_color_tone(color, saturation_scale, luminance_scale)
adjust_color_tone(color, saturation_scale, luminance_scale)
color |
A color in hexadecimal format (e.g., "#FF0000") or a valid R color name. |
saturation_scale |
Numeric, the scaling factor for saturation (values < 1 decrease saturation, values > 1 increase saturation). |
luminance_scale |
Numeric, the scaling factor for luminance (values < 1 darken the color, values > 1 lighten the color). |
Returns a color in hexadecimal format adjusted according to the provided scales.
adjusted_color <- adjust_color_tone("#FF0000", saturation_scale = 0.8, luminance_scale = 1.2) print(adjusted_color)
adjusted_color <- adjust_color_tone("#FF0000", saturation_scale = 0.8, luminance_scale = 1.2) print(adjusted_color)
This function processes a dataframe containing fgsea results. It adjusts pathway names by removing underscores, converting to lowercase, then capitalizing the first letter, and joining the components with spaces. It selects and merges the top upregulated and downregulated pathways based on enrichment score (ES) and p-value.
adjust_export_pathway(fgseaRes, nTop = 10)
adjust_export_pathway(fgseaRes, nTop = 10)
fgseaRes |
Dataframe containing fgsea results with columns 'pathway', 'ES', and 'pval'. |
nTop |
Integer, number of top pathways to select based on the p-value. |
A vector containing combined top upregulated and downregulated pathways.
# Create a synthetic fgseaRes dataframe fgseaRes <- data.frame( pathway = c("KEGG_APOPTOSIS", "GO_CELL_CYCLE", "REACTOME_DNA_REPAIR", "KEGG_METABOLISM", "GO_TRANSPORT"), ES = c(0.45, -0.22, 0.56, -0.35, 0.33), pval = c(0.001, 0.02, 0.0003, 0.05, 0.01) ) # Run the function to get top pathways result <- adjust_export_pathway(fgseaRes = fgseaRes, nTop = 2)
# Create a synthetic fgseaRes dataframe fgseaRes <- data.frame( pathway = c("KEGG_APOPTOSIS", "GO_CELL_CYCLE", "REACTOME_DNA_REPAIR", "KEGG_METABOLISM", "GO_TRANSPORT"), ES = c(0.45, -0.22, 0.56, -0.35, 0.33), pval = c(0.001, 0.02, 0.0003, 0.05, 0.01) ) # Run the function to get top pathways result <- adjust_export_pathway(fgseaRes = fgseaRes, nTop = 2)
A dataset containing the differentially expressed genes (DEGs) from four different statistical analysis methods: DESeq2, edgeR, limma, and Wilcoxon test. This dataset is used for generating Venn diagrams to compare the overlap of DEGs identified by different methods.
data(all_degs_venn)
data(all_degs_venn)
A list with the following components:
A vector of gene IDs or gene symbols identified as DEGs by the DESeq2 method.
A vector of gene IDs or gene symbols identified as DEGs by the edgeR method.
A vector of gene IDs or gene symbols identified as DEGs by the limma method.
A vector of gene IDs or gene symbols identified as DEGs by the Wilcoxon test method.
The data was derived from differential expression analyses performed on a gene expression dataset using four commonly used statistical methods (DESeq2, edgeR, limma, and Wilcoxon test).
data(all_degs_venn) # Example of plotting a Venn diagram using the dataset edge_colors <- c("#1b62bb","#13822e","#332c3a","#9e2d39") name_color <- c("#1b64bb","#13828e","#337c3a","#9e9d39") fill_colors <- c("#e3f2fa", "#0288d1") Contrast_degs_venn <- Contrast_Venn(all_degs_venn, edge_colors, name_color, fill_colors)
data(all_degs_venn) # Example of plotting a Venn diagram using the dataset edge_colors <- c("#1b62bb","#13822e","#332c3a","#9e2d39") name_color <- c("#1b64bb","#13828e","#337c3a","#9e9d39") fill_colors <- c("#e3f2fa", "#0288d1") Contrast_degs_venn <- Contrast_Venn(all_degs_venn, edge_colors, name_color, fill_colors)
This function sequentially adds multiple layers to a 'ggtree' plot, including gene expression data, boxplots for statistical summaries, and additional tile layers for pathway enrichment scores from SSGSEA and GSVA analyses. It utilizes separate functions for adding each type of layer and allows for the specification of gene colors as well as adjustments in aesthetics for each layer. The function is designed to work with specific data structures and assumes all functions for adding layers are defined and available.
circos_fruits( p, long_format_HeatdataDeseq, ssgsea_kegg_HeatdataDeseq, gsva_kegg_HeatdataDeseq, gene_colors )
circos_fruits( p, long_format_HeatdataDeseq, ssgsea_kegg_HeatdataDeseq, gsva_kegg_HeatdataDeseq, gene_colors )
p |
A 'ggtree' plot object to which the data and layers will be added. |
long_format_HeatdataDeseq |
A data frame containing gene expression data with columns for 'Samples', 'Genes', and 'Values'. |
ssgsea_kegg_HeatdataDeseq |
A data frame containing SSGSEA analysis results with columns for 'Samples', 'Genes', and 'Values'. |
gsva_kegg_HeatdataDeseq |
A data frame containing GSVA analysis results with columns for 'Samples', 'Genes', and 'Values'. |
gene_colors |
A named vector of colors for genes, used for coloring tiles in different layers. |
A 'ggtree' plot object with multiple layers added for comprehensive visualization.
# Check and load required packages if (requireNamespace("ggtreeExtra", quietly = TRUE) && requireNamespace("ggplot2", quietly = TRUE)) { library(ggtreeExtra) library(ggplot2) # Example data for gene expression, SSGSEA, and GSVA file_path <- system.file("extdata", "p_tree_test.rds", package = "TransProR") p <- readRDS(file_path) # Create gene expression data frame (long_format_HeatdataDeseq) long_format_HeatdataDeseq <- data.frame( Sample = rep(c("Species_A", "Species_B", "Species_C", "Species_D"), each = 5), Genes = rep(paste0("Gene", 1:5), times = 4), Value = runif(20, min = 0, max = 1) # Randomly generate expression values between 0 and 1 ) # Create SSGSEA analysis results data frame (ssgsea_kegg_HeatdataDeseq) ssgsea_kegg_HeatdataDeseq <- data.frame( Sample = rep(c("Species_A", "Species_B", "Species_C", "Species_D"), each = 3), Genes = rep(c("Pathway1", "Pathway2", "Pathway3"), times = 4), Value = runif(12, min = 0, max = 1) # Randomly generate enrichment scores between 0 and 1 ) # Create GSVA analysis results data frame (gsva_kegg_HeatdataDeseq) gsva_kegg_HeatdataDeseq <- data.frame( Sample = rep(c("Species_A", "Species_B", "Species_C", "Species_D"), each = 4), Genes = rep(c("PathwayA", "PathwayB", "PathwayC", "PathwayD"), times = 4), Value = runif(16, min = 0, max = 1) # Randomly generate enrichment scores between 0 and 1 ) # Define gene and pathway colors (named vector), including all genes and pathways gene_colors <- c( # Genes for gene expression Gene1 = "#491588", Gene2 = "#301b8d", Gene3 = "#1a237a", Gene4 = "#11479c", Gene5 = "#0a5797", # Pathways for SSGSEA Pathway1 = "#0b5f63", Pathway2 = "#074d41", Pathway3 = "#1f5e27", # Pathways for GSVA PathwayA = "#366928", PathwayB = "#827729", PathwayC = "#a1d99b", PathwayD = "#c7e9c0" ) # Call circos_fruits function to add multiple layers final_plot <- circos_fruits( p, long_format_HeatdataDeseq, ssgsea_kegg_HeatdataDeseq, gsva_kegg_HeatdataDeseq, gene_colors ) } else { message("Required packages 'ggtreeExtra' and 'ggplot2' are not installed.") }
# Check and load required packages if (requireNamespace("ggtreeExtra", quietly = TRUE) && requireNamespace("ggplot2", quietly = TRUE)) { library(ggtreeExtra) library(ggplot2) # Example data for gene expression, SSGSEA, and GSVA file_path <- system.file("extdata", "p_tree_test.rds", package = "TransProR") p <- readRDS(file_path) # Create gene expression data frame (long_format_HeatdataDeseq) long_format_HeatdataDeseq <- data.frame( Sample = rep(c("Species_A", "Species_B", "Species_C", "Species_D"), each = 5), Genes = rep(paste0("Gene", 1:5), times = 4), Value = runif(20, min = 0, max = 1) # Randomly generate expression values between 0 and 1 ) # Create SSGSEA analysis results data frame (ssgsea_kegg_HeatdataDeseq) ssgsea_kegg_HeatdataDeseq <- data.frame( Sample = rep(c("Species_A", "Species_B", "Species_C", "Species_D"), each = 3), Genes = rep(c("Pathway1", "Pathway2", "Pathway3"), times = 4), Value = runif(12, min = 0, max = 1) # Randomly generate enrichment scores between 0 and 1 ) # Create GSVA analysis results data frame (gsva_kegg_HeatdataDeseq) gsva_kegg_HeatdataDeseq <- data.frame( Sample = rep(c("Species_A", "Species_B", "Species_C", "Species_D"), each = 4), Genes = rep(c("PathwayA", "PathwayB", "PathwayC", "PathwayD"), times = 4), Value = runif(16, min = 0, max = 1) # Randomly generate enrichment scores between 0 and 1 ) # Define gene and pathway colors (named vector), including all genes and pathways gene_colors <- c( # Genes for gene expression Gene1 = "#491588", Gene2 = "#301b8d", Gene3 = "#1a237a", Gene4 = "#11479c", Gene5 = "#0a5797", # Pathways for SSGSEA Pathway1 = "#0b5f63", Pathway2 = "#074d41", Pathway3 = "#1f5e27", # Pathways for GSVA PathwayA = "#366928", PathwayB = "#827729", PathwayC = "#a1d99b", PathwayD = "#c7e9c0" ) # Call circos_fruits function to add multiple layers final_plot <- circos_fruits( p, long_format_HeatdataDeseq, ssgsea_kegg_HeatdataDeseq, gsva_kegg_HeatdataDeseq, gene_colors ) } else { message("Required packages 'ggtreeExtra' and 'ggplot2' are not installed.") }
The function first extracts histological types from the provided TCGA's normal tissue data set. After displaying these types, the user is prompted to input specific types to retain. The data is then filtered based on this input. The GTEX and TCGA's normal tissue datasets are then combined and batch corrected.
Note: This function assumes that TCGA's normal samples and GTEX samples represent different batches.
Combat_Normal( TCGA_normal_data_path, gtex_data_path, CombatNormal_output_path, auto_mode = FALSE, default_input = "11,12" )
Combat_Normal( TCGA_normal_data_path, gtex_data_path, CombatNormal_output_path, auto_mode = FALSE, default_input = "11,12" )
TCGA_normal_data_path |
The path to the tumor data stored in an RDS file. |
gtex_data_path |
The path to the GTEX data stored in an RDS file. |
CombatNormal_output_path |
A character string specifying the path where the output RDS file will be saved. |
auto_mode |
Logical. If set to TRUE, the function will not prompt the user for input and will instead use the values provided in default_input. Default is FALSE. |
default_input |
Character string. When auto_mode is TRUE, this parameter specifies the default TGCA's normal tissue types to be retained. It should be provided as a comma-separated string (e.g., "11,12"). |
This function takes a TCGA's normal tissue data set and a pre-saved GTEX data set, asks the user for specific TCGA's normal tissue types to retain, then merges the two datasets. The merged dataset is then corrected for batch effects using the ComBat_seq function from the 'sva' package.
The ComBat_seq function from the 'sva' package is used to correct batch effects. The function requires the 'sva' package to be installed and loaded externally.
The example code uses 'tempfile()' to generate temporary paths dynamically during execution. These paths are valid during the 'R CMD check' process, even if no actual files exist, because ‘tempfile()' generates a unique file path that does not depend on the user’s file system. Using 'tempfile()' ensures that the example code does not rely on specific external files and avoids errors during 'R CMD check'. CRAN review checks for documentation correctness and syntax parsing, not the existence of actual files, as long as the example code is syntactically valid.
A data.frame with corrected values after the ComBat_seq adjustment. Note that this function also saves the combat_count_df data as an RDS file at the specified output path.
TCGA_normal_file <- system.file("extdata", "SKCM_Skin_TCGA_exp_normal_test.rds", package = "TransProR") gtex_file <- system.file("extdata", "Skin_SKCM_Gtex_test.rds", package = "TransProR") output_file <- file.path(tempdir(), "SKCM_Skin_Combat_Normal_TCGA_GTEX_count.rds") SKCM_Skin_Combat_Normal_TCGA_GTEX_count <- Combat_Normal( TCGA_normal_data_path = TCGA_normal_file, gtex_data_path = gtex_file, CombatNormal_output_path = output_file, auto_mode = TRUE, default_input = "skip" ) head(SKCM_Skin_Combat_Normal_TCGA_GTEX_count)[1:5, 1:5]
TCGA_normal_file <- system.file("extdata", "SKCM_Skin_TCGA_exp_normal_test.rds", package = "TransProR") gtex_file <- system.file("extdata", "Skin_SKCM_Gtex_test.rds", package = "TransProR") output_file <- file.path(tempdir(), "SKCM_Skin_Combat_Normal_TCGA_GTEX_count.rds") SKCM_Skin_Combat_Normal_TCGA_GTEX_count <- Combat_Normal( TCGA_normal_data_path = TCGA_normal_file, gtex_data_path = gtex_file, CombatNormal_output_path = output_file, auto_mode = TRUE, default_input = "skip" ) head(SKCM_Skin_Combat_Normal_TCGA_GTEX_count)[1:5, 1:5]
The function first extracts histological types from the provided tumor data set. After displaying these types, the user is prompted to input specific types to retain. The data is then filtered based on this input.
Note: This example assumes that different tumor types represent different batches in a general sense. Users need to adjust the batch and group vectors based on real-life scenarios.
combat_tumor( tumor_data_path, CombatTumor_output_path, auto_mode = FALSE, default_input = "01,06" )
combat_tumor( tumor_data_path, CombatTumor_output_path, auto_mode = FALSE, default_input = "01,06" )
tumor_data_path |
The path to the tumor data stored in an RDS file. |
CombatTumor_output_path |
A character string specifying the path where the output RDS file will be saved. |
auto_mode |
Logical. If set to TRUE, the function will not prompt the user for input and will instead use the values provided in default_input. Default is FALSE. |
default_input |
Character string. When auto_mode is TRUE, this parameter specifies the default tumor types to be retained. It should be provided as a comma-separated string (e.g., "01,06"). |
This function takes a tumor data set, asks the user for specific tumor types to retain, and then corrects for batch effects using the ComBat_seq function from the 'sva' package.
The ComBat_seq function from the sva package is used to correct batch effects. The function requires the 'sva' package to be installed and loaded externally.
A data.frame with corrected values after the ComBat_seq adjustment. Note that this function also saves the combat_count_df data as an RDS file at the specified output path.
tumor_file <- system.file("extdata", "SKCM_Skin_TCGA_exp_tumor_test.rds", package = "TransProR") output_file <- file.path(tempdir(), "SKCM_combat_count.rds") SKCM_combat_count <- combat_tumor( tumor_data_path = tumor_file, CombatTumor_output_path = output_file, auto_mode = TRUE, default_input = "01,06" ) head(SKCM_combat_count)[1:5, 1:5]
tumor_file <- system.file("extdata", "SKCM_Skin_TCGA_exp_tumor_test.rds", package = "TransProR") output_file <- file.path(tempdir(), "SKCM_combat_count.rds") SKCM_combat_count <- combat_tumor( tumor_data_path = tumor_file, CombatTumor_output_path = output_file, auto_mode = TRUE, default_input = "01,06" ) head(SKCM_combat_count)[1:5, 1:5]
This function takes two DEG data frames, inner joins them by a specified gene column, checks if a specified column is identical across both data frames, and merges them if they are. The resulting data frame will have a merged column named after the compared column.
compare_merge(df1, df2, by_gene, compare_col, suffixes, df_name)
compare_merge(df1, df2, by_gene, compare_col, suffixes, df_name)
df1 |
First data frame. |
df2 |
Second data frame. |
by_gene |
Column name by which to join the data frames, typically "Gene". |
compare_col |
Column to compare for identity, which will also be the name of the merged column. |
suffixes |
Suffixes to use for non-identical column names in the joined data frame. |
df_name |
Name to assign to the resulting data frame for identification. |
A data frame with processed columns.
# Create simulated DESeq2 data DEG_deseq2 <- data.frame( Gene = c("Gene1", "Gene2", "Gene3", "Gene4", "Gene5"), change = c("up", "down", "no_change", "up", "down"), log2FoldChange = c(2.5, -3.2, 0.1, 1.8, -2.5), pvalue = c(0.01, 0.05, 0.9, 0.02, 0.03) ) # Display the first 5 rows of the DESeq2 data head(DEG_deseq2, 5) # Create simulated edgeR data DEG_edgeR <- data.frame( Gene = c("Gene1", "Gene2", "Gene3", "Gene4", "Gene5"), change = c("up", "down", "no_change", "no_change", "up"), log2FoldChange = c(2.3, -3.1, 0.2, 0.1, 2.7), pvalue = c(0.02, 0.04, 0.8, 0.6, 0.01) ) # Display the first 5 rows of the edgeR data head(DEG_edgeR, 5) # Merge the DESeq2 and edgeR data deseq2_edgeR <- compare_merge( df1 = DEG_deseq2, df2 = DEG_edgeR, by_gene = "Gene", compare_col = "change", suffixes = c("_1", "_2"), df_name = "deseq2_edgeR" )
# Create simulated DESeq2 data DEG_deseq2 <- data.frame( Gene = c("Gene1", "Gene2", "Gene3", "Gene4", "Gene5"), change = c("up", "down", "no_change", "up", "down"), log2FoldChange = c(2.5, -3.2, 0.1, 1.8, -2.5), pvalue = c(0.01, 0.05, 0.9, 0.02, 0.03) ) # Display the first 5 rows of the DESeq2 data head(DEG_deseq2, 5) # Create simulated edgeR data DEG_edgeR <- data.frame( Gene = c("Gene1", "Gene2", "Gene3", "Gene4", "Gene5"), change = c("up", "down", "no_change", "no_change", "up"), log2FoldChange = c(2.3, -3.1, 0.2, 0.1, 2.7), pvalue = c(0.02, 0.04, 0.8, 0.6, 0.01) ) # Display the first 5 rows of the edgeR data head(DEG_edgeR, 5) # Merge the DESeq2 and edgeR data deseq2_edgeR <- compare_merge( df1 = DEG_deseq2, df2 = DEG_edgeR, by_gene = "Gene", compare_col = "change", suffixes = c("_1", "_2"), df_name = "deseq2_edgeR" )
This function creates a Venn Diagram using the 'ggVennDiagram' package. It allows customization of various aesthetic elements of the diagram, including colors.
Contrast_Venn( all_degs_venn, edge_colors, name_color, fill_colors, label_size = 4, edge_size = 3 )
Contrast_Venn( all_degs_venn, edge_colors, name_color, fill_colors, label_size = 4, edge_size = 3 )
all_degs_venn |
A list of DEG sets for Venn Diagram creation. |
edge_colors |
A vector of colors for the edges of the Venn Diagram sets. |
name_color |
A vector of colors for the names of the sets in the Venn Diagram. |
fill_colors |
A vector of two colors for the gradient fill of the Venn Diagram. |
label_size |
The size of the labels showing the number of elements in each set (default is 4). |
edge_size |
The size of the edges of the Venn Diagram sets (default is 3). |
A 'ggplot' object representing the Venn Diagram.
data("all_degs_venn", package = "TransProR") edge_colors <- c("#1b62bb","#13822e","#332c3a","#9e2d39") name_color <- c("#1b64bb","#13828e","#337c3a","#9e9d39") fill_colors <- c("#e3f2fa", "#0288d1") Contrast_degs_venn <- Contrast_Venn(all_degs_venn, edge_colors, name_color, fill_colors)
data("all_degs_venn", package = "TransProR") edge_colors <- c("#1b62bb","#13822e","#332c3a","#9e2d39") name_color <- c("#1b64bb","#13828e","#337c3a","#9e9d39") fill_colors <- c("#e3f2fa", "#0288d1") Contrast_degs_venn <- Contrast_Venn(all_degs_venn, edge_colors, name_color, fill_colors)
This function creates a base plot using 'ggtree' and 'ggtreeExtra' libraries, adding gene expression data as colored tiles to the plot. It allows for dynamic coloring of the genes and includes adjustments for alpha transparency based on the expression value.
create_base_plot(p, data, gene_colors, gene_label = "Gene")
create_base_plot(p, data, gene_colors, gene_label = "Gene")
p |
A ggtree plot object to which the data will be added. |
data |
A data frame containing gene expression data with columns for Samples, Genes, and Values. |
gene_colors |
A named vector of colors for genes. |
gene_label |
A character string used as a label in the legend for the genes. Default is "Gene". |
A 'ggtree' plot object with the gene expression data added.
# Check and load required packages if (requireNamespace("ggtreeExtra", quietly = TRUE) && requireNamespace("ggplot2", quietly = TRUE)) { library(ggtreeExtra) library(ggplot2) file_path <- system.file("extdata", "p_tree_test.rds", package = "TransProR") p <- readRDS(file_path) # Create gene expression data frame expression_data <- data.frame( Sample = rep(c("Species_A", "Species_B", "Species_C", "Species_D"), each = 5), Gene = rep(paste0("Gene", 1:5), times = 4), Value = runif(20, min = 0, max = 1) # Randomly generate expression values between 0 and 1 ) # Define gene colors (named vector) gene_colors <- c( Gene1 = "#491588", Gene2 = "#301b8d", Gene3 = "#1a237a", Gene4 = "#11479c", Gene5 = "#0a5797" ) # Call create_base_plot function to add gene expression data p <- create_base_plot(p, expression_data, gene_colors) } else { message("Required packages 'ggtreeExtra' and 'ggplot2' are not installed.") }
# Check and load required packages if (requireNamespace("ggtreeExtra", quietly = TRUE) && requireNamespace("ggplot2", quietly = TRUE)) { library(ggtreeExtra) library(ggplot2) file_path <- system.file("extdata", "p_tree_test.rds", package = "TransProR") p <- readRDS(file_path) # Create gene expression data frame expression_data <- data.frame( Sample = rep(c("Species_A", "Species_B", "Species_C", "Species_D"), each = 5), Gene = rep(paste0("Gene", 1:5), times = 4), Value = runif(20, min = 0, max = 1) # Randomly generate expression values between 0 and 1 ) # Define gene colors (named vector) gene_colors <- c( Gene1 = "#491588", Gene2 = "#301b8d", Gene3 = "#1a237a", Gene4 = "#11479c", Gene5 = "#0a5797" ) # Call create_base_plot function to add gene expression data p <- create_base_plot(p, expression_data, gene_colors) } else { message("Required packages 'ggtreeExtra' and 'ggplot2' are not installed.") }
This function filters out genes based on their expression change status. It returns the names of genes which are not "stable".
deg_filter(df)
deg_filter(df)
df |
A data frame containing gene expression data. |
A vector of gene names that are differentially expressed.
DEG_deseq2_file <- system.file("extdata", "DEG_deseq2.rds", package = "TransProR") DEG_deseq2 <- readRDS(DEG_deseq2_file) DEG_deseq2_test <- deg_filter(DEG_deseq2)
DEG_deseq2_file <- system.file("extdata", "DEG_deseq2.rds", package = "TransProR") DEG_deseq2 <- readRDS(DEG_deseq2_file) DEG_deseq2_test <- deg_filter(DEG_deseq2)
'DESeq2': Differential gene expression analysis based on the negative binomial distribution. This function utilizes the 'DESeq2' package to conduct differential gene expression analysis. It processes tumor and normal expression data, applies DESeq2 analysis, and outputs the results along with information on gene expression changes.
DESeq2_analyze( tumor_file, normal_file, output_file, logFC = 2.5, p_value = 0.01 )
DESeq2_analyze( tumor_file, normal_file, output_file, logFC = 2.5, p_value = 0.01 )
tumor_file |
Path to the tumor data file (RDS format). |
normal_file |
Path to the normal data file (RDS format). |
output_file |
Path to save the output DEG data (RDS format). |
logFC |
Threshold for log fold change. |
p_value |
Threshold for p-value. |
The DESeq2 methodology is based on modeling count data using a negative binomial distribution, which allows for handling the variability observed in gene expression data, especially in small sample sizes. This approach is well-suited for RNA-Seq data analysis.
A data frame of differential expression results.
DESeq2:Differential gene expression analysis based on the negative binomial distribution. For more information, visit the page: https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/
# Define file paths for tumor and normal data from the data folder tumor_file <- system.file("extdata", "removebatch_SKCM_Skin_TCGA_exp_tumor_test.rds", package = "TransProR") normal_file <- system.file("extdata", "removebatch_SKCM_Skin_Normal_TCGA_GTEX_count_test.rds", package = "TransProR") output_file <- file.path(tempdir(), "DEG_DESeq2.rds") DEG_DESeq2 <- DESeq2_analyze( tumor_file = tumor_file, normal_file = normal_file, output_file = output_file, 2.5, 0.01 ) # View the top 5 rows of the result head(DEG_DESeq2, 5)
# Define file paths for tumor and normal data from the data folder tumor_file <- system.file("extdata", "removebatch_SKCM_Skin_TCGA_exp_tumor_test.rds", package = "TransProR") normal_file <- system.file("extdata", "removebatch_SKCM_Skin_Normal_TCGA_GTEX_count_test.rds", package = "TransProR") output_file <- file.path(tempdir(), "DEG_DESeq2.rds") DEG_DESeq2 <- DESeq2_analyze( tumor_file = tumor_file, normal_file = normal_file, output_file = output_file, 2.5, 0.01 ) # View the top 5 rows of the result head(DEG_DESeq2, 5)
This function creates two sets of legends, one on the left and one on the right side of a plot. It displays color-coded legends with labels corresponding to different data categories. Each legend entry consists of a colored rectangle and a text label. The left side legend has text aligned to the right of the color block, while the right side legend has text aligned to the left of the color block.
drawLegends( labels, colors, legend_width, x_positions, y_position, just_positions, text_alignments, font_size )
drawLegends( labels, colors, legend_width, x_positions, y_position, just_positions, text_alignments, font_size )
labels |
Vector of labels for the legends. |
colors |
Vector of colors corresponding to the labels. |
legend_width |
The width of each legend viewport expressed in grid units. |
x_positions |
Numeric vector of length 2 specifying the x-positions of the left and right legends. |
y_position |
The y-position common for both legends, expressed as a fraction of the plot height. |
just_positions |
List of two vectors, each specifying the horizontal and vertical justification for the legends. |
text_alignments |
List of two character strings specifying text alignments for the legends ('left' or 'right'). |
font_size |
Numeric value specifying the font size for the legend labels. |
Invisible. This function is called for its side effects of drawing legends on a plot.
labels <- c("Label1", "Label2", "Label3", "Label4", "Label5", "Label6") colors <- c("#ff0000", "#00ff00", "#0000ff", "#ffff00", "#ff00ff", "#00ffff") # Convert to 'unit' objects for grid grid::grid.roundrect( x = grid::unit(0.5, "npc"), # "npc" stands for normalized parent coordinates y = grid::unit(0.5, "npc"), width = grid::unit(0.1, "npc"), height = grid::unit(0.05, "npc"), gp = grid::gpar(fill = "red"), r = grid::unit(0.1, "npc") # rounding radius ) # Example of drawing legends with specific labels and colors drawLegends(labels, colors, grid::unit(2, "cm"), c(0.225, 0.75), 0.5, list(c("left", "center"), c("right", "center")), list("right", "left"), 10)
labels <- c("Label1", "Label2", "Label3", "Label4", "Label5", "Label6") colors <- c("#ff0000", "#00ff00", "#0000ff", "#ffff00", "#ff00ff", "#00ffff") # Convert to 'unit' objects for grid grid::grid.roundrect( x = grid::unit(0.5, "npc"), # "npc" stands for normalized parent coordinates y = grid::unit(0.5, "npc"), width = grid::unit(0.1, "npc"), height = grid::unit(0.05, "npc"), gp = grid::gpar(fill = "red"), r = grid::unit(0.1, "npc") # rounding radius ) # Example of drawing legends with specific labels and colors drawLegends(labels, colors, grid::unit(2, "cm"), c(0.225, 0.75), 0.5, list(c("left", "center"), c("right", "center")), list("right", "left"), 10)
This function performs differential gene expression analysis using the 'edgeR' package. It reads tumor and normal expression data, merges them, filters low-expressed genes, normalizes the data, performs edgeR analysis, and outputs the results along with information on gene expression changes.
edgeR_analyze( tumor_file, normal_file, output_file, logFC_threshold = 2.5, p_value_threshold = 0.01 )
edgeR_analyze( tumor_file, normal_file, output_file, logFC_threshold = 2.5, p_value_threshold = 0.01 )
tumor_file |
Path to the tumor data file (RDS format). |
normal_file |
Path to the normal data file (RDS format). |
output_file |
Path to save the output DEG data (RDS format). |
logFC_threshold |
Threshold for log fold change for marking up/down-regulated genes. |
p_value_threshold |
Threshold for p-value for filtering significant genes. |
A data frame of differential expression results.
edgeR: Differential analysis of sequence read count data. For more information, visit the edgeR Bioconductor page: https://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf
# Define file paths for tumor and normal data from the data folder tumor_file <- system.file("extdata", "removebatch_SKCM_Skin_TCGA_exp_tumor_test.rds", package = "TransProR") normal_file <- system.file("extdata", "removebatch_SKCM_Skin_Normal_TCGA_GTEX_count_test.rds", package = "TransProR") output_file <- file.path(tempdir(), "DEG_edgeR.rds") DEG_edgeR <- edgeR_analyze( tumor_file = tumor_file, normal_file = normal_file, output_file = output_file, 2.5, 0.01 ) # View the top 5 rows of the result head(DEG_edgeR, 5)
# Define file paths for tumor and normal data from the data folder tumor_file <- system.file("extdata", "removebatch_SKCM_Skin_TCGA_exp_tumor_test.rds", package = "TransProR") normal_file <- system.file("extdata", "removebatch_SKCM_Skin_Normal_TCGA_GTEX_count_test.rds", package = "TransProR") output_file <- file.path(tempdir(), "DEG_edgeR.rds") DEG_edgeR <- edgeR_analyze( tumor_file = tumor_file, normal_file = normal_file, output_file = output_file, 2.5, 0.01 ) # View the top 5 rows of the result head(DEG_edgeR, 5)
This function combines multiple data frames, arranges them, and visualizes the combined data in a Circular Bar Chart using the 'ggplot2' and 'ggalluvial' packages.
enrich_circo_bar(data_list)
enrich_circo_bar(data_list)
data_list |
A list of data frames to be combined. |
A 'ggplot' object representing the Circular Bar Chart.
# Create sample data frames for each enrichment category # 1. Biological Process (BP) filtered_data_BP <- data.frame( Description = c( "immune response", "cell proliferation", "signal transduction", "apoptotic process", "metabolic process" ), Count = c(120, 85, 150, 60, 95), color = c( "#1f77b4", # blue "#ff7f0e", # orange "#2ca02c", # green "#d62728", # red "#9467bd" # purple ), stringsAsFactors = FALSE ) # 2. Cellular Component (CC) filtered_data_CC <- data.frame( Description = c( "nucleus", "cytoplasm", "membrane", "mitochondrion", "extracellular space" ), Count = c(90, 110, 75, 65, 80), color = c( "#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd" ), stringsAsFactors = FALSE ) # 3. Molecular Function (MF) filtered_data_MF <- data.frame( Description = c( "protein binding", "DNA binding", "enzyme activity", "transporter activity", "receptor activity" ), Count = c(140, 130, 100, 70, 90), color = c( "#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd" ), stringsAsFactors = FALSE ) # 4. Disease Ontology (DO) filtered_data_DO <- data.frame( Description = c( "cancer", "cardiovascular disease", "neurological disorder", "metabolic disease", "infectious disease" ), Count = c(200, 150, 120, 90, 160), color = c( "#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd" ), stringsAsFactors = FALSE ) # 5. Reactome Pathways filtered_data_Reactome <- data.frame( Description = c( "Cell Cycle", "Apoptosis", "DNA Repair", "Signal Transduction", "Metabolism of Proteins" ), Count = c(110, 95, 80, 130, 85), color = c( "#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd" ), stringsAsFactors = FALSE ) # 6. KEGG Pathways filtered_data_kegg <- data.frame( Description = c( "PI3K-Akt signaling pathway", "MAPK signaling pathway", "NF-kappa B signaling pathway", "JAK-STAT signaling pathway", "Toll-like receptor signaling pathway" ), Count = c(175, 160, 145, 130, 155), color = c( "#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd" ), stringsAsFactors = FALSE ) # Combine all filtered data frames into a list data_list <- list( BP = filtered_data_BP, CC = filtered_data_CC, MF = filtered_data_MF, DO = filtered_data_DO, Reactome = filtered_data_Reactome, KEGG = filtered_data_kegg ) # Create the Circular Bar Chart combined_and_visualized_data <- enrich_circo_bar(data_list)
# Create sample data frames for each enrichment category # 1. Biological Process (BP) filtered_data_BP <- data.frame( Description = c( "immune response", "cell proliferation", "signal transduction", "apoptotic process", "metabolic process" ), Count = c(120, 85, 150, 60, 95), color = c( "#1f77b4", # blue "#ff7f0e", # orange "#2ca02c", # green "#d62728", # red "#9467bd" # purple ), stringsAsFactors = FALSE ) # 2. Cellular Component (CC) filtered_data_CC <- data.frame( Description = c( "nucleus", "cytoplasm", "membrane", "mitochondrion", "extracellular space" ), Count = c(90, 110, 75, 65, 80), color = c( "#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd" ), stringsAsFactors = FALSE ) # 3. Molecular Function (MF) filtered_data_MF <- data.frame( Description = c( "protein binding", "DNA binding", "enzyme activity", "transporter activity", "receptor activity" ), Count = c(140, 130, 100, 70, 90), color = c( "#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd" ), stringsAsFactors = FALSE ) # 4. Disease Ontology (DO) filtered_data_DO <- data.frame( Description = c( "cancer", "cardiovascular disease", "neurological disorder", "metabolic disease", "infectious disease" ), Count = c(200, 150, 120, 90, 160), color = c( "#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd" ), stringsAsFactors = FALSE ) # 5. Reactome Pathways filtered_data_Reactome <- data.frame( Description = c( "Cell Cycle", "Apoptosis", "DNA Repair", "Signal Transduction", "Metabolism of Proteins" ), Count = c(110, 95, 80, 130, 85), color = c( "#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd" ), stringsAsFactors = FALSE ) # 6. KEGG Pathways filtered_data_kegg <- data.frame( Description = c( "PI3K-Akt signaling pathway", "MAPK signaling pathway", "NF-kappa B signaling pathway", "JAK-STAT signaling pathway", "Toll-like receptor signaling pathway" ), Count = c(175, 160, 145, 130, 155), color = c( "#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd" ), stringsAsFactors = FALSE ) # Combine all filtered data frames into a list data_list <- list( BP = filtered_data_BP, CC = filtered_data_CC, MF = filtered_data_MF, DO = filtered_data_DO, Reactome = filtered_data_Reactome, KEGG = filtered_data_kegg ) # Create the Circular Bar Chart combined_and_visualized_data <- enrich_circo_bar(data_list)
This function creates a polar bubble plot using 'ggplot2'. It is designed to visually represent data with methods and positional metrics integrated, highlighting specific IDs if necessary.
enrich_polar_bubble(final_combined_df_with_id_and_position, pal, highlight_ids)
enrich_polar_bubble(final_combined_df_with_id_and_position, pal, highlight_ids)
final_combined_df_with_id_and_position |
A data frame containing 'id', 'Count', 'method', 'Description', 'point_position', 'test_color'. |
pal |
A named vector of colors corresponding to the 'method' values. |
highlight_ids |
A vector of IDs to highlight. |
A 'ggplot' object representing the enriched polar bubble plot.
final_df <- data.frame(id = 1:10, Count = c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100), method = rep("Method1", 10), Description = LETTERS[1:10], point_position = seq(10, 100, 10), test_color = sample(c("red", "blue"), 10, replace = TRUE)) pal <- c("Method1" = "blue") highlight_ids <- c(1, 5, 9) enrich_polar_bubble(final_df, pal, highlight_ids)
final_df <- data.frame(id = 1:10, Count = c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100), method = rep("Method1", 10), Description = LETTERS[1:10], point_position = seq(10, 100, 10), test_color = sample(c("red", "blue"), 10, replace = TRUE)) pal <- c("Method1" = "blue") highlight_ids <- c(1, 5, 9) enrich_polar_bubble(final_df, pal, highlight_ids)
This function creates a chord diagram from a specified dataframe and draws two sets of legends for it. It adjusts the track height of the chord diagram to optimize space and uses specified colors for the grid. Legends are drawn at specified positions with configurable text alignments and font sizes.
enrichment_circlize( all_combined_df, original_colors, labels, colors, labels2, colors2, font_size = 10 )
enrichment_circlize( all_combined_df, original_colors, labels, colors, labels2, colors2, font_size = 10 )
all_combined_df |
A dataframe containing the matrix for the chord diagram. |
original_colors |
A vector of colors for the grid columns of the chord diagram. |
labels |
A vector of labels for the first legend. |
colors |
A vector of colors corresponding to the first legend's labels. |
labels2 |
A vector of labels for the second legend. |
colors2 |
A vector of colors corresponding to the second legend's labels. |
font_size |
The font size used for legend texts, defaults to 10. |
Invisible, primarily used for its side effects of drawing on a graphics device.
# Sample Chord Diagram Matrix all_combined_df <- data.frame( A = c(10, 20, 30), B = c(15, 25, 35), C = c(5, 10, 15) ) rownames(all_combined_df) <- c("A", "B", "C") # Colors for the grid of the chord diagram (corresponding to columns of the matrix) original_colors <- c("red", "green", "blue") # Name the colors according to the sectors (A, B, C) names(original_colors) <- colnames(all_combined_df) # Labels and Colors for the First Legend labels <- c("Label 1", "Label 2", "Label 3") colors <- c("yellow", "purple", "cyan") # Labels and Colors for the Second Legend labels2 <- c("Label A", "Label B", "Label C") colors2 <- c("orange", "pink", "brown") # Font size for the legend texts (optional, default is 10) font_size <- 10 # Call the enrichment_circlize function with the sample data # This is just an example; the plot will be rendered in an appropriate graphics context # such as RStudio's plot pane or an external plotting window. plot1 <- enrichment_circlize(all_combined_df, original_colors, labels, colors, labels2, colors2, font_size )
# Sample Chord Diagram Matrix all_combined_df <- data.frame( A = c(10, 20, 30), B = c(15, 25, 35), C = c(5, 10, 15) ) rownames(all_combined_df) <- c("A", "B", "C") # Colors for the grid of the chord diagram (corresponding to columns of the matrix) original_colors <- c("red", "green", "blue") # Name the colors according to the sectors (A, B, C) names(original_colors) <- colnames(all_combined_df) # Labels and Colors for the First Legend labels <- c("Label 1", "Label 2", "Label 3") colors <- c("yellow", "purple", "cyan") # Labels and Colors for the Second Legend labels2 <- c("Label A", "Label B", "Label C") colors2 <- c("orange", "pink", "brown") # Font size for the legend texts (optional, default is 10) font_size <- 10 # Call the enrichment_circlize function with the sample data # This is just an example; the plot will be rendered in an appropriate graphics context # such as RStudio's plot pane or an external plotting window. plot1 <- enrichment_circlize(all_combined_df, original_colors, labels, colors, labels2, colors2, font_size )
This function initializes a spiral plot, adds tracks for pathways and samples, and generates legends based on the sample and pathway information in the provided data frame. It uses 'spiralize' for the spiral plot and 'ComplexHeatmap' for handling legends.
enrichment_spiral_plots(results)
enrichment_spiral_plots(results)
results |
A data frame containing 'Pathway', 'Sample', 'Value', 'PathwayColor', and 'SampleColor' columns. |
No return value, called for side effects. This function generates spiral plots and adds legends based on sample and pathway information.
# Example: Creating enrichment spiral plots with legends # Define the results data frame results <- data.frame( Pathway = c("Pathway1", "Pathway1", "Pathway2", "Pathway2", "Pathway3"), Sample = c("Sample1", "Sample1", "Sample2", "Sample2", "Sample3"), Value = c(20, 30, 15, 35, 25), PathwayColor = c("red", "red", "blue", "blue", "orange"), SampleColor = c("green", "green", "purple", "purple", "cyan"), stringsAsFactors = FALSE ) # Create the enrichment spiral plots with legends enrichment_spiral_plots(results)
# Example: Creating enrichment spiral plots with legends # Define the results data frame results <- data.frame( Pathway = c("Pathway1", "Pathway1", "Pathway2", "Pathway2", "Pathway3"), Sample = c("Sample1", "Sample1", "Sample2", "Sample2", "Sample3"), Value = c(20, 30, 15, 35, 25), PathwayColor = c("red", "red", "blue", "blue", "orange"), SampleColor = c("green", "green", "purple", "purple", "cyan"), stringsAsFactors = FALSE ) # Create the enrichment spiral plots with legends enrichment_spiral_plots(results)
This function filters a data frame for specified descriptions, selects the 'Description' and 'Count' columns, and adds a new column with a specified color.
extract_descriptions_counts(df, descriptions, color)
extract_descriptions_counts(df, descriptions, color)
df |
A data frame containing at least 'Description' and 'Count' columns. |
descriptions |
A vector of descriptions to filter in the data frame. |
color |
A character string specifying the color to be added as a new column. |
A data frame filtered by descriptions, containing 'Description', 'Count', and a new 'color' column.
# Generate Sample Input Data for extract_descriptions_counts Function # Create a sample data frame with 'Description' and 'Count' columns data <- data.frame( Description = c( "immunoglobulin production", "B cell mediated immunity", "T cell activation", "antigen processing and presentation", "cytokine signaling", "natural killer cell activity", "phagocytosis", "complement activation", "antibody-dependent cellular cytotoxicity", "regulatory T cell function" ), Count = c( 150, # immunoglobulin production 200, # B cell mediated immunity 175, # T cell activation 125, # antigen processing and presentation 190, # cytokine signaling 160, # natural killer cell activity 140, # phagocytosis 180, # complement activation 130, # antibody-dependent cellular cytotoxicity 170 # regulatory T cell function ), stringsAsFactors = FALSE # Ensure that strings are not converted to factors ) descriptions_to_filter <- c("immunoglobulin production", "B cell mediated immunity") specified_color <- "red" # You can specify any color you desire filtered_data_with_color <- extract_descriptions_counts( data, descriptions_to_filter, specified_color) print(filtered_data_with_color)
# Generate Sample Input Data for extract_descriptions_counts Function # Create a sample data frame with 'Description' and 'Count' columns data <- data.frame( Description = c( "immunoglobulin production", "B cell mediated immunity", "T cell activation", "antigen processing and presentation", "cytokine signaling", "natural killer cell activity", "phagocytosis", "complement activation", "antibody-dependent cellular cytotoxicity", "regulatory T cell function" ), Count = c( 150, # immunoglobulin production 200, # B cell mediated immunity 175, # T cell activation 125, # antigen processing and presentation 190, # cytokine signaling 160, # natural killer cell activity 140, # phagocytosis 180, # complement activation 130, # antibody-dependent cellular cytotoxicity 170 # regulatory T cell function ), stringsAsFactors = FALSE # Ensure that strings are not converted to factors ) descriptions_to_filter <- c("immunoglobulin production", "B cell mediated immunity") specified_color <- "red" # You can specify any color you desire filtered_data_with_color <- extract_descriptions_counts( data, descriptions_to_filter, specified_color) print(filtered_data_with_color)
This function processes a dataframe containing SSGSEA KEGG results. It allows specifying the number of top pathways to extract for each sample based on their scores, and stores these in a new dataframe with sample names and pathway scores.
extract_ntop_pathways(ssgsea_kegg, nTop = 5)
extract_ntop_pathways(ssgsea_kegg, nTop = 5)
ssgsea_kegg |
Dataframe containing SSGSEA KEGG results with samples as columns and pathways as rows. |
nTop |
Integer, number of top pathways to select for each sample. |
A dataframe with columns 'Pathway', 'Sample', and 'Value' representing the top pathways for each sample.
# Example: Generating input data for the extract_ntop_pathways function # Define example pathways pathways <- c("Pathway_A", "Pathway_B", "Pathway_C", "Pathway_D", "Pathway_E", "Pathway_F", "Pathway_G", "Pathway_H", "Pathway_I", "Pathway_J") # Define example samples samples <- c("Sample_1", "Sample_2", "Sample_3") # Generate random SSGSEA KEGG scores between 0 and 1 set.seed(123) # For reproducibility ssgsea_scores <- matrix(runif(length(pathways) * length(samples), min = 0, max = 1), nrow = length(pathways), ncol = length(samples), dimnames = list(pathways, samples)) # Convert to a data frame ssgsea_kegg <- as.data.frame(ssgsea_scores) # Extract the top 3 pathways for each sample top_pathways <- extract_ntop_pathways(ssgsea_kegg, nTop = 3)
# Example: Generating input data for the extract_ntop_pathways function # Define example pathways pathways <- c("Pathway_A", "Pathway_B", "Pathway_C", "Pathway_D", "Pathway_E", "Pathway_F", "Pathway_G", "Pathway_H", "Pathway_I", "Pathway_J") # Define example samples samples <- c("Sample_1", "Sample_2", "Sample_3") # Generate random SSGSEA KEGG scores between 0 and 1 set.seed(123) # For reproducibility ssgsea_scores <- matrix(runif(length(pathways) * length(samples), min = 0, max = 1), nrow = length(pathways), ncol = length(samples), dimnames = list(pathways, samples)) # Convert to a data frame ssgsea_kegg <- as.data.frame(ssgsea_scores) # Extract the top 3 pathways for each sample top_pathways <- extract_ntop_pathways(ssgsea_kegg, nTop = 3)
This function processes the results of SSGSEA, specifically focusing on KEGG pathways. It extracts pathways with positive values from each sample and randomly selects a subset of them.
extract_positive_pathways(ssgsea_kegg, max_paths_per_sample = 5)
extract_positive_pathways(ssgsea_kegg, max_paths_per_sample = 5)
ssgsea_kegg |
A matrix or data frame with pathways as rows and samples as columns. |
max_paths_per_sample |
Integer, maximum number of pathways to select per sample. |
A data frame with selected pathways, samples, and their corresponding values.
# Example: Generating input data for the extract_positive_pathways function # Define example pathways pathways <- c("Pathway_1", "Pathway_2", "Pathway_3", "Pathway_4", "Pathway_5", "Pathway_6", "Pathway_7", "Pathway_8", "Pathway_9", "Pathway_10") # Define example samples samples <- c("Sample_A", "Sample_B", "Sample_C") # Generate random SSGSEA KEGG scores including both positive and negative values set.seed(456) # For reproducibility ssgsea_scores <- matrix(rnorm(length(pathways) * length(samples), mean = 0, sd = 1), nrow = length(pathways), ncol = length(samples), dimnames = list(pathways, samples)) # Convert to a data frame ssgsea_kegg <- as.data.frame(ssgsea_scores) # Use the extract_positive_pathways function to extract up to 3 positive pathways per sample selected_positive_pathways <- extract_positive_pathways(ssgsea_kegg, max_paths_per_sample = 3)
# Example: Generating input data for the extract_positive_pathways function # Define example pathways pathways <- c("Pathway_1", "Pathway_2", "Pathway_3", "Pathway_4", "Pathway_5", "Pathway_6", "Pathway_7", "Pathway_8", "Pathway_9", "Pathway_10") # Define example samples samples <- c("Sample_A", "Sample_B", "Sample_C") # Generate random SSGSEA KEGG scores including both positive and negative values set.seed(456) # For reproducibility ssgsea_scores <- matrix(rnorm(length(pathways) * length(samples), mean = 0, sd = 1), nrow = length(pathways), ncol = length(samples), dimnames = list(pathways, samples)) # Convert to a data frame ssgsea_kegg <- as.data.frame(ssgsea_scores) # Use the extract_positive_pathways function to extract up to 3 positive pathways per sample selected_positive_pathways <- extract_positive_pathways(ssgsea_kegg, max_paths_per_sample = 3)
This function creates faceted high-density region plots using ggdensity for adding optional density rug and density contours, and scatter points. It also adds a regression line and Pearson correlation label. The plot is faceted by a grouping variable.
facet_density_foldchange( data, x_var, y_var, group_var, facet_var, palette, show_points = FALSE, show_density = TRUE, point_size = 2.5, point_alpha = 0.1, line_size = 1.6, cor_method = "pearson", cor_label_pos = c("left", 0.97), cor_vjust = NULL )
facet_density_foldchange( data, x_var, y_var, group_var, facet_var, palette, show_points = FALSE, show_density = TRUE, point_size = 2.5, point_alpha = 0.1, line_size = 1.6, cor_method = "pearson", cor_label_pos = c("left", 0.97), cor_vjust = NULL )
data |
Data frame containing variables for plotting. |
x_var |
Name of the x-axis variable as a string. |
y_var |
Name of the y-axis variable as a string. |
group_var |
Name of the grouping variable for color mapping as a string. |
facet_var |
Name of the faceting variable. |
palette |
Color palette for the plot as a character vector. |
show_points |
Logical, if TRUE adds scatter points to the plot. |
show_density |
Logical, if TRUE adds filled density contours to the plot. |
point_size |
Size of the points, relevant if show_points is TRUE. |
point_alpha |
Transparency level of the points, relevant if show_points is TRUE. |
line_size |
Size of the regression line. |
cor_method |
Method to calculate correlation ("pearson" or "spearman"). |
cor_label_pos |
Vector of length 2 indicating the position of the correlation label (x and y). |
cor_vjust |
Vertical justification for correlation label, default is NULL. |
A 'ggplot' object representing the high-density region plot.
combined_df_file <- system.file("extdata", "combined_df.rds", package = "TransProR") combined_df <- readRDS(combined_df_file) pal2 = c("#2787e0","#1a9ae0","#1dabbf","#00897b","#43a047","#7cb342") all_facet_density_foldchange_name1 <- facet_density_foldchange( data = combined_df, x_var = "log2FoldChange_1", y_var = "log2FoldChange_2", group_var = "name", facet_var = "name", palette = pal2, show_points = TRUE, show_density = FALSE, point_size = 2, point_alpha = 0.1, line_size = 1.6, cor_method = "pearson", cor_label_pos = c("left", "top"), cor_vjust = 1 )
combined_df_file <- system.file("extdata", "combined_df.rds", package = "TransProR") combined_df <- readRDS(combined_df_file) pal2 = c("#2787e0","#1a9ae0","#1dabbf","#00897b","#43a047","#7cb342") all_facet_density_foldchange_name1 <- facet_density_foldchange( data = combined_df, x_var = "log2FoldChange_1", y_var = "log2FoldChange_2", group_var = "name", facet_var = "name", palette = pal2, show_points = TRUE, show_density = FALSE, point_size = 2, point_alpha = 0.1, line_size = 1.6, cor_method = "pearson", cor_label_pos = c("left", "top"), cor_vjust = 1 )
This function filters a data frame to identify genes with significant differential expression based on specified thresholds for p-values and log fold change. It allows for flexible input of column names for p-values and log fold change.
filter_diff_genes( data, p_val_col = "adj.P.Val", log_fc_col = "logFC", p_val_threshold = 0.05, log_fc_threshold = 1 )
filter_diff_genes( data, p_val_col = "adj.P.Val", log_fc_col = "logFC", p_val_threshold = 0.05, log_fc_threshold = 1 )
data |
A data frame containing gene expression data. |
p_val_col |
Character string indicating the column name for p-values. Default is "adj.P.Val". |
log_fc_col |
Character string indicating the column name for log fold change. Default is "logFC". |
p_val_threshold |
Numeric threshold for filtering p-values. Default is 0.05. |
log_fc_threshold |
Numeric threshold for filtering log fold change. Default is 1.0. |
A data frame with genes filtered by the specified criteria.
# Create a sample data frame with p-values and log fold changes sample_data <- data.frame( adj.P.Val = c(0.03, 0.06, 0.02, 0.07), logFC = c(1.5, 0.8, -1.2, 1.1), gene = c("Gene1", "Gene2", "Gene3", "Gene4") ) # Use the filter_diff_genes function to filter significant genes filtered_genes <- filter_diff_genes(sample_data) print(filtered_genes)
# Create a sample data frame with p-values and log fold changes sample_data <- data.frame( adj.P.Val = c(0.03, 0.06, 0.02, 0.07), logFC = c(1.5, 0.8, -1.2, 1.1), gene = c("Gene1", "Gene2", "Gene3", "Gene4") ) # Use the filter_diff_genes function to filter significant genes filtered_genes <- filter_diff_genes(sample_data) print(filtered_genes)
This function creates a Venn Diagram using the ggVennDiagram package. It allows customization of various aesthetic elements of the diagram.
four_degs_venn(degs_list)
four_degs_venn(degs_list)
degs_list |
A list of DEG sets for Venn Diagram creation. |
A ggplot object representing the Venn Diagram.
data("all_degs_venn", package = "TransProR") four_degs_venn <- four_degs_venn(all_degs_venn)
data("all_degs_venn", package = "TransProR") four_degs_venn <- four_degs_venn(all_degs_venn)
Gather graph edge from data frame Please note that this function is from the 'ggraph' package and has not been altered in functionality, but it has been optimized and iterated. It is not original content of 'TransProR'. However, since 'ggraph' caused frequent GitHub Action errors during the creation of 'TransProR', the author directly referenced the involved functions in 'TransProR'. This is not the author's original creation. All users please be aware!
gather_graph_edge(df, index = NULL, root = NULL)
gather_graph_edge(df, index = NULL, root = NULL)
df |
A data frame |
index |
A vector of column names to group by |
root |
Root name for the root node connections, optional |
A tibble of graph edges
# Example taxonomic hierarchy data frame OTU <- tibble::tibble( p = c("Firmicutes", "Firmicutes", "Bacteroidetes", "Bacteroidetes", "Proteobacteria"), c = c("Bacilli", "Clostridia", "Bacteroidia", "Bacteroidia", "Gammaproteobacteria"), o = c("Lactobacillales", "Clostridiales", "Bacteroidales", "Bacteroidales", "Enterobacterales"), abundance = c(100, 150, 200, 50, 300) # Abundance or some other metric ) # Gathering graph edges by specifying hierarchical taxonomic levels edges <- gather_graph_edge(OTU, index = c("p", "c", "o")) # Adding a root node to the graph edges_with_root <- gather_graph_edge(OTU, index = c("p", "c", "o"), root = "Root")
# Example taxonomic hierarchy data frame OTU <- tibble::tibble( p = c("Firmicutes", "Firmicutes", "Bacteroidetes", "Bacteroidetes", "Proteobacteria"), c = c("Bacilli", "Clostridia", "Bacteroidia", "Bacteroidia", "Gammaproteobacteria"), o = c("Lactobacillales", "Clostridiales", "Bacteroidales", "Bacteroidales", "Enterobacterales"), abundance = c(100, 150, 200, 50, 300) # Abundance or some other metric ) # Gathering graph edges by specifying hierarchical taxonomic levels edges <- gather_graph_edge(OTU, index = c("p", "c", "o")) # Adding a root node to the graph edges_with_root <- gather_graph_edge(OTU, index = c("p", "c", "o"), root = "Root")
Gather graph nodes from a data frame Please note that this function is from the 'ggraph' package and has not been altered in functionality, but it has been optimized and iterated. It is not original content of 'TransProR'. However, since 'ggraph' caused frequent GitHub Action errors during the creation of 'TransProR', the author directly referenced the involved functions in 'TransProR'. This is not the author's original creation. All users please be aware!
gather_graph_node( df, index = NULL, value = utils::tail(colnames(df), 1), root = NULL )
gather_graph_node( df, index = NULL, value = utils::tail(colnames(df), 1), root = NULL )
df |
A data frame |
index |
A vector of column names to group by |
value |
Column name used for summarizing node size, defaults to the last column |
root |
Root name for the root node connections, optional |
a tibble of graph nodes
# Example taxonomic hierarchy data frame (OTU table) OTU <- tibble::tibble( p = c("Firmicutes", "Firmicutes", "Bacteroidetes", "Bacteroidetes", "Proteobacteria"), c = c("Bacilli", "Clostridia", "Bacteroidia", "Bacteroidia", "Gammaproteobacteria"), o = c("Lactobacillales", "Clostridiales", "Bacteroidales", "Bacteroidales", "Enterobacterales"), abundance = c(100, 150, 200, 50, 300) # Numeric values for node size ) # Gathering graph nodes by specifying hierarchical taxonomic levels nodes <- gather_graph_node(OTU, index = c("p", "c", "o"))
# Example taxonomic hierarchy data frame (OTU table) OTU <- tibble::tibble( p = c("Firmicutes", "Firmicutes", "Bacteroidetes", "Bacteroidetes", "Proteobacteria"), c = c("Bacilli", "Clostridia", "Bacteroidia", "Bacteroidia", "Gammaproteobacteria"), o = c("Lactobacillales", "Clostridiales", "Bacteroidales", "Bacteroidales", "Enterobacterales"), abundance = c(100, 150, 200, 50, 300) # Numeric values for node size ) # Gathering graph nodes by specifying hierarchical taxonomic levels nodes <- gather_graph_node(OTU, index = c("p", "c", "o"))
This function merges selected genes with differential expression data and adds a color column based on up/down regulation.
gene_color(selected_genes, DEG_deseq2, up_color, down_color)
gene_color(selected_genes, DEG_deseq2, up_color, down_color)
selected_genes |
A data frame containing selected genes with a column named "Symble". |
DEG_deseq2 |
A data frame containing differential expression data with a column named "Symble" and a column named "change" indicating up/down regulation. |
up_color |
The color to assign to genes with up-regulated expression. |
down_color |
The color to assign to genes with down-regulated expression. |
A data frame containing merged genes with an additional color column.
selected_genes_deseq2_file <- system.file("extdata", "selected_genes_deseq2.rds", package = "TransProR") selected_genes_deseq2 <- readRDS(selected_genes_deseq2_file) Diff_deseq2_file <- system.file("extdata", "Diff_deseq2.rds", package = "TransProR") Diff_deseq2 <- readRDS(Diff_deseq2_file) result_deseq2 <- gene_color(selected_genes_deseq2, Diff_deseq2, "#0000EE", "#fc4746")
selected_genes_deseq2_file <- system.file("extdata", "selected_genes_deseq2.rds", package = "TransProR") selected_genes_deseq2 <- readRDS(selected_genes_deseq2_file) Diff_deseq2_file <- system.file("extdata", "Diff_deseq2.rds", package = "TransProR") Diff_deseq2 <- readRDS(Diff_deseq2_file) result_deseq2 <- gene_color(selected_genes_deseq2, Diff_deseq2, "#0000EE", "#fc4746")
This function enhances a 'ggtree' plot by adding highlights for specific genes. It adds both a semi-transparent fan-shaped highlight and a point at the node corresponding to each gene. Colors for each gene can be customized.
gene_highlights(ggtree_obj, genes_to_highlight, hilight_extend = 18)
gene_highlights(ggtree_obj, genes_to_highlight, hilight_extend = 18)
ggtree_obj |
A ggtree object to which the highlights will be added. |
genes_to_highlight |
A data frame containing genes and their corresponding colors. |
hilight_extend |
Integer, the extension of the highlight fan in degrees. |
A ggtree object with added gene highlights.
data("gtree", package = "TransProR") # Define genes and their colors genes_df <- data.frame(Symble = c("t5", "t9"), color = c("#FF0000", "#0000FF")) # Add highlights gtree <- gene_highlights(gtree, genes_to_highlight = genes_df)
data("gtree", package = "TransProR") # Define genes and their colors genes_df <- data.frame(Symble = c("t5", "t9"), color = c("#FF0000", "#0000FF")) # Add highlights gtree <- gene_highlights(gtree, genes_to_highlight = genes_df)
This function takes multiple data frames and pathway IDs, merging them into a new data frame. Each data frame represents a type of analysis (e.g., BP, KEGG, MF, etc.).
gene_map_pathway( BP_dataframe, BP_ids, KEGG_dataframe, KEGG_ids, MF_dataframe, MF_ids, REACTOME_dataframe, REACTOME_ids, CC_dataframe, CC_ids, DO_dataframe, DO_ids )
gene_map_pathway( BP_dataframe, BP_ids, KEGG_dataframe, KEGG_ids, MF_dataframe, MF_ids, REACTOME_dataframe, REACTOME_ids, CC_dataframe, CC_ids, DO_dataframe, DO_ids )
BP_dataframe |
Data frame for Biological Process analysis |
BP_ids |
Selected pathway IDs for Biological Process analysis |
KEGG_dataframe |
Data frame for KEGG analysis |
KEGG_ids |
Selected pathway IDs for KEGG analysis |
MF_dataframe |
Data frame for Molecular Function analysis |
MF_ids |
Selected pathway IDs for Molecular Function analysis |
REACTOME_dataframe |
Data frame for REACTOME analysis |
REACTOME_ids |
Selected pathway IDs for REACTOME analysis |
CC_dataframe |
Data frame for Cellular Component analysis |
CC_ids |
Selected pathway IDs for Cellular Component analysis |
DO_dataframe |
Data frame for Disease Ontology analysis |
DO_ids |
Selected pathway IDs for Disease Ontology analysis |
A new data frame that includes pathways, gene, type, and value columns
# Simulating data for different analysis types # Simulate Biological Process (BP) data frame BP_df <- data.frame( ID = c("GO:0002376", "GO:0019724"), geneID = c("GENE1/GENE2", "GENE3/GENE4"), Description = c("Immune response", "Glycosylation process") ) # Simulate KEGG data frame KEGG_df <- data.frame( ID = c("12345", "67890"), geneID = c("GENE5/GENE6", "GENE7/GENE8"), Description = c("Pathway 1", "Pathway 2") ) # Simulate Molecular Function (MF) data frame MF_df <- data.frame( ID = c("ABC123", "DEF456"), geneID = c("GENE9/GENE10", "GENE11/GENE12"), Description = c("Molecular function A", "Molecular function B") ) # Simulate REACTOME data frame REACTOME_df <- data.frame( ID = c("R-HSA-12345", "R-HSA-67890"), geneID = c("GENE13/GENE14", "GENE15/GENE16"), Description = c("Pathway in Reactome 1", "Pathway in Reactome 2") ) # Simulate Cellular Component (CC) data frame CC_df <- data.frame( ID = c("GO:0005575", "GO:0005634"), geneID = c("GENE17/GENE18", "GENE19/GENE20"), Description = c("Cellular component A", "Cellular component B") ) # Simulate Disease Ontology (DO) data frame DO_df <- data.frame( ID = c("DOID:123", "DOID:456"), geneID = c("GENE21/GENE22", "GENE23/GENE24"), Description = c("Disease A", "Disease B") ) # Example pathway IDs for each analysis BP_ids <- c("GO:0002376", "GO:0019724") KEGG_ids <- c("12345", "67890") MF_ids <- c("ABC123", "DEF456") REACTOME_ids <- c("R-HSA-12345", "R-HSA-67890") CC_ids <- c("GO:0005575", "GO:0005634") DO_ids <- c("DOID:123", "DOID:456") # Generate the pathway-gene map using the gene_map_pathway function pathway_gene_map <- gene_map_pathway( BP_dataframe = BP_df, BP_ids = BP_ids, KEGG_dataframe = KEGG_df, KEGG_ids = KEGG_ids, MF_dataframe = MF_df, MF_ids = MF_ids, REACTOME_dataframe = REACTOME_df, REACTOME_ids = REACTOME_ids, CC_dataframe = CC_df, CC_ids = CC_ids, DO_dataframe = DO_df, DO_ids = DO_ids ) # Display the resulting pathway-gene mapping data frame print(pathway_gene_map)
# Simulating data for different analysis types # Simulate Biological Process (BP) data frame BP_df <- data.frame( ID = c("GO:0002376", "GO:0019724"), geneID = c("GENE1/GENE2", "GENE3/GENE4"), Description = c("Immune response", "Glycosylation process") ) # Simulate KEGG data frame KEGG_df <- data.frame( ID = c("12345", "67890"), geneID = c("GENE5/GENE6", "GENE7/GENE8"), Description = c("Pathway 1", "Pathway 2") ) # Simulate Molecular Function (MF) data frame MF_df <- data.frame( ID = c("ABC123", "DEF456"), geneID = c("GENE9/GENE10", "GENE11/GENE12"), Description = c("Molecular function A", "Molecular function B") ) # Simulate REACTOME data frame REACTOME_df <- data.frame( ID = c("R-HSA-12345", "R-HSA-67890"), geneID = c("GENE13/GENE14", "GENE15/GENE16"), Description = c("Pathway in Reactome 1", "Pathway in Reactome 2") ) # Simulate Cellular Component (CC) data frame CC_df <- data.frame( ID = c("GO:0005575", "GO:0005634"), geneID = c("GENE17/GENE18", "GENE19/GENE20"), Description = c("Cellular component A", "Cellular component B") ) # Simulate Disease Ontology (DO) data frame DO_df <- data.frame( ID = c("DOID:123", "DOID:456"), geneID = c("GENE21/GENE22", "GENE23/GENE24"), Description = c("Disease A", "Disease B") ) # Example pathway IDs for each analysis BP_ids <- c("GO:0002376", "GO:0019724") KEGG_ids <- c("12345", "67890") MF_ids <- c("ABC123", "DEF456") REACTOME_ids <- c("R-HSA-12345", "R-HSA-67890") CC_ids <- c("GO:0005575", "GO:0005634") DO_ids <- c("DOID:123", "DOID:456") # Generate the pathway-gene map using the gene_map_pathway function pathway_gene_map <- gene_map_pathway( BP_dataframe = BP_df, BP_ids = BP_ids, KEGG_dataframe = KEGG_df, KEGG_ids = KEGG_ids, MF_dataframe = MF_df, MF_ids = MF_ids, REACTOME_dataframe = REACTOME_df, REACTOME_ids = REACTOME_ids, CC_dataframe = CC_df, CC_ids = CC_ids, DO_dataframe = DO_df, DO_ids = DO_ids ) # Display the resulting pathway-gene mapping data frame print(pathway_gene_map)
This function retrieves gene expression data from the GTEx project that is specific to a certain organ. It performs various checks and processing steps to ensure that the data is consistent and relevant to the specified organ. The filtered and cleaned data is saved as an RDS file for further analysis.
get_gtex_exp( organ_specific, file_path, probe_map_path, pheno_path, output_path )
get_gtex_exp( organ_specific, file_path, probe_map_path, pheno_path, output_path )
organ_specific |
A character string specifying the organ to filter the gene expression data by. |
file_path |
A character string specifying the path to the GTEx gene expression data file. |
probe_map_path |
A character string specifying the path to the gtex_probeMap_gencode data file. |
pheno_path |
A character string specifying the path to the GTEx phenotype data file. |
output_path |
A character string specifying the path where the output RDS file will be saved. |
The function begins by checking if the gene expression and phenotype data files exist at the specified paths. It then loads these data files and processes them by setting appropriate row names, modifying column names for clarity, and filtering samples based on the specified organ. The function ensures that only samples present in both datasets are retained for consistency. It also removes any duplicate gene entries to prevent redundancy. Finally, the processed data is saved as an RDS file.
A data frame containing gene expression data for the specified organ. Rows represent genes, and columns represent samples. Note that this function also saves the organ-specific GTEx data as an RDS file at the specified output path.
The function will stop and throw an error if the input files do not exist, or if no samples are found for the specified organ.
CRITICAL: The 'output_path' parameter must end with '.rds' to be properly recognized by the function. It is also highly recommended that the path includes specific identifiers related to the target samples. Please structure the 'output_path' following this pattern: './your_directory/your_sample_type.gtex.rds'.
counts_file <- system.file("extdata", "gtex_gene_expected_count_test", package = "TransProR") probe_map_file <- system.file("extdata", "gtex_probeMap_gencode.v23.annotation.gene.probemap_test", package = "TransProR") phenotype_file <- system.file("extdata", "GTEX_phenotype_test", package = "TransProR") ouput_file <- file.path(tempdir(), "skcm_gtex.rds") SKCM_gtex <- get_gtex_exp( organ_specific = "Skin", file_path = counts_file, probe_map_path = probe_map_file, pheno_path = phenotype_file, output_path = ouput_file ) head(SKCM_gtex[1:3, 1:3])
counts_file <- system.file("extdata", "gtex_gene_expected_count_test", package = "TransProR") probe_map_file <- system.file("extdata", "gtex_probeMap_gencode.v23.annotation.gene.probemap_test", package = "TransProR") phenotype_file <- system.file("extdata", "GTEX_phenotype_test", package = "TransProR") ouput_file <- file.path(tempdir(), "skcm_gtex.rds") SKCM_gtex <- get_gtex_exp( organ_specific = "Skin", file_path = counts_file, probe_map_path = probe_map_file, pheno_path = phenotype_file, output_path = ouput_file ) head(SKCM_gtex[1:3, 1:3])
This function processes expression data and phenotype information, separates tumor and normal samples, and saves the results into different files. It's specifically designed for data obtained from TCGA.
get_tcga_exp( counts_file_path, gene_probes_file_path, phenotype_file_path, output_file_path )
get_tcga_exp( counts_file_path, gene_probes_file_path, phenotype_file_path, output_file_path )
counts_file_path |
File path to the counts data (usually in the form of a large matrix with gene expression data). |
gene_probes_file_path |
File path containing the gene probes data. |
phenotype_file_path |
File path to the phenotype data, which includes various sample attributes. |
output_file_path |
Path where the output files, distinguished between tumor and normal, will be saved. |
A list containing matrices for tumor and normal expression data.
IMPORTANT: This function assumes that the input files follow a specific format and structure, typically found in TCGA data releases. Users should verify their data's compatibility. Additionally, the function does not perform error checking on the data's content, which users should handle through proper preprocessing.
CRITICAL: The 'output_file_path' parameter must end with '.rds' to be properly recognized by the function. It is also highly recommended that the path includes specific identifiers related to the target samples, as the function will create further subdivisions in the specified path for tumor or normal tissues. Please structure the 'output_file_path' following this pattern: './your_directory/your_sample_type.exp.rds'.
Dongyue Yu
counts_file <- system.file("extdata", "TCGA-SKCM.htseq_counts_test.tsv", package = "TransProR") gene_probes_file <- system.file("extdata", "TCGA_gencode.v22.annotation.gene.probeMap_test", package = "TransProR") phenotype_file <- system.file("extdata", "TCGA-SKCM.GDC_phenotype_test.tsv", package = "TransProR") ouput_file <- file.path(tempdir(), "SKCM_Skin_TCGA_exp_test.rds") SKCM_exp <- get_tcga_exp( counts_file_path = counts_file, gene_probes_file_path = gene_probes_file, phenotype_file_path = phenotype_file, output_file_path = ouput_file ) head(SKCM_exp[["tumor_tcga_data"]])[1:5, 1:5] head(SKCM_exp[["normal_tcga_data"]], n = 10) # Because there is only one column.
counts_file <- system.file("extdata", "TCGA-SKCM.htseq_counts_test.tsv", package = "TransProR") gene_probes_file <- system.file("extdata", "TCGA_gencode.v22.annotation.gene.probeMap_test", package = "TransProR") phenotype_file <- system.file("extdata", "TCGA-SKCM.GDC_phenotype_test.tsv", package = "TransProR") ouput_file <- file.path(tempdir(), "SKCM_Skin_TCGA_exp_test.rds") SKCM_exp <- get_tcga_exp( counts_file_path = counts_file, gene_probes_file_path = gene_probes_file, phenotype_file_path = phenotype_file, output_file_path = ouput_file ) head(SKCM_exp[["tumor_tcga_data"]])[1:5, 1:5] head(SKCM_exp[["normal_tcga_data"]], n = 10) # Because there is only one column.
A dataset containing a phylogenetic tree object created using the 'ggtree' package. This tree represents the evolutionary relationships among a set of species or genes.
data(gtree)
data(gtree)
A 'ggtree' object.
The phylogenetic tree was constructed based on sequence alignment data obtained from [Data Source, e.g., NCBI database, specific study, etc.].
This function adds highlights to specific nodes in a phylogenetic tree represented by a 'ggtree' object. Users can specify the nodes to highlight along with custom fill colors, transparency, and extension options.
highlight_by_node( ggtree_object, nodes, fill_colors, alpha_values, extend_values )
highlight_by_node( ggtree_object, nodes, fill_colors, alpha_values, extend_values )
ggtree_object |
A 'ggtree' object representing the phylogenetic tree. |
nodes |
A character vector specifying the nodes to highlight. |
fill_colors |
A character vector specifying the fill colors for the highlighted nodes. |
alpha_values |
A numeric vector specifying the transparency values for the highlighted nodes (between 0 and 1). |
extend_values |
A logical vector specifying whether to extend the highlight to the whole clade below each node. |
A modified 'ggtree' object with the specified nodes highlighted.
plot_file <- system.file("extdata", "tree_plot.rds", package = "TransProR") p2_plot <- readRDS(plot_file) # Please replace the following vectors with your specific values nodes <- c(117, 129, 125, 127, 119, 123, 139, 166, 124, 131, 217) # x-values of the nodes you want to highlight fill_colors <- c("#CD6600", "#CD6600", "#CD6600", "#CD6600", "#009933", "#009933", "#009933", "#009933", "#9B30FF", "#9B30FF", "#9B30FF") # Fill colors alpha_values <- c(0.3, 0.3, 0.3, 0.3, 0.2, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3) # Transparency values extend_values <- c(25, 24, 24, 25, 25, 25, 24, 24, 25, 24, 24) # Values for the 'extend' parameter p2 <- highlight_by_node( p2_plot, nodes, fill_colors, alpha_values, extend_values )
plot_file <- system.file("extdata", "tree_plot.rds", package = "TransProR") p2_plot <- readRDS(plot_file) # Please replace the following vectors with your specific values nodes <- c(117, 129, 125, 127, 119, 123, 139, 166, 124, 131, 217) # x-values of the nodes you want to highlight fill_colors <- c("#CD6600", "#CD6600", "#CD6600", "#CD6600", "#009933", "#009933", "#009933", "#009933", "#9B30FF", "#9B30FF", "#9B30FF") # Fill colors alpha_values <- c(0.3, 0.3, 0.3, 0.3, 0.2, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3) # Transparency values extend_values <- c(25, 24, 24, 25, 25, 25, 24, 24, 25, 24, 24) # Values for the 'extend' parameter p2 <- highlight_by_node( p2_plot, nodes, fill_colors, alpha_values, extend_values )
This function adds highlights for specified genes on a phylogenetic tree object.
highlight_genes(ggtree_obj, genes_to_highlight, hilight_extend = 18)
highlight_genes(ggtree_obj, genes_to_highlight, hilight_extend = 18)
ggtree_obj |
A ggtree object representing the phylogenetic tree. |
genes_to_highlight |
A data frame containing gene names and corresponding colors to highlight. |
hilight_extend |
Numeric value indicating the extension length for highlights. |
A 'ggtree' object with added highlights for specified genes.
plot_file <- system.file("extdata", "tree_plot.rds", package = "TransProR") p2_plot <- readRDS(plot_file) selected_genes_deseq2_file <- system.file("extdata", "selected_genes_deseq2.rds", package = "TransProR") selected_genes_deseq2 <- readRDS(selected_genes_deseq2_file) Diff_deseq2_file <- system.file("extdata", "Diff_deseq2.rds", package = "TransProR") Diff_deseq2 <- readRDS(Diff_deseq2_file) result_deseq2 <- gene_color(selected_genes_deseq2, Diff_deseq2, "#0000EE", "#fc4746") add_gene_highlights_p3 <- highlight_genes(p2_plot, result_deseq2, hilight_extend = 26)
plot_file <- system.file("extdata", "tree_plot.rds", package = "TransProR") p2_plot <- readRDS(plot_file) selected_genes_deseq2_file <- system.file("extdata", "selected_genes_deseq2.rds", package = "TransProR") selected_genes_deseq2 <- readRDS(selected_genes_deseq2_file) Diff_deseq2_file <- system.file("extdata", "Diff_deseq2.rds", package = "TransProR") Diff_deseq2 <- readRDS(Diff_deseq2_file) result_deseq2 <- gene_color(selected_genes_deseq2, Diff_deseq2, "#0000EE", "#fc4746") add_gene_highlights_p3 <- highlight_genes(p2_plot, result_deseq2, hilight_extend = 26)
This function performs differential gene expression analysis using the 'limma' package with voom normalization. It reads tumor and normal expression data, merges them, filters low-expressed genes, normalizes the data, performs limma analysis, and outputs the results along with information on gene expression changes.
limma_analyze( tumor_file, normal_file, output_file, logFC_threshold = 2.5, p_value_threshold = 0.01 )
limma_analyze( tumor_file, normal_file, output_file, logFC_threshold = 2.5, p_value_threshold = 0.01 )
tumor_file |
Path to the tumor data file (RDS format). |
normal_file |
Path to the normal data file (RDS format). |
output_file |
Path to save the output DEG data (RDS format). |
logFC_threshold |
Threshold for log fold change for marking up/down-regulated genes. |
p_value_threshold |
Threshold for p-value for filtering significant genes. |
A data frame of differential expression results.
limma:Linear Models for Microarray and RNA-Seq Data User’s Guide. For more information, visit the page: https://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf
# Define file paths for tumor and normal data from the data folder tumor_file <- system.file("extdata", "removebatch_SKCM_Skin_TCGA_exp_tumor_test.rds", package = "TransProR") normal_file <- system.file("extdata", "removebatch_SKCM_Skin_Normal_TCGA_GTEX_count_test.rds", package = "TransProR") output_file <- file.path(tempdir(), "DEG_limma_voom.rds") DEG_limma_voom <- limma_analyze( tumor_file = tumor_file, normal_file = normal_file, output_file = output_file, logFC_threshold = 2.5, p_value_threshold = 0.01 ) # View the top 5 rows of the result head(DEG_limma_voom, 5)
# Define file paths for tumor and normal data from the data folder tumor_file <- system.file("extdata", "removebatch_SKCM_Skin_TCGA_exp_tumor_test.rds", package = "TransProR") normal_file <- system.file("extdata", "removebatch_SKCM_Skin_Normal_TCGA_GTEX_count_test.rds", package = "TransProR") output_file <- file.path(tempdir(), "DEG_limma_voom.rds") DEG_limma_voom <- limma_analyze( tumor_file = tumor_file, normal_file = normal_file, output_file = output_file, logFC_threshold = 2.5, p_value_threshold = 0.01 ) # View the top 5 rows of the result head(DEG_limma_voom, 5)
This function evaluates the need for a log transformation based on a set of criteria and applies a log2 transformation if necessary.
log_transform(data)
log_transform(data)
data |
A numeric matrix or data frame. |
The original data or the data transformed with log2.
Dongyue Yu
file_path <- system.file("extdata", "all_count_exp_test.csv", package = "TransProR") your_data <- read.csv(file_path, row.names = 1) # Assuming first column is row names (e.g., gene names) TransformedData <- log_transform(data = your_data)
file_path <- system.file("extdata", "all_count_exp_test.csv", package = "TransProR") your_data <- read.csv(file_path, row.names = 1) # Assuming first column is row names (e.g., gene names) TransformedData <- log_transform(data = your_data)
This function creates a high-density region plot using hdr methods to add density rug and filled contours. It also adds a regression line and Pearson correlation label. Points can be added to the plot optionally.
merge_density_foldchange( data, x_var, y_var, group_var, palette = c("#3949ab", "#1e88e5", "#039be5", "#00897b", "#43a047", "#7cb342"), show_points = FALSE, point_size = 2.5, point_alpha = 0.2, x_lim = c(0, 20), y_lim = c(0, 20), cor_method = "pearson", line_size = 1.6, cor_label_pos = c("left", 0.97) )
merge_density_foldchange( data, x_var, y_var, group_var, palette = c("#3949ab", "#1e88e5", "#039be5", "#00897b", "#43a047", "#7cb342"), show_points = FALSE, point_size = 2.5, point_alpha = 0.2, x_lim = c(0, 20), y_lim = c(0, 20), cor_method = "pearson", line_size = 1.6, cor_label_pos = c("left", 0.97) )
data |
Data frame containing variables for plotting. |
x_var |
Name of the x-axis variable as a string. |
y_var |
Name of the y-axis variable as a string. |
group_var |
Name of the grouping variable for color mapping as a string. |
palette |
Color palette for the plot as a character vector. |
show_points |
Logical, if TRUE adds points to the plot. |
point_size |
Size of the points, relevant if show_points is TRUE. |
point_alpha |
Transparency level of the points, relevant if show_points is TRUE. |
x_lim |
Numeric vector of length 2, giving the x-axis limits. |
y_lim |
Numeric vector of length 2, giving the y-axis limits. |
cor_method |
Method to calculate correlation ("pearson" or "spearman"). |
line_size |
Size of the smoothing line. |
cor_label_pos |
Vector of length 2 indicating the position of the correlation label (x and y). |
A ggplot object representing the high-density region plot.
combined_df_file <- system.file("extdata", "combined_df.rds", package = "TransProR") combined_df <- readRDS(combined_df_file) pal1 = c("#3949ab","#1e88e5","#039be5","#00897b","#43a047","#7cb342") all_density_foldchange_name1 <- merge_density_foldchange( data = combined_df, x_var = "log2FoldChange_1", y_var = "log2FoldChange_2", group_var = "name", palette = pal1, show_points = FALSE, point_size = 2.5, point_alpha = 0.1, x_lim = c(0, 20), y_lim = c(0, 20), cor_method = "pearson", line_size = 1.6, cor_label_pos = c("left", "top") )
combined_df_file <- system.file("extdata", "combined_df.rds", package = "TransProR") combined_df <- readRDS(combined_df_file) pal1 = c("#3949ab","#1e88e5","#039be5","#00897b","#43a047","#7cb342") all_density_foldchange_name1 <- merge_density_foldchange( data = combined_df, x_var = "log2FoldChange_1", y_var = "log2FoldChange_2", group_var = "name", palette = pal1, show_points = FALSE, point_size = 2.5, point_alpha = 0.1, x_lim = c(0, 20), y_lim = c(0, 20), cor_method = "pearson", line_size = 1.6, cor_label_pos = c("left", "top") )
This function merges gene expression data obtained from the GTEx (Genotype-Tissue Expression) and TCGA (The Cancer Genome Atlas) datasets. It is assumed that both datasets are in '.rds' format and have genes as row names. The merged dataset is saved as an RDS file at the specified output path.
merge_gtex_tcga( gtex_data_path, tcga_exp_path, output_path = "./merged_gtex_tcga_data.rds" )
merge_gtex_tcga( gtex_data_path, tcga_exp_path, output_path = "./merged_gtex_tcga_data.rds" )
gtex_data_path |
A string that specifies the file path to the GTEx data saved in RDS format. |
tcga_exp_path |
A string that specifies the file path to the TCGA expression data saved in RDS format. This should be a data.frame with rows as genes and columns as samples. |
output_path |
A string that specifies the path where the merged dataset should be saved. The file is saved in '.rds' format. The default path is "./merged_gtex_tcga_data.rds". |
It is assumed that both datasets are in '.rds' format and have genes as row names.
A data frame where rows represent genes and columns represent samples. The data frame contains expression values from both GTEx and TCGA datasets. It saves the merged dataset to the path specified by 'output_path'.
CRITICAL: The 'output_path' parameter must end with '.rds' to be properly recognized by the function. It is also highly recommended that the path includes specific identifiers related to the target samples. Please structure the 'output_path' following this pattern: './your_directory/merged.your_sample_type.gtex.tcga.data.rds'.
tumor_file <- system.file("extdata", "removebatch_SKCM_Skin_TCGA_exp_tumor_test.rds", package = "TransProR") Normal_file <- system.file("extdata", "removebatch_SKCM_Skin_Normal_TCGA_GTEX_count_test.rds", package = "TransProR") ouput_file <- file.path(tempdir(), "all_data.rds") all_data <- merge_gtex_tcga(gtex_data_path = tumor_file, tcga_exp_path = Normal_file, output_path = ouput_file)
tumor_file <- system.file("extdata", "removebatch_SKCM_Skin_TCGA_exp_tumor_test.rds", package = "TransProR") Normal_file <- system.file("extdata", "removebatch_SKCM_Skin_Normal_TCGA_GTEX_count_test.rds", package = "TransProR") ouput_file <- file.path(tempdir(), "all_data.rds") all_data <- merge_gtex_tcga(gtex_data_path = tumor_file, tcga_exp_path = Normal_file, output_path = ouput_file)
This function merges a list of data frames based on common row names. It adds an 'id' column to track the row order and a 'point_position' column calculated based on the maximum 'Count' value across all data frames. It filters data frames to include only common rows, sorts rows by the length of the 'Description' in descending order, and then merges them by rows.
merge_id_position(df_list)
merge_id_position(df_list)
df_list |
A list of data frames, each with a 'Description' and 'Count' column and set row names. |
A single data frame merged from the list, with additional 'id' and 'point_position' columns.
df1 <- data.frame(Description = c("DataA", "DataB"), Count = c(10, 20), row.names = c("R1", "R2")) df2 <- data.frame(Description = c("DataC", "DataD"), Count = c(30, 40), row.names = c("R1", "R3")) df_list <- list(df1, df2) combined_df_test <- merge_id_position(df_list)
df1 <- data.frame(Description = c("DataA", "DataB"), Count = c(10, 20), row.names = c("R1", "R2")) df2 <- data.frame(Description = c("DataC", "DataD"), Count = c(30, 40), row.names = c("R1", "R3")) df_list <- list(df1, df2) combined_df_test <- merge_id_position(df_list)
This function takes a list of data frames, a method name, and a list of colors. It adds a 'method' column and a 'test_color' column to each data frame, then merges all data frames by rows. It ensures that the color list length matches the list of data frames.
merge_method_color(df_list, method_name, color_list)
merge_method_color(df_list, method_name, color_list)
df_list |
A list of data frames, each containing at least 'Description' and 'Count' columns. |
method_name |
A string representing the method name to be added to each data frame. |
color_list |
A list of colors corresponding to each data frame for the 'test_color' column. |
A single data frame merged from the list, with each originally provided data frame now having a 'method' and a 'test_color' column.
df1 <- data.frame(Description = c("A", "B"), Count = c(10, 20)) df2 <- data.frame(Description = c("C", "D"), Count = c(30, 40)) df_list <- list(df1, df2) method_name <- "Method1" color_list <- c("Red", "Blue") combined_df_test <- merge_method_color(df_list, method_name, color_list)
df1 <- data.frame(Description = c("A", "B"), Count = c(10, 20)) df2 <- data.frame(Description = c("C", "D"), Count = c(30, 40)) df_list <- list(df1, df2) method_name <- "Method1" color_list <- c("Red", "Blue") combined_df_test <- merge_method_color(df_list, method_name, color_list)
This function merges multiple gene-pathway related dataframes, processes them for graph creation, and visualizes the relationships in a dendrogram layout using the provided node and edge gathering functions from the 'ggraph' package.
new_ggraph( BP_dataframe, BP_ids, KEGG_dataframe, KEGG_ids, MF_dataframe, MF_ids, REACTOME_dataframe, REACTOME_ids, CC_dataframe, CC_ids, DO_dataframe, DO_ids )
new_ggraph( BP_dataframe, BP_ids, KEGG_dataframe, KEGG_ids, MF_dataframe, MF_ids, REACTOME_dataframe, REACTOME_ids, CC_dataframe, CC_ids, DO_dataframe, DO_ids )
BP_dataframe |
Dataframe for Biological Process. |
BP_ids |
IDs for Biological Process. |
KEGG_dataframe |
Dataframe for KEGG pathways. |
KEGG_ids |
IDs for KEGG pathways. |
MF_dataframe |
Dataframe for Molecular Function. |
MF_ids |
IDs for Molecular Function. |
REACTOME_dataframe |
Dataframe for REACTOME pathways. |
REACTOME_ids |
IDs for REACTOME pathways. |
CC_dataframe |
Dataframe for Cellular Component. |
CC_ids |
IDs for Cellular Component. |
DO_dataframe |
Dataframe for Disease Ontology. |
DO_ids |
IDs for Disease Ontology. |
A 'ggraph' object representing the pathway gene map visualization.
# Example Biological Process (BP) DataFrame BP_dataframe <- data.frame( ID = c("BP1", "BP2"), Description = c("Biological Process 1", "Biological Process 2"), geneID = c("GeneA/GeneB/GeneC", "GeneD/GeneE"), stringsAsFactors = FALSE ) # Example KEGG DataFrame KEGG_dataframe <- data.frame( ID = c("KEGG1", "KEGG2"), Description = c("KEGG Pathway 1", "KEGG Pathway 2"), geneID = c("GeneA/GeneD", "GeneB/GeneF"), stringsAsFactors = FALSE ) # Example Molecular Function (MF) DataFrame MF_dataframe <- data.frame( ID = c("MF1", "MF2"), Description = c("Molecular Function 1", "Molecular Function 2"), geneID = c("GeneC/GeneE", "GeneF/GeneG"), stringsAsFactors = FALSE ) # Example Reactome DataFrame REACTOME_dataframe <- data.frame( ID = c("REACTOME1", "REACTOME2"), Description = c("Reactome Pathway 1", "Reactome Pathway 2"), geneID = c("GeneA/GeneF", "GeneB/GeneG"), stringsAsFactors = FALSE ) # Example Cellular Component (CC) DataFrame CC_dataframe <- data.frame( ID = c("CC1", "CC2"), Description = c("Cellular Component 1", "Cellular Component 2"), geneID = c("GeneC/GeneD", "GeneE/GeneH"), stringsAsFactors = FALSE ) # Example Disease Ontology (DO) DataFrame DO_dataframe <- data.frame( ID = c("DO1", "DO2"), Description = c("Disease Ontology 1", "Disease Ontology 2"), geneID = c("GeneF/GeneH", "GeneA/GeneI"), stringsAsFactors = FALSE ) # Example IDs vectors for each category BP_ids <- c("BP1", "BP2") KEGG_ids <- c("KEGG1", "KEGG2") MF_ids <- c("MF1", "MF2") REACTOME_ids <- c("REACTOME1", "REACTOME2") CC_ids <- c("CC1", "CC2") DO_ids <- c("DO1", "DO2") # Running the `new_ggraph` function to plot a graph plot <- new_ggraph( BP_dataframe = BP_dataframe, BP_ids = BP_ids, KEGG_dataframe = KEGG_dataframe, KEGG_ids = KEGG_ids, MF_dataframe = MF_dataframe, MF_ids = MF_ids, REACTOME_dataframe = REACTOME_dataframe, REACTOME_ids = REACTOME_ids, CC_dataframe = CC_dataframe, CC_ids = CC_ids, DO_dataframe = DO_dataframe, DO_ids = DO_ids )
# Example Biological Process (BP) DataFrame BP_dataframe <- data.frame( ID = c("BP1", "BP2"), Description = c("Biological Process 1", "Biological Process 2"), geneID = c("GeneA/GeneB/GeneC", "GeneD/GeneE"), stringsAsFactors = FALSE ) # Example KEGG DataFrame KEGG_dataframe <- data.frame( ID = c("KEGG1", "KEGG2"), Description = c("KEGG Pathway 1", "KEGG Pathway 2"), geneID = c("GeneA/GeneD", "GeneB/GeneF"), stringsAsFactors = FALSE ) # Example Molecular Function (MF) DataFrame MF_dataframe <- data.frame( ID = c("MF1", "MF2"), Description = c("Molecular Function 1", "Molecular Function 2"), geneID = c("GeneC/GeneE", "GeneF/GeneG"), stringsAsFactors = FALSE ) # Example Reactome DataFrame REACTOME_dataframe <- data.frame( ID = c("REACTOME1", "REACTOME2"), Description = c("Reactome Pathway 1", "Reactome Pathway 2"), geneID = c("GeneA/GeneF", "GeneB/GeneG"), stringsAsFactors = FALSE ) # Example Cellular Component (CC) DataFrame CC_dataframe <- data.frame( ID = c("CC1", "CC2"), Description = c("Cellular Component 1", "Cellular Component 2"), geneID = c("GeneC/GeneD", "GeneE/GeneH"), stringsAsFactors = FALSE ) # Example Disease Ontology (DO) DataFrame DO_dataframe <- data.frame( ID = c("DO1", "DO2"), Description = c("Disease Ontology 1", "Disease Ontology 2"), geneID = c("GeneF/GeneH", "GeneA/GeneI"), stringsAsFactors = FALSE ) # Example IDs vectors for each category BP_ids <- c("BP1", "BP2") KEGG_ids <- c("KEGG1", "KEGG2") MF_ids <- c("MF1", "MF2") REACTOME_ids <- c("REACTOME1", "REACTOME2") CC_ids <- c("CC1", "CC2") DO_ids <- c("DO1", "DO2") # Running the `new_ggraph` function to plot a graph plot <- new_ggraph( BP_dataframe = BP_dataframe, BP_ids = BP_ids, KEGG_dataframe = KEGG_dataframe, KEGG_ids = KEGG_ids, MF_dataframe = MF_dataframe, MF_ids = MF_ids, REACTOME_dataframe = REACTOME_dataframe, REACTOME_ids = REACTOME_ids, CC_dataframe = CC_dataframe, CC_ids = CC_ids, DO_dataframe = DO_dataframe, DO_ids = DO_ids )
This function filters pathways that meet a count threshold and then counts the presence of specified genes in those pathways.
pathway_count(GO, count_threshold, enrich_data)
pathway_count(GO, count_threshold, enrich_data)
GO |
A character vector of gene symbols. |
count_threshold |
An integer specifying the count threshold for selecting pathways. |
enrich_data |
A data frame containing pathway enrichment analysis results. |
A data frame with columns "Symble" (gene symbol), "Description" (pathway description), and "Exists" (1 if gene is present, 0 otherwise).
# Simulated gene list GO <- c("Gene1", "Gene2", "Gene3", "Gene4", "Gene5") # Simulated enrichment analysis data enrich_data <- data.frame( ID = c("GO:0001", "GO:0002", "GO:0003"), Description = c("Pathway A", "Pathway B", "Pathway C"), Count = c(10, 4, 6), geneID = c("Gene1/Gene2/Gene3", "Gene4/Gene5", "Gene2/Gene6/Gene7") ) # Example usage count_threshold <- 5 result_df <- pathway_count(GO, count_threshold, enrich_data)
# Simulated gene list GO <- c("Gene1", "Gene2", "Gene3", "Gene4", "Gene5") # Simulated enrichment analysis data enrich_data <- data.frame( ID = c("GO:0001", "GO:0002", "GO:0003"), Description = c("Pathway A", "Pathway B", "Pathway C"), Count = c(10, 4, 6), geneID = c("Gene1/Gene2/Gene3", "Gene4/Gene5", "Gene2/Gene6/Gene7") ) # Example usage count_threshold <- 5 result_df <- pathway_count(GO, count_threshold, enrich_data)
This function identifies genes present in selected pathways based on gene enrichment analysis results.
pathway_description(GO, selected_pathways_names, enrich_data)
pathway_description(GO, selected_pathways_names, enrich_data)
GO |
A character vector of gene symbols. |
selected_pathways_names |
A character vector specifying the names of selected pathways. |
enrich_data |
A data frame containing pathway enrichment analysis results. |
A data frame with columns "Symble" (gene symbol), "Description" (pathway description), and "Exists" (1 if gene is present, 0 otherwise).
GO <- c("Gene1", "Gene2", "Gene3", "Gene4", "Gene5") # Simulated enrichment analysis data enrich_data <- data.frame( ID = c("Pathway1", "Pathway2", "Pathway3", "Pathway4"), Description = c("Apoptosis", "Cell Cycle", "Signal Transduction", "Metabolism"), geneID = c("Gene1/Gene3", "Gene2/Gene4", "Gene1/Gene2/Gene3", "Gene5"), Count = c(2, 2, 3, 1), stringsAsFactors = FALSE ) # Example usage result <- pathway_description(GO, selected_pathways_names="Apoptosis", enrich_data)
GO <- c("Gene1", "Gene2", "Gene3", "Gene4", "Gene5") # Simulated enrichment analysis data enrich_data <- data.frame( ID = c("Pathway1", "Pathway2", "Pathway3", "Pathway4"), Description = c("Apoptosis", "Cell Cycle", "Signal Transduction", "Metabolism"), geneID = c("Gene1/Gene3", "Gene2/Gene4", "Gene1/Gene2/Gene3", "Gene5"), Count = c(2, 2, 3, 1), stringsAsFactors = FALSE ) # Example usage result <- pathway_description(GO, selected_pathways_names="Apoptosis", enrich_data)
This function reads a DESeq2 DEG data frame from an RDS file, filters it, adjusts the log2FoldChange to absolute values, adds a pseudo-count to pvalues, and transforms pvalues for plotting. The final data frame is returned and optionally saved to a new RDS file.
prep_deseq2(input_path, output_name = NULL)
prep_deseq2(input_path, output_name = NULL)
input_path |
Path to the RDS file containing the DESeq2 DEG data frame. |
output_name |
Name for the processed data frame, also used as the RDS file name. |
A data frame with processed DESeq2 DEG data.
deseq2_file <- system.file("extdata", "DEG_deseq2_test.rds", package = "TransProR") deseq2 <- prep_deseq2(deseq2_file)
deseq2_file <- system.file("extdata", "DEG_deseq2_test.rds", package = "TransProR") deseq2 <- prep_deseq2(deseq2_file)
This function reads an edgeR DEG data frame from an RDS file, filters it using
deg_filter
function, adjusts the logFC to absolute values, adds a pseudo-count to PValue,
and transforms PValue for plotting. The final data frame is returned and
optionally saved to a new RDS file.
prep_edgeR(input_path, output_name = NULL)
prep_edgeR(input_path, output_name = NULL)
input_path |
Path to the RDS file containing the edgeR DEG data frame. |
output_name |
Name for the processed data frame, also used as the RDS file name. |
A data frame with processed edgeR DEG data.
edgeR_file <- system.file("extdata", "DEG_edgeR_test.rds", package = "TransProR") edgeR <- prep_edgeR(edgeR_file)
edgeR_file <- system.file("extdata", "DEG_edgeR_test.rds", package = "TransProR") edgeR <- prep_edgeR(edgeR_file)
This function reads a limma-voom DEG data frame from an RDS file, filters it using
deg_filter
function, adjusts the logFC to absolute values, adds a pseudo-count to P.Value,
and transforms P.Value for plotting. The final data frame is returned and
optionally saved to a new RDS file.
prep_limma(input_path, output_name = NULL)
prep_limma(input_path, output_name = NULL)
input_path |
Path to the RDS file containing the limma-voom DEG data frame. |
output_name |
Name for the processed data frame, also used as the RDS file name. |
A data frame with processed limma-voom DEG data.
limma_file <- system.file("extdata", "DEG_limma_voom_test.rds", package = "TransProR") limma <- prep_limma(limma_file)
limma_file <- system.file("extdata", "DEG_limma_voom_test.rds", package = "TransProR") limma <- prep_limma(limma_file)
This function reads a Wilcoxon DEG data frame from an RDS file, filters it using
deg_filter
function, adjusts the log2foldChange to absolute values, adds a pseudo-count to pValues,
and transforms pValues for plotting. The final data frame is returned and
optionally saved to a new RDS file.
prep_wilcoxon(input_path, output_name = NULL)
prep_wilcoxon(input_path, output_name = NULL)
input_path |
Path to the RDS file containing the Wilcoxon DEG data frame. |
output_name |
Optional; name for the processed data frame, also used as the RDS file name. If not provided, the data frame will not be saved to file. |
A data frame with processed Wilcoxon DEG data.
wilcoxon_file <- system.file("extdata", "Wilcoxon_rank_sum_testoutRst_test.rds", package = "TransProR") Wilcoxon <- prep_wilcoxon(wilcoxon_file)
wilcoxon_file <- system.file("extdata", "Wilcoxon_rank_sum_testoutRst_test.rds", package = "TransProR") Wilcoxon <- prep_wilcoxon(wilcoxon_file)
This function processes heatmap data ('heatdata') based on a given selection option. It allows customization of column names, selection of specific columns per group, or averaging columns based on a common prefix.
process_heatdata( heatdata, selection = 1, custom_names = NULL, num_names_per_group = NULL, prefix_length = 4 )
process_heatdata( heatdata, selection = 1, custom_names = NULL, num_names_per_group = NULL, prefix_length = 4 )
heatdata |
A data frame containing the heatmap data. |
selection |
An integer specifying the processing method: - 1: Use custom names for columns. - 2: Select a given number of columns per group based on a prefix. - 3: Calculate the average of columns per group based on a prefix. |
custom_names |
A character vector of custom names for columns (used when 'selection = 1'). The length of this vector must match the number of columns in 'heatdata'. |
num_names_per_group |
An integer specifying the number of columns to select per group (used when 'selection = 2'). |
prefix_length |
An integer specifying the length of the prefix for grouping columns (used when 'selection = 2' or 'selection = 3'). Default is 4. |
A processed data frame based on the specified selection option.
# Example heatmap data frame heatdata <- data.frame( groupA_1 = c(1, 2, 3), groupA_2 = c(4, 5, 6), groupB_1 = c(7, 8, 9), groupB_2 = c(10, 11, 12) ) # Selection 1: Use custom names for columns custom_names <- c("Sample1", "Sample2", "Sample3", "Sample4") processed_data1 <- process_heatdata(heatdata, selection = 1, custom_names = custom_names) # Selection 2: Select a given number of columns per group based on a prefix processed_data2 <- process_heatdata(heatdata, selection = 2, num_names_per_group = 1) # Selection 3: Calculate the average of columns per group based on a prefix processed_data3 <- process_heatdata(heatdata, selection = 3, prefix_length = 6)
# Example heatmap data frame heatdata <- data.frame( groupA_1 = c(1, 2, 3), groupA_2 = c(4, 5, 6), groupB_1 = c(7, 8, 9), groupB_2 = c(10, 11, 12) ) # Selection 1: Use custom names for columns custom_names <- c("Sample1", "Sample2", "Sample3", "Sample4") processed_data1 <- process_heatdata(heatdata, selection = 1, custom_names = custom_names) # Selection 2: Select a given number of columns per group based on a prefix processed_data2 <- process_heatdata(heatdata, selection = 2, num_names_per_group = 1) # Selection 3: Calculate the average of columns per group based on a prefix processed_data3 <- process_heatdata(heatdata, selection = 3, prefix_length = 6)
This function reads the GTEX phenotype data from a specified path, renames its columns for better readability, and then returns a table of primary site counts.
seek_gtex_organ(path = "./download_data/GTEX_phenotype")
seek_gtex_organ(path = "./download_data/GTEX_phenotype")
path |
The path to the GTEX phenotype data file. Default is "./download_data/GTEX_phenotype". |
A table representing the count of samples per primary site.
# Get the file path to the example data in the package path <- system.file("extdata", "GTEX_phenotype_test", package = "TransProR") # Call the `seek_gtex_organ` function with the path and print the result SeekGtexOrgan <- seek_gtex_organ(path = path)
# Get the file path to the example data in the package path <- system.file("extdata", "GTEX_phenotype_test", package = "TransProR") # Call the `seek_gtex_organ` function with the path and print the result SeekGtexOrgan <- seek_gtex_organ(path = path)
This function randomly selects a specified number of pathways from a given list, ensuring that each selected pathway name does not exceed a specified number of words. It filters out pathways with names longer than the specified word limit before making the selection.
selectPathways(pathways, max_words = 10, num_select = 10)
selectPathways(pathways, max_words = 10, num_select = 10)
pathways |
Character vector of pathways. |
max_words |
Integer, maximum number of words allowed in the pathway name. |
num_select |
Integer, number of pathways to randomly select. |
A character vector of selected pathways.
pathway_list <- c("pathway_one response to stimulus", "pathway_two cell growth and death", "pathway_three regulation of cellular process", "pathway_four metabolic process") selected_pathways <- selectPathways(pathway_list, max_words = 5, num_select = 2)
pathway_list <- c("pathway_one response to stimulus", "pathway_two cell growth and death", "pathway_three regulation of cellular process", "pathway_four metabolic process") selected_pathways <- selectPathways(pathway_list, max_words = 5, num_select = 2)
This function creates a spiral plot for visualizing sequential data in a compact and visually appealing way. It uses run-length encoding to represent the lengths and colors of sequences in the spiral.
spiral_newrle(x, samples, values, colors, labels = FALSE)
spiral_newrle(x, samples, values, colors, labels = FALSE)
x |
A vector representing categories or segments. |
samples |
A vector indicating the sample each segment belongs to. |
values |
Numeric vector indicating the lengths of each segment. |
colors |
Character vector specifying the colors for each segment. |
labels |
Logical, whether to add labels to each segment. |
No return value, called for side effects. This function generates a spiral plot and optionally adds labels.
# Example: Creating a spiral plot using the spiral_newrle function # Define example data x <- c("A", "A", "B", "C") samples <- c("Sample1", "Sample1", "Sample2", "Sample2") values <- c(20, 30, 15, 35) colors <- c("red", "blue", "green", "purple") labels <- TRUE # Initialize the spiral plot, setting the x-axis range and scaling spiralize::spiral_initialize(xlim = c(0, sum(values)), scale_by = "curve_length", vp_param = list(x = grid::unit(0, "npc"), just = "left")) # Create a track for the spiral plot spiralize::spiral_track(height = 0.5) # Add segments to the spiral plot using run-length encoding spiral_newrle(x, samples, values, colors, labels)
# Example: Creating a spiral plot using the spiral_newrle function # Define example data x <- c("A", "A", "B", "C") samples <- c("Sample1", "Sample1", "Sample2", "Sample2") values <- c(20, 30, 15, 35) colors <- c("red", "blue", "green", "purple") labels <- TRUE # Initialize the spiral plot, setting the x-axis range and scaling spiralize::spiral_initialize(xlim = c(0, sum(values)), scale_by = "curve_length", vp_param = list(x = grid::unit(0, "npc"), just = "left")) # Create a track for the spiral plot spiralize::spiral_track(height = 0.5) # Add segments to the spiral plot using run-length encoding spiral_newrle(x, samples, values, colors, labels)
This function performs differential gene expression analysis using Wilcoxon rank-sum tests. It reads tumor and normal expression data, performs TMM normalization using 'edgeR', and uses Wilcoxon rank-sum tests to identify differentially expressed genes.
Wilcoxon_analyze( tumor_file, normal_file, output_file, logFC_threshold = 2.5, fdr_threshold = 0.05 )
Wilcoxon_analyze( tumor_file, normal_file, output_file, logFC_threshold = 2.5, fdr_threshold = 0.05 )
tumor_file |
Path to the tumor data file (RDS format). |
normal_file |
Path to the normal data file (RDS format). |
output_file |
Path to save the output DEG data (RDS format). |
logFC_threshold |
Threshold for log fold change for marking up/down-regulated genes. |
fdr_threshold |
Threshold for FDR for filtering significant genes. |
A data frame of differential expression results.
Li, Y., Ge, X., Peng, F., Li, W., & Li, J. J. (2022). Exaggerated False Positives by Popular Differential Expression Methods When Analyzing Human Population Samples. Genome Biology, 23(1), 79. DOI: https://doi.org/10.1186/s13059-022-02648-4.
# Define file paths for tumor and normal data from the data folder tumor_file <- system.file("extdata", "removebatch_SKCM_Skin_TCGA_exp_tumor_test.rds", package = "TransProR") normal_file <- system.file("extdata", "removebatch_SKCM_Skin_Normal_TCGA_GTEX_count_test.rds", package = "TransProR") output_file <- file.path(tempdir(), "Wilcoxon_rank_sum_testoutRst.rds") # Run the Wilcoxon rank sum test outRst <- Wilcoxon_analyze( tumor_file = tumor_file, normal_file = normal_file, output_file = output_file, logFC_threshold = 2.5, fdr_threshold = 0.01 ) # View the top 5 rows of the result head(outRst, 5)
# Define file paths for tumor and normal data from the data folder tumor_file <- system.file("extdata", "removebatch_SKCM_Skin_TCGA_exp_tumor_test.rds", package = "TransProR") normal_file <- system.file("extdata", "removebatch_SKCM_Skin_Normal_TCGA_GTEX_count_test.rds", package = "TransProR") output_file <- file.path(tempdir(), "Wilcoxon_rank_sum_testoutRst.rds") # Run the Wilcoxon rank sum test outRst <- Wilcoxon_analyze( tumor_file = tumor_file, normal_file = normal_file, output_file = output_file, logFC_threshold = 2.5, fdr_threshold = 0.01 ) # View the top 5 rows of the result head(outRst, 5)