
Classes & Data Structures
classes_and_datastructures.Rmd
Classes used for results objects
#required packages for classes
library(readr)
library(Biostrings)
library(treeio)
# define necessary classes
# class for the expanded_OG - containing all different types of data we have on it
setClass("expanded_OG", slots=list(blast_table="tbl_df",
add_OG_analysis="list"))
# class for the hypothese
setClass("hypothesis", slots=list(description="character",
number="character",
expanded_in ="character",
compared_to="character",
expanded_OGs="list",
species_tree="phylo"))
#class for adding OGs analysis
setClass("add_OG_set",
slots=list(genes="spec_tbl_df",
msa="AAStringSet",
tree="phylo"
)
)
A2TEA.Workflow result object
If you intend to create further visualizations or want to perform other analyses with the underlying data, the following will provide the classes and knowledge to work proficiently with the A2TEA.Workflow results.
The final output generated by the workflow is a single .RData file that can be loaded into an active R environment with the load() command. This provides several separate objects containing all results in a compact form factor:
-
HYPOTHESES.a2tea - List object with one S4 object per hypothesis. Each S4 object contains several layers of nested information. E.g. HYPOTHESES.a2tea$hypothesis_2@expanded_OGs$N0.HOG0001225 refers to a specific expanded OG and S4 data object that contains:
blast_table (complete BLAST/DIAMOND results for OG genes & extended hits)
add_OG_analysis (includes multiple sequence alignment (MSA), phylogenetic tree and gene info for expanded OG and additional OGs based on best BLAST/DIAMOND hits)
HOG_level_list - List object with one tibble per hypothesis. Information includes OG, number of genes per species, boolean expansion info, number of significant DEGs, and more. The last N list element is a non-redundant superset of all species analyzed over all formulated hypotheses. This makes it easy to create a comparison set e.g. conserved OGs of all species to which the hypothesis subset can then be compared.
HOG_DE.a2tea - Tibble of DESeq2 results for all genes + additional columns.
A2TEA.fa.seqs - Non-redundant list object containing corresponding amino acid FASTA sequences of all genes/transcripts in the final analysis (this includes those of expanded OGs + those in additional BLAST hits & additional OGs based on user-chosen parameters).
SFA/SFA_OG_level - gene/transcript level tables that contain functional predictions (human readable descriptions & GO terms inferred by AHRD).
hypotheses - a copy of the user-defined hypotheses definitions for the underlying workflow run.
all_speciesTree - phylogenetic tree of all species in the workflow run (a non-redundant superset of hypotheses) as inferred by Orthofinder/STAG/Stride.