Tutorial

What files do I need to provide?

You will need to upload 2 files to run a PEBBA analysis: a DEG file and a GMT file. For your convenience we make available 2 common GMT files, so you actually only need to provide a GMT file if you would like to use your own custom file. Attention: if the genes described in your DEG file are not present in the GMT file or if you pick the wrong column the analysis won't work!

Parameter choice

There are 3 obligatory parameters that PEBBA needs in order to correctly read your DEG file, these are the column names for the genes ID column (which will be matched agaisnt the GMT file), the log_2 FC column and the P-value column.

If you desire to personalize your PEBBA analysis you may configure 4 additional parameters:

  • Min and max genes: These 2 parameters control the range of the analysis. PEBBA will start by creating a ranking of the calculated pi-values and running the ORA analysis with the top "min genes", this will be used to generate one line of the final heatmap. Then, PEBBA will repeat this process again and again by including 50 genes to the analysis until it reaches the "max genes" size
  • Bar graphs significance threshold: This parameter is a threshold used in the creation of the barplots, each bar is a count for the number of times the ORA analysis reached a significance value higher than the threshold for a given row (ranking size) or column (pathway).
  • Minimum significance score: It is not uncommon to perform a meta-analysis such as the one done in PEBBA and arive in many insignificant results. To better visualize the results, pathways that have near 0 significance values in all rows are cut from the heatmap to increase clarity and interpretability (these results are not excluded from the csv file, only from the html heatmap file). This parameter is provided for users that would like to increase or decrease this threshold.

Interpreting the results

The raw results of the PEBBA analysis are provided as a simple csv file, alongside it we provide 3 html files with plots of this data. The 3 html files represent "up" (ranking by increase of gene expression), "down" (ranking by decrease of gene expression) and "any" (ranking by magnitude of change in gene expression, be it increase or decrease).

Each html file is composed of 3 interactive plots: one heatmap and two barplots. Each column of the heatmap represents a different pathway/group of genes decribed in the GMT file while each row represents the number NG of genes used in an analysis. The hue of a square represents the -log_10 p-value of the ORA analysis for a given pathway and NG value. The 2 barplots serve as a visual aid to interpret the heatmap, given a threshold value it counts how many times in a given row or column the ORA analysis score surpassed this thershold.

For more details about the inner workings of PEBBA you may refer to our paper or to its open source implentation.