-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scRNA-seq: how to create the Ranked list? #50
Comments
Asma, I would recommend you to check the documentation about metric in the Broad implementation of GSEA https://software.broadinstitute.org/gsea/doc/GSEAUserGuideFrame.html For many clusters the comparisons to do the metric have to depend of your dataset and your clusters (cell types, cell states). Also in my opinion negative enrichment in in a cell type with scRNAseq data is kinda of meaningless, because this is most likely to be driven by another cell type in the dataset (beside the logical exercise to explain it). |
Hi @asmariyaz23 , In the paper https://www.ahajournals.org/doi/pdf/10.1161/CIRCRESAHA.118.312804 (figure 7) I was checking the consistency of pathway enrichment: bulk RNA-seq vs scRNA-seq. But in my case, differential expression was done one cluster vs other (and not one cluster vs all the other, as it seems in your case) and compared to the same populations in bulk RNA-seq. I tried several techniques to accomplish this, but the most robust was to just use avg_logFC to rank genes. Other approach would be to use My code was similar to this: # whole is a Seurat object, I was comparing cluster 4 vs cluster 1
clustering <- whole@ident
cellsIn <- names(clustering[clustering == 4])
cellsOut <- names(clustering[clustering == 1])
markers <- FindMarkers(whole, cellsIn, cellsOut, logfc.threshold = 0, min.pct = 0, test.use="MAST")
ranks <- markers$avg_logFC
names(ranks) <- rownames(markers)
# allPathways is predefined list of signatures
fgseaRes <- fgsea(pathways = allPathways,
stats = ranks,
minSize=10,
maxSize=500,
nperm=1000000)
Important thing in this snippet is to set thresholds to 0 to not artificially exclude genes from the analysis. Running FindMarkers like this (with 0 thresholds) is the closest you can get to classical DE in bulk RNA-seq (however, genes that are not expressed in both groups will be excluded anyway, and this is good). In case of one vs all other clusters, I must agree with @ToledoEM, negative enrichment won't tell you much information about this particular cluster in most of the cases, but for positive enrichment you can use snippet above. Cheers, |
Thank you for your replies. I am a newbie to analysis so excuse my naive questions. I have a curated a disease related gene list based on mutations (purely literature based), and in parallel I ran Seurat on single cell data (of a region related to this disease). Now I want to see if these clusters are enriched for the curated gene list. I have tried the fisher exact test but the odds ratio doesn't look correct, which led me in finding a package which does gene enrichment. My idea was to supply these clusters of DE genes as pathway list to fgsea, but I don't know how I should fit the curated gene list into this equation. Is this something I can do with this package? |
Reopening, as this question arises regularly |
Unless you are using |
Why not use the formula for signal-to-noise across single cell transcriptomes from both groups? |
@Close-your-eyes I guess you can use that as well if it works for you. Although I would say that logFC can be estimated more directly by the differential expression method. |
Hello. What I'm doing is actually comparing post-treatment samples to pre-treatment. I'm separating positive and negative differentially-expressed genes (DEGs) and running fgsea on each. For the DEGs with positive log2FC, I take the GO terms with positive NES. Thanks a lot. |
Dear all, By doing Don't you think it would be better to put some min.pct, for example the 0.01 (the default in Seurat5) or 0.1 to remove from the analysis genes not expressed in any of the two groups of cells? As far as I know, we should remove these genes before using MAST. Moreover, if we use
As you can see, for Xkr4, for example, it seems to me that the avg_log2FC is wrong, since there is no expression in any of the groups:
For the fgsea, if we use the FindMarkers strategy and then the -sign(avg_logFC) * log10(p.adj) method to rank the genes, then I suppose the effect will be minimum since log10(1) is zero, and the whole value will turn to zero:
|
any updates please on how to rank genes from Seurat's findmarkers() on disease compared to control? Thank you. |
@ToledoEM thank you. I meant to ask about how to rank, if the log2fc is sufficient to rank them without taking into consideration or pvalue or adj_pvalue as seurat has changed the way findmarkers() compute DEGS and now it will return more genes than before. |
@Sirin24 I think @tmontserrat suggestion to add filter for |
Thanks @assaron, I gave " -sign(avg_logFC) * log10(p.adj)" a try and it led to Infinite values which would be problematic I think. May I should try to modify pct... by pct you mean pct.1 only?
|
No, you should be able to use
Yes, it's not great, as the ranking can't differentiate these genes. Technically though, it should be OK, fgsea should replace the infinite values with a finite ones, producing a warning. |
@assaron unfortunately I do not think fgsea can handle infinite values. msigdbr_7.5.1 fgsea_1.30.0
|
I added min and max values for those Infinite values and only 1 pathway passed with adj p value 0.05 (I ranked my gene list by -sign(avg_logFC) * log10(p.adj) ) I am not sure if this approach is working after findmarkers() with MAST. |
Hello,
I have some clusters with differentially expresses genes generated using Seurat. I also have a highly curated gene list and I wish to perform enrichment analysis using these on each cluster picking the top X genes. I understand the differentially expressed genes in each cluster can be input to your algorithm as a pathway. But I am confused what the ranked list should be? Is it the curated gene list? If yes, could you explain the values to be input into this list?
Thank you,
Asma
The text was updated successfully, but these errors were encountered: