Why do I have a different amount of estimated gene counts?

2 years ago

#13959

pythonbeginner

I am analyzing bulk RNA seq data and I used Kallisto to align my data to the transcriptome. Then, I used tximport to assign the gene names from ensembl to my counts. I am comparing the results I analyzed currently to some data that were run 4 years ago and I noticed that in the data from 4 years ago I ended up with an estimated gene counts table with ~50000 genes while now I have about half. Is it possible to see which version of the gene annotation I am using? Is it possible that the difference in the overall amount of genes could be that there was an update on the Ensembl dataset I am using?

I am using the Ensembl dataset using the code below:

mart <- biomaRt::useMart("ensembl", hsapiens_gene_ensembl, host = "uswest.ensembl.org", ensemblRedirect = FALSE)

I also noticed that the estimated gene counts from 4 years ago contains thousand of gene names that are similar to AC253536.2 (they all start with AC) but the version I am using now does not output any gene names like this. Does anyone know why those were removed?

Thank you

bioinformatics

bioconductor

rna-seq

biomart

0 Answers

Your Answer

Posts

Questions

Blogs