Hi,
Very new to this! Apologies if not enough information given.
I'm trying to count the change in ERV expression before and after treatment of human cell lines with a drug. I have been using the gEVE database, which provides a GTF annotation for human genomes (http://geve.med.u-tokai.ac.jp/download/).
My pipeline is: generate genome indices using STAR --> trim reads (fasta files) using trimmomatic --> sort, remove blacklisted regions --> align trimmed reads to genome using STAR again --> count reads using featureCounts.
All seems fine except counting reads: I get a very low percentage of assigned reads (usually 0.1%) and these are mainly due to "Unassigned_NoFeatures". This is my input for featureCounts:
fc_ERV <- featureCounts(files=filenames, annot.ext="Hsap38.geve.v1.gtf", isGTFAnnotationFile=TRUE, GTF.featureType="CDS", countMultiMappingReads=TRUE, genome='Homo_sapiens.GRCh38.dna.primary_assembly.fa', isPairedEnd=TRUE,nthread=20)
I have used "GTF.featureType="CDS"" because the GTF from gEVE does not have an exon column, only CDS.
Any ideas on what I'm doing wrong? I am new to Bioinformatics, any help would be much appreciated.
Very new to this! Apologies if not enough information given.
I'm trying to count the change in ERV expression before and after treatment of human cell lines with a drug. I have been using the gEVE database, which provides a GTF annotation for human genomes (http://geve.med.u-tokai.ac.jp/download/).
My pipeline is: generate genome indices using STAR --> trim reads (fasta files) using trimmomatic --> sort, remove blacklisted regions --> align trimmed reads to genome using STAR again --> count reads using featureCounts.
All seems fine except counting reads: I get a very low percentage of assigned reads (usually 0.1%) and these are mainly due to "Unassigned_NoFeatures". This is my input for featureCounts:
fc_ERV <- featureCounts(files=filenames, annot.ext="Hsap38.geve.v1.gtf", isGTFAnnotationFile=TRUE, GTF.featureType="CDS", countMultiMappingReads=TRUE, genome='Homo_sapiens.GRCh38.dna.primary_assembly.fa', isPairedEnd=TRUE,nthread=20)
I have used "GTF.featureType="CDS"" because the GTF from gEVE does not have an exon column, only CDS.
Any ideas on what I'm doing wrong? I am new to Bioinformatics, any help would be much appreciated.