I am attempting to design software for pulling out pathogen reads from either RNA-seq or WGS sequencing of human samples. The pipeline is centred around the NCBI taxonomy database and NCBI pathogen reference genomes, and now I am incorporating a component that will use FeatureCounts to process individual bam files mapped to pathogen genomes in order to determine expression levels of, say HPV transcripts, in cervical cancer etc when RNA-seq reads are the initial input.
As such, I need a GTF or SAF format as described here: http://bioinf.wehi.edu.au/featureCounts/
Now, I can write a perl script to convert the GFF files available for viruses into a suitable GTF or SAF format, but I wanted to ascertain if viral and bacterial NCBI GTF or SAF files were already available somewhere, before I took the time and computational resources to convert 7,000 GFF files. Additionally, if anyone has already encountered this dilemma for any other reason and has a useful script that would be most helpful.
Many thanks.
As such, I need a GTF or SAF format as described here: http://bioinf.wehi.edu.au/featureCounts/
Now, I can write a perl script to convert the GFF files available for viruses into a suitable GTF or SAF format, but I wanted to ascertain if viral and bacterial NCBI GTF or SAF files were already available somewhere, before I took the time and computational resources to convert 7,000 GFF files. Additionally, if anyone has already encountered this dilemma for any other reason and has a useful script that would be most helpful.
Many thanks.
Comment