Hi pkerrwall your answer to this question:
The problem with this script is that it returns the reference DNA sequence (from the genome fasta file used for mapping) for the transcripts and not the assembly from the mapped reads.
Does anyone know of a tool that can give the assembled transcript sequence from the RNA-seq data? Or can someone help me if I'm misunderstanding GFFRead?
Here is the process that I am using:
samtools mpileup -uf ref.fa accepted_hits.bam | bcftools view -cg - | vcfutils.pl vcf2fq | fq2fa.pl > new_ref.fa
gffread -w transcripts.fa -g new_ref.fa transcripts.gtf
where fq2fa.pl is a bioperl script to convert from fq to fasta
I also have an email into the cufflinks developers to see if there is a way that the gffread utility can be enhanced to get the consensus sequence from the bam file and not the reference genome.
Could you make the bioperl script: fq2fa.pl and vcfutils.pl and vcf2fq available.
Lizex
The problem with this script is that it returns the reference DNA sequence (from the genome fasta file used for mapping) for the transcripts and not the assembly from the mapped reads.
Does anyone know of a tool that can give the assembled transcript sequence from the RNA-seq data? Or can someone help me if I'm misunderstanding GFFRead?
Here is the process that I am using:
samtools mpileup -uf ref.fa accepted_hits.bam | bcftools view -cg - | vcfutils.pl vcf2fq | fq2fa.pl > new_ref.fa
gffread -w transcripts.fa -g new_ref.fa transcripts.gtf
where fq2fa.pl is a bioperl script to convert from fq to fasta
I also have an email into the cufflinks developers to see if there is a way that the gffread utility can be enhanced to get the consensus sequence from the bam file and not the reference genome.
Could you make the bioperl script: fq2fa.pl and vcfutils.pl and vcf2fq available.
Lizex