My colleagues and I are assembling transcriptomes from RNASeq data. Our pipeline can (loosely) be outlined as follows:
1. De novo assembly using all RNASeq read data
2. Identification of potential coding DNA sequences (CDS)
3. Alignment of individual RNASeq libraries against CDS sequences to create BAM files
4. Phasing of BAM files into individual haplotypes
5. Generate consensus sequence from phased BAM files
I have been trying to phase the BAM files from a small dataset with only three putative CDS. Looking at the data in Geneious, there are what appear to be very obvious SNPs and heterozygous sites. However, samtools does not sort the polymorphisms into separate BAM files. Rather, each output file seems to contain a random mixture of reads.
Any suggestions as to why this might be the case?
NB: BAM files were created in Bowtie2 version 2.2.5
NB: We are using Samtools version 1.3
NB: A picture of one of out BAM files is below. There is clearly a T/G and C/A polymorphisms; and the G allele seems to be linked with the A allele. However, Samtools did not separate reads into separate BAM files.
1. De novo assembly using all RNASeq read data
2. Identification of potential coding DNA sequences (CDS)
3. Alignment of individual RNASeq libraries against CDS sequences to create BAM files
4. Phasing of BAM files into individual haplotypes
5. Generate consensus sequence from phased BAM files
I have been trying to phase the BAM files from a small dataset with only three putative CDS. Looking at the data in Geneious, there are what appear to be very obvious SNPs and heterozygous sites. However, samtools does not sort the polymorphisms into separate BAM files. Rather, each output file seems to contain a random mixture of reads.
Any suggestions as to why this might be the case?
NB: BAM files were created in Bowtie2 version 2.2.5
NB: We are using Samtools version 1.3
NB: A picture of one of out BAM files is below. There is clearly a T/G and C/A polymorphisms; and the G allele seems to be linked with the A allele. However, Samtools did not separate reads into separate BAM files.