Hi,
I have 8 fastq files with small RNA sequencing data and I would like to allign the sequences to a reference (.fa), thereby removing the sequences that do not allign perfectly to the reference sequences. However, the condition is that the perfect alligned seqeunces have the same length as the reference sequences to which they were alligned to. Example (not the real data, but to show what I mean):
Ref sequence (.fa):
CAATCGATCGATGCTAGTC
sample sequences (.fastq):
GCAATCGATCGATGC
CAATCGATCGATGCTAGTC
AATCGATCGATGCTAGTC
GTACCATCGACT
Expected output from bowtie (.sam):
CAATCGATCGATGCTAGTC
This is the command I used:
bowtie2 -L 6 -i S,0,0.5 --rdg 1,6 --rfg 1,6 --norc --score-min C,0,-1 -p 8 -x "INDEX" (input_file.fastq) > (output_file.sam)
The command runs and produces output, however not exactly what I expected.
Bowtie2 output from command:
CAATCGATCGATGCTAGTC
AATCGATCGATGCTAGTC
How can I change the bowtie2 command to remove perfect alligned sequences that are not the same length as the reference sequences? Or use Samtools to remove the smaller/longer sequences?
Thanks in advance!
I have 8 fastq files with small RNA sequencing data and I would like to allign the sequences to a reference (.fa), thereby removing the sequences that do not allign perfectly to the reference sequences. However, the condition is that the perfect alligned seqeunces have the same length as the reference sequences to which they were alligned to. Example (not the real data, but to show what I mean):
Ref sequence (.fa):
CAATCGATCGATGCTAGTC
sample sequences (.fastq):
GCAATCGATCGATGC
CAATCGATCGATGCTAGTC
AATCGATCGATGCTAGTC
GTACCATCGACT
Expected output from bowtie (.sam):
CAATCGATCGATGCTAGTC
This is the command I used:
bowtie2 -L 6 -i S,0,0.5 --rdg 1,6 --rfg 1,6 --norc --score-min C,0,-1 -p 8 -x "INDEX" (input_file.fastq) > (output_file.sam)
The command runs and produces output, however not exactly what I expected.
Bowtie2 output from command:
CAATCGATCGATGCTAGTC
AATCGATCGATGCTAGTC
How can I change the bowtie2 command to remove perfect alligned sequences that are not the same length as the reference sequences? Or use Samtools to remove the smaller/longer sequences?
Thanks in advance!