In STACKS, I ran process_radtags and get close to 20 million reads. They are paired-end 150bp reads.
/home/srile14/stacks-1.48/process_radtags -p /work/srile14/FastqFiles_MS2_044_Kelly_OysterRADseq/ \
--paired \
-i gzfastq \
-b /work/srile14/demulti --inline_inline \
-o /work/srile14/test-out \
-c -q -r -t 140 -w 0.15 -s 10 \
--renz_1 xbaI \
--renz_2 ecoRI \
--adapter_mm 2 \
--adapter_1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC \
--adapter_2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
20272558 total sequences
142938 reads contained adapter sequence (0.7%)
356902 ambiguous barcode drops (1.8%)
0 low quality read drops (0.0%)
19436 ambiguous RAD-Tag drops (0.1%)
19753282 retained reads (97.4%)
When I go to align those reads to the genome I created with bowtie2-build, I get only 14,641 reads.
bowtie2 -q -x /work/srile14/virginica_genome/virginica_genome \
-1 Sample_96.1.fq,Sample_95.1.fq,Sample_94.1.fq,....Sample_1.1.fq
-2 Sample_96.2.fq,Sample_95.2.fq,Sample_94.2.fq,....Sample_1.2.fq
-S /work/srile14/stdout
14641 reads; of these:
14641 (100.00%) were paired; of these:
4171 (28.49%) aligned concordantly 0 times
5341 (36.48%) aligned concordantly exactly 1 time
5129 (35.03%) aligned concordantly >1 times
----
4171 pairs aligned concordantly 0 times; of these:
88 (2.11%) aligned discordantly 1 time
----
4083 pairs aligned 0 times concordantly or discordantly; of these:
8166 mates make up the pairs; of these:
5678 (69.53%) aligned 0 times
1199 (14.68%) aligned exactly 1 time
1289 (15.78%) aligned >1 times
80.61% overall alignment rate
The overall alignment rate is high, but the total number of reads mapped seems extremely low (14,651 out of 20 million). Is there something I am missing? Or is this common for longer reads?
Thanks in advance,
Scott
/home/srile14/stacks-1.48/process_radtags -p /work/srile14/FastqFiles_MS2_044_Kelly_OysterRADseq/ \
--paired \
-i gzfastq \
-b /work/srile14/demulti --inline_inline \
-o /work/srile14/test-out \
-c -q -r -t 140 -w 0.15 -s 10 \
--renz_1 xbaI \
--renz_2 ecoRI \
--adapter_mm 2 \
--adapter_1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC \
--adapter_2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
20272558 total sequences
142938 reads contained adapter sequence (0.7%)
356902 ambiguous barcode drops (1.8%)
0 low quality read drops (0.0%)
19436 ambiguous RAD-Tag drops (0.1%)
19753282 retained reads (97.4%)
When I go to align those reads to the genome I created with bowtie2-build, I get only 14,641 reads.
bowtie2 -q -x /work/srile14/virginica_genome/virginica_genome \
-1 Sample_96.1.fq,Sample_95.1.fq,Sample_94.1.fq,....Sample_1.1.fq
-2 Sample_96.2.fq,Sample_95.2.fq,Sample_94.2.fq,....Sample_1.2.fq
-S /work/srile14/stdout
14641 reads; of these:
14641 (100.00%) were paired; of these:
4171 (28.49%) aligned concordantly 0 times
5341 (36.48%) aligned concordantly exactly 1 time
5129 (35.03%) aligned concordantly >1 times
----
4171 pairs aligned concordantly 0 times; of these:
88 (2.11%) aligned discordantly 1 time
----
4083 pairs aligned 0 times concordantly or discordantly; of these:
8166 mates make up the pairs; of these:
5678 (69.53%) aligned 0 times
1199 (14.68%) aligned exactly 1 time
1289 (15.78%) aligned >1 times
80.61% overall alignment rate
The overall alignment rate is high, but the total number of reads mapped seems extremely low (14,651 out of 20 million). Is there something I am missing? Or is this common for longer reads?
Thanks in advance,
Scott
Comment