View Single Post
Old 04-10-2015, 09:52 PM   #1
Alun3.1
Junior Member
 
Location: alberta

Join Date: Feb 2015
Posts: 8
Default Basic question mapping read/contigs (Bowtie2)

Hi,

I am learning bioinformatics and have a basic question about bowtie2.

I got 2 different sets of single-end sequencing reads:
- set #1: 10,000,000 reads
- set #2: 12,000,000 reads

I concatenated these two datasets together (22,000,000 reads) and assembled them with Trinity (100,000 contigs).

Now, I am trying to know which contigs come from which set of data.

For that I aligned with bowtie2 the indexed contigs to the reads of the first set, and then did the same with the second set separately:
- indexed contigs VS set #1
- indexed contigs VS set #2

I used the bowtie2 --end_to_end option and the --al option in order to output the contig sequences that aligned to the reads, and it returned:
- overall alignment rate of contigs VS set #1 = 80% (8,000,000 sequences)
- aligned contig file set #1: 15,000,000 sequences

- overall alignment rate of contigs VS set #2 = 78% (9,360,000 sequences)
- aligned contig file set #2: 19,000,000 sequences

How can I end up with more contigs than Trinity produced and than the number of reads???

I am clearly doing something wrong.
How could I obtain the sequences of the contigs from set #1 and the sequences of the contigs from set #2?
Is the --al the right option?
Should I start digging the SAM file instead?

Thanks in advance for your help!
Alun3.1 is offline   Reply With Quote