SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Three reads with the same name in the BAM file (http://seqanswers.com/forums/showthread.php?t=67174)

Alphabets 03-28-2016 07:38 PM

Three reads with the same name in the BAM file
 
Hi all,

I am dealing with the paired-end BAM file, and come up with many warnings like this:

Code:

WARNING: Could not find pair for HWI-ST430:177:2:1:4979:15503#0
WARNING: Could not find pair for HWI-ST430:177:2:1:5127:13427#0
WARNING: Could not find pair for HWI-ST430:177:2:1:6521:21452#0

I check the warning reads in the BAM file, and find all the warning reads have three reads with the same name. For example:

Code:

HWI-ST430:177:2:1:4979:15503#0        65        chr32        26100696        60        79M21S        chr5        36697147        0        ACTTTGCAATTTAAGTTTTACTTACTTTTTAACTAATATACATGCCTAAAATTTACAAAAACAATAATAAAAACAACAGAACACTGGAAACATTTTTAAA        >;=<>=<<=======<====;===;=======<=>>>>>><=>>==>>>>=>>>>==>?>=<<==>?>>>?>?==><=?>><=<>>>?>?=>??>?===>        BD:Z:[email protected][email protected]@@[email protected]@        MD:Z:79        PG:Z:MarkDuplicates        RG:Z:Basenji        BI:Z:FFIECHGIHFEAFEEHEAAFFHDFFHDAAAFEEIHFGGHGGGHHGHHHFBBGFBGGGHBBBFGHGGFGGFBBBGHIGHJGHGHFKJJJJEIKLJGHBGFB        NM:i:0        AS:i:79        XS:i:19
HWI-ST430:177:2:1:4979:15503#0        129        chr5        36697147        60        72M28S        chr32        26100696        0        ATTTGCCCCTGGGCTATTTTTTTCCTNCCATGTAAGATTCCGTTTTAAAAATGTTTCCAGTGTTCTGTTGTTTTTATTATTGTTTTTGTAAATTTTAGGC        ===<=<<<<====<=>========<<!<<<=><<=>>>>>=5=>>>>>>>>>>=>>>==>=>=>>>>=?>=>>>>>>>>=?>=>>>?>>>??>??>;<=>        SA:Z:chr32,26100739,-,36M64S,60,0;        BD:Z:[email protected][email protected]@@EGGEGGGFHAAAHGJHBJJDDEHHI        MD:Z:26T37T7        PG:Z:MarkDuplicates        RG:Z:Basenji        BI:Z:FFFBHHHFFHGGDGHGGEAAAAADFGEEEIHHGHFFFGFEGHHFBBGFBBBGHGFBEGIIIFGFEFHGFHHGCCCHIGHIGHHGDDDIIKIFKJGHGHGH        NM:i:2        AS:i:65        XS:i:21
HWI-ST430:177:2:1:4979:15503#0        401        chr32        26100739        60        36M64H        =        26100696        -79        GCCTAAAATTTACAAAAACAATAATAAAAACAACAG        ===<=>>=>>===>===<=>===========>;===        SA:Z:chr5,36697147,+,72M28S,60,2;        BD:Z:[email protected]@AHHIJFIFF        MD:Z:36        PG:Z:MarkDuplicates        RG:Z:Basenji        BI:Z:HGHGBBFFAEGFFAAAEFFEGFEGFABBFGHGGHFF        NM:i:0        AS:i:36        XS:i:22

The BAM file is alignment of HiSeq reads aligned to the reference genome using bwa, and use picard to remove redundancy. Base realignments were done using gatk.


My confusion is:
1、Why there are three reads with the same name, but have no relation?
2、Maybe the first two are treated as mate pairs and the third as a single read. So could I just ignore it?

Could eveyone help me? Many thanks for your help!

Richard Finney 03-28-2016 08:05 PM

3rd read flag value 401 has not primary alignment bit.

2nd read has "SA" tag:
SA is : Other canonical alignments in a chimeric alignment, formatted as a semicolon-delimited list: ( rname , pos , strand , CIGAR , mapQ , NM [[...]+. Each element in the list represents a part of the chimeric alignment. Conventionally, at a supplementary line, the [...] element points to the primary line.

it's pointing to 3rd read via the location.

So, looks like your software suppors reads that have parts that maps to different locations.

Alphabets 03-28-2016 08:30 PM

Quote:

Originally Posted by Richard Finney (Post 191466)
3rd read flag value 401 has not primary alignment bit.

2nd read has "SA" tag:
SA is : Other canonical alignments in a chimeric alignment, formatted as a semicolon-delimited list: ( rname , pos , strand , CIGAR , mapQ , NM [[...]+. Each element in the list represents a part of the chimeric alignment. Conventionally, at a supplementary line, the [...] element points to the primary line.

it's pointing to 3rd read via the location.

So, looks like your software suppors reads that have parts that maps to different locations.

Thank you for your reply!

I read your reply carefully but there is some difficulty for me to understand.

Could you explain the three reads more easy to understand? or how can I solve the warnings "Could not find pair for HWI-ST430:177:2:1:4979:15503#0".

Thank you very much!

Richard Finney 03-28-2016 09:57 PM

What is your goal?

What program is reporting the warning?

Check the manual for your alignment software and check the notes on when it produces an "SA" tag.

Read one is one mate pair.
The next two represent the other read with two entries , that is it is a "chimeric" read [ I think ].

Ignoring it could be thing to do, depending on your goals.

If you are looking for chimeric reads or possible errors in the reference, then you have struck gold

Alphabets 03-28-2016 11:22 PM

Quote:

Originally Posted by Richard Finney (Post 191466)
What is your goal?

What program is reporting the warning?


I want to call STRs with lobSTR dealing with the BAM file.

I run lobSTR with the paired-end BAM file and it occurs many warnings like that.

The BAM file I use is downloaded from web and I don't know more about it.

When I run lobSTR treating it as the single-end BAM file, there is no warnings.
The lobSTR to run single-end and single-end BAM file have different parameters.

So, any other suggestions? Thanks!


All times are GMT -8. The time now is 06:41 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.