Seqanswers Leaderboard Ad

**Zigster** · 12-29-2009, 08:59 AM

Do you have any data regarding the number of multiple hits/ambiguous alignments you are seeing? You say you are taking unique best hits but what if the next best one (e.g. with one mismatch) is where it should be relative to its pair mate? How many unmated pairs are you seeing (one read aligns but its mate does not at default bowtie parameters)

Have you tried doing a paired-end alignment using Bowtie and just substract those reads that align from the pool before doing your analysis?

Have you tried this against refseq sequences instead of the genome?

**lh3** · 12-29-2009, 10:43 AM

I bet most of these translocations are misalignments. To find SVs, I would suggest two-phase alignment:

1) Fast alignment: align PE reads with bowtie/bwa in the paired-end mode.

2) Accurate alignment: align aberrant read pairs and singletons with a more accurate aligner such as novoalign. The aligner in use should be able to produce mapping quality.

If you are mainly interested in translocations where both ends mapped to unique regions, you should set a high threshold on mapping quality (e.g. 35-40). I am not sure how people will do if repeats are involved. See this figure for why mapping quality helps to greatly reduce false alignments.

**quinlana** · 12-29-2009, 04:19 PM

I second lh3's suggestion. This is nearly identical to the approach I use. One further caveat I should mention is that even after using BWA and Novoalign, there can remain pairs that appear to be aberrant owing to misalignment or chimeric molecule. To mitigate the latter, I cluster aberrant pairs (say having two or more supporting pairs) under the assumption that chimeras occur randomly. I then realign the supporting pairs in all clusters with megablast or something similar (using ridiculously sensitive settings).

Also, are you sure they are suggesting translocations? They can also be retrotransposon insertions that have occurred in your test DNA, but are not present in the reference genome. AluYs, LINEs and SVAs are still active.

Aaron

**ramouz87** · 12-31-2009, 02:12 AM

Originally posted by Zigster View Post

Do you have any data regarding the number of multiple hits/ambiguous alignments you are seeing? You say you are taking unique best hits but what if the next best one (e.g. with one mismatch) is where it should be relative to its pair mate? How many unmated pairs are you seeing (one read aligns but its mate does not at default bowtie parameters)

Have you tried doing a paired-end alignment using Bowtie and just substract those reads that align from the pool before doing your analysis?

Have you tried this against refseq sequences instead of the genome?

Hi Zigster,
Thanks for you answer.
I'm new in the field so still experimenting aligner and trying to get how they work.
I've changed the setting to -m1 -n0 with these option we keep only reads that align to a unique position in the reference with no mismatch. And we have the following statistics
for s_1_1_sequence.fq
# reads processed: 16479658

# reads with at least one reported alignment: 10592189 (64.27%)

# reads that failed to align: 3406969 (20.67%)

# reads with alignments suppressed due to -m: 2480500 (15.05%)

for s_1_2_sequence.fq
# reads processed: 16479673

# reads with at least one reported alignment: 10372746 (62.94%)

# reads that failed to align: 3704063 (22.48%)

# reads with alignments suppressed due to -m: 2402864 (14.58%)

when aligning in paired-end mode -m1 -n0 -X1000 (X max gap size between reads) I got very poor alignment
# reads processed: 16479658

# reads with at least one reported alignment: 947283 (5.75%)

# reads that failed to align: 15495410 (94.03%)

# reads with alignments suppressed due to -m: 36965 (0.22%)

This is surprising because if I take the single reads and match them by their ids the number of matching read is higher than 3.2 million reads after all the filtring of duplicates and Poly(A/T)
attached 2 plots about gap between reads
anyhow the reads that are positioned at the normal range are automatically put aside and also all reads mapping to the same chromosome.

Originally posted by lh3 View Post

I bet most of these translocations are misalignments. To find SVs, I would suggest two-phase alignment:

1) Fast alignment: align PE reads with bowtie/bwa in the paired-end mode.

2) Accurate alignment: align aberrant read pairs and singletons with a more accurate aligner such as novoalign. The aligner in use should be able to produce mapping quality.

If you are mainly interested in translocations where both ends mapped to unique regions, you should set a high threshold on mapping quality (e.g. 35-40). I am not sure how people will do if repeats are involved. See this figure for why mapping quality helps to greatly reduce false alignments.

Hi Lh3,
thanks for your reply,
As mentioned above, for some reasons the paired-end alignment with bowtie is giving an unexpected result.
I was thinking of shortcuting step one by taking only the Id of reads mapping in different chromosome from my analysis, extract the data from fastq for these id and run novoalign on that selection. Do you thing it's a good idea ?
For the mapping quality, is it -l parameter in novoalign that should be set to 35-40?
The default option is Log4(hg size/ 2)+5=20.xx

Originally posted by quinlana View Post

I second lh3's suggestion. This is nearly identical to the approach I use. One further caveat I should mention is that even after using BWA and Novoalign, there can remain pairs that appear to be aberrant owing to misalignment or chimeric molecule. To mitigate the latter, I cluster aberrant pairs (say having two or more supporting pairs) under the assumption that chimeras occur randomly. I then realign the supporting pairs in all clusters with megablast or something similar (using ridiculously sensitive settings).

Also, are you sure they are suggesting translocations? They can also be retrotransposon insertions that have occurred in your test DNA, but are not present in the reference genome. AluYs, LINEs and SVAs are still active.

Aaron

Hi Aaron,
Thanks for your comment,
I just landed in the field of NGS two month ago, so my experience is limited as I used to work with microarray before.
Could you give me more detail about your clustering approach to overcome chimeric DNA ? That could be helpful as I have some experience with machine learning and could try to findout if that could be Improved.

Thanks to all of you and best wishes

Regards,
Ramzi

Attached Files

**ramouz87** · 01-05-2010, 05:28 AM

Hi
Thanks for your suggestions
I've run the analysis and now a considerable number of artefact is discarded (98%) by applying novoalign, but still have 5861 PE showing translocations.
i've attached the contengency table so you can have an idea.
Any other way to filter further this data ?
Thanks again.

Regards,
Ramzi

Attached Files

after_novo_SV_table_PE_hg19.m1bs.txt (1.5 KB, 21 views)

**ramouz87** · 01-06-2010, 05:04 AM

Bug in code (still high number of artefact even after novoalign)

Originally posted by ramouz87 View Post

Hi
Thanks for your suggestions
I've run the analysis and now a considerable number of artefact is discarded (98%) by applying novoalign, but still have 5861 PE showing translocations.
i've attached the contengency table so you can have an idea.
Any other way to filter further this data ?
Thanks again.

Regards,
Ramzi

Hi
There was a small bug in data fetching and after correcting that it turn out that the number of artefact decrease from 303318 to 251374 (18% less) but still very high number of artefact.
I've attached the contingency table so you can have an overview of the mapping of reads in chromosomes.
Thanks in advance for suggestions..

Regards,
Ramzi

Attached Files

sv_S1_cancer_1.chimera.out.bedPE_CT_table_NOVO.txt (1.9 KB, 6 views)

**lh3** · 01-06-2010, 06:31 AM

Many people will cluster aberrant reads with high mapping quality. But probably you should start to dig into literatures (e.g. breakdancer) and use a proper software package if SVs are your main interest.

**ramouz87** · 01-07-2010, 02:59 AM

Hi Heng,
I've wanted to use Breakdancer 2 month ago but there were a problem with converting bam file (using bwa then samtool) to cfg using the bam2cfg script, hopefully there's a new version of Breakdancer were the script was updated hope I can be able to run it.
Thanks for your suggestions.
Regards,
Ramzi

**KevinLam** · 07-19-2010, 02:35 AM

I have just used breakdancer with bwa and it works 'fine'
illumina 76bp PE reads (just plugged in solexa reads direct into bwa they are already in fastq)

one thing the documentation skipped is that you need to use sorted bams for breakdancer to work.

cheers

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

unexpected high number of chromosomal translocation from paired-end data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News