Seqanswers Leaderboard Ad

**Richard Finney** · 05-12-2015, 08:14 AM

Show us your alignment and sorting commands.

**fanli** · 05-12-2015, 08:28 AM

Multiple alignments...

**pkMyt1** · 05-12-2015, 08:33 AM

Both come from iterating the files in Python. The FASTQ files are read in in blocks of four lines each which is one read. This example is a MiSeq run so 30 million (15 million in each FASTQ) seems realistic. The BAM count is from

bamfile = pysam.AlignmentFile(o['bamfile'], "rb")
bamfile.count()
or
bamfile_reads = functools.reduce(lambda x, y: x + y, [eval('+'.join(l.rstrip('\n').split('\t')[2:])) for l in pysam.idxstats(o['bamfile'])])

How To Get Number Of Reads In Bam File Efficiently In Python?

https://www.biostars.org/p/1890/

or simply counting the reads as I iterate the BAM file to do my analysis.

**pkMyt1** · 05-12-2015, 08:39 AM

Originally posted by fanli View Post

Multiple alignments...

So....
Would this imply my alignment settings are keeping things I should not?

bwa mem -a -T 25 -L '(100, 100)'

**dpryan** · 05-12-2015, 09:47 AM

I would imagine that the -a flag is to blame.

**Brian Bushnell** · 05-12-2015, 11:35 AM

Originally posted by pkMyt1 View Post

So....
Would this imply my alignment settings are keeping things I should not?

That depends on the goal of your experiment. What are you trying to do?

**pkMyt1** · 05-13-2015, 04:43 AM

Originally posted by Brian Bushnell View Post

That depends on the goal of your experiment. What are you trying to do?

This is duplex exome sequencing. Very deep but only about 80 kb of capture. I did not want to lose any alignments where one read aligned and the other did not either due to a translocation or simply a sequencing error. This is why I did the -a option. Each read is uniquely tagged so I had been able to filter things in the end. This is the first time I have seen this but it is also the first time I have run a sample that I know contains many chromosomal rearrangements in the way of translocations, duplications, deletions. I will need to try and pull out some of these multiple alignments and have a look at them so I can understand what they are better.

**Brian Bushnell** · 05-13-2015, 09:43 AM

In that case, it sounds like considering all good alignments of the reads is probably best. The reason for all the multiple alignments is presumably that you're targeting a repetitive region.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 29 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Sorted BAM read count >2x total FASTQ count

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News