Unconfigured Ad

**sparks** · 10-03-2011, 04:55 PM

Hi Jane,

There are a few things that might cause slightly different results. First would be the setting of insert size & standard deviation. In Novoalign this is used to set initial limits and as more reads are processed the actual distribution off insert lengths is used. With MPI each process maintains its own fragment length table so there might be small differences and it will take longer for the actual distribution to take affect.
Also, if you use quality calibration the MPI processes each maintain their own mismatch counts so quality calibration may be slightly different and will take longer to kick in.
With regard the homopolymer filter and quality filter, reads are first identified as homopolymer and/or having low quality bases. This will stop them being used in the first single end phase of alignment however they will still be used in paired end search if the mate was successfully mapped. If this results in a proper pair then the read won't be counted as homopolymer or low quality.
I'd like to see your command line and also the insert size reported by novoalign. The differences should be reduced if you set the -i option more accurately.
There's no need to be concerned about the differences, other than to check that -i was set at least such that mean + 6 std dev is sufficient to cover your fragments.
The actual alignment code is identical between the different versions of Novoalign, the differences all relate to fragment length distribution and the quality calibration function.
You can remove quality calibration differences by first running a sample of reads (say 100K) and saving the table using the -K <qcal.csv> and then using this in subsequent runs -k <qcal.csv>.

Colin

**jgSoton** · 10-04-2011, 04:23 AM

Hi,

Thanks for the reply. I feel more comfortable with the data now.

My command line is;
#mpiexec -f hostfile -n $nprocs -launcher rsh -iface ib0 $run_exe \
mpiexec -f ibhostfile -n $nprocs $run_exe --mmapoff \
-d /temp/EXOME_DATA/REF_GENOMES/HG18/hg18.nix \
-f /temp/EXOME_DATA//RESULTS/03/FASTQ/WTCHG_22039_06_1_sequence.txt.gz /temp/EXOME_DATA//RESULTS/03/FASTQ/WTCHG_22039_06_2_sequence.txt.gz \
-F ILMFQ -i 200 30 -o SAM -o SoftClip -k -a -g 65 -x 7 \
> SOTON0003a_aligned.sam 2> SOTON0003a_mapping.stats

I dont know where the insert size is output...

Jane

**sparks** · 10-04-2011, 05:57 AM

Hi Jane,

The insert size will be reported near the end of the log file, SOTON0003a_mapping.stats

Colin

**jgSoton** · 10-04-2011, 06:01 AM

Ahh,

# Mean 201, Std Dev 53.7

Jane

**sparks** · 10-04-2011, 06:18 AM

As you used -i 200 30 the range of fragment length for proper pairs would be 0 to 480. It should be OK as penalties will adjust to the actual distribution. However a few long fragments may not have been flagged as proper pairs.
The -k option and the -i difference will likely explain the small difference in result between MPI and nonMPI runs.

Topics	Statistics	Last Post
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, Today, 08:59 AM	0 responses 10 views 0 reactions	Last Post by SEQadmin2 Today, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 17 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM

Unconfigured Ad

Novoalign MPI homopolymer filter

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News