Seqanswers Leaderboard Ad

**GenoMax** · 11-02-2015, 11:16 AM

Was the fragment selection/experiment done in such as way that you expect the two reads to overlap?

**bi_maniac** · 11-02-2015, 11:25 AM

histogram files

Attached Files

**bi_maniac** · 11-02-2015, 11:37 AM

Originally posted by GenoMax View Post

Was the fragment selection/experiment done in such as way that you expect the two reads to overlap?

YES

By the way, it was done with an amplicon library and expected insert size is 300-370

**GenoMax** · 11-02-2015, 11:42 AM

I don't know the full story here but is the result telling you something useful? Purely speculating: e.g. there is an insertion in "low merge" case such that the reads no longer overlap?

If you map the reads to the reference in both cases what do you see?

**bi_maniac** · 11-02-2015, 11:45 AM

Relevant post from previous thread:

"
Currently I have only been able to perform this task with BBmerge, mergePairs.py and a custom software developed by myself.

I consistently get low percentages of merging: about 20% when limiting quality of overlap above 90%.

Is this normal? I assumed that merging would be much above 50%.
"

**bi_maniac** · 11-02-2015, 11:52 AM

Originally posted by GenoMax View Post

I don't know the full story here but is the result telling you something useful? Purely speculating: e.g. there is an insertion in "low merge" case such that the reads no longer overlap?

If you map the reads to the reference in both cases what do you see?

Excuse my ignorance.

Which command you advice me to use in order to map to a reference?

I assume there is a possibility to do it with BBMap suite.

In this case I know it is chromosome 14 and I downloaded full hg19.

**Brian Bushnell** · 11-02-2015, 11:53 AM

You clearly have a weird bimodal distribution with sharp peaks at ~150 and ~200. But if you expect an insert size of 300-370bp, you should not plan on merging because they won't overlap! But please note that "insert size" means different things to different people. I use it to mean "The length of the genomic portion that is sequenced, including the unsequenced middle, if any". But some people also include adapters and primers so they get bigger numbers that may be more relevant to size selection, but are less relevant to downstream analysis.

I suspect your true distribution is at least trimodal (since even in the "high" case a lot don't overlap), and the highest mode is at least 300bp and therefore won't show up. Perhaps there are homologous locations in the organism that the primers are binding to - one is ~150bp long, one is ~200bp long, and the last one is the one that's actually of interest, if you expect it to be over 300bp?

You might want to clarify with the people that made the libraries or designed the primers exactly what they mean by 300-370bp, and why they expect them to overlap. For example, a diagram showing the position of the genomic sequence, primers, and so forth, clearly labelled with how long each one is.

**Brian Bushnell** · 11-02-2015, 11:55 AM

Originally posted by bi_maniac View Post

Excuse my ignorance.

Which command you advice me to use in order to map to a reference?

I assume there is a possibility to do it with BBMap suite.

In this case I know it is chromosome 14 and I downloaded full hg19.

Yes, that's the best option at this point.

bbmap.sh in1=r1.fq in2=r2.fq ref=hg19.fa ihist=ihist_mapping.txt reads=1m

Doing with only chr14 is a possibility but I'd recommend the whole thing.

**GenoMax** · 11-02-2015, 12:01 PM

@bi_maniac: Take the sequence of the primers and do a quick search over at UCSC: http://genome.ucsc.edu/cgi-bin/hgPcr?db=hg38 to see if you pull up more than one unique product. That may account for the multiple products @Brian thinks you have in your sample.

**bi_maniac** · 11-02-2015, 12:03 PM

OK give me a while, please!

**GenoMax** · 11-02-2015, 12:07 PM

Change the genome build accordingly if the primers were designed using hg19. Default genome build at UCSC is now GRCh38 for the tool above.

**bi_maniac** · 11-02-2015, 01:50 PM

Execution of BBMap with high_merge case and with chr14.fa as reference (my current computer has only 4GB of memory and was unable to process whole genome).

Code:

java -Djava.library.path=/home/carlos/BBMap/bbmap/jni/ -ea -Xmx3g -cp /home/carlos/BBMap/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 -Xms2g -Xmx3g in1=high_merge_R1.fq in2=high_merge_R2.fq ref=/home/carlos/HG19/chr14.fa ihist=ihist_mapping.txt
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, -Xms2g, -Xmx3g, in1=high_merge_R1.fq, in2=high_merge_R2.fq, ref=/home/carlos/HG19/chr14.fa, ihist=ihist_mapping.txt]

BBMap version 35.43
Set insert size histogram output to ihist_mapping.txt
Retaining first best site only for ambiguous mappings.
No output file.
NOTE:	Deleting contents of ref/genome/1 because reference is specified and overwrite=true
Writing reference.
Executing dna.FastaToChromArrays2 [/home/carlos/HG19/chr14.fa, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=false]

Set genScaffoldInfo=true
Writing chunk 1
Set genome to 1

Loaded Reference:	0,332 seconds.
Loading index for chunk 1-1, build 1
No index available; generating from reference genome: /home/carlos/DATA/IG-EMR_080915/prba/ref/index/1/chr1_index_k13_c4_b1.block
Indexing threads started for block 0-1
Indexing threads finished for block 0-1
Generated Index:	68,469 seconds.
Analyzed Index:   	11,051 seconds.
Cleared Memory:    	0,703 seconds.
Processing reads in paired-ended mode.
Started read stream.
Started 2 mapping threads.
Detecting finished threads: 0, 1

   ------------------   Results   ------------------   

Genome:                	1
Key Length:            	13
Max Indel:             	16000
Minimum Score Ratio:  	0.56
Mapping Mode:         	normal
Reads Used:           	283940	(60393910 bases)

Mapping:          	920,090 seconds.
Reads/sec:       	308,60
kBases/sec:      	65,64


Pairing data:   	pct reads	num reads 	pct bases	   num bases

mated pairs:     	  0,0035% 	        5 	  0,0035% 	        2084
bad pairs:       	  0,0000% 	        0 	  0,0000% 	           0
insert size avg: 	  292,20
insert 25th %:   	  147,00
insert median:   	  167,00
insert 75th %:   	  229,00
insert std dev:  	  122,63
insert mode:     	  167


Read 1 data:      	pct reads	num reads 	pct bases	   num bases

mapped:          	  0,0035% 	        5 	  0,0035% 	        1042
unambiguous:     	  0,0028% 	        4 	  0,0030% 	         895
ambiguous:       	  0,0007% 	        1 	  0,0005% 	         147
low-Q discards:  	  0,0183% 	       26 	  0,0031% 	         910

perfect best site:	  0,0000% 	        0 	  0,0000% 	           0
semiperfect site:	  0,0000% 	        0 	  0,0000% 	           0
rescued:         	  0,0007% 	        1

Match Rate:      	      NA 	       NA 	  3,8966% 	         927
Error Rate:      	100,0000% 	        5 	 96,1034% 	       22863
Sub Rate:        	100,0000% 	        5 	  0,4834% 	         115
Del Rate:        	 20,0000% 	        1 	 95,6200% 	       22748
Ins Rate:        	  0,0000% 	        0 	  0,0000% 	           0
N Rate:          	  0,0000% 	        0 	  0,0000% 	           0


Read 2 data:      	pct reads	num reads 	pct bases	   num bases

mapped:          	  0,0035% 	        5 	  0,0037% 	        1156
unambiguous:     	  0,0035% 	        5 	  0,0037% 	        1156
ambiguous:       	  0,0000% 	        0 	  0,0000% 	           0
low-Q discards:  	  0,0190% 	       27 	  0,0039% 	        1211

perfect best site:	  0,0000% 	        0 	  0,0000% 	           0
semiperfect site:	  0,0000% 	        0 	  0,0000% 	           0
rescued:         	  0,0000% 	        0

Match Rate:      	      NA 	       NA 	  3,6493% 	         988
Error Rate:      	100,0000% 	        5 	 96,3507% 	       26086
Sub Rate:        	100,0000% 	        5 	  0,4026% 	         109
Del Rate:        	 60,0000% 	        3 	 95,7302% 	       25918
Ins Rate:        	 40,0000% 	        2 	  0,2179% 	          59
N Rate:          	  0,0000% 	        0 	  0,0000% 	           0

Total time:     	1006,450 seconds.

[B]ihist_high_mapping.txt[/B]

Mean	253,600
#Median	167
#Mode	167
#STDev	122,631
#PercentOfPairs	0,004
#InsertSize	Count
147	1
167	1
229	1
236	1
489	1

**GenoMax** · 11-02-2015, 02:05 PM

This isn't looking good as you have probably figured out by now. The reads you have do not seem to map to the intended target on chr 14.

**Brian Bushnell** · 11-02-2015, 02:06 PM

It looks like you may not have what you think you have. I recommend BLASTing some of those reads against nt to see if you get hits. They may not be human, or at least not chromosome 14.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 50 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Merging paired ends fastq files with BBMerge

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News