SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Finding main transcripts from SOAPdenovo-trans results Melissa Bioinformatics 0 02-12-2014 09:45 AM
To Assemble or not to Assemble (Illumina short reads-microbial metagenomic data) Andres_Gomez Bioinformatics 0 09-30-2013 02:52 PM
Results from SOAPdenovo no variation Petrichor Bioinformatics 0 04-16-2013 04:06 AM
Comparing Soapdenovo and Soapdenovo-Trans Kate.W Bioinformatics 0 08-06-2012 04:04 PM
Assemblathon:Unable to reproduce results reported for SOAPdenovo Mahtab Bioinformatics 0 12-21-2011 05:01 PM

Reply
 
Thread Tools
Old 04-22-2015, 08:45 AM   #1
huily
Junior Member
 
Location: auburn, AL

Join Date: Jan 2015
Posts: 8
Default many N in soapdenovo assemble results

Hi,

I used soapdenovo2 assemble one plant genome, but there are many N in the scaffolds and the N50 only 15000bp. Below is some results. I don't know why? Hope somebody can help me. Thank you very much!

Size_includeN 1339592585
Size_withoutN 305042437
Scaffold_Num 975033
Mean_Size 1373
Median_Size 120
Longest_Seq 221407
Shortest_Seq 100
Singleton_Num 803540
Average_length_of_break(N)_in_scaffold 1061

Known_genome_size NaN
Total_scaffold_length_as_percentage_of_known_genome_size NaN

scaffolds>100 842372 86.39%
scaffolds>500 119625 12.27%
scaffolds>1K 93419 9.58%
scaffolds>10K 35907 3.68%
scaffolds>100K 187 0.02%
scaffolds>1M 0 0.00%

Nucleotide_A 85568194 6.39%
Nucleotide_C 68439481 5.11%
Nucleotide_G 67296842 5.02%
Nucleotide_T 83737920 6.25%
GapContent_N 1034550148 77.23%
Non_ACGTN 0 0.00%
GC_Content 44.50% (G+C)/(A+C+G+T)

N10 59139 1741
N20 43739 4409
N30 34619 7868
N40 27474 12218
N50 21467 17745
N60 15647 25064
N70 10191 35422
N80 7208 51584
N90 931 95139



Regards,
huily
huily is offline   Reply With Quote
Old 04-22-2015, 09:32 AM   #2
vivek_
PhD Student
 
Location: Denmark

Join Date: Jul 2012
Posts: 163
Default

Looks like the scaffolding is rather poor, so its kind of expected to see a ton of gaps with Ns. What insert size libraries were used for the assembly?
vivek_ is offline   Reply With Quote
Old 04-22-2015, 09:55 AM   #3
huily
Junior Member
 
Location: auburn, AL

Join Date: Jan 2015
Posts: 8
Default

Two pair end libraries, each insert length is about 250bp and on mate-pair library of insert length 7K. I have tried many times use velvet and soapdenovo2, but all seems not good.
huily is offline   Reply With Quote
Old 04-22-2015, 11:17 AM   #4
vivek_
PhD Student
 
Location: Denmark

Join Date: Jul 2012
Posts: 163
Default

Have you done any pre-processing steps to clean your reads prior to assembly? SOAPdenovo has issues with chimeric mate pair reads which effect proper scaffolding if not removed prior to assembly. Also specifying the mate pair reads in config file is rather tricky.

Here's some helpful discussion if you haven't seen it already: https://www.biostars.org/p/13142/
vivek_ is offline   Reply With Quote
Old 04-22-2015, 11:42 AM   #5
huily
Junior Member
 
Location: auburn, AL

Join Date: Jan 2015
Posts: 8
Default

Dear vivek,

Thank you very much! I trimmed and normalized my raw reads before assembly. Here is my config for soapdenovo.

max_rd_len=100
[LIB]
avg_ins=250
reverse_seq=0
asm_flags=3
rank=1
q1=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/AU_normalized/DNA-1_CGATGT_L002_R1_001_paired_trimmed_paired_1.fastq.normalized_K25_C30_pctSD200.fq
q2=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/AU_normalized/DNA-1_CGATGT_L002_R1_001_paired_trimmed_paired_2.fastq.normalized_K25_C30_pctSD200.fq
q1=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/PBU_normalized/SM01-PBU1_GTCCGC_L005_R1_001_paired_trimmed_paired_1.fastq.normalized_K25_C30_pctSD200.fq
q2=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/PBU_normalized/SM01-PBU1_GTCCGC_L005_R1_001_paired_trimmed_paired_2.fastq.normalized_K25_C30_pctSD200.fq
[LIB]
avg_ins=7000
reverse_seq=0
asm_flags=3
rank=2
q1=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/PBU7k_normalized/SM01-PBU1-7k_CCGTCC_L005_R1_001_paired_trimmed_paired_1.fastq.normalized_K25_C30_pctSD200.fq
q2=/diag/home/hzz0036/goosegrass_genome/clc_trimmed_genome/PBU7k_normalized/SM01-PBU1-7k_CCGTCC_L005_R1_001_paired_trimmed_paired_2.fastq.normalized_K25_C30_pctSD200.fq

My script is : SOAPdenovo-63mer all -s /diag/home/goosegrass_genome/SOAPdenovo/config -K 31 -R -o /diag/home/genome/SOAPdenovo/graph_prefix_1>scaff.log_2>scaff.err

Do you think have some suggestions about improving my script? And I don't know whether my mate pair is chimeric or not. How can I judge it?

Thanks a lot.
huily is offline   Reply With Quote
Old 04-22-2015, 11:59 AM   #6
vivek_
PhD Student
 
Location: Denmark

Join Date: Jul 2012
Posts: 163
Default

Someone suggested in the thread I linked to set reverse_seq=1 for the mate pair library.

Other than that something I did for a similar issue albeit a long while ago was to align the 7kb library reads to the draft genome you currently have and see what kind of insert size distribution you are observing.

If some read are chimeric, you'll see the mate pair reads with insert sizes much less than 7kB in the alignment results, which you can subsequently discard and re-do the assembly with the remaining ones to see if it improves scaffolding.
vivek_ is offline   Reply With Quote
Old 04-22-2015, 12:25 PM   #7
huily
Junior Member
 
Location: auburn, AL

Join Date: Jan 2015
Posts: 8
Default

I reversed the mate pair sequence use clc before assemble, so I set reverse_seq=0.
I also used velvet with multikemrs to assemble, it seems no N in the scaffolds but N50 is only 11199bp. I don't know why. I will try to allign mate pair reads to the draft genome to see the chimeric. Thanks!
huily is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:55 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO