SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
CIGAR inconsist with read length. Does Bowtie2 reedit read sequence? wisense Bioinformatics 5 05-26-2015 11:35 PM
SPAdes: selecting K-mer based on read length bio_informatics Bioinformatics 8 04-20-2015 04:32 AM
Tophat2 read-gap-length and read-mismatches max acceptable values ElizabethRoss RNA Sequencing 0 10-13-2014 03:38 PM
Tophat: options -N --read-edit-dist --read-gap-length Pradhaun Bioinformatics 0 01-04-2013 07:58 AM
picard error: Mismatch between read length and quals length writing read shawpa Bioinformatics 0 08-20-2012 05:52 AM

Reply
 
Thread Tools
Old 06-10-2015, 07:22 AM   #1
bio_informatics
Senior Member
 
Location: USA

Join Date: Nov 2013
Posts: 182
Default SPAdes: with different read length

Hi members,

I'm facing something weird. I know not all reads in the fastq files can have same length.
But in my trimmed data, read length of reverse and forward files are different. I checked out for only first read, though.

Data: paired end
Organism: E. coli

Output 101 from reverse strand:

Code:
zcat sample_R2_trimmed.fastq.gz | awk '{if(NR%4==2) print length($1)}'  | head -1
Output 54 from forward strand:

Code:
zcat sample_R1_trimmed.fastq.gz | awk '{if(NR%4==2) print length($1)}'  | head -1
An icing to this dilemma is, SPAdes uses 55 as K-mer during assembly. And the genome size from assembly is reasonable.

My questions:

1) Shouldn't SPAdes select K-mer less than 55 here?
bio_informatics is offline   Reply With Quote
Old 06-10-2015, 07:41 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

Can you tell us why are they different (and uniformly of length 54 for R1)? Did you trim them that way?
GenoMax is offline   Reply With Quote
Old 06-10-2015, 07:48 AM   #3
bio_informatics
Senior Member
 
Location: USA

Join Date: Nov 2013
Posts: 182
Default

Hi Genomax,
Thanks for your reply.

I've 2 sets of data, trimmed (sequencing center did this trimming based on the quality)and untrimmed. I used the already trimmed data for assembly.

Oh, they aren't uniform ( looking at already trimmed file).
Have 72 different lengths: max is 101, minimum 30.
So, in a way, all numbers from 30-101 are present for read lengths. That's strange. :-/

Last edited by bio_informatics; 06-10-2015 at 07:50 AM.
bio_informatics is offline   Reply With Quote
Old 06-10-2015, 07:57 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

In that case your original question is no longer applicable, correct?
GenoMax is offline   Reply With Quote
Old 06-10-2015, 07:58 AM   #5
bio_informatics
Senior Member
 
Location: USA

Join Date: Nov 2013
Posts: 182
Default

Yes.
Sorry.

Quote:
Originally Posted by GenoMax View Post
In that case your original question is no longer applicable, correct?
bio_informatics is offline   Reply With Quote
Old 06-10-2015, 07:59 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

No problem. Sounds like you did get a reasonable sized assembly so all should be well.
GenoMax is offline   Reply With Quote
Old 06-10-2015, 08:04 AM   #7
bio_informatics
Senior Member
 
Location: USA

Join Date: Nov 2013
Posts: 182
Default

Yes, that is more important.
Merci!
bio_informatics is offline   Reply With Quote
Reply

Tags
illumia 1.8, paired end, spades

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:43 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO