Originally posted by ssully
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
I have removed the linkers and split the 454 mate pair reads with sff_extract; I have them now as (after deinterlacing) a pair of fastq files (454_1.fastq and 454_2.fastq) containing reads _1 and _2 only, respectively. In each case Read_1 represents the pre-linker and Read_2 represents the post-linker part of the original read, both in forward orientation:
schematic of original read
Code:================================^^^^^^^^^^^^^^^======================= 454_1---> linker 454_2--->
because when assembled, they should be ordered _2 --> _1 (again both in forward i.e., 5'--3' orientation), with the library insert size distance between them
schematic of assembled reads
Code:454_2 454_1 --------> (~3kb) --------> ==================================================================
would a YAML readset section like this work?
{
orientation: "ff",
type: "mate-pairs",
right reads: [
"/FULL_PATH_TO_DATASET/454_1.fastq"
],
left reads: [
"/FULL_PATH_TO_DATASET/454_2.fastq"
]
},
or should it be
{
orientation: "ff",
type: "mate-pairs",
right reads: [
"/FULL_PATH_TO_DATASET/454_2.fastq"
],
left reads: [
"/FULL_PATH_TO_DATASET/454_1.fastq"
]
},
?
(I adapted these views from http://seqanswers.com/forums/showpos...85&postcount=2 )Last edited by ssully; 12-02-2014, 07:10 PM.
Comment
-
Originally posted by ssully View PostI have removed the linkers and split the 454 mate pair reads with sff_extract; I have them now as (after deinterlacing) a pair of fastq files (454_1.fastq and 454_2.fastq) containing reads _1 and _2 only, respectively. In each case Read_1 represents the pre-linker and Read_2 represents the post-linker part of the original read, both in forward orientation:
schematic of original read
Code:================================^^^^^^^^^^^^^^^======================= 454_1---> linker 454_2--->
because when assembled, they should be ordered _2 --> _1 (again both in forward i.e., 5'--3' orientation), with the library insert size distance between them
schematic of assembled reads
Code:454_2 454_1 --------> (~3kb) --------> ==================================================================
would a YAML readset section like this work?
{
orientation: "ff",
type: "mate-pairs",
right reads: [
"/FULL_PATH_TO_DATASET/454_1.fastq"
],
left reads: [
"/FULL_PATH_TO_DATASET/454_2.fastq"
]
},
or should it be
{
orientation: "ff",
type: "mate-pairs",
right reads: [
"/FULL_PATH_TO_DATASET/454_2.fastq"
],
left reads: [
"/FULL_PATH_TO_DATASET/454_1.fastq"
]
},
?
(I adapted these views from http://seqanswers.com/forums/showpos...85&postcount=2 )
Anyway, you can simply feed the data to SPAdes and check whether it inferred the insert size distribution properly.
Comment
-
I don't know; the second variant seems to be saying to me , 'the reads from the right side of the library read (post-linker, 454_2.fastq) belong at the right end of the genome fragment' -- which would be incorrect.
For me it really comes down to what 'right reads' and 'left reads' means in the YAML specification:
e.g. does 'right reads' refer to a read's position in the 454 mate pair library read (i.e., right side/post-linker in the 454 read, but maps to the left end of the genomic fragment) or with respect to the genome (i.e., maps to the right end of the genomic fragment...but comes from the left side/pre-linker half of the 454 read)
(it's also unusual to me that 'right read' is specified before 'left read' in the YAML, for both paired end and mate pair types, given that sequences are typically read by humans from left to right, 5' to 3'... is there a particular reason for that?)
But anyway I can try inputting it both ways, in two runs, and see which one assembles the 454 mate pairs correctly.Last edited by ssully; 12-03-2014, 01:12 PM.
Comment
-
I worked out the correct orientation and order of 454 paired reads input for SPAdes, and have corrected the reads with --iontorrent option (ionhammer). Btu now I have questions regarding ionhammer error correction -- does it pay any attention to fastq quality scores?
here is an original paired-end sff read (converted to fastq -- note 'sanger style' quality scores, and lower case for low-quality bases). I have underlined that the part that constitutes the 'post linker' read.
sff to fastq
@GIDY76W02G4JWL
Code:tcagTTATTGATCAGTATTAGAATGAGGCCTATTAATAGCCAATTATCACATTTTGGATCTATTTTGTATCGATGATATCATTTATCGATAATCATCATAGTTATTTCGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA[U]TTATTGCTATAAATAAACGTACTTCTGGAGTAGAATTGAAGTGAGATAGAATTTCTGGTTTTAAGctgagactgccaaggcacacaggggatagg[/U]n + III;;;;BIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII;:8599>>@:9////92EBEDDDGIIIIIIFEC?:??IIHHHEIIIIIIIIIICCECC:??C==?EEEIEGHHIIIIIIIIGHFHGIIIIC?==CIIIIEEAAAEE>8333C444IIICIIIIGGGGGIIIGGGGIIIIIIIIIIIIA>999=499----./25:===;=@A>>::::EEIIII@@BAGGGII!
sffToCA
Code:@GIDY76W02G4JWLb clr=0,95 clv=1,0 max=1,0 tnt=1,0 rnd=t TTATTGCTATAAATAAACGTACTTCTGGAGTAGAATTGAAGTGAGATAGAATTTCTGGTTTTAAGCTGAGACTGCCAAGGCACACAGGGGATAGG + IEEAAAEE>8333C444IIICIIIIGGGGGIIIGGGGIIIIIIIIIIIIA>999=499----./25:===;=@A>>::::EEIIII@@BAGGGII
here was my spades command
Code:spades.py --only-error-correction --iontorrent --dataset 454_4.yaml -t 8 --sc -k 21,33,55 --disable-gzip-output -o sff2ca_spades_corrected
and here is the output of ionhammer for the above read
Code:>GIDY76W02G4JWLb TTATTGCTATAAATAAACGTACTTCTGGAGTAGAATTGAAGTGAGATAGAATTTCT[U]G[/U]TTTTAAGCTGAGACTGCCAAGGCACACAGGGGATAGG
So, I'm not clear on what ionhammer should be doing; it appears I need to quality-trim my 454 reads *before* running them through ionhammer...*OR* I need to preserve the lower-case base formatting in the input file?Last edited by ssully; 12-06-2014, 07:50 AM.
Comment
-
Originally posted by ssully View PostSo, I'm not clear on what ionhammer should be doing; it appears I need to quality-trim my 454 reads *before* running them through ionhammer...*OR* I need to preserve the lower-case base formatting in the input file?
Comment
Latest Articles
Collapse
-
by seqadmin
Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...-
Channel: Articles
03-22-2024, 06:39 AM -
-
by seqadmin
The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.
Avian Conservation
Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...-
Channel: Articles
03-08-2024, 10:41 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 06:37 PM
|
0 responses
8 views
0 likes
|
Last Post
by seqadmin
Yesterday, 06:37 PM
|
||
Started by seqadmin, Yesterday, 06:07 PM
|
0 responses
8 views
0 likes
|
Last Post
by seqadmin
Yesterday, 06:07 PM
|
||
Started by seqadmin, 03-22-2024, 10:03 AM
|
0 responses
49 views
0 likes
|
Last Post
by seqadmin
03-22-2024, 10:03 AM
|
||
Started by seqadmin, 03-21-2024, 07:32 AM
|
0 responses
67 views
0 likes
|
Last Post
by seqadmin
03-21-2024, 07:32 AM
|
Comment