Originally posted by ssully
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
I have removed the linkers and split the 454 mate pair reads with sff_extract; I have them now as (after deinterlacing) a pair of fastq files (454_1.fastq and 454_2.fastq) containing reads _1 and _2 only, respectively. In each case Read_1 represents the pre-linker and Read_2 represents the post-linker part of the original read, both in forward orientation:
schematic of original read
Code:================================^^^^^^^^^^^^^^^======================= 454_1---> linker 454_2--->
because when assembled, they should be ordered _2 --> _1 (again both in forward i.e., 5'--3' orientation), with the library insert size distance between them
schematic of assembled reads
Code:454_2 454_1 --------> (~3kb) --------> ==================================================================
would a YAML readset section like this work?
{
orientation: "ff",
type: "mate-pairs",
right reads: [
"/FULL_PATH_TO_DATASET/454_1.fastq"
],
left reads: [
"/FULL_PATH_TO_DATASET/454_2.fastq"
]
},
or should it be
{
orientation: "ff",
type: "mate-pairs",
right reads: [
"/FULL_PATH_TO_DATASET/454_2.fastq"
],
left reads: [
"/FULL_PATH_TO_DATASET/454_1.fastq"
]
},
?
(I adapted these views from http://seqanswers.com/forums/showpos...85&postcount=2 )Last edited by ssully; 12-02-2014, 07:10 PM.
Comment
-
Originally posted by ssully View PostI have removed the linkers and split the 454 mate pair reads with sff_extract; I have them now as (after deinterlacing) a pair of fastq files (454_1.fastq and 454_2.fastq) containing reads _1 and _2 only, respectively. In each case Read_1 represents the pre-linker and Read_2 represents the post-linker part of the original read, both in forward orientation:
schematic of original read
Code:================================^^^^^^^^^^^^^^^======================= 454_1---> linker 454_2--->
because when assembled, they should be ordered _2 --> _1 (again both in forward i.e., 5'--3' orientation), with the library insert size distance between them
schematic of assembled reads
Code:454_2 454_1 --------> (~3kb) --------> ==================================================================
would a YAML readset section like this work?
{
orientation: "ff",
type: "mate-pairs",
right reads: [
"/FULL_PATH_TO_DATASET/454_1.fastq"
],
left reads: [
"/FULL_PATH_TO_DATASET/454_2.fastq"
]
},
or should it be
{
orientation: "ff",
type: "mate-pairs",
right reads: [
"/FULL_PATH_TO_DATASET/454_2.fastq"
],
left reads: [
"/FULL_PATH_TO_DATASET/454_1.fastq"
]
},
?
(I adapted these views from http://seqanswers.com/forums/showpos...85&postcount=2 )
Anyway, you can simply feed the data to SPAdes and check whether it inferred the insert size distribution properly.
Comment
-
I don't know; the second variant seems to be saying to me , 'the reads from the right side of the library read (post-linker, 454_2.fastq) belong at the right end of the genome fragment' -- which would be incorrect.
For me it really comes down to what 'right reads' and 'left reads' means in the YAML specification:
e.g. does 'right reads' refer to a read's position in the 454 mate pair library read (i.e., right side/post-linker in the 454 read, but maps to the left end of the genomic fragment) or with respect to the genome (i.e., maps to the right end of the genomic fragment...but comes from the left side/pre-linker half of the 454 read)
(it's also unusual to me that 'right read' is specified before 'left read' in the YAML, for both paired end and mate pair types, given that sequences are typically read by humans from left to right, 5' to 3'... is there a particular reason for that?)
But anyway I can try inputting it both ways, in two runs, and see which one assembles the 454 mate pairs correctly.Last edited by ssully; 12-03-2014, 01:12 PM.
Comment
-
I worked out the correct orientation and order of 454 paired reads input for SPAdes, and have corrected the reads with --iontorrent option (ionhammer). Btu now I have questions regarding ionhammer error correction -- does it pay any attention to fastq quality scores?
here is an original paired-end sff read (converted to fastq -- note 'sanger style' quality scores, and lower case for low-quality bases). I have underlined that the part that constitutes the 'post linker' read.
sff to fastq
@GIDY76W02G4JWL
Code:tcagTTATTGATCAGTATTAGAATGAGGCCTATTAATAGCCAATTATCACATTTTGGATCTATTTTGTATCGATGATATCATTTATCGATAATCATCATAGTTATTTCGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA[U]TTATTGCTATAAATAAACGTACTTCTGGAGTAGAATTGAAGTGAGATAGAATTTCTGGTTTTAAGctgagactgccaaggcacacaggggatagg[/U]n + III;;;;BIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII;:8599>>@:9////92EBEDDDGIIIIIIFEC?:??IIHHHEIIIIIIIIIICCECC:??C==?EEEIEGHHIIIIIIIIGHFHGIIIIC?==CIIIIEEAAAEE>8333C444IIICIIIIGGGGGIIIGGGGIIIIIIIIIIIIA>999=499----./25:===;=@A>>::::EEIIII@@BAGGGII!
sffToCA
Code:@GIDY76W02G4JWLb clr=0,95 clv=1,0 max=1,0 tnt=1,0 rnd=t TTATTGCTATAAATAAACGTACTTCTGGAGTAGAATTGAAGTGAGATAGAATTTCTGGTTTTAAGCTGAGACTGCCAAGGCACACAGGGGATAGG + IEEAAAEE>8333C444IIICIIIIGGGGGIIIGGGGIIIIIIIIIIIIA>999=499----./25:===;=@A>>::::EEIIII@@BAGGGII
here was my spades command
Code:spades.py --only-error-correction --iontorrent --dataset 454_4.yaml -t 8 --sc -k 21,33,55 --disable-gzip-output -o sff2ca_spades_corrected
and here is the output of ionhammer for the above read
Code:>GIDY76W02G4JWLb TTATTGCTATAAATAAACGTACTTCTGGAGTAGAATTGAAGTGAGATAGAATTTCT[U]G[/U]TTTTAAGCTGAGACTGCCAAGGCACACAGGGGATAGG
So, I'm not clear on what ionhammer should be doing; it appears I need to quality-trim my 454 reads *before* running them through ionhammer...*OR* I need to preserve the lower-case base formatting in the input file?Last edited by ssully; 12-06-2014, 07:50 AM.
Comment
-
Originally posted by ssully View PostSo, I'm not clear on what ionhammer should be doing; it appears I need to quality-trim my 454 reads *before* running them through ionhammer...*OR* I need to preserve the lower-case base formatting in the input file?
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...-
Channel: Articles
Yesterday, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
55 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
51 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
45 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
55 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Comment