Hi all,
I'm a newbie to NGS work and would have a few questions that I hope someone can help me with.
I got strand-specific PE Illumina data (100bp). The company already did a clean up (adaptor filtering..). I checked my two files with fastqc and quality wise they look good (over Q30 on average), but have a slight drop at the 3' end to about Q28 and from the other graphs I had a bit more variation for the first 5-7 bases. I just did a test-run with PrinSeq and Trimmomatic using a few 100 sequences and Trimmomatic seems to give me the nicer output (PrinSeq adds the sequence identifier to the quality information - so 1 line. sequence identifier, 2 line. actual read, 3 line. + sequence identifier again, 4 line. quality information. Trimmomatic doesn't, it only has the + in line 3 which matches with the input file.) that Trinity might like better.
The two things I'm interested in doing is a headcrop of 7 nucleotides and I'd love to use trailing to cut for quality on the 3' end. Now according to the manual it seems to work with a quality score of 1, 2 or 3 (3 should be used) - what does that mean? I'd like to cut anything below Q30 on my 3' end. Some posts here related to other questions with Trimmomatic seem to suggest I can write 30 as well, is that true? Could I just say trailing:30 or does it have to be 3 (whatever that means)?
Strand-specificity doesn't really matter here, does it (my data is RF directionality)? Can I still write the /1 file as my first input file and the /2 as my second input file or would I have to change that?
I'm also struggeling with phred33/phred64. I read 33 is for Illumina version 1.8 and I also read the wiki post most seem to refer to in that regard, but the one my seq id matches to doesn't clearly say what version it belongs to. It's very difficult getting information from my sequencing company, so I hope to figure out myself what version they might have used. My sequence id is like this:
instrument:run id:1101:1374:1950#ATCAGAA/1
Is there a way to figure out the Illumina version based on this?
Thank you so much for your help and apologies for the lengthy post.
Nicole
I'm a newbie to NGS work and would have a few questions that I hope someone can help me with.
I got strand-specific PE Illumina data (100bp). The company already did a clean up (adaptor filtering..). I checked my two files with fastqc and quality wise they look good (over Q30 on average), but have a slight drop at the 3' end to about Q28 and from the other graphs I had a bit more variation for the first 5-7 bases. I just did a test-run with PrinSeq and Trimmomatic using a few 100 sequences and Trimmomatic seems to give me the nicer output (PrinSeq adds the sequence identifier to the quality information - so 1 line. sequence identifier, 2 line. actual read, 3 line. + sequence identifier again, 4 line. quality information. Trimmomatic doesn't, it only has the + in line 3 which matches with the input file.) that Trinity might like better.
The two things I'm interested in doing is a headcrop of 7 nucleotides and I'd love to use trailing to cut for quality on the 3' end. Now according to the manual it seems to work with a quality score of 1, 2 or 3 (3 should be used) - what does that mean? I'd like to cut anything below Q30 on my 3' end. Some posts here related to other questions with Trimmomatic seem to suggest I can write 30 as well, is that true? Could I just say trailing:30 or does it have to be 3 (whatever that means)?
Strand-specificity doesn't really matter here, does it (my data is RF directionality)? Can I still write the /1 file as my first input file and the /2 as my second input file or would I have to change that?
I'm also struggeling with phred33/phred64. I read 33 is for Illumina version 1.8 and I also read the wiki post most seem to refer to in that regard, but the one my seq id matches to doesn't clearly say what version it belongs to. It's very difficult getting information from my sequencing company, so I hope to figure out myself what version they might have used. My sequence id is like this:
instrument:run id:1101:1374:1950#ATCAGAA/1
Is there a way to figure out the Illumina version based on this?
Thank you so much for your help and apologies for the lengthy post.
Nicole
Comment