Unconfigured Ad

**kmcarr** · 06-16-2009, 04:12 AM

Layla,

The fact that you are seeing the key tag (tcag) in your sequence indicates that you have the untrimmed sequence. SFF files store the complete flowgram, sequence and quality scores for a well. They also contains trimming information for each read, the 5' and 3' positions of high quality sequence. The trim points also account for the key tag (and multiplex barcode if used) at the 5' end and the library adapter at the 3' end if the insert was short.

When the FASTA and QUAL files are output from an SFF file using the sffinfo program they normally just contain the trimmed sequence. It is also possible to output the entire untrimmed sequence by using the -n option when you run sffinfo. In this case the portions of the read which are beyond the trim points are also output but in lower case. That is what you are seeing, the lower case bases are those which the 454 software marked to be trimmed.

**Layla** · 06-16-2009, 04:50 AM

50% lower case bases

Thank you for the information kmcarr.

I carried out a simple sffinfo -s file1.sff > file1.fna command without the -n option to get to this file. The fact that 454 has marked for these bases to be trimmed, should I also be eliminating them before I map them to the human genome? My concern is that 50% of my bases from 500MB are in lower case and in removing such bases, each read will only be on average 50 bases instead of the 500 bases that Titanium should be giving.

Any suggestions on what one should do? I guess still holding onto those reads should not be an option?

L

**hlu** · 06-19-2009, 12:57 PM

Originally posted by Layla View Post

Thank you for the information kmcarr.

I carried out a simple sffinfo -s file1.sff > file1.fna command without the -n option to get to this file. The fact that 454 has marked for these bases to be trimmed, should I also be eliminating them before I map them to the human genome? My concern is that 50% of my bases from 500MB are in lower case and in removing such bases, each read will only be on average 50 bases instead of the 500 bases that Titanium should be giving.

Any suggestions on what one should do? I guess still holding onto those reads should not be an option?

L

Might want to contact software support on this issue? This sounds like a mis-behavior for sffinfo software.

**dan** · 07-14-2009, 03:08 AM

Looking at the 454TrimStatus.txt file (produced by assembly or mapping of an SFF), I get the following values:

Mean Raw Length = 534
Mean Orig Trimmed Length = 380

About trimming before mapping... you should certainly trim the key tag and any adapter sequence from your reads before mapping (there is no way this could or should map onto your genome except by chance, i.e. in error).

Using the 454 software, I was told that there is no special consideration taken for low quality mismatches. i.e. gsMapper does not use quality information when mapping. For this reason, you should trim low quality bases before mapping. However, I'd be interested to know of any mapper that can take quality information into account, i.e. by not penalising a low quality mismatch or by mapping high quality bases and using low quality bases when generating the consensus...

It seems that the error model for 454 could be captured by a HMM. You could then map using all the available information for a read (excluding key tag and any adapter sequence) and then somehow perform a multiple HMM to HMM alignment to generate the consensus... Any maths geniuses around?

Cheers,

**bioinfosm** · 07-14-2009, 06:58 AM

Perhaps MOSAIK from Marth lab works with quality values of 454 data..

Topics	Statistics	Last Post
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 13 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 24 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 28 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 22 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM

Unconfigured Ad

Titanium upper and lower case bases

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News