Seqanswers Leaderboard Ad

**simonandrews** · 08-19-2011, 11:45 PM

Without seeing your actual data it's really difficult to make any sensible suggestions as to what might be different. If you're seeing differences in the sequence content plots then these will bias the results of the Kmer plots. If you could post 2 of your sequence content plots which look different we might be able to offer more concrete suggestions.

**byou678** · 08-22-2011, 07:13 AM

Any other ideas? Thanks in advance.

**byou678** · 08-23-2011, 05:41 AM

[IMG]C:\Users\whittier.2\Desktop[/IMG]

**byou678** · 08-23-2011, 05:42 AM

**simonandrews** · 08-23-2011, 05:47 AM

The links you tried to post point to a file on your desktop which we can't see. You either need to put the images on a public facing webserver, or add them as attachments to your post. To add an attachment go into the 'advanced' posting options and click on the paper clip icon.

**byou678** · 08-23-2011, 06:06 AM

Thanks simonandrews!

The picture of " Per Base Sequence Content " of 5_1 is in the attachment.

Originally posted by simonandrews View Post

The links you tried to post point to a file on your desktop which we can't see. You either need to put the images on a public facing webserver, or add them as attachments to your post. To add an attachment go into the 'advanced' posting options and click on the paper clip icon.

Attached Files

per_base_sequence_content.png (37.4 KB, 237 views)

**byou678** · 08-23-2011, 06:09 AM

The Picture of " Per Base Sequence of Content" of 5_2 in the following attachment. Please take a look and compare to 5_1 and give me some ideas. Thanks !

Attached Files

per_base_sequence_content.png (35.7 KB, 100 views)

**byou678** · 08-23-2011, 06:13 AM

The Picture of "Overrepresented Kmers" of 5_1 is in the attachment.

Attached Files

kmer_profiles.png (96.8 KB, 138 views)

**byou678** · 08-23-2011, 06:15 AM

The Picture of "Overrepresented Kmers" of 5_2 is in the attachment.

All the four pictures above are FASTQC results after quality trimming and adapter trimming.

Attached Files

Screen shot 2011-08-23 at 10.40.51 AM.png (89.9 KB, 107 views)

**simonandrews** · 08-23-2011, 07:02 AM

It looks like in the 5_1 sample you are reading through your diverse insert sequence into some kind of adapter. You can actually read part of the adapter sequence from the Kmer plot - it starts with GAGCGGTCA. I've had a quick look but I couldn't match that to any of the standard illumina adapter sequences (it does occur in pBT-6 if that rings any bells), but you should check to see if it matches any of the primers or vectors you've used during your library construction.

If you can figure out the source of the sequence and you're happy the rest of your library is OK then you can rerun the adapter trimmer with this new sequence to remove the additional sequence and your libraries should look more like each other again.

**byou678** · 08-23-2011, 07:38 AM

Thanks again.

The adapter sequences are as below:
5' TACACTCTTTCCCTACACGACGCTCTTCCGATCT
5 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA
5' GACGGCATACGAGCTCTTCCGATCT
5' AGATCGGAAGAGCTCGTATGCCGTC

I use "Trim.pl" to do quality trimming, some scripts are showed below

Options:
--type <num> 0=standard trimming, 1=adaptive trimming, 2=windowed adaptive trimming. Default 0
--qual-threshold <num> quality threshold for trimming, default 20
--length-threshold <num> length threshold for trimming, default 20
--qual-type <num> 0=sanger qualities, 1=illumina qualities pipeline>=1.3, 2=illumina qualities pipeline<1.3. Default 0.
--pair1 <paired end input filename> fastq, paired end file. Must have same number of records as pair2. Required.
--pair2 <paired end input filename> fastq, paired end file. Must have same number of records as pair1. Required.
--outpair1 <paired end output file> Required.
--outpair2 <paired end output file> Required.
--single <single end output file> Required.

I choose the appropriate characters and values to run quality trimming. Here I use --type 2 ( windowed adaptive trimming); --qual-type 1 (illumina qualities pipeline>=1.3); default values for--qual-threshold <num> and length-threshold <num>.
Put the corresponding sequence data names after --pair1 and --pair2 . In addition, name the output files of --outpair1, --output2 and --single.

I choose software "cutadapt" for adaptor trimming, I use the following scripts in Terminal to run the 5.1 file (the 5.1 file after quality trimming). The output will be saved in file “5.1_adaptortrim.fastq”.
$ cutadapt -b TACACTCTTTCCCTACACGACGCTCTTCCGATCT –b AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA –b GACGGCATACGAGCTCTTCCGATCT –b AGATCGGAAGAGCTCGTATGCCGTC 5.1_trimmed.fastq > 5.1_adaptortrim.fastq

Any more suggestions and many thanks!!

Originally posted by simonandrews View Post

It looks like in the 5_1 sample you are reading through your diverse insert sequence into some kind of adapter. You can actually read part of the adapter sequence from the Kmer plot - it starts with GAGCGGTCA. I've had a quick look but I couldn't match that to any of the standard illumina adapter sequences (it does occur in pBT-6 if that rings any bells), but you should check to see if it matches any of the primers or vectors you've used during your library construction.

If you can figure out the source of the sequence and you're happy the rest of your library is OK then you can rerun the adapter trimmer with this new sequence to remove the additional sequence and your libraries should look more like each other again.

**simonandrews** · 08-23-2011, 07:49 AM

So the problem is that you have a sequence in your library which isn't one of the adapters you passed to cutadapt. I can't immediately see where it's come from, but since cutadapt didn't know about it it didn't remove it, and your trimmed library is still biased. I'd suspect that if you looked at the size distribution of your two libraries after trimming you'll see that one has been trimmed significantly more than the other.

You need to figure out as much of this mystery sequence as you can (either by finding the sequence in one of your primers or by looking at some of your sequences and seeing where the common sequence at the end stops). You can then pass this as an extra sequence to cutadapt which can remove it from your library.

**GenoMax** · 08-23-2011, 08:43 AM

byou678: Have you tried aligning the reads to your reference? Since this is an RNA-seq sample are we seeing just the standard "random primer effect" at the beginning of the read?

**byou678** · 08-23-2011, 09:42 AM

Yes, align using BWA. Could you explain your second question in detail? Thanks for your reply.

Originally posted by GenoMax View Post

byou678: Have you tried aligning the reads to your reference? Since this is an RNA-seq sample are we seeing just the standard "random primer effect" at the beginning of the read?

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Need help for FastQC results. Thanks!!

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News