Unconfigured Ad

**GenoMax** · 07-03-2016, 04:33 AM

Are you saying that there are actual NNNN or just short(er) than 51 bp reads?

If there are N's then that may indicate a failure of basecalling. It could be due to overloading. Generally sequencing facilities will not release this kind of data.

If that is a result of some sort of post-run data processing (where they replaced the adapter sequences with N's for example, don't know if BaseSpace does something like that) then you would need to ask. If you ignore/strip the N's is the rest of the data good quality?

**apredeus** · 07-03-2016, 05:39 AM

There are a bunch of NNNNN reads that are 20 bp long, and there are bunch of other reads that are not N* but have a variable length. I'll try to align them to see if it will at least look like micro-RNA, but the thing is, you need to clip the adapters and it's hard to do it on a variable length read

It does not look like the cell is overloaded from FastQC report though. It looks like there's a small bubble there but that's all.

It was not a sequencing facility that did it - just a small institute ran it on their MiSeq. So they totally might have done something wrong there, they don't run it very often for this sort of libraries - mostly they sequence strains of viruses.

**Brian Bushnell** · 07-03-2016, 10:10 AM

Reads don't come off the machine with variable length unless you set the Illumina software to trim the adapters during base-calling or demultiplexing or something (not sure exactly when it happens), or they've been postprocessed in some way. You should ask how the data was generated, or better yet, see if you can get the raw fastq data.

**apredeus** · 07-03-2016, 10:13 AM

Those were supposed to be raw fastq. But you are right, I was thinking along the same lines. I'll just come over and get the data from the device myself.

**jdk787** · 07-03-2016, 07:22 PM

I've seen this with short small RNA libraries when using MiSeq reporter to demux with automatic adapter trimming.

To fix this you can redemultiplex the run with BCL2FastQ, or remove the adapter sequences from your sample sheet and redumultiplex with MiSeq reporter. Then just trim the adapters yourself.

**GenoMax** · 07-04-2016, 03:45 AM

Originally posted by apredeus View Post

Those were supposed to be raw fastq. But you are right, I was thinking along the same lines. I'll just come over and get the data from the device myself.

If you can't get the raw data or can't get the facility to re-run the analysis then just trim the N's off. One can safely assume that Illumina would know how to identify their own adapter sequences. It sounds like they are masked by the default demux process.

@Brian: What is an easy way to trim those N's using BBMap? I should add this to my BBMap tricks thread.

**jdk787** · 07-04-2016, 08:47 AM

Originally posted by GenoMax View Post

If you can't get the raw data or can't get the facility to re-run the analysis then just trim the N's off. One can safely assume that Illumina would know how to identify their own adapter sequences. It sounds like they are masked by the default demux process.

I couldn't find this info for MiSeq Reporter, but did see this in the Bcl2FastQ guide..

--mask-short-adapter-reads arg (=22) smallest number of remaining bases (after masking bases below the minimum trimmed read length) below which whole read is masked

So it looks like it is possible that the adapters are being correctly identified, but the remaining read after trimming is shorter than 22bp and may be being masked with NNNN.

Since this is micro RNA, I think it is worth trying to redemux without adapter trimming or changing this variable in order to unmask these reads instead of removing them. Doing this has worked for me when sequencing Small RNA libraries on the MiSeq.

**jdk787** · 07-04-2016, 09:24 AM

From MiSeq Reporter User Guide

Masking Short Reads
MiSeq Reporter includes a setting that prevents reads that have been almost entirely
trimmed or masked from confounding downstream analysis, which is based on the following criteria:
} If the adapter is encountered within the first 32 bases of the read, the adapter sequence is N-masked.
} If the adapter is identified in the first 32 bases and the read includes ten or more bases from the start of the adapter, the entire read is N-masked. This ten-base limit is controlled by the configuration setting NMaskShortAdapterReads.

**Brian Bushnell** · 07-04-2016, 11:26 AM

Originally posted by GenoMax View Post

One can safely assume that Illumina would know how to identify their own adapter sequences.

I'd like to think so...

What is an easy way to trim those N's using BBMap? I should add this to my BBMap tricks thread.

You can use BBDuk or Reformat with "qtrim=rl trimq=1". That will only trim trailing and leading bases with Q-score below 1, which means Q0, which means N (in either fasta or fastq format). The BBMap package automatically changes q-scores of Ns that are above 0 to 0 and called bases with q-scores below 2 to 2, since occasionally some Illumina software versions produces odd things like a handful of Q0 called bases or Ns with Q>0, neither of which make any sense in the Phred scale.

@jdk787, thanks for posting the specific details of what's going on. Looks like defaults that make sense in many cases but not for small RNAs.

**agent_pilin** · 04-10-2019, 07:05 AM

Originally posted by apredeus View Post

Hello all

I'm processing a micro-RNA-seq experiment for a collaborator of ours, and see a very unusual thing. They have sequenced three samples using miSeq, with the expected read length of 51. However instead I see lots of reads that are NNNNNNNNNNNNNN of length 20-21, and quite a few of intermediate ones too.

This is very unusual - do you have any idea about why it might have happened?

Hello Alexander, have you already found the reason of this problem?
I have the same problem with last sequencing data: the reads 1 are considered to have the length 41 bp, but real length varies from 35 bp to 41 bp and some of reads are polyN!

**GenoMax** · 04-10-2019, 08:20 AM

Are your sequences adapter masked or are there genuine N's (no calls)?

**agent_pilin** · 04-10-2019, 08:28 AM

Originally posted by GenoMax View Post

Are your sequences adapter masked or are there genuine N's (no calls)?

I think these are adapter sequences masked, but it was not me who performed sequencing experience, I process fastq raw data

**apredeus** · 04-10-2019, 08:29 AM

Originally posted by agent_pilin View Post

Hello Alexander, have you already found the reason of this problem?
I have the same problem with last sequencing data: the reads 1 are considered to have the length 41 bp, but real length varies from 35 bp to 41 bp and some of reads are polyN!

Hello,

I don't quite remember since it was a long time ago

but I'm pretty sure that the reason this happened is due to Illumina software being confused by the adapter and short read sequence. So you would need to get the untrimmed sequences. If these are not available, get the BCL files and convert them to fastq yourself.

**agent_pilin** · 04-10-2019, 08:39 AM

Originally posted by apredeus View Post

Hello,

I don't quite remember since it was a long time ago

but I'm pretty sure that the reason this happened is due to Illumina software being confused by the adapter and short read sequence. So you would need to get the untrimmed sequences. If these are not available, get the BCL files and convert them to fastq yourself.

Thank you for your answer, it's a good idea !
Stanislav

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 40 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 102 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 123 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 114 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

MiSeq producing various length reads

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News