Seqanswers Leaderboard Ad

**harrike** · 02-16-2016, 02:44 PM

Does the 2nd column indicate the strand? I check a few lines of my data. The frequency and number (2nd column value) are listed below.

Frequency Number
847 147
75 163
902 339
94 355
97 403
909 419
71 83
847 99

419/339 pair stands for the "+" strand, 147/99 the "-" strand"? What others?

**GenoMax** · 02-16-2016, 02:47 PM

Originally posted by harrike View Post

Does the 2nd column indicate the strand? I check a few lines of my data. The frequency and number (2nd column value) are listed below.

Frequency Number
847 147
75 163
902 339
94 355
97 403
909 419
71 83
847 99

419/339 pair stands for the "+" strand, 147/99 the "-" strand"? What others?

See the added info in my post above.

**harrike** · 02-16-2016, 02:54 PM

Hi Genomax,

Thanks for providing the info. It is quite helpful. I am clear now.

Rui

**harrike** · 02-19-2016, 07:31 AM

Hi Alex,

This time I am using STAR to another set of data, which are strand-specific, paired-end, and of 150 bp read length. The command I used is "

STAR-STAR_2.4.2a/bin/Linux_x86_64/STAR --genomeDir Zmay_AGPv2_STAR_index/ --runThreadN 24 --readFilesIn Zm_ant_02_07a_TGACCA_L001_R1_001.fastq Zm_ant_02_07a_TGACCA_L001_R2_001.fx/ --runThreadN 24 --readFilesIn Zm_TGACCA_L001_R1_001.fastq Zm_TGACCA_L001_R2_001.fastq --outSAype EndToEnd --outFilterIntronMotifs RemoveNoncanonical --outFilterType BySJout --outFileNamePrefix Zm_antMtype BAM Unsorted --outFilterMultimapNmax 20 --alignIntronMax 10000 --alignMatesGapMax 10000 --alignEndsType EndToEnd --outFilterIntronMotifs RemoveNoncanonical --outFilterType BySJout --outFileNamePrefix Zm_TGACCA_L001_R1R2_

There are 56.34 of reads unmapped (short), see the Log.final.out file below:

What are the possible reason of this low-mapping rate? Thanks,

Rui

**GenoMax** · 02-19-2016, 08:06 AM

I suggest that you start by looking at a few (10-20) unmapped reads and blast them against nt to see what they are aligning to. You may be surprised by what you find and it may provide an explanation for the low % alignment.

**alexdobin** · 02-19-2016, 02:29 PM

Hi Rui,

here are a few suggestions in addition to @GenoMax's suggestion.

1. You are using --alignEndsType EndToEnd, which requires end-to-end alignment for each read (no soft clipping). This might be too harsh for longer reads, which are more likely to have poor quality tails, adapters at the ends etc. Please try to map without this option.
2. Map read1 and read2 separately - you may have a problem with one of the reads.
3. Check sequencing quality by plotting quality scores vs position in read (Illumina pipelines typically produce these plots). If sequencing quality drops towards the ends of the reads for a substantial portion of the reads, this would explain poor mappability.

Cheers
Alex

**harrike** · 02-19-2016, 08:52 PM

Hi Alex,

Thanks for your suggestions.

I manually checked a couple of reads as Genomax suggested, and find the major reason of this low mapping rate is because that most of the reads have adapter, due to the poor construction of RNA-seq library. What I am trying to do is to trim the adapter and do the mapping again. The read quality is good per FastQC check.

I will try to relax --alignEndsType option, and see if the mapping will become better or not.

Rui

**Juegos 2 friv 4school** · 02-19-2016, 11:40 PM

Thank your article. very helpful article. thank you very much.

**SamCurt** · 04-04-2016, 09:08 AM

Just a quick question here. Is the parameters file used with --parametersFile just a list of command-line options in the same way I type in the console?

**alexdobin** · 04-04-2016, 02:25 PM

Originally posted by SamCurt View Post

Just a quick question here. Is the parameters file used with --parametersFile just a list of command-line options in the same way I type in the console?

The file with parameters should have each parameter on a separate line:
<parameterName> <parameterValue(s)>
parameterName should not contain --
For instance,
genomeChrBinNbits 18
genomeSAsparseD 1
readFilesIn Read1 Read2
readFilesCommand -

**SamCurt** · 04-05-2016, 08:42 PM

Thank you for the quick reply, Alex.

I also have another problem here. My new institution only has 2.4.0j on their cluster, and it'd take about a week to get a newer version installed. Do you think it's safe to run the first pass using 2.4.0j, and use its SJ.out.tab files for --sjdbFileChrStartEnd when I get, say, 2.5.1b?

**alexdobin** · 04-06-2016, 06:31 AM

Originally posted by SamCurt View Post

Thank you for the quick reply, Alex.

I also have another problem here. My new institution only has 2.4.0j on their cluster, and it'd take about a week to get a newer version installed. Do you think it's safe to run the first pass using 2.4.0j, and use its SJ.out.tab files for --sjdbFileChrStartEnd when I get, say, 2.5.1b?

Hi Sam,

this would be generally safe, however, when you publish your method, the reviewers and readers will have a bone to pick with you.

STAR does not really require installation, you can download a pre-compiled executable and run it instead of the one "installed" on your cluster.
I recommend re-generating the genome indexes for the 2.5.1b.

Cheers
Alex

**GenoMax** · 04-06-2016, 06:46 AM

Originally posted by alexdobin View Post

I recommend re-generating the genome indexes for the 2.5.1b.

Cheers
Alex

@Alex: Does that mean indexes generated with older versions won't work or you recommend that they be regenerated.

**alexdobin** · 04-06-2016, 08:30 AM

Originally posted by GenoMax View Post

@Alex: Does that mean indexes generated with older versions won't work or you recommend that they be regenerated.

The new versions of STAR may not work with old genome indexes in rare cases - hence my recommendation to re-generate with 2.5.1 that is very stable.

**SamCurt** · 05-02-2016, 08:55 AM

So, just for gene expression profiling purposes, should I keep my sjDb file set for second-pass alignment constant?

Complete story: I have a set of ~40 samples already completed the entire set of double-pass alignment for both gene expression and variation analysis purposes. sjDb files from the first-passes of these samples were used for their second-pass alignments.

Now I have received a further ~15 samples within the same project of which I'd perform gene expression only. I wonder whether I should I do a first-pass on these new samples and pool their sjDb's with the old ones for second-pass, or just do a "second-pass" with the old sjDb's? My concern is obviously not about time, but rather whether using a different sjDb set would make the gene counts less comparable.

Topics	Statistics	Last Post
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Yesterday, 10:17 AM	0 responses 7 views 0 reactions	Last Post by seqadmin Yesterday, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 60 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM
Mapping the snoRNAome in Zebrafish to Advance Disease Research by seqadmin Started by seqadmin, 03-18-2025, 12:50 PM	0 responses 50 views 0 reactions	Last Post by seqadmin 03-18-2025, 12:50 PM

Seqanswers Leaderboard Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News