SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > 454 Pyrosequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Fastq groomer from command line Giles Bioinformatics 4 12-14-2011 01:04 AM
Want to use extract_genomic_dna in command line louis7781x Bioinformatics 2 12-04-2011 06:51 AM
SAMtools command line ??? Pawan Noel Bioinformatics 6 11-16-2010 11:42 AM
Tophat command line options ice RNA Sequencing 6 09-02-2010 04:25 PM
SIFT on the command line lamasmi Bioinformatics 2 08-17-2010 10:32 AM

Reply
 
Thread Tools
Old 03-29-2011, 02:24 AM   #1
hlwright
Member
 
Location: Liverpool, UK

Join Date: Feb 2011
Posts: 30
Question Newbler runMapping via command line

Hello everyone, I am new to this forum and this is my first post so I hope someone can help me.

I have some 454 transcriptomic data which I am trying to analyse using Newbler mapping to the human GRCh37.61 cDNA fasta reference. I am having to run Newbler via the command line at the moment as I do not have enough RAM to launch it via Java. However, it seems to be running quite well this way so I am not too bothered about not being able to launch it via Java.

However, I am exploring the command line options to try and improve the number of reads which are fully/partially mapping. I am still getting a large number of reads which are classified as repeats and I wondered if anyone had any tips on how to improve the quality of my mapping. (The default settings gave me 11% fully mapped, 6% partially mapped, 32% unmapped, 41% repeat, 6.5% chimeric, 3.5% too short).

I have tried decreasing the seed length from 16 to 10 and this greatly decreased the number of unmapped reads, but increased the number of repeats (almost 50%). I have also changed the repeat score threshold from default (12) to 0 which has improved it a bit more and has greatly increased the number of contigs generated. I am now playing with the minimum overlap length but am getting more chimeric reads.

I am really just arbitrarily changing these numbers and could sit here from now until Christmas doing this, so I wondered if anyone had any advice or tips they could give me.

Before you ask why I am not using the assembler, well I just don't think I have enough reads to get a good assembly. My dataset contains around 50,000 reads per sample. What do you think?

Any advice would be very much appreciated. Thank you in advance.

Helen
hlwright is offline   Reply With Quote
Old 03-29-2011, 11:53 PM   #2
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

Reads marked 'Repeat' map equally well to multiple locations in the reference. The settings you are trying are not going to change that...

The only thing I can think of is to have more stringent alignment requirements, so that perhaps these reads start mapping uniquely (i.e. reads from different paralogues mapping to just one of the copies). This can be done by

- increasing the minimum overlap length -ml, default is 40 bases, but you can go up to higher numbers, or even better, use '-ml 90%' to force at least 90% of the length of the read to map (or try 95%).
- increasing the minimum overlap identity, -mi, default 90, but you could try '-mi 95' (no % here).

On the other hand, you might get less reads mapped this way....

Good luck anyways!
flxlex is offline   Reply With Quote
Old 04-05-2011, 12:27 AM   #3
hlwright
Member
 
Location: Liverpool, UK

Join Date: Feb 2011
Posts: 30
Default

Thank you for replying so quickly. I have been exploring many options with Newbler mapping.

Unfortunately, the options you suggested did not improve the number of reads mapped. However, I think I may have worked out the problem. I am using a cDNA fasta reference as I have transcriptome reads. I have had a look at some of the reads which are 'unmapped' and a quick BLAST of a couple shows these are ribosomal RNAs (and as such will not be in my cDNA fasta file).

I wonder if anyone else has noticed this in the past? Do you know of a fasta file containing rRNA sequences that I could concatenate with my cDNA reference to maybe annotate my 'unmapped' reads?

Thank you
Helen
hlwright is offline   Reply With Quote
Reply

Tags
cdna, human, mapping, newbler

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:24 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO