SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
fastq to bam hugh_hang Bioinformatics 10 05-22-2015 12:51 AM
Convert merged BAM back to per lane BAM or FASTQ file danielsbrewer Bioinformatics 6 10-03-2013 07:29 AM
FastQ/BAM compression tir_al General 16 05-15-2013 12:38 AM
Reverse engineering BAM files: BAM -> FASTQ gene coder Bioinformatics 3 01-03-2012 02:42 PM
Stand Alone Bam to FASTQ dcfargo Bioinformatics 11 08-02-2010 05:56 AM

Reply
 
Thread Tools
Old 10-21-2015, 09:19 AM   #1
elitatoun
Junior Member
 
Location: france

Join Date: Oct 2015
Posts: 7
Default some help ! BAM to Fastq

Dear all

I'm new on your forum and on the NGS bio informatic analysis !

Sorry for my english !

I need some help from specialists...i try to analyse NGS results for small non coding RNA...

until now i have made :
checked the qulity of the reads
removed the adaptators...
launched my first map (bowtie). And i have found that the half of my reads don't mapp my genom of reference...

and i try to understand why !

so i try to keep all the unmapped reads, to see what they are :
i have used bwa to keep my unmapped reads and i have obtained a SAI file.
I have converted the SAI file into BAM file

Now i will try to creat a new BAM file with only unmapped reads with this Command :

samtools view fichier bam.bam | awk '$3 =="*"' | samtools view -bS - > no_maps.bam

and after i will convert this BAM file into Fastq file with :


java -jar -Xmx6g ~/bin/SamToFastq.jar INPUT=no_maps.bam F=no_map.fastq VALIDATION_STRINGENCY=LENIENT

A very important think :
i have found these 2 commands, but i don't understand everything...that why i need your help !

If someone can explain me, it will be great

Thx
elitatoun is offline   Reply With Quote
Old 10-21-2015, 09:56 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,046
Default

I am moving this post to a new thread for better visibility.

@elitatoun: If you were only interested in capturing reads that did not map with bowtie you could have done that using the "--un filename" option. This would have captured all reads that did not align to a new file.

Are you trying to find what these reads are? You could take a few of them (you will need to convert them to fasta format) and blast them at NCBI to get a general idea of what they are.
GenoMax is offline   Reply With Quote
Old 10-21-2015, 10:09 AM   #3
elitatoun
Junior Member
 
Location: france

Join Date: Oct 2015
Posts: 7
Default

@GenoMax
thank you for your reply ! i will try also your solution thx
and yes i'm trying to find what these reads are....
yes i will blast (a small part of them) at NCBI...but i have the half of all my read (several millons) so i will verify if they mapp on a "transcriptom sequences" just to see if i have a huge part of degraded mRNA in my reads
elitatoun is offline   Reply With Quote
Old 10-21-2015, 10:09 AM   #4
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Quote:
Originally Posted by elitatoun View Post

Code:
samtools view fichier bam.bam | awk '$3 =="*"' | samtools view -bS - > no_maps.bam
Hard to explain the above because the 'samtools view fichier' part does not make any sense. Getting rid of the 'fichier' part then means:

1) You are viewing the file 'bam.bam' and converting it from binary BAM format to human readable SAM format.
2) From the SAM choose all lines with the 3rd field equal to "*" (which means no mapping).
3) Convert the SAM file back to BAM and save it.

However a more simple method would be:
Code:
samtools view -b -f 4 > no_maps.bam
'-b' means output in BAM format. '-f 4' means to select all reads marked as 'the query sequence itself is unmapped'.


Quote:
and after i will convert this BAM file into Fastq file with :

Code:
java -jar -Xmx6g ~/bin/SamToFastq.jar INPUT=no_maps.bam F=no_map.fastq VALIDATION_STRINGENCY=LENIENT
Well I usually put the '-jar' command next the the jar file itself. Basically to me the '-jar' and '-Xmx6g' parameters are switched around in my mind but I do not think it makes a difference. Basically you are running a Java program that will do the conversion for you. That is just the way Java programs are run.

It looks like you are running a Picard tools program in which case the more modern way of doing this is:

Code:
 PicardCommandLine SamToFastq INPUT=no_maps.bam F=no_map.fastq VALIDATION_STRINGENCY=LENIENT
[/QUOTE]
westerman is offline   Reply With Quote
Old 10-21-2015, 10:16 AM   #5
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

@GenoMax's solution (just capture the unmapped reads first) is good. Redoing the mapping might take some time and you won't learn about samtools nor picard that way but it is what I would have done as the first step. Of course I would have suggested using BBMap instead of bowtie. Am surprised GenoMax didn't suggest that. :-)
westerman is offline   Reply With Quote
Old 10-21-2015, 10:29 AM   #6
elitatoun
Junior Member
 
Location: france

Join Date: Oct 2015
Posts: 7
Default

@ westerman
thank you also for your reply !!!

i think i'm folling in love of this forum !!!

it's clear know for me

i will come back with the results i hope tomorow....
and i will try also your solution...

Why BBmap instead bowtie ?
elitatoun is offline   Reply With Quote
Old 10-21-2015, 10:31 AM   #7
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,046
Default

Quote:
Originally Posted by westerman View Post
Of course I would have suggested using BBMap instead of bowtie. Am surprised GenoMax didn't suggest that. :-)
Just going with the flow of the original question

"fichier" is "file" so I assume that is just from some instructions @elitatoun was using.

@Elitatoun: This is small non-coding RNA data so perhaps depending on how the prep was done you may see an impact on the % alignment you are going to get.
GenoMax is offline   Reply With Quote
Old 10-21-2015, 12:03 PM   #8
elitatoun
Junior Member
 
Location: france

Join Date: Oct 2015
Posts: 7
Default

ok thank you Genomax
elitatoun is offline   Reply With Quote
Old 10-25-2015, 11:06 AM   #9
elitatoun
Junior Member
 
Location: france

Join Date: Oct 2015
Posts: 7
Default

Dear all,
thank you for your advices !
I have finished all my file treatments...and by your advice all it works...

i try to know if the unmapped reads come from degradation of long RNA and particularly mRNA....
so i have mapped the unmapped reads to reference sequence of transcriptome...but i'm not sur about this file (the transcriptome file)...because i don't know if inside the term of "transcriptome" we are talking about only mRNA or all RNA...
do you know a file with only sequences from mRNA ??
elitatoun is offline   Reply With Quote
Old 10-25-2015, 01:09 PM   #10
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

If you are unsure whether your transcriptome contains the transcripts of interest, I suggest you align to the genome instead (using a splice-aware aligner). Of course, you could just try mapping the unmapped reads to the genome; if they don't map, your transcriptome should be OK.

Unlike a genome, the contents of a transcriptome are not well defined; the accuracy and completeness varies a lot. I think they generally do not contain uRNAs, for example.
Brian Bushnell is offline   Reply With Quote
Old 10-26-2015, 03:02 AM   #11
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,046
Default

Quote:
Originally Posted by elitatoun View Post
i try to know if the unmapped reads come from degradation of long RNA and particularly mRNA....
so i have mapped the unmapped reads to reference sequence of transcriptome...but i'm not sur about this file (the transcriptome file)...because i don't know if inside the term of "transcriptome" we are talking about only mRNA or all RNA...
do you know a file with only sequences from mRNA ??
What genome are you working on? Reason I am asking is to see if there is a relatively good known transcriptome for it.

Even though you may have had some degradation aligners should still be able to find a match for those reads if they really belong in your genome. Since they did not align the first time around I am suspicious of the strategy of doing a second round alignment with those unmapped reads to the transcriptome. Perhaps you need to try different options for the original aligner to see if you can get a better result.

What was the result of the blast search for those unmapped reads? Are they mapping to the genome you are working with?
GenoMax is offline   Reply With Quote
Old 10-26-2015, 03:16 AM   #12
elitatoun
Junior Member
 
Location: france

Join Date: Oct 2015
Posts: 7
Default

@GenoMax,

thank you fr your reply.

I am working with the baus taurus genome.
I will try this afternoon the blast.
But you're right it could be a problem of "options" with the first mapper

We have used bowtie (inside miRdeep2) the first time ==> 50% of the reads unmapped
I have used also bwa (just to see the proportion of mapped reads on the ref genome outside of a miR environnement) ==> 10% of unmapped reads.

so i will work with the options of bowtie !

I have obtained "normal" results at the end (comparison to pubmed) in terms of miR number or putatives....but only 12% of all my reads correspond to miR...and may be i can obtained better results if i modify my extraction procedure...that why i want to know if i have a lot of small reads in the same nucleotide range of my miR, but belonging to degraded mRNA !

I will keep you in touch !
Eli
elitatoun is offline   Reply With Quote
Old 10-26-2015, 03:36 AM   #13
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,046
Default

How many cycles was this data (bp per read)?

It would be reasonable to see if blast gives reasonable full length "hits" across the entire read.

You may also want to look at BBMap (if you were planning to do additional alignments).
GenoMax is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:05 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO