SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
converting paired-end (PE) bam file to single-end (SE) fastq adrian Bioinformatics 3 05-05-2015 10:00 AM
Split Paired-End bam file Liy Bioinformatics 2 12-27-2013 01:37 AM
Given BAM/SAM file, how to see if it's single-end or paired-end sequencing? xxatbio Bioinformatics 2 08-11-2013 02:51 AM
paired-end, bam-file-format Azazel Bioinformatics 4 06-27-2012 11:09 PM
Filter paired end BAM file based on iSize Leif Bergsagel Bioinformatics 2 12-16-2010 11:50 AM

Reply
 
Thread Tools
Old 03-20-2016, 01:12 AM   #1
Alphabets
Junior Member
 
Location: SZ

Join Date: Mar 2016
Posts: 7
Default lobSTR dealing with the paired-end BAM file

Hello:

I use the software lobSTR to deal with the paired-end bam file. As the document shows, I should run the follow code before the bam file sorted by read name.

'''
lobSTR \
--index-prefix hg19_v3.0.2/lobstr_v3.0.2_hg19_ref/lobSTR_ \
-f my_sample.bam --bampair \
--rg-sample my_sample.sorted.bam --rg-lib my_sample \
--out my_sample_output
'''

My question is which bam file ir right in the above code's third line and fourth line, the bam before sorted or after sorted?
The document shows there should be the bam file before sorted, but I think it should be the sorted rather than the raw bam file in the third line.
Besides, I think the fourth line should be a tag not the sorted file.

Any other ideas about it and could anyone solve my confusion. Thanks!
Alphabets is offline   Reply With Quote
Old 03-20-2016, 05:42 AM   #2
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

The --rg-sample and --rg-lib parameters should be tags with read-group information, not the bam files.

See the lobSTR FAQs for more info.

http://melissagymrek.com/lobstr-code/faq.html

I think if you start with a bam file rather than fastq files then it should be sorted and indexed. Note that the command line options description for --bampair says that the file should be sorted by name order, 'samtools sort -n'.
mastal is offline   Reply With Quote
Old 03-20-2016, 05:06 PM   #3
Alphabets
Junior Member
 
Location: SZ

Join Date: Mar 2016
Posts: 7
Default

Quote:
Originally Posted by mastal View Post
The --rg-sample and --rg-lib parameters should be tags with read-group information, not the bam files.

See the lobSTR FAQs for more info.

http://melissagymrek.com/lobstr-code/faq.html

I think if you start with a bam file rather than fastq files then it should be sorted and indexed. Note that the command line options description for --bampair says that the file should be sorted by name order, 'samtools sort -n'.
I know what you mean, the --rg-sample and --rg-lib parameters should not be the bam files. But I am confused of the third line "-f my_sample.bam --bampair". I think the -f parameter should be the sorted bam not the raw bam. The document shows it is the raw bam not the sorted bam file,which confuse me.

the document is:
http://melissagymrek.com/lobstr-code...html#pairedbam
Alphabets is offline   Reply With Quote
Old 03-21-2016, 03:18 AM   #4
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

In the end, it shouldn't matter, because anything that is expecting an unsorted bam file will work perfectly well if you give it a sorted bam file (but of course the reverse is not true, and if a program is expecting a sorted bam file it will throw an error if you give it an unsorted bam file).

I agree with you that the lobSTR documentation is confusing and not consistent. I had looked at the 'usage' page, which seems to give different examples than the genotype-calling page you gave the link to. Also in the usage page's definition of parameters, for --bampair it says that you have to give a bam file sorted by name (samtools sort -n), which it doesn't mention anywhere else where it shows examples of sorting the bam files with samtools sort.
mastal is offline   Reply With Quote
Old 03-21-2016, 03:42 AM   #5
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

I think maybe I understand a bit better now.

I think if you are running lobSTR with a paired-end bam file, you need to give it a bam file that has been name-sorted (samtools sort -n) for the -f parameter, because it needs the two reads of a pair to be next to each other in the file.

Later steps, like running allelotype, may need bam files sorted by coordinate, as shown in some of the examples where they sort the bam files that are output from running losSTR.

Hope this makes more sense.
mastal is offline   Reply With Quote
Old 03-21-2016, 04:56 AM   #6
Alphabets
Junior Member
 
Location: SZ

Join Date: Mar 2016
Posts: 7
Default

Quote:
Originally Posted by mastal View Post
I think maybe I understand a bit better now.

I think if you are running lobSTR with a paired-end bam file, you need to give it a bam file that has been name-sorted (samtools sort -n) for the -f parameter, because it needs the two reads of a pair to be next to each other in the file.

Later steps, like running allelotype, may need bam files sorted by coordinate, as shown in some of the examples where they sort the bam files that are output from running losSTR.

Hope this makes more sense.


Thank you for your reply. I agree with you, but there is something wrong with me.

When I run lobSTR with my name-sorted paired-end bam file just like what you mentioned, I get the follow warnings:

'''
WARNING: Could not find pair for BRISCOE:4:... Is the bam file sorted by read name?
'''

All my screen is full of the warnings. Is it nomal?

Besides, I when I run allelotype with the bam it generates, I also get the warnings in my output:

'''
WARNING: Skipping locus chr1:123585375. Invalid period size (20)
WARNING: Discarding duplicate of locus chr1: 123587826
'''

I don't know what it means and what should I do?

Thank you.

Last edited by Alphabets; 03-21-2016 at 05:03 AM.
Alphabets is offline   Reply With Quote
Old 03-21-2016, 06:37 AM   #7
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

Where do the bam files you are trying to use with lobSTR come from?

Are they unaligned bams, or are they the result of alignment with another aligner?

Have any reads been removed from the data set? For example, reads that didn't align, leaving some reads without a mate?
mastal is offline   Reply With Quote
Old 03-21-2016, 04:53 PM   #8
Alphabets
Junior Member
 
Location: SZ

Join Date: Mar 2016
Posts: 7
Default

Quote:
Originally Posted by mastal View Post
Where do the bam files you are trying to use with lobSTR come from?

Are they unaligned bams, or are they the result of alignment with another aligner?

Have any reads been removed from the data set? For example, reads that didn't align, leaving some reads without a mate?
The bam files I use are alignment of HiSeq reads aligned to the reference genome, removed redundancy and base realignments were done.

Does it matter? And how should I do?

Thank you!
Alphabets is offline   Reply With Quote
Reply

Tags
lobstr

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:06 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO