SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Ideas to filter known SNPs dawe Bioinformatics 9 08-02-2012 07:23 AM
ncRNA annotation pipeline ideas Francisc Bioinformatics 0 03-22-2012 05:00 AM
Need ideas about the advantages of Blast2Go byou678 Bioinformatics 5 08-27-2011 10:47 AM
no enrichment of beads --> any ideas why? lichtfaengerin 454 Pyrosequencing 1 08-11-2011 02:45 AM

Reply
 
Thread Tools
Old 07-03-2012, 06:49 PM   #1
vinay427
Member
 
Location: USA

Join Date: Jul 2012
Posts: 10
Question Useful bioinformatics tool ideas?

Hi,

I am working at a lab for the next several weeks, and as I have strong experience in the computer side of things, I wanted to develop a tool that would be useful. My initial idea is to compile various tools such as Trimmomatic, tophat, bowtie2, cufflinks, samtools, prinseq-lite, and prinseq-graph to make a sort of "one-stop shop" script for those assembling and visually representing a paired end FastQ file. I had issues learning about how these tools fit together, and figured that this script might help do so. I've attached it below, but it's very dysfunctional as it's not nearly finished. However, it gives you an idea of what it does. I'm not quite sure if this niche is already covered by another (GPL) tool? Do any of you have an idea for something that is greatly needed and would hopefully prove very helpful in this community?

Code:
#!/bin/bash

############################
# TITLE
clear
echo "FastQ Paired/Single End Sequence Analyzer"
echo "Licensed under GPL"
echo

# GET MAIN INFORMATION
echo -n "Enter path of first FastQ file (without extension): "
read -e FASTQ1
echo -n "Enter path of second FastQ file (without extension): "
read -e FASTQ2
echo
../$FASTQ1=FASTQ_DIR
echo -n "Enter path of desired paired read 1 file: "
read -e P_R1
echo -n "Enter path of desired UNpaired read 1 file: "
read -e U_R1
echo -n "Enter path of desired paired read 2 file: "
read -e P_R2
echo -n "Enter path of desired UNpaired read 2 file: "
read -e U_R2
echo -n "Enter path of desired trim log: "
read -e TRIM_LOG
echo -n "Enter number of threads to be used: "
read -e THREADS

# QUALITY MENU
PS3='Pick a quality scale: '
phred_options=("phred33" "phred64")
select PHRED_CHOICE in "${phred_options[@]}"
do
    case $PHRED_CHOICE in
    	"phred33")
			PHRED=phred33
			break
			;;
		"phred64")
			PHRED=phred64
			break
			;;
	esac
done

#################################
# MAIN ACTION MENU FROM TRIMMOMATIC
PS3='Pick a step: '
select TRIM_CHOICE in ILLUMINACLIP SLIDINGWINDOW LEADING TRAILING CROP HEADCROP MINLENGTH TOPHRED33 TOPHRED64 SKIP/CONTINUE
do
	case "$TRIM_CHOICE" in
		"ILLUMINACLIP")
			echo "Starting"
			echo -n "If desired, enter manual flags to be used: "
			read -e TRIM_FLAGS
			java -classpath trimmomatic.jar org.usadellab.trimmomatic.TrimmomaticPE $TRIM_FLAGS -threads $THREADS $PHRED -trimlog $TRIM_LOG $FASTQ_DIR/$FASTQ1.fastq $FASTQ_DIR/$FASTQ2.fastq $P_R1 $U_R1 $P_R2 $U_R2 ILLUMINACLIP
			;;
		"SLIDINGWINDOW")
			echo -n "Specify window size (number of bases to average across): "
			read -e SLIDINGWINDOW_WINDOW
			echo -n "Specify required quality: "
			read -e SLIDINGWINDOW_QUALITY
			echo -n "If desired, enter manual flags to be used: "
			read -e TRIM_FLAGS
			echo "Starting..."
			java -classpath trimmomatic.jar org.usadellab.trimmomatic.TrimmomaticPE $TRIM_FLAGS -threads $THREADS -$PHRED -trimlog $TRIM_LOG $FASTQ_DIR/$FASTQ1.fastq $FASTQ_DIR/$FASTQ2.fastq $P_R1 $U_R1 $P_R2 $U_R2 SLIDINGWINDOW:$SLIDINGWINDOW_WINDOW:$SLIDINGWINDOW_QUALITY
			;;
		"LEADING")
			echo -n "Specify min quality: "
			read -e LEADING_VALUE
			echo -n "If desired, enter manual flags to be used: "
			read -e TRIM_FLAGS
			echo "Starting..."
			java -classpath trimmomatic.jar org.usadellab.trimmomatic.TrimmomaticPE $TRIM_FLAGS -threads $THREADS -$PHRED -trimlog $TRIM_LOG $FASTQ_DIR/$FASTQ1.fastq $FASTQ_DIR/$FASTQ2.fastq $P_R1 $U_R1 $P_R2 $U_R2 LEADING:$LEADING_VALUE
			;;
		"TRAILING")
			echo -n "Specify min quality"
			read -e TRAILING_VALUE
			echo -n "If desired, enter manual flags to be used: "
			read -e TRIM_FLAGS
			echo "Starting"
			java -classpath trimmomatic.jar org.usadellab.trimmomatic.TrimmomaticPE $TRIM_FLAGS -threads $THREADS -$PHRED -trimlog $TRIM_LOG $FASTQ_DIR/$FASTQ1.fastq $FASTQ_DIR/$FASTQ2.fastq $P_R1 $U_R1 $P_R2 $U_R2 TRAILING:$TRAILING_VALUE
			;;
		"CROP")
			echo -n "Specify number of bases to keep (from start of read): "
			read -e CROP_VALUE
			echo -n "If desired, enter manual flags to be used: "
			read -e TRIM_FLAGS
			echo "Starting..."
			java -classpath trimmomatic.jar org.usadellab.trimmomatic.TrimmomaticPE $TRIM_FLAGS -threads $THREADS -$PHRED -trimlog $TRIM_LOG $FASTQ_DIR/$FASTQ1.fastq $FASTQ_DIR/$FASTQ2.fastq $P_R1 $U_R1 $P_R2 $U_R2 CROP:$CROP_VALUE
			;;
		"HEADCROP")
			echo -n "Specify number of bases to remove (from start of read): "
			read -e HEADCROP_VALUE
			echo -n "If desired, enter manual flags to be used: "
			read -e TRIM_FLAGS
			echo "Starting..."
			java -classpath trimmomatic.jar org.usadellab.trimmomatic.TrimmomaticPE $TRIM_FLAGS -threads $THREADS -$PHRED -trimlog $TRIM_LOG $FASTQ_DIR/$FASTQ1.fastq $FASTQ_DIR/$FASTQ2.fastq $P_R1 $U_R1 $P_R2 $U_R2 HEADCROP:$HEADCROP_VALUE
			;;
		"MINLENGTH")
			echo -n "Specify min length of reads to be kept: "
			read -e MINLENGTH_VALUE
			echo -n "If desired, enter manual flags to be used: "
			read -e TRIM_FLAGS
			echo "Starting..."
			java -classpath trimmomatic.jar org.usadellab.trimmomatic.TrimmomaticPE $TRIM_FLAGS -threads $THREADS -$PHRED -trimlog $TRIM_LOG $FASTQ_DIR/$FASTQ1.fastq $FASTQ_DIR/$FASTQ2.fastq $P_R1 $U_R1 $P_R2 $U_R2 MINLENGTH:$MINLENGTH_VALUE
			;;
		"TOPHRED33")
			echo "Starting..."
			echo -n "If desired, enter manual flags to be used: "
			read -e TRIM_FLAGS
			java -classpath trimmomatic.jar org.usadellab.trimmomatic.TrimmomaticPE $TRIM_FLAGS -threads $THREADS -$PHRED -trimlog $TRIM_LOG $FASTQ_DIR/$FASTQ1.fastq $FASTQ_DIR/$FASTQ2.fastq $P_R1 $U_R1 $P_R2 $U_R2 TOPHRED33
			;;
		"TOPHRED64")
			echo "Starting..."
			echo -n "If desired, enter manual flags to be used: "
			read -e TRIM_FLAGS
			java -classpath trimmomatic.jar org.usadellab.trimmomatic.TrimmomaticPE $TRIM_FLAGS -threads $THREADS -$PHRED -trimlog $TRIM_LOG $FASTQ_DIR/$FASTQ1.fastq $FASTQ_DIR/$FASTQ2.fastq $P_R1 $U_R1 $P_R2 $U_R2 TOPHRED64
			;;
        "GENERATE VISUAL REPORT")
            echo "Visual report will now be generated..."
            PS3='Pick a tool: '
                phred_options=("prinseq-lite, prinseq-graphs" "fastqc")
                select VISUAL_CHOICE in "${visual_options[@]}"
                do
                case $VISUAL_CHOICE in
		            "prinseq-lite, prinseq-graphs")
                        echo "Generating graph file..."
                        prinseq-lite -fastq $FASTQ_DIR/$FASTQ1.fastq -out_format 4 -graph_data fastq1.gd
                        prinseq-lite -fastq $FASTQ_DIR/$FASTQ2.fastq -out_format 4 -graph_data fastq2.gd
                        echo "Generating html report..."
                        mkdir read1_report
                        mkdir read2_report
                        cd read1_report
                        prinseq-graphs ../fastq1.gd -html_all
                        cd ../read2_report
                        prinseq-graphs ../fastq2.gd -html_all
	            		break
		            	;;
                    "fastqc"
                        fastqc $P_R1 $P_R2
                        echo -n "Type command for web browser to view html results in: "
                        read -e BROWSER
                        $BROWSER $FASTQ_DIR/$FASTQ1"_fastqc"/fastqc_report.html $FASTQ_DIR/$FASTQ2"_fastqc"/fastqc_report.html
                        ;;
            	esac
                done
            ;;
		"SKIP/CONTINUE")
			echo "Continuing..."
			break
			;;
	esac
done

##############################
# BOWTIE AND TOPHAT
echo -n "Enter path to reference FastA for index build: "
read -e "REF_PATH"
echo -n "If desired, enter manual flags to be used by tophat: "
read -e TOPHAT_FLAGS
bowtie/bowtie2-build $REF_PATH
$TOPHAT_PATH $TOPHAT_FLAGS $FASTQ_DIR/$FASTQ1.fastq $FASTQ_DIR/$FASTQ2.fastq
tophat_out/accepted_hits.bam=$ACC_HITS
samtools build $ACC_HITS

# TABLET
tablet $REF_PATH $ACC_HITS
Thanks,
vinay427
vinay427 is offline   Reply With Quote
Old 07-05-2012, 09:26 AM   #2
vinay427
Member
 
Location: USA

Join Date: Jul 2012
Posts: 10
Default

Bump, does anyone have any neat ideas or advice about a project that would be helpful to the community? Of course, it would be open-source licensed under GPL.
vinay427 is offline   Reply With Quote
Old 07-05-2012, 12:09 PM   #3
pbluescript
Senior Member
 
Location: Boston

Join Date: Nov 2009
Posts: 224
Default

It would be nice if there was a tool that could take a bam file aligned to a transcriptome and along with a gtf file convert the transcriptome coordinates to genome coordinates.

I have asked if such a tool exists here before and done my own searching with no luck. I've cobbled together a solution to the problem, but it's not fast and it only works in my specific case.
pbluescript is offline   Reply With Quote
Old 07-06-2012, 11:07 AM   #4
vinay427
Member
 
Location: USA

Join Date: Jul 2012
Posts: 10
Default

Anyone else feel that pbluescript's idea would be useful? I'm only trying to gain a general idea of people's opinions toward such a tool (converting .bam aligned to transcriptome and .gtf to genome coordinates).
vinay427 is offline   Reply With Quote
Old 07-06-2012, 03:30 PM   #5
alec
Member
 
Location: Cambridge, MA

Join Date: Apr 2011
Posts: 18
Default

If you go with the conversion tool, it would also be useful if it could convert from, say, hg18 to hg19 using a liftover file.

My vote would be for a Python wrapper to bowtie or bwa. For example, I often have barcoded data and would like to write a loop that reads a fastq file, removes the barcode, aligns the read, and then uses pysam to write it to a different bam file depending on the barcode. This would also give me more control over how multiply aligned or mispaired reads are handled, or let me store additional information with the alignment.
alec is offline   Reply With Quote
Old 07-08-2012, 08:06 PM   #6
A_Morozov
Member
 
Location: Russia, Irkutsk

Join Date: Feb 2011
Posts: 40
Default

vinay427, you can contribute quite a bit to BioPerl and BioPython modules if you know any of these languages. Just check out mailing lists: there always are some requests.
A_Morozov is offline   Reply With Quote
Old 07-09-2012, 05:52 AM   #7
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

I would like a tool that visually displays the components of an RNA sample sequence by length and abundance. What I have in mind would be a 2 dimensional plot. X-axis would be "nucleotides" of length of a transcript. Y-axis would be read counts for the given transcript. The Y-axis might be usefully log-transformed.

So, if you did no ribosomal depletion, and sequenced 30 million reads into a sample, somewhere on the order of 90% of the reads (27 million) would be 28S and 18S nuclear rRNA. So those might be plotted as being 4 thousand and 2 thousand nucleotides on the x-axis and 18 million reads "thick" and 9 million reads thick respectively. Below them would be the next rank of high expressers -- possibly mitochondrial rRNA, or some abundant tissue specific gene. Or some small transcriptome contaminant, like from bacteria contaminating a cell culture.

Then subsequent rows of decreasing thickness would populated the lower part of the diagram.

The purpose of this would be to give a general, visual, overview of the composition of an RNA sample. I don't think you get that using a full genome viewer.

--
Phillip
pmiguel is offline   Reply With Quote
Old 07-10-2012, 06:22 AM   #8
Jean
Member
 
Location: Canada

Join Date: Nov 2008
Posts: 37
Default

Quote:
Originally Posted by pmiguel View Post
I would like a tool that visually displays the components of an RNA sample sequence by length and abundance. What I have in mind would be a 2 dimensional plot. X-axis would be "nucleotides" of length of a transcript. Y-axis would be read counts for the given transcript. The Y-axis might be usefully log-transformed.

So, if you did no ribosomal depletion, and sequenced 30 million reads into a sample, somewhere on the order of 90% of the reads (27 million) would be 28S and 18S nuclear rRNA. So those might be plotted as being 4 thousand and 2 thousand nucleotides on the x-axis and 18 million reads "thick" and 9 million reads thick respectively. Below them would be the next rank of high expressers -- possibly mitochondrial rRNA, or some abundant tissue specific gene. Or some small transcriptome contaminant, like from bacteria contaminating a cell culture.

Then subsequent rows of decreasing thickness would populated the lower part of the diagram.

The purpose of this would be to give a general, visual, overview of the composition of an RNA sample. I don't think you get that using a full genome viewer.

--
Phillip

Couldn't this be plotted in R with the mapping and transcriptome information?
Jean is offline   Reply With Quote
Old 07-10-2012, 06:56 AM   #9
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Quote:
Originally Posted by Jean View Post
Couldn't this be plotted in R with the mapping and transcriptome information?
That would be fine with me.

--
Phillip
pmiguel is offline   Reply With Quote
Old 07-19-2012, 09:31 AM   #10
vinay427
Member
 
Location: USA

Join Date: Jul 2012
Posts: 10
Default

Quote:
Originally Posted by alec View Post
My vote would be for a Python wrapper to bowtie or bwa. For example, I often have barcoded data and would like to write a loop that reads a fastq file, removes the barcode, aligns the read, and then uses pysam to write it to a different bam file depending on the barcode. This would also give me more control over how multiply aligned or mispaired reads are handled, or let me store additional information with the alignment.
What exactly do you mean by a wrapper? From what I've seen, would it basically pass through arguments to bowtie (or bowtie2) so that you can integrate it into your Python loop?

I found this example; is this essentially what you would want a Python wrapper for Bowtie to do, with different arguments?
Code:
input_filenames = ['BAM-1.bam','BAM-2.bam']
output_filename = 'all.bam'

merge_parameters = [output_filename] + input_filenames
pysam.merge(*merge_parameters)
vinay427 is offline   Reply With Quote
Old 07-19-2012, 10:56 AM   #11
severin
Genome Informatics Facility
 
Location: Iowa @isugif

Join Date: Sep 2009
Posts: 105
Default GBrowse2

I would like to see an extension to GBrowse2 that would permit for on the fly joining of scaffolds based on evidence in the tracks.
severin is offline   Reply With Quote
Old 07-20-2012, 06:53 AM   #12
alec
Member
 
Location: Cambridge, MA

Join Date: Apr 2011
Posts: 18
Default

Quote:
Originally Posted by vinay427 View Post
What exactly do you mean by a wrapper? From what I've seen, would it basically pass through arguments to bowtie (or bowtie2) so that you can integrate it into your Python loop?
I'm thinking of being able to align sequences one by one. For example, my reads often have custom barcodes at the beginning that need to be removed before alignment and written to a different file for each barcode:

Code:
files = dict([
    ["ATGCG", pysam.Samfile("sample1.bam", "wb")],
    ["GACTA", pysam.Samfile("sample2.bam", "wb")]
])
bwt = bowtie.load("/path/to/index")
for((seq, qual) in FastqReader("reads.fastq")):
    barcode = seq[0:5]
    if barcode in files:
        files[barcode].write(bwt.align(seq[5:], qual[5:]))
    else:
        pass #handle nonmatching barcodes here
There are also a lot of cases where I would want to store additional information in the auxiliary data, set certain flags, or otherwise modify the alignment before writing it. Of course, this can already be done with pysam, the trick is getting the alignment in the first place.
alec is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:39 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO