Tools

The purpose of this section of the site is to provide tools for interested parties to use some of the tools we have developed for our research.
Workflow Image
  • The Blast links provide an interface for blasting against our chloroplast specific databases.
  • We developed the Genbank to Fasta tool to be the most flexible genbank to fasta file converter that we know of
  • The embl to pdf converters make nice figures out of simple annotated embl files

Blast::Chloroplast_Public

For instructions and hints, click here



Choose program to use and database to search:

Program Database

Enter sequence below in FASTA format

Or load it from disk

Set subsequence: From To


The query sequence is filtered for low complexity regions by default.
Filter Low complexity Mask for lookup table only

Expect Matrix Perform ungapped alignment

Query Genetic Codes (blastx only)

Database Genetic Codes (tblast[nx] only)

Frame shift penalty for blastx

Other advanced options:


Graphical Overview Alignment view
Descriptions Alignments Color schema

Instructions

For your convenience we have pulled together all published chloroplast genomes, and offer a web interface to Blast against those sequences. Click here to get started.

The databases are available in a "genome", "nt",  and an "aa" version. The "genome" is the entire nucleotide sequence of each genome. The "nt" version is the nucleotide sequence that corresponds to identified ORFs. The "aa" is translated peptide sequence of the ORFs. Please note that each blast program has different requirements for nucleotide or amino acid input and databases. The following table is a guide:

Program Input Database
blastn nt
nt or genome
blastp aa aa
blastx nt
aa
tblastn aa nt or genome
tblastx nt
nt or genome


The available databases are:

Database What it is:
Chloroplast_Public Blast against all published, whole, chloroplast genomes at once. See here to check when we last synced our database with NCBI

Blast::Fosmid_Ends

For instructions and hints, click here




 

 

Choose program to use and database to search:

Program Database

Enter sequence below in FASTA format

Or load it from disk

Set subsequence: From To


The query sequence is filtered for low complexity regions by default.
Filter Low complexity Mask for lookup table only

Expect Matrix Perform ungapped alignment

Query Genetic Codes (blastx only)

Database Genetic Codes (tblast[nx] only)

Frame shift penalty for blastx

Other advanced options:


Graphical Overview Alignment view
Descriptions Alignments Color schema

Instructions

For your convenience we offer a web interface to Blast against all of our fosmid ends.

The fosmid end databases are available only in an "nt" version. This corresponds to the nt sequence of each fosmid end we have sequenced for a particular organism. Please note that each blast program has different requirements for nucleotide or amino acid input and databases. The following table is a guide:

Program Input Database
blastn nt
nt
tblastn aa nt
tblastx nt
nt


 

The "Fosmid_Ends" database includes all end sequences from all fosmid libraries created by this project.  There are also organism-specific fosmid ends databases available.

 

Convert Genbank to Fasta

This tool has moved to http://rocaplab.ocean.washington.edu/tools/genbank_to_fasta

Convert embl to circular map pdf

This page is designed to accept one or more annotated embl format files plus a fasta sequence file, and output a nice looking circular feature map in pdf format.

This converter is based on the circular_diagram.pl perl script written by the Sanger Institute.

Instructions:

Select at least one embl file and one fasta sequence file, then select your options. Each embl file of features will be displayed as an concentric ring in the final pdf.
Reasonable defaults are already entered, but feel free to make changes
and see how it changes the result. When you are ready, click "Make PDF!"

Note that one unit of height or width = 1/72 of an inch.

Choose Files:

Please select a fasta format sequence file to upload. Must be fasta format.


Please select an embl format file to upload. Must be embl format.

Radius:
Radius indicates where on the circle features from this embl file should be drawn. 0 is the center 100 is the edge of the circle.

Feature Height:
Feature Height indicates the thickness of the features drawn in that circle.


Ready?


Additional Optional Files

Optionally, please select a second embl format file to upload.

Radius:
Feature Height:


Optionally, please select a third embl format file to upload.

Radius:
Feature Height:


Optionally, please select a fourth embl format file to upload.

Radius:
Feature Height:


Optional GC Content Calculation

Show GC ratio ring?:
What it is: If "yes" is selected, the GC ratio of the sequence will be calculated and displayed in a ring.

GC Window size:
What it is: The number of bases over which a running average GC is calculated.

GC Step Size:
What it is: The "resolution" of the GC content calculation. Higher numbers will have a smoother appearence.

Radius:
Feature Height:


Optional GC Skew Calculation

Show GC Skew ring?:
What it is: If "yes" is selected, the GC skew of the sequence will be calculated and displayed in a ring.

GC Skew Window size:
What it is: The number of bases over which a running average GC skew is calculated.

GC Skew Step Size:
What it is: The "resolution" of the GC Skew calculation. Higher numbers will have a smoother appearence.

Radius:
Feature Height:


More Available Options


Page Width:
What it is: the width of each page. The default 594 = 8.25 inch

Page Height:
What it is: the height of each page. The default 792 = 11 inch

Scale Font Size:
What it is: the size of the font. Not the same scale as you are used to.

Label Distance:
What it is: the distance between the scale line/circle and the centre of the scale labels

Scale Label Distance:
What it is: the distance between the scale numbers in bases. Enter "0" for no labels.

Scale Tick Mark Distance:
What it is: the distance between the scale tick marks in bases. Enter "0" for no labels.

Scale Tick Mark Thickness:
What it is: the thickness ofthe scale tick marks

Scale Tick Mark Height:
What it is: the height ofthe scale tick marks

Minimum Feature Width:
What it is: the minimum width of a feature in bases

Draw Direction
What it is: which direction to draw the figure

End Gap:
What it is: the gap (in bases) between the last base of the sequence and the first. This option exists to allow non-circular genomes to be drawn sensibly

Show Command Line?
What it is: Display in the browser all the command line options used to generate the pdf

Ready?

Details on this software are hard to find, but we found mention of this script here and downloaded it from here.

Convert embl to linear-map pdf

This page is designed to accept an annotated embl format file, and convert it to a nice looking linear feature map in pdf format.

This converter is based on the perl_to_ps.pl perl script written by the Sanger Institute.

Instructions:

Select a file to convert, then select your options.
Reasonable defaults are already entered, but feel free to make changes
and see how it changes the result. When you are ready, click "convert!"

Select File

Please select an embl format file to upload. Must be embl format.


Commonly Used Options

Note that one unit of height or width = 1/72 of an inch.

Line Base Length:
What it is: the number of bases or peptides on each line

Feature Height:
What it is: the height of each feature

Line Spacing:
What it is: the vertical distance between each line

Font Size:
What it is: the size of the font. Not the same scale as you are used to.

Ready?


More Available Options


Page Width:
What it is: the width of each page. The default 594 = 8.25 inch

Page Height:
What it is: the height of each page. The default 792 = 11 inch

Border Width:
What it is: the margin to leave at the edge of the page

Page Count:
What it is: The number of pages to generate

Scale Distance:
What it is: the distance between the scale (dna) line and the feature

Scale Font Size:
What it is: the size of the font. Not the same scale as you are used to.

Scale Number Distance:
What it is: the distance between the dna scale line and the labels at each end

Label Feature Distance:
What it is: the distance between the feature and it's label feature

Tick Mark Spacing:
What it is: the distance (in bases) between the small tick marks on the scale line

Major Tick Spacing:
What it is: the distance (in bases) between the large tick marks on the scale line

Shade-line Spacing:
What it is: the distance between the diagonal lines used for shading #experiement with this value

Left Shaded Keys:
What it is: the keys of the features that should be shaded with left-slanting lines. (comma separated list)

Right Shaded Keys:
What it is: the keys of the features that should be shaded with right-slanting lines. (comma separated list)

Protein Features:
What it is: the keys of the features that should be drawn beside the dna line (comma separated list)

DNA Features:
What it is: the keys of the features that should be drawn on the dna line (comma separated list)

Feature Outline Width:
What it is: the width of the black border around each exon or feature

Base Label Position:
What it is: this controls which end of the dna line should have the base count label

Label Angle:
What it is: the angle of the feature labels (0 is horizontal)

Horizontal Protein Labels:
What it is: if "Yes" the labels on the protein features will be horizontal and
centred, otherwise the labels will be drawn at the angle given by label_angle.

DNA Labels:
What it is: selects where to draw DNA feature labels

Protein Labels:
What it is: choose whether to draw Protein Labels

Show Command Line?
What it is: Display in the browser all the command line options used to generate the pdf

Ready?

Details on this software are hard to find, but we found mention of this script here and downloaded it from here.