Section Banner

Frequently asked questions

How do I get non-redundant protein sequences in the FASTA format for my species?

How do I get gene coordinates for my species?

How do I get protein sequences for my species, if unavailable from NCBI?

How do I get gene coordinates for my species, if unavailable from NCBI?

Can I run SMURF on just one assembly (supercontig)?

Which gene coordinates are needed to Run SMURF?

How do I get non-redundant protein sequences in the FASTA format for my species?

Go to NCBI NCBI RefSeq, choose "Protein" from the pull-down menu and type your query like so: Aspergillus fumigatus[orgn] AND srcdb refseq[properties]. Alternatively you can go to the WGS project list page, click on your species name and then on the Protein link in the Entrez records table.

How do I get gene coordinates for my species?

From the NCBI ftp site:

ftp.ncbi.nih.gov/gene/DATA/ASN_BINARY/Fungi/All_Fungi.ags.gz

Get the gene2xml tool from:

ftp.ncbi.nih.gov/toolbox/ncbi_tools/cmdline/

and convert the file to xml and parse it instead. You may need to check the taxid fields to make sure you are getting the fields for the correct organism.

prefix ./ before the executable name to force shell to look in the current directory:

./gene2xml

This should print out a set of switches to the screen.

The requirements for SMURF gene coordinates are outlined below in:

Which gene coordinates are needed to Run SMURF?

How do I get protein sequences for my species, if unavailable from NCBI?

See Links for the list of sequencing centers.

How do I get gene coordinates for my species, if unavailable from NCBI?

See Links for the list of sequencing centers and download the gene information in a GFF-formatted file which maps the coordinates to the largest chunk-the supercontig.

The requirements for SMURF gene coordinates are outlined below in:

Which gene coordinates are needed to Run SMURF?

Can I run SMURF on just one assembly (supercontig)?

Yes you can, as long as you have proteins sequences and gene coordinates. SMURF does not require a completely sequenced genome.

Which gene coordinates are needed to Run SMURF?

  • protein ID: unique gene ID assigned to your gene set. Make sure that the gene ID in your gene coordinate file matches your gene ID in the protein FASTA file.
  • chromosome: contig/chromosome ID on which your genes are located.
  • 5' gene start: starting position of your genes (5' end).
  • 3' gene end: ending position of your genes (3' end).
  • protein name/function/definition: any functional information you want to include.

Please make sure that your gene coordinate file is in this order and use 'tab' as the separator between columns i.e. 'tab delimited gene coordinate file.'

NEW AND RETURNING USERS

Click here to get started.

 
SMURF Home   >  FAQ