Create a BLAST database
First you need to create a BLAST database for your genome or transcriptome. For your reference sequences in a FASTA file, use this command line:
makeblastdb -in <reference.fa> -dbtype nucl -parse_seqids -out <database_name> -title "Database title"
The -parse_seqids
option is required to keep the original sequence identifiers. Otherwise makeblastdb will generate its own identifiers, -title
is optional.
For more information on makeblastdb see NCBI BLAST+ Command Line User Manual.
Magic-BLAST will work with a genome in a FASTA file, but will be very slow for anything larger than a bacterial genome, so we do not recommend it.
Example
To create a BLAST database from the reference file my_reference.fa
$ cat my_reference.fa
>sequence_1 Homo sapiens hemoglobin subunit alpha 2 (HBA2), mRNA
CATAAACCCTGGCGCGCTCGCGGGCCGGCACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGG
TGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCCGCCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTA
TGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACCACCAAGACCTACTTCCCGCACTTCGAC
CTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCGACGCGCTGACCAACGCCGTGG
CGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACAAGCTTCGGGTGGA
CCCGGTCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGCCGAGTTC
ACCCCTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACC
GTTAAGCTGGAGCCTCGGTAGCCGTTCCTCCTGCCCGCTGGGCCTCCCAACGGGCCCTCCTCCCCTCCTT
GCACCGGCCCTTCCTGGTCTTTGAATAAAGTCTGAGTGGGCAGCAAAAAAAAAAAAAAAAAA
>sequence_2 Homo sapiens alpha one globin (HBA1) mRNA, complete cds
CAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCCGCCTGGGGTA
AGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACCAC
CAAGACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTG
GCCGACGCGCTGACCAACGCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACC
TGCACGCGCACAAGCTTCGGGTGGACCCGGTCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCT
GGCCGCCCACCTCCCCGCCGAGTTCACCCCTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTG
AGCACCGTGCTGACCTCCAAATACCGTTAAGCTGGAGCCTCGGTGGCCATGCTTCTTGCCCCTTGGGC
use this command line
makeblastdb -in my_reference.fa -out my_reference -parse_seqids -dbtype nucl
Note that the word following ‘>’ is a sequence identifier that will be used in Magic-BLAST reports. The identifier should be unique.
There are several ways to download whole genomes, transcriptomes, or selected sequences from NCBI. For example to download human chromosome 1 using NCBI EDirect tools use:
search -db nucleotide -query NC_000001 | efetch -format fasta >NC_000001.fa
You can download full human genome or transcriptome from NCBI human genome resources or use NCBI Genome search for any organism.
For example to download the latest human genome use:
wget ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/GRCh38_latest_genomic.fna.gz
gzip -d GRCh38_latest_genomic.fna.gz
makeblastdb -in GRCh38_latest_genomic.fna -out GRCh38 -parse_seqids -dbtype nucl