First you need to create a BLAST database for your genome or transcriptome. For your reference sequences in a FASTA file, use this command line:

makeblastdb -in <reference.fa> -dbtype nucl -parse_seqids -out <database_name> -title "Database title"

The -parse_seqids option is required to keep the original sequence identifiers. Otherwise makeblastdb will generate its own identifiers, -title is optional.

For more information on makeblastdb see NCBI BLAST+ Command Line User Manual.

Magic-BLAST will work with a genome in a FASTA file, but will be very slow for anything larger than a bacterial genome, so we do not recommend it.

 

Example

To create a BLAST database from the reference file my_reference.fa

$ cat my_reference.fa
>sequence_1 Homo sapiens hemoglobin subunit alpha 2 (HBA2), mRNA
CATAAACCCTGGCGCGCTCGCGGGCCGGCACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGG
TGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCCGCCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTA
TGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACCACCAAGACCTACTTCCCGCACTTCGAC
CTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCGACGCGCTGACCAACGCCGTGG
CGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACAAGCTTCGGGTGGA
CCCGGTCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGCCGAGTTC
ACCCCTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACC
GTTAAGCTGGAGCCTCGGTAGCCGTTCCTCCTGCCCGCTGGGCCTCCCAACGGGCCCTCCTCCCCTCCTT
GCACCGGCCCTTCCTGGTCTTTGAATAAAGTCTGAGTGGGCAGCAAAAAAAAAAAAAAAAAA
>sequence_2 Homo sapiens alpha one globin (HBA1) mRNA, complete cds
CAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCCGCCTGGGGTA
AGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACCAC
CAAGACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTG
GCCGACGCGCTGACCAACGCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACC
TGCACGCGCACAAGCTTCGGGTGGACCCGGTCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCT
GGCCGCCCACCTCCCCGCCGAGTTCACCCCTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTG
AGCACCGTGCTGACCTCCAAATACCGTTAAGCTGGAGCCTCGGTGGCCATGCTTCTTGCCCCTTGGGC

use this command line

makeblastdb -in my_reference.fa -out my_reference -parse_seqids -dbtype nucl

Note that the word following ‘>’ is a sequence identifier that will be used in Magic-BLAST reports. The identifier should be unique.

There are several ways to download whole genomes, transcriptomes, or selected sequences from NCBI. For example to download human chromosome 1 using NCBI EDirect tools use:

search -db nucleotide -query NC_000001 | efetch -format fasta >NC_000001.fa

You can download full human genome or transcriptome from NCBI human genome resources or use NCBI Genome search for any organism.

For example to download the latest human genome use:

wget ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/GRCh38_latest_genomic.fna.gz
gzip -d GRCh38_latest_genomic.fna.gz
makeblastdb -in GRCh38_latest_genomic.fna -out GRCh38 -parse_seqids -dbtype nucl