Using Nanopore sequene data to check specific genes

This page will guide you through the steps to check the sequence of a parasite line

The basic steps involve:

1. Basecalling your sequence data using Dorado

2. Demultiplexing your reads into FASTQ sequence files for each barcode

3. Mapping individual FASTQ files to your reference genome

4. Searching individual genes for coverage / SNPs

1. Basecalling using Dorado

Dorado should already be installed in your server account - please ask me to have a look if you have any issues running Dorado

i) Migrate to the folder containing data from MinKNOW

The folder will look similar to the image below, and the name of the folder will contian information on the date of the run and the flow cell used.

- Do not change the name of this folder, it will help us to trace back results to each Nanopore run.

Screenshot 2024-12-17 at 16 18 37

ii) Within this folder, create a folder for the new basecalling output

mkdir dorado_sup_basecall

mkdir = make a new directory

iii) Migrate into yor new directory

cd dorado_sup_basecall

cd = change directory

iv) Use dorado to start basecalling

Note: Basecalling is a slow process that takes a lot of memory, this may not run if other large process are running in the server at the same time

dorado basecaller \
--min-qscore 10 \
--kit-name SQK-NBD114-96 \
sup ../pod5 > Filename_output.bam

Note:

change the kit-name to the relevant kit name, is it NBD or RBK??

The above code uses the ‘sup’ basecalling alogrithm, this is the most accurate, but slowest

This creates a combined file in BAM format, that contains all of your sequence data reads in one file, each read has a header line with information, where the barcode within the reads is listed.

We can use this file to split all of the data into one file per Nanopore barcode to split our isolates into their separate pools.

This process is known as DEMULTIPLEXING

3) Demultiplexing reads

i) Use dorado demux to demultiplex

dorado demux --output-dir ./classified_demux --no-classify ./

Within this folder (classified_demux), you will see individual bam files for each barcode.

Like below:

Screenshot from 2025-04-08 16-11-08

You will see barcodes that you did not use, do not worry, these files are likely empty or rubbish and can be ignored

ii) format your demultiplexing output

First create a folder for the barcodes that you have used

mkdir barcodes_used

> Create a list within this folder of the names of the barcodes that you have used (barcodes_used.txt)

An example of this file is:

Screenshot from 2025-04-08 16-14-40

We will use vim a text editor to create this file

vim barcodes_used.txt

This will open a blank text document, press ‘i’ to enter your text

List each barcode used in the correct format (matching the end of your filenames) on a new line - do not leave whitespace

To exit vim:

1. Press ESc

2. Type ‘:wq!:’

3. Press ENTER

> Move your files into the barcodes used folder

cat ./barcodes_used/barcodes_used.txt | parallel -j 1 "mv ./PATH/TO/FILE/{}.bam ./barcodes_used"

5) Map your reads (bam files previously created) to a reference genome

I have copied references genomes into your server account, within the folder, ‘genomes’, see below:

Screenshot from 2025-04-09 11-04-34

i) Create a mapping directory

You are currently in the ‘dorado_sup’ directory, which contained your barcode-sorted basecalled reads

It is a good idea to create a separate directory for the mapped reads

a) move up a directory

cd ../

b) create a new directory for mapped reads

mkdir mapping

c) migrate to the mapping directory

cd mapping

ii) Use Minimap2 to align your individual reads to the reference genome

Minimap2 is a sequence aligner that is recommended for aligning long reads created by Oxford Nanopore sequencing technologies.

Here is the reference for Minimap2, an here is a tutorial page.

Minimap2 should already be installed in your server account, please let me know if there are issues running Minimap2

minimap2 -ax map-ont ~/PATH/TO/GENOME/REFERENCE.fa ~/PATH/TO/BARCODED/BAMs/classified_demux/XXX_barcode01.bam > barcode01_aligned.sam       

You will have to do this individually for each barcode file, and remember to change the names of the input and output files accordingly

iii) format your individual aligned files

Alignment files are outputted as SAM files, these files are large - it is best to convert these to BAM files and then DELETE YOUR SAM FILES

All of the following steps with utilise samtools, a useful ackage for working with SAM and BAM files after sequencing.

a) convert SAM file to BAM file

samtools view -Sb -o barcode01_aligned.bam barcode01_aligned.sam

-Sb indicates that the input is a SAM file and the output is a BAM file Change the name of each file accordingly depending on what sample you are working on

b) sort your BAM file

Sorting reorders the reads in your alignment based on their position when aligned to the reference genome

samtools sort -O bam -o barcode01_aligned.sorted.bam barcode01_aligned.bam

-O bam specifies the output as a BAM file -o specified the name of the output file

c) index your sorted BAM file

Indexing creates a ‘contents page’ of the aligned file, which is needed by many downstream applications, including tablet

The index file created will be called FILENAME.bam.bai

samtools index barcode01_aligned.sorted.bam

Check for generation of the indexed file using ‘ls’ to list all files in the current directory

6) Checking mapping statistics / QC

Check mapping statistics using samtools bamstats

samtools flagstat FILENAME.bam

This will output the number of reads and the percentage of reads mapping to the reference genome for a basic measure of how much P. knowlesi data you have

Genome coverage plots - to do after Easter!

7) Visualising your genome in Tablet

Install tablet

Once tablet is installed, you will need to copy uour BAM file (filename.bam) and indexed BAM file (FILENAME.bam.bai) over into your PC where tablet is installed

First copy over the bam file…

scp USERNAME@10.18.0.25:FOLDER/FILE/PATH/TO/BAMFILE/FILENAME.bam ./

Then, copy over the inde file:

scp USERNAME@10.18.0.25:FOLDER/FILE/PATH/TO/BAMFILE/FILENAME.bam.bai ./

Now you can open Tablet in your local PC, and upload the relevant BAM file and your reference genome - the reference genome is already downloaded into your PC!

Below - For Amy to edit after easter!!

7) Calling variants and looking for SNPs / INDELs

check the best variant caller for nanopore