To estimate gene abundances salmon (v.1.4.0) was applied in alignment-based mode as described in https://salmon.readthedocs.io/en/latest/salmon.html#quantifying-in-alignment-based-mode
For installation you can follow the explanations listed at https://combine-lab.github.io/salmon/getting_started/ (installs salmon in own conda environment)
input=microbepore/data/mapped/raw # input directory with all remapped filesfor file in${input}/*/*/*remapped.sorted.bam
do# folder and filenamesf_ex=${file##*/}foldername=$(echo${f_ex} | cut-d"_"-f 1,2,3)filename=${f_ex%%.*}# create dir for quantification using salmon in alignment-based mode (e.g. used in conda environment)mkdir microbepore/data/salmon
mkdir microbepore/data/salmon/${foldername}output=microbepore/data/salmon/${foldername}/${filename}mkdir${output}
conda activate salmon # activate conda environment# use salmon in alignment-based mode
salmon quant \-t${transcripts}\-l A \-a${file}\-o${output}\--threads 8
conda deactivate
done
Transcripts per million (TPM) were re-calculated using the salmon-computed effective transcript length, after dropping reads mapping to rRNAs, that are variable between non-depleted and depleted RNA sets (compare custom Rscripts/salmon_analysis.R).
Calculation of custom TPMs is performed in the modify_salmon_output function:
modify_salmon_output<-function(input,method){# read in salmon outputsuppressMessages(vroom(input,num_threads=8))%>%# sort by Number of reads mappingarrange(desc(NumReads))%>%# read in salmon output# rename columnsmutate(counts=NumReads,gene=Name,salmon_tpm=TPM)%>%# drop all rRNAsdplyr::filter(!gene%in%names_rRNA)%>%# select columnsselect(counts,gene,salmon_tpm,EffectiveLength)%>%# rename genesmutate(gene=str_split_fixed(gene,"-",2)[,2])%>%# sort by number of countsarrange(desc(counts))%>%# add gene information (including with to table)left_join(ecoli_gff,by="gene")%>%# make RNA type categoriesmutate(type_fine=ifelse(type=="rRNA",as.character(locus_name),as.character(type)),type_fine=ifelse(type=="tRNA","ncRNA",as.character(type_fine)))%>%# calculate RPK from salmon-calculated effectiveLength, TPM(hand) = RPK/sum(RPK)/1000000, additionally calc RPKM mutate(sample=method,rpk=(counts/EffectiveLength*1000),TPM_hand=rpk/(sum(rpk,na.rm=T)/1000000),rpkm=(counts/(sum(counts)/1000000))/width*1000)%>%# TPM_hand is the value used dplyr::select(gene,type_fine,counts,sample,rpkm,TPM_hand,salmon_tpm)}