In this page I showcase a selected list of my software and tools I have developed over time.
Seq2Vec is a software to convert metagenomics long reads (or contigs) into oligo-nucleotide frequency vectors. The main motivation of the tool is to be fast and robust. Seq2Vec uses memory mapped output files to operate in a highly parallel manner. Give it a go https://github.com/anuradhawick/seq2vec.
This more or less does the same of Seq2Vec. However, this tool generate coverage vectors for long reads (or contigs). This tools implements the k-mer coverage histogram explained in MetaBCC-LR. Give it a go https://github.com/anuradhawick/seq2covvec.
Metagenomics and plasmid recovery
My PhD study was on developing efficient algorithms and models for metagenomics binning and plasmid recovery. Following are the software that I developed during my study.
- MetaBCC-LR – my first binning tool. Introduced the concept of k-mer coverage histogram. Uses TSNE for dimension reduction and DBSCAN for clustering. Try it https://github.com/metagentools/MetaBCC-LR.
- LRBinner – one disadvantage of MetaBCC-LR was having to use coverage and composition in two steps. Resolved that using an auto-encoder to combine the two features. Also developed a new clustering algorithm. Try it https://github.com/anuradhawick/lrbinner.
- OBLR – this tool brought in read-overlap graph to binning long reads. UMAP detect clusters and GraphSAGE perform labelling. Perform probabilistic sampling to address the issue of variable cluster sizes. Implemented using the CUDA RAPIDS library. Try it https://github.com/anuradhawick/oblr.
In computer vision, training of an object detector is challenging due to limited training data. However, Object Annotation maker I created would help someone to bootstrap images based on a target image and a set of backgrounds. Try it https://github.com/anuradhawick/Object-Annotation-Maker.
Serverless computing is very popular in web service development and rapid prototyping. I have made this simplistic boilerplate to get you started. I am using it myself in https://vinyl.lk. Try it https://github.com/anuradhawick/aws-lambda-serverless-boilerplate.
CUDA programming in C++ is challenging. Have a look at my simple k-mer counter. I made this for a presentation at work. Bits and pieces may be re-usable. Give it a peek https://github.com/anuradhawick/CUDA-k-mer-counting.
Also https://github.com/anuradhawick/GPU-Optimized-Algorithms to see how you can write CUDA code for some algorithms.
And this https://github.com/anuradhawick/GPU-Optimized-Biological-Data-Analysis to see how I have implemented the Needleman-Wunsch algorithm (NW algorithm) in bioinformatics.