My research interest is in the intersection of Natural Language Processing and Computer Vision. Particularly, I am interested in multi-modal learning, language grounding and low-resource machine learning. Recently I’ve also been exploring large scale machine learning systems in general, as part of my work at Chaldal!


Geospatial Language Model

OSM

Sept. 2023 ‑ Present 🟡

Working on creating a geospatial language mode that enhances the understanding of geo-entities in natural language, which in turn is able to help us create an address matching platform specially tuned for house addresses, at which platforms like Google Maps fails. This will help optimize delivery agents to deliver orders faster and more efficiently.

Collaborators:

  • Tejas Viswanath (CTO, Chaldal Ltd.)
  • Asif Imitial (SWE, Chaldal Ltd.)

🏷️ Natural Language Processing LLMs Retrieval Augmented Generation


3D Particle Picking

Oct. 2022 ‑ Present 🟡 Manuscript in preparation

Worked on developing a semi-supervised framework for 3D particle picking from macromolecular samples, which is able to detect particles directly from subtomogram.

Collaborators:

  • Ajmain Yasar Ahmed Sahil (BUET)

Supervisors:

🏷️ Computer Vision Object Detection Image Processing


Low resource computer vision aided PD recognition

PULSAR

May 2022 ‑ Aug. 2023 🟢 Under review at MICCAI ‘24

Developed PULSAR, an automated PD screening tool. It uses spatio-temporal graph neural network to detect PD from videos. We were also the first to explore positive unlabeled learning in this setting, addressing the lack of positive labels, which is generally the case in medical data. [Lead Investigator]

Collaborators:

Supervisors:

🏷️ Computer Vision AI for Healthcare 🔗 Project page


Quartet Fiduccia-Mattheyses Revisited for Larger Phylogenetic Studies

QFM-FI

Nov. 2022 ‑ April 2023 🟢 Accepted at Bioinformatics (Oxford University Press) Journal!

Our proposed method QFM-FI, a faster and improved version of the QFM algorithm, can amalgamate millions of quartets over thousands of taxa into a species tree with a great level of accuracy. This also achieves a speedup of 20,000x compared to its predecessor. Worked on providing a theoretical analysis of the running time and memory requirements of QFM-FI.

Collaborators:

  • Sharmin Akter Mim (BUET)

Supervisors:

🏷️ Bioinformatics Computational Biology


Other Research and Independent Exploration


Personalized Recommendation System

chaldal catalog page

June 2023 ‑ Present 🟡

Developing an in-house personalized recommender system from the ground up for the leading online platform (Chaldal.com ) for grocery and delivery in Bangladesh, scaling it to serve more than one million users.

Collaborators:

  • Anand Bhaskar (Research Scientist, Meta)
  • Rohit Vaz (VP of Engineering, Chaldal Ltd.)
  • Asif Imitial (SWE, Chaldal Ltd.)

🏷️ Recommender Systems Reinforcement Learning


Bangla Plagiarism Detection

chaldal catalog page

Jan. 2023 ‑ Mar. 2023 🟢

We created the first Bangla Plagiarism Dataset (Available in Hugging Face 🤗 ) using a semi-supervised approah as part of our Machine Learning project. Along with that, we proposed two distinct approaches for detecting plagiarism. The first method involves fine-tuning Bangla BERT, while the second method utilizes sentence embeddings for multi-document plagiarism detection. The software will soon be available. A presentation on the project is available here .

Collaborators:

  • Ramisa Alam (BUET)

Supervisors:

🏷️ Natural Language Processing Rlagiarism Detection 🔗 Project page


Occam’s Razor Strikes Again: Revisiting Short-Text Stream Clustering with Latest Sentence Embeddings

OPSEC

May 2021 ‑ May 2022 🟢

During the height of COVID-19, online classes were the new normal. The rise of MOOCs led to a very interesting research question - Given a large number of questions already posed by the students, how can we detect repetitive questions in real time and unify them as a single question for the instructor? Under the supervision of Prof. Subhra Kanti Karmakar, I worked on this problem of online short-text clustering. We developed a software tool called One Pass Sentence Embedding Clustering. This tool efficiently clusters short text streams using a unique one-pass algorithm that calculates similarity scores based on sentence embeddings. It was used as an internal tool to cluster questions in online classes at Auburn.

Collaborators:

  • Minh Smith (Auburn University)
  • Ramisa Alam (BUET)

Supervisors:

🏷️ Natural Language Processing Clustering 🔗 Project page


Bangla Sign Language Recognition

Sign Language

Oct. 2022 ‑ Feb. 2023 🟢

Sign language, as a different form of communication language, is important to large groups of people in society. Each sign in sign language is unique due to variations in hand form, motion profile, and positioning of the hand, face, and other body components. So, visual sign language recognition is a complex research area in computer vision. In this work, we present a new word-level Bangla Sign Language (BdSL) dataset consisting of 611 videos over 40 BdSL words, along with two different approaches: one with a 3D Convolutional Neural Network model and another a novel Graph Neural Network based approach for the classification of BdSL40 dataset. To the best of our knowledge, this is the first study on word-level BdSL recognition, and the dataset was transcribed from Indian Sign Language (ISL) using the Bangla Sign Language Dictionary (1997). The proposed GNN model achieved an F1 score of 89%. The study highlights the significant lexical and semantic similarity between BdSL, West Bengal Sign Language, and ISL, and the lack of word-level datasets for BdSLin the literature.

Supervised 4 junior year undergraduates in this project!

Collaborators:

  • H.A.Z. Sameen Shahgir (BUET)
  • Khondker Salman Sayeed (BUET)
  • Md Toki Tahmid (BUET)
  • Tanjeem Azwad Zaman (BUET)

🏷️ Computer Vision Sign Language Recognition 🔗 Dataset