Hi!     I'm Nayem

Computer Science Ph.D. Candidate
Research Assistant, ASPIRE research lab
Indiana University, Bloomington, IN, USA
Research focus on Machine Learning, Deep Learning, Speech/Audio Processing, NLP, DSP

About Me

Protrait of Nayem

I am a Ph.D. Candidate in Luddy School of Informatics, Computing, and Engineering (Luddy SICE) at Indiana University, Bloomington. Additionally, I am completing my Ph.D. minor in Cognitive Science at Indiana University. I am working as a research assistant under the supervision of Professor Donald S. Williamson at the ASPIRE research lab.

My doctoral research focuses on speech/audio technology, particularly deep learning architectures that support hearing aids and voice-assistive technologies. In addition to my research, I am interested in a broad range of fields, including natural language processing (NLP), machine learning (ML), Large Language Model (LLM), and digital signal processing (DSP). My experience as an intern at Amazon (Alexa AI, Seller Partner Services), Microsoft Research (MSR), and BOSE has given me a strong foundation in this field, and I have published peer-reviewed papers at TASLP, ICASSP, INTERSPEECH, and MLSP conferences. I also worked on a novel machine learning problem, critical disease detection in a sensitive population of the community, with the Public Health Department of Indiana University under the Protected Health Information (PHI) project. During my undergraduate studies, I explored artificial intelligence (AI) and computer vision by developing a handwritten character recognition system for Bangla/Bengali language.

  Get my resume   Get my CV

Recent News

October 2023: Journal paper accepted for publication at IEEE/ACM TASLP 2024.
June 2023: Accepted paper at ISCA INTERSPEECH, 2023 for poster presentation.
May 2023: Started summer internship at Amazon Services LLC, Consumer SPIRIT, Seattle, WA.
April 2023: Reviewed papers at INTERSPEECH, 2023.
March 2023: Submitted a journal paper at IEEE TASLP 2023.
February 2023: Submitted a conference paper at INTERSPEECH, 2023.
December 2022: Completed fall internship at Amazon Services LLC, Alexa AI, Cambridge, MA.
August 2022: Completed summer internship at Microsoft Corporation, Microsoft Research, Redmond, WA.
April 2022: Successfully defended my Ph.D. Proposal.
August 2021: Presented paper virtually at ISCA INTERSPEECH, 2021. Also, worked as a remote student volunteer.
June 2021: Accepted paper at ISCA INTERSPEECH, 2021. Presented paper virtually at IEEE ICASSP, 2021.
February 2021: Accepted paper at IEEE ICASSP, 2021.
August 2020: Completed internship at BOSE Corporation, Boston, MA.
May 2020: Presented paper virtually at IEEE ICASSP, 2020.
January 2020: Paper accepted at IEEE ICASSP, 2020.
December 2019: Completed M.Sc. degree in CS from Indiana University.
October 2019: Presented paper at IEEE MLSP, 2019.
September 2019: Student volunteer at INTERSPEECH, 2019.
July 2019: Paper accepted at IEEE MLSP, 2019.
June 2019: Gave a research talk at Midwest Music and Audio Day (MMAD), 2019.

Publications

[TASLP 2024] Khandokar Md. Nayem and Donald S. Williamson. "Attention-based Speech Enhancement Using Human Quality Perception Modelling." IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), vol. 32, pp. 250-260, 2024. [paper]
[INTERSPEECH 2023] Khandokar Md. Nayem, Ran Xue, Ching-Yun Chang, Akshaya Vishnu Kudlu Shanbhogue, "Knowledge Distillation on Joint Task End-to-End Speech Translation." ISCA INTERSPEECH, pp. 1493-1497, 2023. [paper, poster]
[Arxiv 2023] Khandokar Md. Nayem and Donald S. Williamson. "Attention-based Speech Enhancement Using Human Quality Perception Modelling" in arxiv, 2023. [paper]
[INTERSPEECH 2021] Khandokar Md. Nayem and Donald S. Williamson. "Incorporating Embedding Vectors from a Human Mean-Opinion Score Prediction Model for Monaural Speech Enhancement." ISCA INTERSPEECH, pp. 216-220, 2021. [paper, slides-3min, slides-15min, video-3min, video-15min]
[ICASSP 2021] Khandokar Md. Nayem and Donald S. Williamson. "Towards an ASR approach using Acoustic and Language Models for Speech Enhancement." IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7123-7127, IEEE, 2021. [paper, poster, slides, video]
[ICASSP 2020] Khandokar Md. Nayem and Donald S. Williamson. "Monaural Speech Enhancement Using Intra-Spectral Recurrent Layers in the Magnitude and Phase Responses." IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6224-6228, IEEE, 2020. [paper, slides, video]
[MLSP 2019] Khandokar Md. Nayem and Donald S. Williamson. "Incorporating intra-spectral dependencies with a recurrent output layer for improved speech enhancement." IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 31-35, IEEE, 2019. [paper, poster]
[IU-VISION 2017] Shujon Naha, Khandokar Md. Nayem, Md. Lisul Islam, “RSGAN: Recurrent Stacked Generative Adversarial Network for Conditional Video Generation”, poster presented at IU computer vision project showcase, 2017. [paper, poster]
[BUET-Thesis 2014] Khandokar Md. Nayem, Mir Toornaw Islam, Md. Monirul Islam. “Handwritten Writer Independent Bangla Character Recognition”, undergrad thesis presented at BUET, 2014. [paper, poster]

Experience

Amazon, Seattle, WA
Applied Scientist II (L5) Intern, Seller Partner Services (SPS)
Summer 2023
Conducting research on the application of the Large Language Model (LLM) for class labeling on closed taxonomy utilizing product descriptions, while also generating chain-of-reasoning explanations for improved overall comprehension. (Python, PyTorch, Sagemaker, LLMs: BART, GPT2, GPT-J, FLAN, Falcon)
Amazon, Cambridge, MA
Applied Scientist II (L5) Intern, Alexa AI
Fall 2022
Researched the development of a real-time, end-to-end compressed multi-lingual speech translation system. Investigated the use of Large Language Models (LLMs) and applied knowledge distillation approach to transfer their performance to smaller models with 50% and 75% fewer parameters. (Python, PyTorch, Transformer, Facebook-Fairseq) ~ Published at INTERSPEECH-2023
Microsoft Research, Redmond, WA
Audio & Acoustics Research Intern
Summer 2022
Focused on analyzing and improving the performance of speech enhancement algorithms to generate high-fidelity (Hi-Fi) speech by removing distortions and extending speech bandwidth. Applied causal LSTM models with various augmentation techniques to recover distortions such as codec and clipping, and performed deep noise suppression. (Python, PyTorch, LSTM, Spotify-pedalboard)
Bose Corporation, Boston, MA
Machine Learning/Neural Signal Processing Intern
Summer 2020
Researched on enhancing speech in remote microphone applications by removing self-speech in order to provide better quality sound with low latency to hearing aids and voice-assistive wearable devices. Utilized an LSTM-based architecture with speaker-dependent d-vector for speaker identification, to ensure real-time operation. (Python, TensorFlow, LSTM, d-vector)
Indiana University, Bloomington, IN
Research Assistant, ASPIRE research lab
Fall 2016 - Present
Proposed a joint learning training program to enhance perceptual speech quality, leveraging a speech assessment model and a quantized language model to improve the performance of the enhancement model. Utilized conditional attention on crucial perceptual features extracted from subjective Mean-Opinion Score (MOS) ratings. (Python, PyTorch) ~ Published at TASLP-2024, archive version arXiv
Developed an attention-based monaural speech enhancement model that aims to maximize human perceptual rating of the enhanced speech by incorporating embedding vectors from a human Mean-Opinion Score (MOS) prediction model and jointly training the models on real-world noisy speech data. (Python, Tensorflow) ~ Published at INTERSPEECH-2021
Proposed and implemented a quantized speech prediction model that classifies speech spectra into a corresponding quantized class and applies a language-style model to ensure more realistic speech spectra. Acceptable quantization level is determined by a listener study ran in Amazon MTurk designed in Qualtrics. (Python, RNN, LSTM) ~ Published at ICASSP-2021
Designed a recurrent layer named Intra-Spectral Recurrent (ISR) layer that captures spectral dependencies within the magnitude and phase responses of the noisy speech using Markovian recurrent connections, and successfully deployed in a LSTM-based single-channel speech enhancement model. (Python, Keras, RNN) ~ Published at ICASSP-2020
Formulated a new type of recurrent output layer that enforces spectral-level dependencies within each spectral time frame modeling the Markovian assumption along the frequency axis in both uni-directional and bi-directional ways, and tested in a magnitude speech enhancement model. (Python, Keras, RNN) ~ Published at MLSP-2019
Engineered a deep architecture named Recurrent Stacked Generative Adversarial Network (RSGAN) which generates video clips based on a pre-condition like a sentence description, action classes, or fMRI signals. (Pyhton, GAN) ~ Published at IU-VISION-2017
United International University (UIU), Dhaka, Bangladesh
Lecturer, Department of CSE
August 2016
Taught courses of Computer Science curriculum, like C++ Programming language, Algorithms, Digital Logic Design and Pattern Recognition courses in classes of more than 90 undergrads.
Designed curriculum of multiple computer science courses according to the accreditation standard and requirements.
Coached the participating team of 5 students for regional round of International Collegiate Programming Contest (ICPC).
REVE Systems, Dhaka, Bangladesh
Jr. Software Engineer, Team Media Gateway
January 2015
Programmed media gateway controller to facilitate both calls and faxes between the telephone network and VoIP network or another telephone network. (protocol megaco 1.0, java)
Designed front-end panel for VoIP administrators and customers for easy use. (.jsp framework, JavaScript, Ajax, MySQL)

Education

Ph.D. in Computer Science
Indiana University, Bloomington, IN, USA
Minor in Cognitive Science
April 2024
Advisors: David J. Crandall, Donald S. Willamson
M.Sc. in Computer Science
Indiana University, Bloomington, IN, USA
December 2019
B.Sc. in Computer Science & Engineering (CSE)
Bangladesh University of Engineering & Technology (BUET),
Dhaka, Bangladesh
July 2014

Projects additional

Speech/Audio Processing
Researched noise cancellation techniques to filter out wide-range of noises from human speech using noise masking approach by applying deep neural network models. (Matlab, Python, DNN, RNN, LSTM; code)
Improved speech quality and performance through the implementation of auxiliary information, such as phonemic structure or textual information, in speech enhancement and separation tasks. (Python, TensorFlow, Attention model; code)
Researched on speech emotion detection systems that analyze speech to monitor human emotions, a valuable cue for tackling sensitive emotional situations and maintaining a healthy conversation. (CNN, RNN)
Implemented Recurrent Neural Network (RNN) and Convolutional Recurrent Neural Network (CRNN) models to automatically classify rhetorical questions with stress detection. Conducted on a personally collected dataset. (Python, Keras, NLTK; code)
Developed an end-to-end speech recognition system for the English language, utilizing a bi-directional recurrent neural network architecture without the need for frame-wise labeling. (Python, Keras, NLTK; code)
Utilized multiple machine learning algorithms such as SVM, Classification And Regression Trees (CART), and Random Forest to identify gender from voice and speech. Worked with a Kaggle competition dataset. (R)
Computer Vision, Image Processing
Developed a Recurrent Stacked Generative Adversarial Network (RSGAN) that generates video frames based on a pre-condition like sentence description, action classes, or fMRI signals. The model uses a fully convolutional LSTM network stacked in StackGAN architecture. (Pyhton, Tensorflow, GAN; code)
Developed deep neural models that generate surrounding parts of a color image. (PixelRNN, PixelCNN)
Developed a system for recognizing writer independent handwritten full-size Bangla documents without using any external database. The system includes a new feature called Connecting Feature, which was investigated for its viability in real-life Bangla documents. (Matlab)
Implemented various features such as eigenvectors, Haar-like, and bag of words to classify food images using SVM classifiers. Also used CNN features in a Deep Neural Network to achieve better accuracy. (C++, CImg, OverFeat packages)
Implemented image warping and matching using SIFT features to detect whether two images are a snapshot of the same object from different viewpoints. Also transformed the image to a general viewpoint (camera coordinate). (C++, CImg)
Created panoramic views from multiple images using image stitching with SIFT features. (C++, CImg)
Implemented a sliding window algorithm to detect cars in aerial snapshots of a parking lot. (C++)
Implemented digital watermarking in Fourier domain using FFT and detected whether an image was originally watermarked or falsified. (C++)
Developed a basic game engine that serves as a physics simulator. The engine allows for the creation of various 3D shapes with unique properties such as transparency, speed, and gravitational force, which follow Newton's Laws of Dynamics. (C++, OpenGL)
Natural Language Processing (NLP)
Implemented a Hidden Markov Model based part-of-speech tagger using Viterbi algorithm for English language grammar. (Python)
Developed a sentiment analysis system for a large Twitter post dataset, including gender detection. (Python)
Digital Health
Designed prediction models for nulliparous women to diagnose and prevent gestational diseases like diabetics, pre-eclampsia, and hypertension as part of the NSF Proactive Health Informatics (PHI) project. Also, developed a smart system that tracks the daily physical activity of women collected by wearable devices and helps to diagnose gestational complications. (Python, Random forest, DNN)
Web-based System
Designed and developed an Internal Guest Management System for the Ministry of Foreign Affairs, Bangladesh to manage guests in a corporate environment. This system allows different levels of admin users to control and monitor the entry of personnel and vehicles into the ministry premises, ensuring proper security measures. (HTML5, PHP, JavaScript, Ajax, MySQL)
Designed and developed an Automated Naval Transportation System for the Water Resource Board, Bangladesh. The system included an API for consumer ticket reservation and a validation system for naval transport. (HTML5, PHP, JavaScript, Oracle9i)
Robots & Hardware Projects
Created a Bluetooth-controlled robot with a built-in jaw for carrying objects and a mounted camera for remote control. The robot was built using Atmega16 micro-controller, servo motor, and camera. (C++)
Developed a ball-following robot using ePuck robot's Infra-Red sensors and camera.
Designed and built a 4-bit microprocessor with a basic instruction set and an associated interpreter, featuring a single-stage paneling, microprogrammed control unit, and separate data and instruction RAM.
Others
Developed a Petrel Software plug-in named FracDesigner that enhances the functionality of the commercial reservoir software by Schlumberger. The plug-in interprets seismic data, performs well correlation, builds reservoir models for simulation, submits and visualizes simulation results, calculates volumes, and designs development strategies to maximize reservoir exploitation. It also enables 2D and 3D seismic analysis of reservoirs and graphical representation of the data. This project was completed for the Schlumberger Ocean Competition 2013 and our team ranked 1st in Bangladesh and amonf the top 8 worldwide. (C#)
Developed and designed multiple eCommerce sites for trading daily products among regular clients and wholesalers. The sites offer seasonal discounts, wholesale discounts, and support payment systems like PayPal, Visa, and Mastercard. The admin can generate sales reports and update batch products in the back-end. (HTML5, PHP, JavaScript, OpenCart)
Designed and developed a secure peer-to-peer file sharing system that ensures privacy by implementing manual certification and RSA encryption (128-bit key) during file transfer over the network. (Java)
Designed and developed a modern online library management system with a user-friendly graphical interface. Implemented various features such as inventory management, online book access, and automated due calculation. (C++, iGraphics library)

Contact

Khandokar Md. Nayem
Luddy Hall
700 N Woodlawn Ave, #3061P
Bloomington, IN 47408
Email: knayem@iu.edu