About Me
A curious and passionate researcher with the hunger to make an impact and who takes pride in solving problems using data. With a research background in multidisciplinary computational and physical fields of science, my research interests include open, emerging technologies such as artificial intelligence, blockchain, and augmented reality, as well as their management and technological forecasting. My primary research focusses in the AI domain includes Multimodal networks and Explainable AI.
Education
Centre for research and Training in AI, University of Limerick, Limerick, Ireland
Member, Biocomputing and Development Systems Group , LERO, Research Ireland Centre for Software
University Grant Commission, NET (National Eligibility test), India.
Lectureship Subject: Computer Science. Percentile: 98.01%
University of Delhi, Delhi, India.
Major: Informatics. CGPA: 7.39
Advisor: Dr. Sanjiv Singh, Conducted Lectures on Machine Learning For M.Sc. Students.
Christ University, Bangalore.
Major: Mathematics, Physics And Electronics. CGPA: 8.25
Conducted Lectures on Superconductors For B.Sc. Students.
Teaching Experience
Teaching Assistant
Spring 2025
Software Development Project, University of Limerick, Ireland.
The course guided bachelor students through the end-to-end process of designing and developing a web application. Students gained hands-on experience with front-end technologies such as HTML, CSS, and Bootstrap, as well as back-end development using PHP and MySQL.
Teaching Assistant
Spring 2025
Applied Big Data & Visualisation, University of Limerick, Ireland.
The course was designed as a hands-on module to equip M.Sc. AI students with practical skills in managing and analysing large-scale data. While briefly covering the theoretical foundations of distributed data processing, the primary focus was on key technologies such as Hadoop and Apache Spark.
Teaching Assistant
Fall 2024
Evolutionary Algorithms & Humanoid Robotics, University of Limerick, Ireland.
The course introduced M.Sc. AI students to bio-inspired optimization methods, focusing on both theory and practical implementation. Core topics included genetic algorithms, evolution strategies, genetic programming, and grammatical evolution. Emphasis was placed on solution representation, selection mechanisms, and fitness evaluation.
Teaching team
Fall 2024
Business Information Systems, University of Limerick, Ireland.
The course was designed to provide bachelor students with a foundational understanding of how information systems support and enhance business operations. Lectures covered key topics such as data management, business process modelling, and the strategic use of information technology. Additionally, the course explored the role of IT in decision-making and introduced tools for analysing and optimizing business performance.
Teaching Assistant
Spring 2024
Explainable AI, University of Limerick, Ireland.
The course was designed to equip M.Sc. AI students with a thorough understanding of the explainable components of artificial intelligence. Lectures were conducted, during which the distinctions between explainable and non-explainable models were explained. Additionally, the course covered methodologies for interpreting non-explainable models
Teaching team
Spring 2023
Data mining, University of Shanghai for Science and Technology, China.
The content of this course is spread across different stages of the data mining process such as problem definition, analytics design, data exploration, data preparation, modelling using both open-source AI platforms and Python programming, and visualization. The lectures will also provide a working foundation for real-world applications through case study discussions in the classroom.
Teaching team
Fall 2022
Python programming, University of Shanghai for Science and Technology, China.
This course is intended to lay the groundwork for real-world applications, which will be followed by an advanced Python programming course titled “Python Applications” in the final semester of the undergraduate program.
Teaching team
Fall 2021
Artificial Intelligence & Citizen Development for International Business, SKEMA Business School, France.
The course was organized to realize the theories in Data Management, AI, Natural Language Processing (NLP), and Digital Transformation (DT) through low-code environments and concepts of citizen development in the International Business context. Taught the master’s course Artificial Intelligence for International Business School during my appointment with high teaching effectiveness evaluation of 4.2/5.
Publications
Singh, D. K., de Lima, A., Reyes Fernández de Bulnes, D., & Ryan, C. (2025). GraCo: Towards GRammar Assisted COunterfactuals. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO ‘25 Companion) [Accepted].
Blockchain technologies adoption in healthcare - Overcoming barriers amid the hype cycle to enhance patient care
Govindarajan, U. H., Singh, D. K., & Yadav, V. S. (2025). Blockchain technologies adoption in healthcare: Overcoming barriers amid the hype cycle to enhance patient care. Technological Forecasting and Social Change, 213, 124031.
An Analysis of Patent Grants between the years 2015-2023 in Conjunction with Academic Publications to Obtain Complementary Information and Build Educator Confidence in Web3 Integration
Govindarajan, U. H., Singh, D. K., Zhong, W. & Yadav, Sun-Lin,H. (2025). An Analysis of Patent Grants between the years 2015-2023 in Conjunction with Academic Publications to Obtain Complementary Information and Build Educator Confidence in Web3 Integration. Engineering Management Review (Accepted).
Forecasting cyber security threats landscape and associated technical trends in telehealth using bidirectional encoder representations from Transformers (BERT)
Govindarajan, U. H., Singh, D. K., & Gohel, H. A. (2023). Forecasting cyber security threats landscape and associated technical trends in telehealth using bidirectional encoder representations from Transformers (Bert). Computers & Security, 103404.
EvoDropX - Evolving Feature Dropout for Faithful Model Explanations
Submitted - The project deals with creating a faithful explainability method for identifying important features of the model.
Engagements
The IEEE International Conference on Computer Supported Cooperative Work in Design (IEEE CSCWD 2022), Hangzhou, China.
The special session Business Intelligence: The CSCW Perspectives, IEEE International Conference on Computer Supported Cooperative Work in Design (IEEE CSCWD 2022) , Hangzhou, China.
International Conference on Artificial Intelligence, and its Applications (iCARTI, 2021), Mauritius.
Rural Revitalization & E-mobility
Mentored a team of 4 students from the National Rail & Transportation Institute, Vadodara to participate in the 8th China International College Students’ “Internet+” Innovation and Entrepreneurship Competition. The team secured the third position in the Shanghai zone.
Professional Experience
As a HIPAA-certified Data Scientist, I played a key role in orchestrating data science infrastructure, analyzing large and diverse datasets to extract business value, and driving process optimization. I actively researched and shared cutting-edge advancements in machine learning relevant to our use cases, ensuring the team stayed aligned with the latest industry standards.
Projects
-
Risk Adjustments and Analytics - Knowledge Integrated Neural Network (KINN) [AWS EMR, Kafka, Scala, Pyspark, Tensorflow Distributed, Scikit-learn]
Developed a fully scalable and distributed pipeline for detecting missing medical conditions (ICD/HCC codes) from patients’ medical history (claim data) (Suspect Engine). The architecture combines a Machine Learning method based on an Autoencoder - Recommendation system with an expert knowledge approach based on a Clinical Rule Engine. Furthermore, Designed and built an approach (Targeting Engine) for identifying patient-provider relationships and locating missing diagnostic codes.
-
Talix Coding Insight Improvement and Talix Taxonomy Visualization [AWS EC2, Tensorflow, Pytorch, Detectron 2, HuggingFace Transformers, Microsoft UniLM, Doctr, Docker, FastAPI ]
Talix coding insight helps clinical coders to find diagnostic codes in patient medical literature (charts). Improvements were made in the following directions.
- Reduced OCR costs by migrating from Abby Fine to Doctr-based in-house OCR.
- Detection and segmentation of form type (Q&A versus OMR-based form) utilizing Mask R-CNN and layout parser.
- Q&A Form data extraction using LayoutLM V2.
- CascadeTabNet and Deepdoctection were used to detect tables and recognize table structures.
- Deployed Dockerized models with FastAPI endpoints.
- Medical charts to ICD-10 code generation using Multimodal network.
The Talix Taxonomy maps out over 1,000,000 health concepts and over 2,000,000 relationships and is constantly updated, expanded, and refined-making it the most comprehensive map of healthcare concepts in existence today. Designed and built a system that allows clinicians to visualize complex tabular taxonomy data in graph form. Added a dashboard to allow them to simply traverse through the graph and examine various qualities at the concept level.
-
Population Payment Management – Episode of Care Risk Score [AWS EMR, Pyspark, MLlib]
Developed a system to determine Risk Coefficient associated with multiple types of episodes of care (Pregnancy, Cancer etc.) using Statistical Models. Risk Coefficient captures the demographic and clinical markers that affect the cost of care. Iterative GLM (Gaussian with Identity link) with Wald test-based feature drop was used. The P-value used was below 0.10.
-
Prior Authorization [ AWS EC2, Hugging Face Transformers, Pythia, Falcon-40B, Llama 2 70B, LLMs]
Led the product development for fully automated Edifecs Prior Authorization, introducing a revolutionary approach integrating an AI-enabled solution with providers’ Electronic Health Records. The solution leveraged Large Language Model capabilities to transform policy requirements into a comprehensive questionnaire, seamlessly automating the matching of responses with Electronic Health Records. This innovative approach optimizes policy compliance within the broader healthcare framework.
-
Data Science Infrastructure Orchestration
- DataHub : Data Hub was used for data discovery, observability, and federated governance. This allowed Edifecs’ Data Science team to easily examine the various data sources and schema available across various Edifecs products and teams.
- MLflow : MLflow is an open-source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. Deployed an MLflow docker on EC2 to allow the data science team to track various experiments and Models.
- Automated Preliminary Exploratory Data Analysis : D-Tale, Autoviz, sweetviz, and Pandas profiling were used to automate the preliminary EDA. With the basic EDA ready in a few lines of code, data scientists were able to dive deeper.
- Explainable AI : Explain-ability is one of the main barriers AI is facing nowadays regarding its practical implementation specifically in the Healthcare domain hence we Designed AI explain-ability guidelines for different types of models. In addition, a collection of Explain-ability packages was created to assist data scientists in describing various sorts of ML models.
In this role, I focused on deeply understanding complex business problems and identifying opportunities for optimization through data-driven approaches. I conducted thorough research into current state-of-the-art techniques relevant to each use case, ensuring the solutions were grounded in the latest advancements. My work involved extensive dataset exploration, visualization, statistical analysis, and feature engineering to uncover actionable insights. I led the end-to-end process of model training, evaluation, and testing, continuously improving predictive performance through hyper-parameter tuning.
Projects
-
Inventory Management System For commercial Refrigerators [Ubuntu, Tensorflow, Opencv, Numpy, Pandas, plotly]
Based on the Monocular Depth Estimation technique and Object detection and segmentation, the number of bottles already present inside the commercial refrigerator and how many more can be stored was determined with 98% accuracy (POC for Coca-Cola India). Based on the collected data, Data Analysis and Time Series Forecasting were used to forecast the usage trend and seasonality.
-
Measuring Retail Store Traffic System [Ubuntu, Tensorflow, OpenCV, NumPy, Pandas, Intel Openvino]
Live tracking of store visitors with 71% Multiple Object Tracking Accuracy (MOTA) using the Person Re-Identification Algorithm (feature map extraction and Pedestrian Attribute Recognition) with OSNET and DeepSort. The system also analyses customer interactions with the store space, resulting in more effective and efficient store space management. The architecture is fully scalable and based on distributed computation.
-
Social Distancing and COVID-19 Management [Ubuntu, OpenCV, NumPy, Pandas, Intel Openvino]
Created a system to ensure that people are adhering to social distancing and wearing masks. In closed spaces, we used projective transform and camera calibration to determine the distance between people with a +/- 4 cm error margin. In addition, an analytic dashboard was created for the administration to determine specific areas and times of day that are more prone to violations.
-
Augmented Reality based installer Bot [Ubuntu, OpenCV, NumPy, Pandas, Tensorflow, PyOpenGL, OpenGL]
Designed a system that uses augmented reality to guide users through the installation of a network device such as a router. Object detection was used to determine which side of the device was facing the camera, and then OpenGL and OpenCV were used to project 3D objects.
-
Image Installation Auditor Tool [GCP, Opencv, NumPy, Pandas, Tensorflow]
Increased Auditing efficiency by 95% by automating the process with a dashboard built to classify different hardware equipment from images (Used Image Classification and Image Segmentation).
-
Call Center Transcripts Generation [GCP, Google UISRNN, Tensorflow, OpenSeq2seq, NLTK]
Used Speech Signal Processing, Speaker Diarization to distinguish between Customer executive and caller, and Speech Recognition to convert speech into text. Achieved 25% WER (Word Error Rate) on real-world noisy audio beating Google’s Speech API accuracy by 5%.
-
Unsupervised Text Analytics [Ubuntu, NLTK, plotly, Sentence-Bert, Scikit-learn ]
Devised an approach to find different intents, hidden inside sentences, questions, or a comment in an unsupervised manner using PCA and clustering. Raw sentences are encoded using Sentence Bert (S-Bert) and projected onto the top 100 Eigenvectors using PCA capturing 95% variance. Performed cluster analysis to find the hidden intent clusters. The optimal number of clusters was determined using WCSS and elbow method. We got the Most Innovative team award for this research.
-
Natural Language Understanding and Time series prediction [GCP, NLTK, Sentence-Bert, Keras, Scikit-learn ]
Formulated a way to analyze interactions between customer and Customer support, and further determine whether customers will continue with the service or opt-out. The accuracy of the model was 92%. This helped in retaining customers who were at risk of opting out of the service.
In these internship, I began to understand the nuanced differences between real-world data and curated repository datasets, which are often cleaner and more structured.
Projects
-
Detection of Different Types of Arrhythmias [Ubuntu , Tensorflow, OpenCV]
Created an approach to automate the detection of cardiac arrhythmia from two lead ECG data. My role here was to train a TDNN-based model on pre-processed data to determine whether the QRS complexes were arrhythmic or not. We used filtering and SVM to determine the position of QRS complexes.
-
Detecting Number of Starch Structures on a Grayscale image [Ubuntu , OpenCV]
Used blob detection methodology to automate the tedious task of detecting starch structure on a gray scale image. Not only did we detect the total number of starch structures, but we also determined the radius and position of each starch structure.
Projects
-
Consumer Feedback Analysis [GCP, NLTK, Sentence-Bert, Keras, Scikit-learn ]
Designed an application to analyze user feedback and determine technical areas that need more attention. Average classification accuracy was 93.8% in 6 classes.
Role
-
Work closely with backend, frontend teams and product teams to develop analytics tools for user commute analytics.
-
Drive the development of analytical, data-driven tools that can drive better/faster decision making and/or more personalized interactions with our consumers.
-
Develop systems to Analyze user behavior.
Technical Skills
Data Science skills
Exploratory Data Analysis, Time Series Analysis, and Forecasting, Classification, Regression, Clustering And Statistical Analysis, Residual Analysis, SHAP analysis, Computer Vision, NLP, Multimodal Tasks (Form data extraction), Recommendation system, Synthetic Data Generation, ETL Pipeline Development.
ML Packages
Tensorflow, Pytorch, Keras, Intel Openvino (Low Computation Deployment), Pandas, Numpy, SciPy, Scikit-learn, Matplotlib, NLTK, Seaborn, Xgboost, Pyopengl, OpenCV, Openseq2seq, PySpark, MLlib, Detectron 2, Transformer, FastAPI, Tensorflow Distributed(EMR).
Computation Skills
Data Structures and Algorithms, Architecture Design (HLD & LLD), Microsoft Visio, Networking, Cryptography, OS (Linux, Linux-BSD, CentOS), Database.
Programming Skills
Python, C, Java, Scala (spark).
Data Ingestion
Kafka, S3, Redshift, MySQL, MongoDB, S3, Directly from source (Camera and Sensors).
Development Environment
Anaconda, Jupyter Notebook, PyCharm Ide, Visual Studio Code, Mobaxterm,IntelliJ(spark-submit EMR), Docker.
ML operations
MLflow, Datahub, AI Explainability packages.
External Courses
MITx - 6.00.1x
Introduction to Computer Science and Programming Using Python: John Guttag and Eric Grimson.
Edx Micromaster
Data Science Program: Sanjoy Dasgupta, Ilkay Altintas, Leo Porter, Alon Orlitsky, Yoav Freund.
- UCSanDiegoX - DSE200x : Python for Data Science.
- UCSanDiegoX - DSE210x : Statistics and Probability in Data Science using Python.
- UCSanDiegoX - DSE220x : Machine Learning Fundamentals.
- UCSanDiegoX - DSE230x : Big Data Analytics using Spark.
Deep Learning
Deeplearning.ai (All four Courses): Andrew Ng, Younes Bensouda, Kian Katanforoosh.
CS224N Stanford Online
Natural Language Processing with Deep Learning (Winter-2019): Christopher Manning, Abigail See.
CMU 10-715
Advanced Introduction to Machine Learning (Fall 2015): Barnabas Poczos, Alex Smola.
MIT Course 9.520
Statistical Learning Theory and Applications, (Fall 2015): Tomaso Poggio, Lorenzo Rosasco.
Microsoft DEV287x
Speech Recognition Systems: Adrian Leven.
Stanford CS224S/LINGUIST28
Spoken Language Processing by Andrew Maas.
More About Me
Alongside my interests in ML, DL, XAI and Physics some of my other interests and hobbies are:
- Playing Basketball
- Gaming (Apex Legends)
- Photography
- Flute
- Hiking
Me
