Ajay Therala

Data Scientist & Data Engineer

Download Resume

Education

Master of Science in Computer Science

Arizona State University

August 2023 - May 2025

Bachelor of Technology in Information Technology

Jawaharlal Nehru Technological University

August 2017 - July 2021

Skills

Python
SQL
PySpark
Hadoop
Hive
Javascript
Spark SQL
AWS
Generative AI
Deep Learning
Natural Language Processing
PyTorch
Tensorflow
OpenSearch
Scikit-learn
Git
Data Structures & Algorithms
Databases

Research Publications

Professional Experience

AI Full Stack Developer - Technology at Arizona State University

Aug 2024 - Present

  • Optimized Data Ingestion Pipeline - Spearheaded a 95% reduction in file chunking times for large datasets (from 1500 s to 4.87s) by integrating an advanced chunking approach, enhancing data processing efficiency
  • AWS Lambda Pipeline Deployment - Containerized & Deployed docker-image to AWS ECR, and integrated it with AWS Lambda for serverless execution. Enhanced pipeline efficiency by instantiating multiprocessing in the chunking code to fully leverage AWS Lambda's vCPUs.
  • Crafted a script that leverages the Google Drive API to extract and index course content from a Google Drive into OpenSearch, empowering seamless Retrieval-Augmented Generation (RAG) for fast and precise content discovery.
  • Enhanced file processing pipeline by expanding supported file formats from 3 to 12 using the unstructured module, and optimized deployment by reducing ECR image size from 5.6GB to 3.8GB through a refined requirements file.

Systems Engineer (ML Developer), Digital Research & Innovation - Tata Consultancy Services Limited

August 2021 - August 2023

  • Engineered core components for TCS Cognitive Product Support, an intelligent domain-specific search engine, improving search accuracy by 30%.
  • Crafted Data Lens, a component for training custom NER models, achieving an impressive 80% - 90% accuracy by leveraging Ontology
  • Demonstrated expertise in Generative AI, and Prompt Engineering by developing advanced GPT-powered bots handling over 5,000+ interactions daily. Delivered impactful client demos, earning high praise from esteemed clientele.
  • Mastered AWS services to optimize data storage and automate document processing. Strategically planned and developed a proof of concept (POC) for extracting key-value pairs from handwritten forms, resulting in streamlined data management.

Research Project Intern - Tata Consultancy Services Limited

January 2021 - August 2021

  • Investigated data refinement and balancing techniques while evaluating Machine Learning and Deep Learning algorithms on NetML, CICIDS2017, and non-vpn2016 datasets, achieving a 6% improvement in detection accuracy through Bagging & Boosting Algorithms.
  • Accomplished top 5 position in the NetML - Network Traffic Analytics Challenge 2020, surpassing baseline metrics.
  • Presented research findings at ICAIIC 2022, sharing key insights with over 300 peers and industry professionals.

Projects

Text Similarity Model for Question Answer Validation for Online Learning Platforms

Engineered an automated answer validation system for e-learning, achieving 87% accuracy by combining Siamese Networks and Large Language Models, improving baseline accuracy by 6%.

Analyzing and Mitigating Hallucinations in Multi modal LLM's

Conducted analysis on multimodal LLMs like Instruct BLIP using 15,000 test cases, identifying 47% hallucinations in count and color-based questions and curating 75,000 QA pairs with Mistral LLM for further validation.

Contact Me

LinkedIn GitHub