Introduction

Hi there! I'm Aman Arya, a developer passionate about building reliable and scalable software systems.

My journey spans from deploying cloud-native web applications on AWS to designing data-intensive pipelines with Spark and Iceberg. I've engineered full-stack platforms that scaled to hundreds of customers, showcasing my ability to deliver robust solutions at scale.

I specialize in:
• Machine learning and data engineering
• Cloud architecture and AWS services
• Software engineering and distributed systems

Scroll down to know more about me!

Curriculum Vitae


Education
Stony Brook University, NY
Master of Science in Data Science (Expected May 2025)
Relevant Coursework: Advanced Databases, Cloud Computing, Advanced Computational Algorithms, Machine Learning

Technical Skills
Languages: Python, Java, C++, JavaScript (ES6+), SQL, Shell
Frameworks: Spring Boot, React, Redux, Node.js, Express.js, Flask, FastAPI, Docker, Kubernetes
Cloud & Tools: AWS (EC2, S3, Lambda), CI/CD, OpenAI APIs, Apache (Spark & Iceberg)
Databases: MongoDB, PostgreSQL, MySQL, Redis, PostGIS, Delta Lake

Recent Experience
The Research Foundation for SUNY — Software Engineering Research Assistant
Jan 2025 - Present
• Built and deployed a scalable full-stack flood risk analysis platform using Spring Boot and AWS
• Engineered an advanced RAG pipeline achieving 95% factual accuracy across validated queries
• Developed RESTful APIs and implemented CI/CD with GitHub Actions and comprehensive testing

Publications

Research Papers

1. Rangwal, G., Arya, A., Subramaniam, A., Singh, K.P., Liu, X. (2024). "Orbits and vertical height distribution of 4006 open clusters in the Galactic disk using Gaia DR3". Journal of Astrophysics and Astronomy.

This study combines astrophysics with advanced data engineering to analyze the Galactic structure. I developed a high-performance data pipeline using optimized spatial algorithms that reduced cross-matching runtime between 227,000 stars and 7,000 clusters from 7 hours to just 25 minutes. The implementation showcases software engineering best practices including modular design, robust validation, and workflow automation. The resulting visualization (shown right) maps the spatial distribution of open clusters (colored dots) against the Milky Way's spiral arm structure, demonstrating how efficient data processing enables large-scale astronomical analysis.

Distribution of open clusters in the Galactic disk

Projects

Sea Level Rise Forecast Webtool

Engineered a full-stack flood risk visualization platform using Flask, Google Earth Engine, and OpenAI's GPT-4. The application features an interactive flood risk map with real-time water level adjustments, specialized AI chat assistant for CoastalDEM dataset queries, and comprehensive tide predictions across major coastal stations. Implemented responsive design patterns and modern UI components using Bootstrap 5, achieving seamless cross-device compatibility and enhanced user experience through gradient backgrounds and smooth animations.

Sea Level Rise Forecast Webtool Screenshot
Stock Price Prediction Analysis for Tech Companies

Market Regime Detection with HMM

Implemented a sophisticated Hidden Markov Model (HMM) framework for S&P 500 market regime detection using Python and scikit-learn. Engineered an automated data pipeline with Apache Airflow for daily stock data processing and model retraining. Achieved 85% accuracy in identifying market transitions by implementing Forward-Backward, Viterbi, and Baum-Welch algorithms from scratch, demonstrating strong mathematical modeling and algorithmic skills.

simpleEnsemble: Advanced ML Package in R

Developed a comprehensive R package implementing a suite of machine learning models including regularized regressions (Ridge, Lasso, Elastic Net) and tree-based methods (Random Forest, Bagging). Built a unified pipeline with automated feature screening, cross-validation, and model stacking, achieving production-grade efficiency. The package demonstrates expertise in statistical computing, algorithm implementation, and software development best practices.

Stock Price Predictions for Tech Companies
SVR/CVaR Regression Analysis Plot

Advanced Regression Methods Analysis

Conducted comprehensive research comparing Nu-Support Vector Regression (NuSVR) and Conditional Value at Risk (CVaR) regression methods. Implemented both algorithms from scratch and demonstrated their mathematical equivalence through empirical analysis on multiple datasets. Developed a robust testing framework achieving 99% coefficient similarity across 1000 train-test splits, providing valuable insights into the theoretical foundations of machine learning algorithms.