Connor McManigal
Data Scientist | Machine Learning Engineer

As a passionate data scientist with a robust analytical foundation and diverse computational background, I thrive on innovation and continuously learning. I apply my problem-solving skills to uncover insights, develop data-driven solutions, and bridge the gap between technical insights and actionable business strategies.

Image 1 Image 2

Get to know me:

I have always been fascinated by the power of data—its ability to tell stories, uncover patterns, and drive meaningful decisions. Whether solving complex problems, extracting hidden insights, or making sense of vast amounts of information, I am passionate about transforming raw data into actionable knowledge. With a strong foundation in data science, I specialize in data manipulation, data visualization, statistical analysis, and machine learning. My hands-on experience spans across various tools, including Python, R, SQL, Spark, BigQuery, and Google Cloud Platform (GCP), along with key libraries such as Pandas, Matplotlib, Scikit-learn, and Pytorch. These tools are my go-to for working with both structured and unstructured data, building predictive models, and developing data-driven strategies that inform decision making.

 

Beyond data science, there's much more to know about me. I have over 16 years of experience in water polo, including four years as a Division 1 student-athlete at UC San Diego. This experience shaped my soft skills in communication, teamwork, time management, leadership, and discipline, all while fostering a strong work ethic and resilience. These qualities enable me to excel under pressure and in fast paced environments. When I'm not immersed in data, I enjoy staying active and practicing my hunter-gatherer skills through freediving, spearfishing, lobstering, and fishing. There's something uniquely fulfilling about sustainably providing my own meals.

Education

2023-2024
University of California, Irvine
Master of Data Science
GPA: 4.0/4.0
2019-2023
University of California, San Diego
B.S. Cognitive Science with Specialization in Machine Learning and Neural Computation
GPA: 3.8/4.0

Experience

With five years of programming experience and over four years in machine learning and data science, I have interned at CoreLogic, the American Medical Association, and the UCSD Basement Innovation Sprints Program. At CoreLogic, I gained hands-on experience with cloud computing using Google Cloud Platform, performed validation analysis for multiple resiliency models, derived resiliency and risk model statistics for Department of Insurance filings, and conducted research and development into automating tax file ingestion using prompt engineering and large language models. At the American Medical Association, I conducted a comprehensive analysis of the AI landscape in healthcare, focusing on augmented AI applications, developed ethical frameworks for algorithm evaluation to ensure transparency and equity for patients and providers, and wrote a report on using machine learning to predict the likelihood of patient engagement with healthcare programs. During my time at the UCSD Basement Innovation Sprints Program, I led a team to enhance a YOLOv4 computer vision algorithm for 24/7 surveillance of the elephant enclosure at the San Diego Zoo, collaborating closely with zoo stakeholders to address challenges such as nighttime operation and environmental noise.

Jun 2024 - Sep 2024
Data Science and Analytics Intern
CoreLogic
Jun 2022 - Sep 2022
Integrated Health Model Initiative Intern
American Medical Association
Feb 2022 - May 2022
Machine Learning Intern, Team Lead
UCSD Basement Innovation Sprints Program

Portfolio

UnVAEling Network Anomalies: Detecting Network Attacks with Variational Autoencoders

In this group project, we explored the performance of two variational autoencoder (VAE) approaches—standard VAE and Mixed-loss VAE (MLVAE)—for network traffic monitoring and security. VAEs are particularly effective in scenarios with sparse labels, making them ideal for detecting the rare occurrence of network attacks. Using PyTorch, we trained both VAE models on normal traffic data and validated their performance using the RT-IoT2022 dataset, which simulates communication between smart devices and includes nine types of attacks. For validation, we implemented one-left-out classification, utilizing Bayesian optimization to define reconstruction loss thresholds that maximize the separation between normal and attack traffic. This approach enabled us to optimize model accuracy for normal traffic while minimizing errors for attack traffic. By implementing multinomial classification, we enhanced our ability to identify specific types of attack traffic, moving beyond traditional binary classification methods. After tuning the hyperparameters, we evaluated each model’s filtering accuracy and achieved a mean AUC of 0.7534 with MLVAE1 and 0.8092 with MLVAE2. Although these results are lower than those found in similar studies, they highlight MLVAE's potential for detecting stealthy or novel attacks, particularly in scan detection, by effectively leveraging reconstruction loss thresholds.

UnVAEling Network Anomalies: Detecting Network Attacks with Variational Autoencoders

Leveraging Sentiment Analysis and Data Augmentation to Recreate Recipe Scoring Algorithm

In this project, my team and I explored how sentiment analysis can be utilized to augment predictions and recreate the scoring algorithm for recipe reviews by analyzing a dataset of 18,000 reviews sourced from the UCI Machine Learning Library. We employed VADER and TextBlob libraries to derive polarity and subjectivity scores, which were used to enhance our dataset alongside original features like user reputation and response counts. We trained and compared the performance of Multi-Layered Perceptron (MLP) and Gradient Boosting Regressor (GBR) models, focusing on their capacity to capture complex relationships and non-linear patterns. For model training, we utilized an 80/20 data split and applied techniques like Scikit-learn's GridSearchCV for hyperparameter tuning. Our findings revealed that the GBR outperformed the MLP, achieving a Mean Absolute Error (MAE) of 21.446 compared to 22.672. While the study validated the predictive power of sentiment scores, it also highlighted limitations, such as the reliance on limited data and general sentiment analysis packages and tools.

Leveraging Sentiment Analysis and Data Augmentation to Recreate Recipe Scoring Algorithm

San Diego County 2021 Automobile Accident Analysis

For this project, my team and I conducted an analysis of automobile accidents in San Diego County in 2021, utilizing a dataset that originally contained over 2.8 million records. We refined the dataset to focus on 23,915 accidents, extracting key variables such as date, month, season, and weather conditions. Using packages like tidyverse and ggplot2 for data manipulation and visualization, our exploratory analysis revealed that December 14, 2021, had the highest number of accidents (379), with December totaling 4,055 accidents overall. Statistical analysis indicated a positive correlation between accident frequency and adverse weather, particularly in winter and fall, with foggy conditions linked to increased accident likelihood. Our findings emphasize the significant impact of seasonal and weather changes on road safety, aiming to inform strategies for reducing accidents in San Diego.

San Diego County 2021 Automobile Accident Analysis

Machine Learning for Diabetes Prediction: A Comparative Study of Binary Classification Techniques

This project aims to assess different machine learning algorithms in successfully predicting diabetes patients, addressing the complexities of accurate diagnosis, which is crucial for ensuring timely treatment. We utilized a dataset of 100,000 observations with features such as age, BMI, and HbA1c levels, applying binary classification algorithms, including logistic regression, decision trees, random forests, k-nearest neighbors, and support vector machines. Using grid search and random search for hyperparameter tuning, we evaluated model performance based on sensitivity, precision, specificity, and ROC-AUC. Ultimately, our decision tree model demonstrated the best balance of high weighted recall and low false negative rates, scoring highest on our sixteen-point scale that considered key error metrics. We believe that, if deployed, this model would generalize well to new data, although further training on larger datasets and additional patient variables would enhance its effectiveness.

Machine Learning for Diabetes Prediction: A Comparative Study of Binary Classification Techniques

Effect of NBA Injuries on Team Record Data Analysis (2010-2015 Seasons)

This project aimed to assess the relationship between player injuries and NBA team performance from 2010-2015, hypothesizing a negative correlation between the number of injured players and winning percentage due to reduced roster depth. We combined an injury dataset with the Historical NBA Performance dataset to match players with their teams, meticulously cleaning and merging the data using Pandas. Exploratory analysis was performed with Seaborn and Matplotlib to visualize distributions. Our OLS regression analysis via Statsmodels revealed a weak negative relationship between injuries and winning percentage, and a weak positive relationship for returning players. We also trained a Scikit-Learn linear regressor, yielding a root mean square error (RMSE) of 10.86 in predicting the relationship between total and returning injured players and winning percentage. Finally, we developed a function to predict the 2016 winning percentage based on injury data, underscoring the need for more comprehensive injury data to improve accuracy.

Effect of NBA Injuries on Team Record Data Analysis (2010-2015 Seasons)

Contact Connor

Profile Picture

Phone Number

(949)-630-5208

Work Email

mcmanigc@uci.edu

Personal Email

conmcmac@gmail.com

Send Message