Portfolio of Projects

Data Analytics Across Industries

Tableau Dashboard - Netflix Content Distribution

An interactive dashboard created to examine Netflix's content distribution worldwide and its evolving preference for TV shows over movies. Using the Kaggle dataset for Netflix movies and TV shows, the project makes use of Python for data cleaning and Tableau for visualization.

Tools: Python, Tableau

Techniques: Data Cleaning, Data Visualization

Outcome: The final Tableau dashboard offers an interactive way to explore Netflix's content strategy, showing how the platform's focus has shifted and reflecting broader viewing trends.

Analysis of Diabetes Prevalence in Pima Indians

In this project, I conducted an exploratory data analysis to identify the prevalence and predictors of diabetes among Pima Indian women. I utilized Python and its libraries to clean, analyze, and visualize the data.

Tools: Python

Libraries: Numpy, Pandas, Seaborn, Matplotlib

Techniques: Exploratory Data Analysis, Statistical Modeling, Data Visualization

Outcome: My analysis revealed significant predictors of diabetes, providing valuable insights into its distribution within the Pima Indian community, which could inform future public health strategies.

Movie Recommendation System - Google PaLM 2 API

This team project introduces MovieConnect, a system that leverages Google's PaLM 2 API for creating a personalized movie recommendation engine. The team collaborated on algorithm development, system integration, and user interface design, focusing on enhancing the recommendation accuracy and user experience.

Tools: Google Generative AI (PaLM 2), Python, Streamlit.

Libraries: Pandas, scikit-learn, numpy, matplotlib, seaborn, IPython, ipywidgets.

Techniques: Generative AI, Large Language Models (LLMs) Integration, IPython for interactive UI, prompt design and testing for model training, Web Application Development.

Outcome: MovieConnect demonstrates the successful application of advanced AI in entertainment, significantly improving recommendation personalization and introducing a new standard for user interaction in digital content platforms.

I managed the creation of a predictive model for forecasting used car prices, focusing on encoding categorical features and using regression techniques to predict 'price_log'. Using Python, I processed and analyzed data, then built and validated regression models.

Tools: Python

Libraries: pandas, numpy, matplotlib, seaborn, sklearn

Techniques: Data Cleansing, Exploratory Data Analysis, Categorical Feature Encoding, Train-Test Split, Regression Modeling & Analysis (Linear Regression, Ridge/Lasso Regression), Model Performance Evaluation (R2 Score, RMSE).

Outcome: The project highlighted key factors influencing used car prices. Through various regression models, including Linear Regression, Ridge, and Random Forest, I identified crucial predictors like Power and CarAge, enhancing predictive accuracy for used car prices.

Used Car Market Analysis and Price Prediction

In this project, I developed a machine learning model to predict Boston housing prices. I leveraged regression analysis, employing Python’s sklearn library for model training, testing, and validation based on historical housing data.

Tools: Python

Libraries: NumPy, pandas, Scikit-learn, Matplotlib, Seaborn

Techniques: Linear regression, decision trees, random forest, model evaluation metrics (MSE, RMSE, MAE).

Outcome: The model effectively predicted house prices, demonstrating the importance of features such as the number of rooms and proximity to employment centers, providing insights into the housing market dynamics.

Boston House Price Prediction Project

Contact

ffazal@tepper.cmu.edu