Kio-med Pharma Sales Forecasting –
Business Problem Statement - The retailer has provided the historical sales data and is looking to forecast the sales for the period of one month. These forecasts will be used to ensure that the company is able to stock its supplies of medicines in a warehouse accordingly in each city for a period of one month.
Summary - The dataset has approx. 22 million records for 10 city categories and 3380+ medicine categories arranged in chronological order of dates from 1st January 2015 to 30th June 2018. The Sales forecasting for 30 days, for the month of July 2018 has to be predicted for each city and for relevant medicines, provided in the test data.
Technical Solution
EDA - The problem required to convert datatypes, impute missing values and one hot encode the categorical variables. Plotted heatmaps, scatter plots, box-plots to understand the correlation between variables, detect outliers, distribution of data. All the possible visualizations have been plotted using Tableau & matplotlib, insights have been shared in an understandable business language.
Model Building - This is a typical Sales forecasting problem, which could either be approached using time-series forecasting or regression-based algorithms.
I have taken a route of building regression models. Started with linear regression and based upon its poor performance used decision tree regressor, followed by L1/L2 regularization model i.e. elastic-nets.
However, the desired predictions were way too far from the actual values. All those algorithms were suffering from BIAS nature of the outcome.
I later opted for ensemble technique of Gradient boost (Boosting addresses the problem of BIAS). Tuned hyper-parameters of the boosting algorithm based on findings from previous tunings.
Also applied Neural Nets, Grid-Search. however, it was getting computationally expensive.
Although the data of Dependant variable is normally distributed, applying CLT and building model on a sample dataset was a challenge as each medicine and city has its own effects on sale value.
The final results gave optimized R2 and RMSE values through Gradient Boost Algorithm.
Business Outcomes
The forecasting has helped the client make appropriate measures and stock up the warehouses well in advance. It eventually saved a lot of operational expense on logistics and gave insights about, which city has demand for what kind of medicines, areas they can focus to improve sales. Topped the Hackathon.
Refer code here: code_sales forecasting.html