Dwell Time Prediction of freight containers using Naive Bayes Algorithm

Dwell time is the time a cargo container has to sit idle on the port. In the port, the cargo containers are stacked in a storage area. The containers are placed on top of the other containers because of limited space. A Rehandle happens when a cargo container is moved because a container placed below it has to be loaded on the ship. The purpose of this project is to reduce the total number of rehandles by estimating the dwell time of each container and placing them appropriately in order to reduce their individual rehandles.

The model was built for outbound containers for export at JNPT port

Methodology: There are two data sets:

  1. Train arrival and unloading data of containers from Railways
  2. Container Outgoing data for export from JNPT.

Using the above dataset a Naive Bayes model was built to predict the dwell time of an incoming container. The independent variables are Size, weight of the container, status of the container (Full or Empty), source of origin. Etc.

Assumptions were made about the layout of the storage space of the containers. A simulation was done based on the arrival and outgoing times of the containers in the month of June. The dwell time was predicted for each containers using the above model. First, the containers were placed stacked in the storage area in the same order that they arrived without considering the dwell time. Every time a container had to be taken out of the storage, the number of containers above the container was counted as rehandles. In this way, the total number of rehandles for the month were counted. Then the containers were segregated based on the predicted dwell time. The containers with shorter predicted dwell time were put in a smaller stack and the ones with longer predicted dwell time were put in a larger stack. The rehandles were counted in the same way as before.

The number of rehandles after using the segregation strategy was 8% lower than the unsegregated method. With further improvement in the dwell time prediction model and better segregation strategy, rehandles can be reduced further. This will reduce the cost of operation.

You can find the source code here

Monte Carlo Tree Search

The tree search algorithm attempts to mimic the intuition of a portfolio manager in the selection of a ‘best’ stress testing model for a portfolio of retail loans. The trajectory the portfolio default probability will follow in a hypothetical stress situation is the basis of selection and is evaluated by managers using several heuristics. The algorithm attempts to capture these heuristics, significantly increasing the efficiency of the modelling process. The code can be found below https://github.com/adu3110/Data-Science-Repository/tree/master/R

Applying Dual Time Dynamics to Credit Risk Models

Dual time Dynamics attempts to model the loss behaviour of a portfolio of retail loans as a function of calendar time and portfolio maturation time, as well as portfolio risk profile and the impact of the macro-economy as an exogenous factor.

DTD decomposes vintage level data to months-on-books (maturation), calendar date (exogenous) and vintage (quality). It follows the same concept in Age-Cohort-Period model which is commonly used in sociology, epidemiology and other population and demography studies.
In credit risk stress testing models, we would like to model delinquency or default rates to forecast them in different macroeconomic scenarios. In this context, estimating time effect is our main interest but it is contaminated by vintage quality and loan ages. More precisely, since observation date (t) minus date of loan issue (v) is equal to loan age (a) ( a = t — v), it is not possible to estimate these factors based on a unique solution.

I worked on adapting the framework for application to stress testing models across a range of retail products. Following is the R code

##############loading libraries#########################

library(data.table)
library(ggplot2)
library(gam)
library(plotly)
library(plot3D)

#######Number of Obligations and Time Points#######

Nobl <- 1000
T_final <- 200 #####time upto which we are generating asset price series######

#######Generating Unemployment Data (Macro economic variable) for all 1000 customers#######################

unemployment_series <- c(0.08)
while(length(unemployment_series)<T_final){
curr_unemp <- (unemployment_series[length(unemployment_series)] + rnorm(1, 0, 0.01))
if(curr_unemp > 0.04 & curr_unemp < 0.12){
unemployment_series <- c(unemployment_series, curr_unemp)
 }
}
ggplot(data.table(arp = unemployment_series,
pt = 1:T_final)) + geom_line(aes(pt, arp))

#############Generating Obligations Data##########

data_customer <- data.table(oblig_id = rep(paste(“ID”, seq(1,Nobl,1), sep=’_’), T_final),
vintage = rep(1:T_final, each=Nobl),
unemployment = rep(unemployment_series, each=Nobl),
age = -1)

#############Generating Age###########

for (i in 1:20){
curr_cust_list <- paste(“ID”, seq((50*(i-1)+1),(50*(i))), sep=”_”)
data_customer[oblig_id %in% curr_cust_list, age := vintage-1–9*(i-1)]
}
data_customer[age < 0, age:=NA]
#######20 clusters, 1st 50 customers with all 200 quaters,2nd 50 customers originated at 9th quater, etc.
###########Generating Default For DtD Modeling (randomly aloting defaults)########

data_customer[, is_default := -1]
data_customer[is.na(age), is_default := NA]
for (age_curr in unique(data_customer$age)){
if(is.na(age_curr)){
data_customer[age==age_curr, is_default := NA]
}else if(age_curr <= 25){
data_customer[age==age_curr, is_default := rbinom(nrow(data_customer[age==age_curr]),
1, (0.01 + 9*age_curr*0.0004))]
}else{
data_customer[age==age_curr, is_default := rbinom(nrow(data_customer[age==age_curr]),
1, 0.1)]
 }
}

View(data_customer[, mean(is_default),age])
ggplot(data_customer[, list(prob_def=mean(is_default)),age]) + geom_line(aes(age, prob_def))

#######GAM model for DtD########

gam_model <- gam(is_default ~ s(age, df = 3) + s(unemployment, df=3) + s(vintage, df=3),
family = binomial, data = data_customer, na.action=na.omit)

#########Generating Asset Prices##############

sigma_i <-runif(Nobl, 0.03, 0.05)
A_i <- sigma_i *100000 #########asset_price####
B_i <- A_i — 1000 ##########default_threshold_price#####

############generating mu_i for each customer for each quater########

GetReturnRate <- function(dt, group){
dt$prob_def <- 1/(1+exp(-predict(gam_model, newdata=dt)))
mu_i <- (log(B_i[group]/A_i[group]) + dt[age>0]$age*(sigma_i[group]²)/2 -
sigma_i[group]*sqrt(dt[age>0]$age)*qnorm(dt[age>0]$prob_def))/dt[age>0]$age
dt$mu_i <- c(rep(NA, nrow(dt[is.na(age) | age==0])), mu_i)
return(dt)
}
data_customer <- data_customer[, GetReturnRate(.SD, .GRP), oblig_id]

########generating asset price using merton’s formula#####

GetAssetPrice <- function(dt, group){
dt$asset_price <- exp(log(A_i[group]) + dt$mu_i*dt$age — dt$age*(sigma_i[group]²)/2 +
sigma_i[group]*sqrt(dt$age)*rnorm(nrow(dt), 0, 1))
dt[age==0]$asset_price <- A_i[group]
dt$default_threshold <- B_i[group]
return(dt)

}

data_customer <- data_customer[, GetAssetPrice(.SD, .GRP), oblig_id]
data_customer <- data_customer[, default_or_not := as.numeric(asset_price < default_threshold)]
CreateRowData <- function(series){
lst <- as.list(series)
names(lst) <- paste0(‘T’, (0:(length(series)-1)))
return(lst)
}

data_cust_asset<- data_customer[,CreateRowData(asset_price), oblig_id]
data_cust_def_or_not<- data_customer[,CreateRowData(default_or_not), oblig_id]

########Plotting#######

ViewPricePlot <- function(Nobl_no){
curr_obgl <- paste(‘ID’, Nobl_no, sep=’_’)
curr_dt <- data_customer[oblig_id == curr_obgl][!is.na(age)]
p <- ggplot(curr_dt) + geom_point(aes(age, asset_price), size=0.2) +
geom_line(aes(age, asset_price)) +
geom_hline(aes(yintercept=unique(curr_dt$default_threshold), color=’red’))+
scale_color_identity(guide=’legend’, labels=c(‘default threshold’))+ ggtitle(paste(“Asset Price for”, Nobl_no, “obligation”))
return(p)
}

ViewPricePlot(600) #######view price plot of 600th customer#####

ggplot(data_customer[!is.na(age), list(prob_def=mean(asset_price<default_threshold)),age]) +
geom_line(aes(age, prob_def))
def_rate <- data_customer[!is.na(age), list(prob_def=mean(asset_price<default_threshold)),.(vintage, age)]
Sys.setenv(“plotly_username”=”aditi_tiwari”)
Sys.setenv(“plotly_api_key”=”Ow0zdx8YcWbyQsdeyJQs”)
p <- plot_ly(def_rate, x = ~vintage, y = ~age, z = ~prob_def,
type = ‘scatter3d’, mode = ‘surface’,
opacity = 1, line = list(width = 6,
color = ~vintage,
reverscale = T))
p
View(data_customer[oblig_id==’ID_100'])library(data.table)

The Journey began in 2002…

It all started at the age of 12 when I wrote my first code in the scheme with the help of my elder brother and soon I fell in love with programming. So today I thought to put down in words the journey of my programming life. Being a kid I realized that my capability to do anything in life is directly proportional to my coding skills. The more I wrote a neat and succinct code the cleaner and organized I became. I started harping more and more on my style of writing codes and used to practice a lot due to which  I developed a focused attitude towards any other task I performed.  As with focus, unstructured time to explore an interest is a good way to foster curiosity and after all, Life is about continuous improvement and taking the steady tiny step towards your goal.

From the very beginning of my education, I was a math lover and someone who loved solving puzzles & logic. Programming was a very fruitful skill which back then I never realized and used it in my daily life. So I learned my basic math solving skills from the Trachtenberg speed system of Basic Mathematics and I remember putting it in form of CLI to play with it for fun, similarly, I made a CLI using the book links which I used to read back then just to store them in a proper order. I loved shell back then. 😛

My interdisciplinary course in graduation (Industrial & Production Engineering) gave me a wider spectrum of applications for programming. I build a UI for an FMCG during my internship for an inventory management project using HTML, JS, C++, and SQL to which used to identify no.of plastic bottles and tin cans required for a particular a particular area in different quarters, it was a distribution problem which was solved by this naive fix and was appreciated.

My Postgraduation at Indian Institue of Science (IISc) helped me upskill my mathematical knowledge and gave me a different lens to look at problems. My favorite subject at IISc was Probability & Statistics which forms a base of so many algorithms out there now. I firmly believe  that it all starts from a random variable ‘X’. This guy ‘X’ is something one needs to tame to understand an algorithm. By taming it I mean learning the probability it follows and the distribution it gives. And of course, my love for coding followed with it I also learned machine learning & the big data environment (Hadoop).

With all these experience and a continuous effort, here I’m standing in a space which is an intersection of mathematics and computational skills which I never realized I’ll be being a kid.

I believe that my profile curve and my skills as a person were shaped up to present by that one skill which my elder brother sowed as a seed in me and then nurtured deep in me and now will nourish forever.

Cheers

A 🙂