--- output: html_document: default pdf_document: default --- ```{r setup, echo = FALSE} knitr::opts_chunk$set(error = TRUE) ``` # Deep Learning We will demonstrate how to create deep neural nets with `keras`, which interfaces with `tensorflow`. This powerful tool is useful for applications such as automatic postal code recognition, a key feature in logistics systems for accurate deliveries. ```{r chunk5} library(keras) reticulate::use_condaenv(condaenv = "r-tensorflow") ``` ## A Multilayer Network on the MNIST Digit Data The `keras` package includes various example datasets, such as the `MNIST` digit data. This dataset is frequently used for digit classification, which is essential for logistics tasks like reading postal codes. ```{r chunk12} mnist <- dataset_mnist() x_train <- mnist$train$x g_train <- mnist$train$y x_test <- mnist$test$x g_test <- mnist$test$y dim(x_train) dim(x_test) ``` The dataset includes 60,000 training images and 10,000 test images, each 28x28 pixels. These are stored in a 3D array, so we need to reshape them into a matrix format. Additionally, we need to "one-hot" encode the labels. Fortunately, `keras` provides convenient functions to handle this. ```{r chunk13} x_train <- array_reshape(x_train, c(nrow(x_train), 784)) x_test <- array_reshape(x_test, c(nrow(x_test), 784)) y_train <- to_categorical(g_train, 10) y_test <- to_categorical(g_test, 10) ``` Neural networks are sensitive to input scaling. For example, digit values in postal codes (ranging from 0 to 255) must be rescaled to the [0, 1] range for optimal performance. ```{r chunk14} x_train <- x_train / 255 x_test <- x_test / 255 ``` Now, we are ready to define and fit our neural network model. ```{r chunk15} modelnn <- keras_model_sequential() modelnn %>% layer_dense(input_shape = c(784), units = 256, activation = "relu") %>% layer_dropout(rate = 0.4) %>% layer_dense(units = 128, activation = "relu") %>% layer_dropout(rate = 0.3) %>% layer_dense(units = 10, activation = "softmax") ``` The first layer converts the 784 input units (from the reshaped images) to a hidden layer with 256 units using the ReLU activation. This structure is typical for tasks like postal code digit recognition in delivery systems. ```{r chunk16} summary(modelnn) ``` The model has 235,146 parameters, including a bias term in each layer. This is essential for fine-tuning the network to perform well on tasks like postal code classification. Next, we define the fitting process, where we will minimize the cross-entropy loss function, commonly used in classification tasks. ```{r chunk17} modelnn %>% compile(loss = "categorical_crossentropy", optimizer = optimizer_rmsprop(), metrics = c("accuracy") ) ``` We are now ready to train the model using the prepared dataset. This step is crucial to teach the model to classify the digits. ```{r chunk18} system.time( history <- modelnn %>% # fit(x_train, y_train, epochs = 30, batch_size = 128, fit(x_train, y_train, epochs = 15, batch_size = 128, validation_split = 0.2) ) plot(history, smooth = FALSE) ``` The validation split of 20% means the model trains on 80% of the data. Training time is essential to ensure the model performs well when applied to logistics systems, especially for digit classification tasks like postal code reading. We then evaluate the model on the test set to see how well it generalizes. ```{r chunk19} accuracy <- function(pred, truth) mean(drop(as.numeric(pred)) == drop(truth)) modelnn %>% predict(x_test) %>% k_argmax() %>% accuracy(g_test) ``` In logistics, it's important to visualize predictions, such as how well the model can classify postal codes. ```{r} modelnn %>% predict(x_test) %>% head() # Probability outputs, with one column showing a 100% probability for digit 7 (postal code digit). (prognoze <- modelnn %>% predict(x_test) %>% k_argmax() %>% drop() %>% as.numeric()) ``` Comparing the predicted labels with the true labels helps assess how well the model is classifying postal codes. ```{r} prognoze g_test cor(prognoze,g_test) ``` In logistics, accurate digit classification (e.g., postal codes) is crucial for optimizing delivery processes and ensuring correct package routing. For simpler tasks, we can use a multiclass logistic regression model, which is faster but less complex than the previous neural network. ```{r chunk20} modellr <- keras_model_sequential() %>% layer_dense(input_shape = 784, units = 10, activation = "softmax") summary(modellr) ``` The training process is similar, but the architecture is simpler, making it faster for less demanding tasks. ```{r chunk21} modellr %>% compile(loss = "categorical_crossentropy", optimizer = optimizer_rmsprop(), metrics = c("accuracy")) modellr %>% fit(x_train, y_train, epochs = 30, batch_size = 128, validation_split = 0.2) modellr %>% predict(x_test) %>% k_argmax() %>% accuracy(g_test) ``` While this approach is faster, it may not be as effective on complex datasets like MNIST. However, it can still be useful for logistics applications where digit recognition is less complex but still necessary.