CSE 4311 - Assignments - Assignment 5

List of assignment due dates.

The assignment should be submitted via Canvas. Submit a file called assignment5.zip, containing the following files:

answers.pdf, for your answers to the written tasks, and for the output that the programming task asks you to include. Only PDF files will be accepted. All text should be typed, and if any figures are present they should be computer-generated. Scans of handwriten answers will NOT be accepted.
All source code files needed to run your solutions for the programming tasks. Your Python code should run on the versions specified in the syllabus, unless permission is obtained via e-mail from the instructor or the teaching assistant. Also submit any other files that are needed in order to document or run your code (for example, additional source code files).

These naming conventions are mandatory, non-adherence to these specifications can incur a penalty of up to 20 points.

Your name and UTA ID number should appear on the top line of both documents.

For all programming tasks, feel free to reuse, adapt, or modify, any code that is posted on the slides or the class website.

Task 1 (40 points, programming)

File cifar2mnist_base.py contains incomplete code that aims to do transfer learning from the CIFAR10 dataset to the MNIST dataset and compare it to not using transfer learning.

To complete that code, you must create a file called cifar2mnist_solution.py, where you implement the following Python functions:

train_model(model, cifar_tr_inputs, cifar_tr_labels, 
            batch_size, epochs)

refined_model = load_and_refine(filename, training_inputs, training_labels, 
                                batch_size, epochs)

test_acc = evaluate_my_model(model, test_inputs, test_labels)

Here is a description of what each function should do:

The train_model function trains the model using the given training inputs and training labels (which come from the CIFAR10 dataset), using the specified batch size and number of epochs. For training, use Sparse Categorical Crossentropy as the loss function and Adam as the optimizer. You should let Keras use default values for any options that are not explicitly discussed in the task description. You need to decide what exactly is already done in cifar2mnist_base.py, and what this function needs to do so that the code works correctly.
The load_and_refine function does the transfer learning as discussed in the slides and in class. It performs these steps:
- It loads a pre-trained model from filename.
- It creates a new model that contains all hidden layers (and already learned weights) of the pre-trained model and a new output layer with randomly initialized weights.
- It freezes the weights of the hidden layers of the new model.
- It trains the new model on the specified training_inputs and training_labels (which come from the MNIST dataset), using the specified batch size and number of epochs. For training, use Sparse Categorical Crossentropy as the loss function and Adam as the optimizer. You should let Keras use default values for any options that are not explicitly discussed in the task description.
  IMPORTANT: Your function is responsible for doing whatever pre-processing needs to be done (and not already done in cifar2mnist_base.py) on the training inputs to make them work with this model.
- It returns the new model.
The evaluate_my_model function computes the classification accuracy of the model on the given test inputs and test labels (which come from the MNIST dataset). Your function can call the Keras model.evaluate to do the main work, but (similar to load_and_refine) your function is responsible for any pre-processing that needs to be done on the test inputs before you call model.evaluate.

You can see how these three functions are used in cifar2mnist_base.py, to verify that you understand what arguments they take and what they are supposed to do. When grading, we reserve the right to test your functions with different code (instead of cifar2mnist_base.py), so your functions should not refer to any global variables defined in cifar2mnist_base.py.

Program Output

The code in cifar2mnist_base.py prints all the output that is needed. Your functions do not need to produce any additional output, it is fine whether they do produce output or not.

In your answers.pdf document, please provide ONLY THE LAST LINE (the line printing the classification accuracy) of the output by the test stage, when you run cifar2mnist_base.py with your solution.

Some Results

Here is some information to help you test your code. You can use, as a debugging tool, the cifar10_e20_b128.keras model that my train_model function created (it took train_model about 18 minutes on my computer to train that model). Using that specific cifar10_e20_b128.keras model, I ran 10 times my load_and_refine and evaluate_my_model functions. The classification accuracy on the MNIST test set ranged from 93.09% to 93.57%.

Task 2 (40 points, programming)

File month_base.py contains incomplete code that trains and evaluates a neural network model (a fully-connected model) that predict the month of the year that a specific moment belongs to, based on a time series of weather observations from the week before and the week after that moment. The Jena Climate dataset is used for training and testing the models. You can download the Jena Climate dataset as a CSV file from here: jena_climate_2009_2016.csv.

To complete that code, you must create a file called month_solution.py, where you implement the following Python functions:

normalized_data = data_normalization(raw_data, train_start, train_end)

(inputs, targets) = make_inputs_and_targets(data, months, size, sampling)

history = build_and_train_dense(train_inputs, train_targets, 
                                val_inputs, val_targets, filename)

test_acc = test_model(filename, test_inputs, test_targets)

conf_matrix = confusion_matrix(filename, test_inputs, test_targets)

Here is a description of what each function should do:

The data_normalization function normalizes the data so that the mean for each feature is 0 and the standard deviation for each feature is 1. The function takes as inputs:
- data: the time series of weather observations that the code reads from the jena_climate_2009_2016.csv file.
- train_start, train_end: these specify, respectively, the start and end of the segment of raw_data that should be used for training.
The data_normalization function returns a time series normalized_data, that is obtained by normalizing raw_data so that, for each feature, the mean value over the training segment is 0, and the standard deviation over the training segment is 1.
The make_inputs_and_targets function creates a a set of input objects and target values, that can be used as training, validation or test set. The function takes as inputs:
- data: a time series which in our code is a segment (training, validation, or test segment) of the normalized_data time series.
- months: this is a time series of target values for data, so that months[i] is the correct month for the moment in time in which data[i] was recorded. Numbers are assigned to months following the usual convention (1 for January, 2 for February, etc.), except that 0 (and not 12) is assigned to December. This makes it easier later to train a model, as we will have class labels that start from 0.
- size: this specifies the size of the resulting set of inputs and targets. For example, if size == 10000, then the function extracts and returns 10,000 input vectors and 10,000 target values.
- sampling: this specifies how to sample the values in data. For example, if sampling == 6, then we sample one out of every six time steps from the data time series. This reduces the length of each input vector by a factor equal to sampling.
The make_inputs_and_targets function returns two values:
- inputs: This is a three-dimensionaly numpy array. The first dimension is equal to the argument size. inputs[i] is a 2D matrix containing two weeks of consecutive weather observations extracted from data, using the appropriate sampling rate. To determine the number of observations (number of time steps) that inputs[i] should have to cover two weeks of data, consider that the Jena Climate dataset records one observation every ten minutes, and consider the sampling rate (with a sampling rate of 6, inputs[i] should only contain one observation per hour, which gives a total of 336 observations). As a reminder, each observation is a 14-dimensional vector.
- targets: These are the target values for the inputs. targets[i] should be the month corresponding to the moment at which the mid-point of inputs[i] was recorded. For example, if the sampling rate is 6, inputs[i] contains 336 observations, and targets[i] should be the month corresponding to the moment when inputs[i][168] was recorded.
The build_and_train_dense function trains a fully-connected model using the given training inputs and training labels. You should train for 10 epochs, use the "Adam" optimizer, and use a default batch size. Note that one of its arguments is a filename. Your function should save in that filename the best model, according to classification accuracy on the validation set. The function returns the training history, which your function can obtain from its call to the model.fit() function.
The first layers of the fully-connected model that you create should be as specified below:
```
    model = keras.Sequential([keras.Input(shape=input_shape),
                              keras.layers.Flatten(),
                              keras.layers.Dense(64, activation="tanh"),
                              # from here on you decide what to do, there are multiple correct options
                             ])
```
The test_model function computes the classification accuracy of the model saved on the given filename, using the given test inputs and test_targets. Your function can call the Keras model.evaluate to do the main work. The return value is the test accuracy, represented as a number between 0 and 1.

The confusion_matrix function computes the confusion matrix of the model saved on the given filename, using the given test inputs and test_targets. The confusion matrix is a 12x12 matrix, where at row i and column j (with row and column indices starting at 0, Python-style) it stores the number of test objects that had true class label i and were classified by the model as having class label j. You can see how these functions are used in month_base.py, to verify that you understand what arguments they take and what they are supposed to do. When grading, we reserve the right to test your functions with different code (instead of month_base.py), so your functions should not assume the existence of any global variables defined in month_base.py.

Program Output

The code in month_base.py prints all the output that is needed. Your functions do not need to produce any additional output, it is fine whether they do produce output or not.

From the output that you get when you run month_base.py with your solution, please put in your answers.pdf document ONLY THE two LINES printing the classification accuracy on the validation and test set for the dense model.

Some Results

Here is some information to help you test your code.

A complete output of month_base.py using my solution is saved at file month_output10.txt.

I ran month_base.py using my solution 10 times. I wrote a small matlab script, script1.m, to summarize my results. This is the summary of the results:

 
dense val acc:  min = 36.5%, max = 41.3%, mean = 38.8%, median = 38.5%
dense test acc: min = 40.4%, max = 46.7%, mean = 44.0%, median = 44.5%

Task 2b (Extra Credit, maximum 10 points)

In the previous task, the instructions for build_and_train_dense specify the initial layers that must be used. At the same time, the instructions leave it up to each implementation to decide what layers to add after those initial layers. 10 points will be given to the implementation of build_and_train_dense that gives the best test accuracy. Our experiments will use the same setup (amount of training data, length of time series, sampling rate) as in month_base.py.

To qualify for consideration, you need to run your solution 10 times, and report a summary of results in answers.pdf. Your result summary should follow the exact same format in which I summarize my results in the section titled "Some Results" at the end of Task 2.

Task 3 (10 points)

This question refers to the time series slides. Slide 52 shows results from the naive method. Slides 65-66 shows results from a fully connected network. Slide 71 shows results from a recurrent network. As we see, for the fully connected network and the recurrent network, the best training error was significantly lower than the best validation error. On the other hand, for the naive method, the validation error was lower than the training error.

Part a: For the naive method, if we applied it on different datasets (with features similar to the dataset that we used, but collected from different locations), would you expect that in most experiments the training errors would be lower than validation errors? Describe how you would expect training and validation errors to compare in each experiment, and justify your answer.

Part b: Answer the same question as in part a, but for the fully connected network instead of the naive method.

Part c: Answer the same question as in part a, but for the recurrent network instead of the naive method.

Task 4 (10 points)

This question refers to the time series slides. Slide 60 defines a fully connected network where the output layer uses the identity activation function. What would go wrong if we used the tanh activation function for the output layer in this case? Justify your answer.

CSE 4311 - Assignments - Assignment 5