Friday, July 1, 2016

Lesson 1 Summary - in words

The pieces are starting to come together.

From beginning to end:

Take dataset, e.g. images organized into folders (folder_name eventually becomes a given image's "label")

For each folder, convert it into a 3D array (image index, x, y) [with x and y being the pixel size of each image) and normalize the data of each image

"Pickle" the 3D arrays into 1 file each

Organize into "data" (i.e. the image data) and "labels" (i.e. what the image is, e.g. "A") - will need to "load" the data from the "pickled" files and randomize them
  • Choose a set size (# of images) and calculate how many images X are needed from each folder/class
  • Create a new "merged" 3D array containing data for X images from each folder/class; at the same time, create a companion 1D array containing the label for each image
  • Randomize the merged array and corresponding label array - this is the train_dataset and train_labels for the model (i.e. "X" and "y")
  • Do steps 1-3 to create validation and testing datasets and labels
Create a model and train it
  • e.g. use sklearn LogisticRegression()
  • do we need to set parameters?
    • multi_class='multinomial' in order to use cross-entropy as the loss function
    • solver='newton-cg' or 'lbfgs' (which support multi_class=multinomial)
  • Comparing the udacity tutorial with the documentation for LogisticRegression(), is it correct that it is doing all the following:
    • Sets up a linear model (y=Wx+b)
    • Applies softmax to convert computed values of y to probabilities
    • Compares the computed values of y with the actual values of y (one-hot encoded) using cross-entropy
    • Tweaks the model's parameters using an optimization algorithm to minimize the loss as defined in relation to cross-entropy
      • is it using stochastic gradient descent here? (taking many many tiny steps instead of fewer big ones)
      • loss over the entire set or a small random sample?
      • and the concepts of momentum (running average of gradients and moving in that direction vs just the current gradient assessment) 
      • and learning rate decay (i.e. how much we change the weights at each step, where lower rate is better)
Validate the model

Test the model

No comments:

Post a Comment