Showing posts with label: machine_learning. Show all posts.

Azure Spot Instances

Thursday 26 November 2020

I have some free Azure student credit so I decided to try to use some Azure VMs to train some of my models yesterday. I soon realized that a student account does not include a quota for any GPU more powerful than a K80 and with a student account there is no way to request increased quota. However, the student account does include a quota for "low priority instances" or spot instances, which are pre-emptible. So I set up a spot VM.

On AWS sometimes spot VMs can go for days before being pre-empted. Not so on Azure. I tried about a half dozen times, and no instance ever lasted long enough to complete even half an epoch, or about an hour. I was very disappointed because the spot prices were much better than AWS spot prices. For Azure spot instances you can set a price you are willing to pay, but even setting the price above the on demand price didn't make any difference.

My final complaint about Azure VMs is the shortage of images. AWS has a huge number of images for deep learning so you can basically just start the instance and you are set to go. Azure only has a few such images and they still required considerable configuration and installation of packages, which is made especially difficult by the fact that the instance kept shutting down.

I may use Azure on-demand VMs in the future, but the spot instances were largely useless.

Labels: machine_learning, azure
No comments

CoLab Pro

Thursday 12 November 2020

I have been using CoLab for quite a few years now and have always really appreciated the ability to get access to GPUs (and TPUs) for free. So when I recently found out about CoLab Pro I was reluctant to pay $10 a month for something I had been getting for free. However, at the same time I was paying hundreds of dollars a month for cloud GPU instances. Last week, after going well over my AWS budget last month, I decided to maybe try CoLab Pro and I am very glad I did.

CoLab Pro gives you priority on high-end GPUs - so far I have never not gotten a V100. This is the same GPU I was paying $0.90/hour spot rate (preemptible) on AWS. For me, the main disadvantage of CoLab was that each instance lasted usually about 10 hours before shutting down, and they would time out if left unattended or if I wasn't at the computer. CoLab Pro instances will last up to 24 hours, and they will not time out. I had one running at work the other day and when I got home I figured it had timed out, but when I went back the next morning it was still running !

Obviously, CoLab Pro is better suited to running experiments than executing long training, and it doesn't support multiple GPUs. And if you are using TensorFlow you have TPUs (I prefer PyTorch.) In the past I have repeatedly kicked myself after spending hundreds of dollars training a model, and then finding a small mistake. In the future I will be running my experiments on CoLab Pro and only using VMs when I am sure everything is correct and I need to train models quickly.

 

Labels: machine_learning, aws, gpu, colab
No comments

VAE GAN

Sunday 08 December 2019

I had been trying to train a version of VAE-GAN for a few weeks and it wasn't working as well as I had hoped it would. I had added an auxiliary output to the discriminator which was attempting to predict the 40 features of each image provided with the celeb-a dataset as suggested in the VAE-GAN paper and I was scaling that loss to try to bring it in line with the GAN discriminator loss, but I was doing that incorrectly so that loss ended up overwhelming the GAN loss. (I was summing, rather than averaging the losses, and the lambda I was using to scale the loss was appropriate for a mean loss, but with 40 features the auxiliary loss was 40x the GAN loss at base, so I needed to divide the lambda by 40 to get the effect I wanted.)

After having corrected that error I am finally making some progress with these models. Below are sample images from two models I am training. The first outputs images at 160x160, the second at 128x128.

I guess the moral of this story is if something isn't working the way you expect it to, double check your math before you continue training it!

Labels: python, machine_learning, pytorch, gan
3 comments

GAN Hacks

Saturday 21 September 2019

I've now been trying to train my GANs for quite a while and still haven't been too successful, but I have learned some tricks. I found this excellent article a while ago and I didn't really understand it completely at first, but after having tried a lot of its tricks I understand them now. Here are my thoughts and some additional tricks I have used:

  1. Item 5 from the article - use convolutional layers with strides of 2 rather than pools : one of the biggest problems in training GANs is maintaining the gradients. Since the gradients for the generator come from the discriminator vanishing or exploding gradients are a huge problem and need to be avoided at all costs. Max pools eliminate all of the gradients but one, so convolutional layers with a stride of 2 are a better way to downsample. Average pooling will also work, but I've found that stride 2 layers work better.
  2. With apologies to Frank Herbert, "the gradients must flow." I've had luck using dense convnets as the discriminator because of the improved gradient flow they provide.
  3. Item 6 - soft and noisy labels - this has helped a LOT. I haven't tried using random labels, but I have had luck using labels that are slightly off from 0 or 1, like 0.1 or 0.99. This keeps the discriminator from becoming too confident in it's predictions and the gradient to the generator exploding. I've learned that when training GANS, exploding gradients are just as bad as vanishing gradients in that the generator learns nothing.
  4. The article also suggests occassionally flipping the labels, which I'm not sure exactly how to interpret. In practice, if the discriminator gets too strong I will occassionally flip the labels for a few training steps to confuse it a bit and then flip them back. This seems to help the generator catch up a bit.
  5. One other thing I have found is that using smaller batch sizes seems to work better. When I started using the V100 GPUs I immediately increased my batch size to the max the GPU could handle, but the generator did not learn well at all. Reducing the batch size helps a lot, possibly by introducing some additional regularization to the discriminator. 
  6. Dropout - the article mentions using dropout in the generator, which I haven't tried. I do use dropout in the discriminator, which I wasn't sure about since it will reduce the gradients, but it does help slow down the discriminator which seems to help training.
  7. Item 11 - I have tried to do this and wasted a lot of time. If your training has collapsed it is not likely you will be able to uncollapse it by training one network more than the other. I would suggest that rather than training one network more than the other you make sure that the networks are roughly equally matched from the start. Training the generator more, for example, tends to lead to mode collapse; training the discriminator more tends to lead to the gradients exploding or vanishing.
  8. Item 12 - I haven't tried this one yet but it is interesting. I've heard a lot about using auxiliary outputs to provide regularization and if I had labelled images I would definitely try this one. In fact, I may try to label my images somehow in order to do so.
  9. One thing that was not mentioned in the article, but which I have found very helpful, if using separate batches for real and fake images when training the discriminator. At first I thought this was a bizarre idea and wasn't sure how it would help, but it really does.

Some additional tips on how to construct a GAN:

  1. Start small - when I started playing with GANs I immediately made two large, deep convnets and tried to train them and they learned nothing. I recommend you start with a very small network, train it enough to make sure it is learning something, then add a layer and repeat. I still don't know what the problem with my original networks was, or if I just wasn't patient enough, but it's a lot easier to find problems if you add one layer at a time (or one block at a time) than if you start off with a 100 layer network.
  2. Keep things simple - training a GAN involves making sure that two networks are roughly learning at the same pace, it's a delicate dance and I would recommend not throwing too many bells and whistles into it. As in the previous tip, make sure everything is working properly first before you add some newfangled loss function or dynamic loss weighted or anything into it.

 

Labels: machine_learning, gan
No comments

Archives