2 years ago

#197880

ThOpaque

Could someone explain me what's behind the FaceNet Paper ? (one-shot learning, siamese network and triplet loss)

I'm struggling since about 3 weeks on my One-Shot learning project. I'm trying to unlock my computer with my face. Unfortunately, I'm far from this task. First, I wanted to understand well the concepts behind one-shot learning and especially triplet loss before anything else. So know I try to train a network (in PyTorch) with transfert learning which will lead me I hope to my goal.

What I understood until now :

One shot learning

It's a method where a model should be able to minimise the euclidean distance between two embeddings of two faces of the same person and at the contrary, maximise the euclidean distance between two faces of different persons. In other words, the model should put any face in a d-dimensional Euclidean space, same persons are close to each other and different are fare away from each other.
This model should not especially be trained with known identity. In other words, once well trained, anyone could use it to compare a fixed, unchanged, photo of his face to another face of him/her.
Face verification is the ability to maximise the distances between any face which doesn't belong to (let's say) the authorized person and minimise only the distances of faces belonging to the authorized person (1:1 problem).
Face recognition is the ability to maximise the distances between any face which doesn't belong to (let's say) the authorized persons and minimise any distances of faces belonging to a set of authorized persons (1:K problem).

Triplet mining

To ensure that the model learns information, one needs to feed it with triplets that are well defined and not obvious. For a dataset of faces this leads to :
triplets such as [for all (i,j,k) distincts] : face[i] == face[j]; and face[i] != face[k]; and face[j] != face[k]
Those triplets are called "valid triplets" and the faces are defined as Anchors; Positives and Negatives.
triplets such as the faces in the euclidean space are not already far away from each others (prevent trivial losses which collapses to zero). They are defined as semi-hard and hard triplets.

From those base lines, I looked for examples on the internet. I understood that the usual ways to produce triplets are online mining or offline mining. I used the marvelous code of https://omoindrot.github.io/triplet-loss to implement batch hard strategy and batch all strategy which are online mining.

My questions :

From this point I'm kind of lost. I've tried different approaches to build my dataset but my loss never converges. The model doesn't seem to learn anything.

Description of my approach (through PyTorch)

Model and dataset

I'm using InceptionResnetV1 from the pytorch_facenet library, pretrained with Casia-Webfaces. I unfreeze the last two layers : linear layer model.last_linear(1792, 512) and model.last_bn() which lead me to 918,528 trainable parameters and an output embedding of dim (512,1).

For the dataset, I'm using the HeadPoseImageDatabase which is a dataset containing 15 persons with for each : 2 front pictures and 186 various head pose pictures. This leads to a set of 2797 pictures (one person has 193 pictures) and 30 front pictures.

My work

I understood that the model should see various identities. So first, I tried the nn.TripletMarginLoss of PyTorch and provide an anchor (one of the two front pictures of each identity); positive (one of the 183 pictures relative to the anchor's identity); and a negative (a random other face with a different identity).
This was unsuccessful : the loss seems to reduce but the model doesn't generalize on test set.

I thought maybe I didn't provide enough semi-hard or hard triplets to the loss so I constructed 15 datasets relative to each identity "i" : each dataset contains the positive faces of the identity "i" and other negative faces. So that, each dataset contains 2797 images and returns an image with its label (1 if the identity of the face correspond to the dataset I else 0). I made a loop over each identity dataset (and there was a batch loop inside each dataset). I used batch hard this time (https://omoindrot.github.io/triplet-loss) but again, unsuccessful.

Questions

Do I need to create a much simpler model and train it from scratch ?
Is my method seem correct : does the Anchor should pass through the same model than the Positive and Negative's ?
How should I set the margin ?
About the face verification is my statements above correct ? I expect to train my model without pictures of me, and then be capable of minimise/maximise euclidean distances between any embedding faces. Is it correct ?
Is this work feasible with a decent accuracy as a small project (i.e. smth around 95%) ?

Thanks all for your time, I hope my explanations were clear. I let you a piece of code below.

model = InceptionResnetV1(pretrained='casia-webface')
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
num_epochs = 10

for param in model.parameters():
    param.requires_grad = False
model.last_linear.weight.requires_grad = True
model.last_bn.weight.requires_grad = True

             .                          .                     .
             .                          .                     .
             .                          .                     .
          Block8-511           [-1, 1792, 3, 3]               0
AdaptiveAvgPool2d-512          [-1, 1792, 1, 1]               0
         Dropout-513           [-1, 1792, 1, 1]               0
          Linear-514                  [-1, 512]         917,504
     BatchNorm1d-515                  [-1, 512]           1,024
================================================================
Total params: 23,482,624
Trainable params: 918,528
Non-trainable params: 22,564,096
----------------------------------------------------------------
Input size (MB): 0.29
Forward/backward pass size (MB): 88.63
Params size (MB): 89.58
Estimated Total Size (MB): 178.50
----------------------------------------------------------------

def model_loop(model, epochs, trainloader, batch_size, pytorchLoss, optimizer, device):
    ### This model uses the PyTorchMarginLoss with a training set of 4972 images 
    ### separate between Anchors / Positives / Negatives 
    delta_time = datetime.timedelta(hours=1)
    timezone = datetime.timezone(offset=delta_time)
    
    model.to(device)
    train_loss_list = []
    size_train = len(trainloader.dataset)
    
    for epoch in range(num_epochs):
        t = datetime.datetime.now(tz=timezone)
        str_t = '{:%Y-%m-%d %H:%M:%S}'.format(t)
        print(f"{str_t} : Epoch {epoch+1} on {device} \n---------------------------")
        
        train_loss = 0.0
        model.train()
        for batch, (imgsA, imgsP, imgsN) in enumerate(trainloader):
            # Transfer Data to GPU if available
            imgsA, imgsP, imgsN = imgsA.to(device), imgsP.to(device), imgsN.to(device)
            
            # Clear the gradients
            optimizer.zero_grad()
            
            # Make prediction & compute the mini-batch training loss
            predsA, predsP, predsN = model(imgsA), model(imgsP), model(imgsN)     
            loss = pytorchLoss(predsA, predsP, predsN)
            
            # Compute the gradients
            loss.backward()
            
            # Update Weights
            optimizer.step()
            
            # Aggregate mini-batch training losses
            train_loss += loss.item()
            train_loss_list.append(train_loss)
            
            
            if batch == 0 or batch == 19:
                loss, current = loss.item(), batch*BATCH_SIZE + len(imgsA)
                print(f"mini-batch loss for training : \
                {loss:>7f}  [{current:>5d}/{size_train:>5d}]")
                
        # Compute the global training & as the mean of the mini-batch losses
        train_loss /= len(trainloader)
        print(f"--Fin Epoch {epoch+1}/{epochs} \n Training Loss: {train_loss:>7f}" )
        print('\n')

    return train_loss_list


train_loss = model_loop(model = model,
                        epochs = num_epochs,
                        trainloader = train_dataloader,
                        batch_size = 256, 
                        pytorchLoss = nn.TripletMarginLoss(margin=0.1),
                        optimizer = optimizer,
                        device = device)

2022-02-18 20:26:30 : Epoch 1 on cuda 
-------------------------------
mini-batch loss for training : 0.054199  [  256/ 4972]
mini-batch loss for training : 0.007469  [ 4972/ 4972]
--Fin Epoch 1/10 
 Training Loss: 0.026363

2022-02-18 20:27:48 : Epoch 5 on cuda 
-------------------------------
mini-batch loss for training : 0.005694  [  256/ 4972]
mini-batch loss for training : 0.011877  [ 4972/ 4972]
--Fin Epoch 5/10 
 Training Loss: 0.004944

2022-02-18 20:29:24 : Epoch 10 on cuda 
-------------------------------
mini-batch loss for training : 0.002713  [  256/ 4972]
mini-batch loss for training : 0.001007  [ 4972/ 4972]
--Fin Epoch 10/10 
 Training Loss: 0.003000

Stats through a dataset of 620 images :
TP : 11.25%
TN : 98.87%
FN : 88.75%
FP : 1.13%
The model accuracy is 55.06%

pytorch

computer-vision

loss-function

siamese-network

triplet

0 Answers

Your Answer

Posts

Questions

Blogs

Could someone explain me what&#39;s behind the FaceNet Paper ? (one-shot learning, siamese network and triplet loss)

What I understood until now :

One shot learning

Triplet mining

My questions :

Description of my approach (through PyTorch)

Model and dataset

My work

Questions

Could someone explain me what's behind the FaceNet Paper ? (one-shot learning, siamese network and triplet loss)