1 year ago
#197880
ThOpaque
Could someone explain me what's behind the FaceNet Paper ? (one-shot learning, siamese network and triplet loss)
I'm struggling since about 3 weeks on my One-Shot learning project. I'm trying to unlock my computer with my face. Unfortunately, I'm far from this task. First, I wanted to understand well the concepts behind one-shot learning and especially triplet loss before anything else. So know I try to train a network (in PyTorch) with transfert learning which will lead me I hope to my goal.
What I understood until now :
One shot learning
- It's a method where a model should be able to minimise the euclidean distance between two embeddings of two faces of the same person and at the contrary, maximise the euclidean distance between two faces of different persons. In other words, the model should put any face in a d-dimensional Euclidean space, same persons are close to each other and different are fare away from each other.
- This model should not especially be trained with known identity. In other words, once well trained, anyone could use it to compare a fixed, unchanged, photo of his face to another face of him/her.
- Face verification is the ability to maximise the distances between any face which doesn't belong to (let's say) the authorized person and minimise only the distances of faces belonging to the authorized person (1:1 problem).
- Face recognition is the ability to maximise the distances between any face which doesn't belong to (let's say) the authorized persons and minimise any distances of faces belonging to a set of authorized persons (1:K problem).
Triplet mining
To ensure that the model learns information, one needs to feed it with triplets that are well defined and not obvious. For a dataset of faces this leads to :
triplets such as [for all (i,j,k) distincts] : face[i] == face[j]; and face[i] != face[k]; and face[j] != face[k]
Those triplets are called "valid triplets" and the faces are defined as Anchors; Positives and Negatives.
triplets such as the faces in the euclidean space are not already far away from each others (prevent trivial losses which collapses to zero). They are defined as semi-hard and hard triplets.
From those base lines, I looked for examples on the internet. I understood that the usual ways to produce triplets are online mining or offline mining. I used the marvelous code of https://omoindrot.github.io/triplet-loss to implement batch hard strategy and batch all strategy which are online mining.
My questions :
From this point I'm kind of lost. I've tried different approaches to build my dataset but my loss never converges. The model doesn't seem to learn anything.
Description of my approach (through PyTorch)
Model and dataset
I'm using InceptionResnetV1
from the pytorch_facenet
library, pretrained with Casia-Webfaces. I unfreeze the last two layers : linear layer model.last_linear(1792, 512)
and model.last_bn()
which lead me to 918,528 trainable parameters and an output embedding of dim (512,1).
For the dataset, I'm using the HeadPoseImageDatabase which is a dataset containing 15 persons with for each : 2 front pictures and 186 various head pose pictures. This leads to a set of 2797 pictures (one person has 193 pictures) and 30 front pictures.
My work
I understood that the model should see various identities. So first, I tried the
nn.TripletMarginLoss
of PyTorch and provide an anchor (one of the two front pictures of each identity); positive (one of the 183 pictures relative to the anchor's identity); and a negative (a random other face with a different identity).
This was unsuccessful : the loss seems to reduce but the model doesn't generalize on test set.
I thought maybe I didn't provide enough semi-hard or hard triplets to the loss so I constructed 15 datasets relative to each identity "i" : each dataset contains the positive faces of the identity "i" and other negative faces. So that, each dataset contains 2797 images and returns an image with its label (1 if the identity of the face correspond to the dataset I else 0). I made a loop over each identity dataset (and there was a batch loop inside each dataset). I used batch hard this time (https://omoindrot.github.io/triplet-loss) but again, unsuccessful.
Questions
- Do I need to create a much simpler model and train it from scratch ?
- Is my method seem correct : does the Anchor should pass through the same model than the Positive and Negative's ?
- How should I set the margin ?
- About the face verification is my statements above correct ? I expect to train my model without pictures of me, and then be capable of minimise/maximise euclidean distances between any embedding faces. Is it correct ?
- Is this work feasible with a decent accuracy as a small project (i.e. smth around 95%) ?
Thanks all for your time, I hope my explanations were clear. I let you a piece of code below.
model = InceptionResnetV1(pretrained='casia-webface')
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
num_epochs = 10
for param in model.parameters():
param.requires_grad = False
model.last_linear.weight.requires_grad = True
model.last_bn.weight.requires_grad = True
. . .
. . .
. . .
Block8-511 [-1, 1792, 3, 3] 0
AdaptiveAvgPool2d-512 [-1, 1792, 1, 1] 0
Dropout-513 [-1, 1792, 1, 1] 0
Linear-514 [-1, 512] 917,504
BatchNorm1d-515 [-1, 512] 1,024
================================================================
Total params: 23,482,624
Trainable params: 918,528
Non-trainable params: 22,564,096
----------------------------------------------------------------
Input size (MB): 0.29
Forward/backward pass size (MB): 88.63
Params size (MB): 89.58
Estimated Total Size (MB): 178.50
----------------------------------------------------------------
def model_loop(model, epochs, trainloader, batch_size, pytorchLoss, optimizer, device):
### This model uses the PyTorchMarginLoss with a training set of 4972 images
### separate between Anchors / Positives / Negatives
delta_time = datetime.timedelta(hours=1)
timezone = datetime.timezone(offset=delta_time)
model.to(device)
train_loss_list = []
size_train = len(trainloader.dataset)
for epoch in range(num_epochs):
t = datetime.datetime.now(tz=timezone)
str_t = '{:%Y-%m-%d %H:%M:%S}'.format(t)
print(f"{str_t} : Epoch {epoch+1} on {device} \n---------------------------")
train_loss = 0.0
model.train()
for batch, (imgsA, imgsP, imgsN) in enumerate(trainloader):
# Transfer Data to GPU if available
imgsA, imgsP, imgsN = imgsA.to(device), imgsP.to(device), imgsN.to(device)
# Clear the gradients
optimizer.zero_grad()
# Make prediction & compute the mini-batch training loss
predsA, predsP, predsN = model(imgsA), model(imgsP), model(imgsN)
loss = pytorchLoss(predsA, predsP, predsN)
# Compute the gradients
loss.backward()
# Update Weights
optimizer.step()
# Aggregate mini-batch training losses
train_loss += loss.item()
train_loss_list.append(train_loss)
if batch == 0 or batch == 19:
loss, current = loss.item(), batch*BATCH_SIZE + len(imgsA)
print(f"mini-batch loss for training : \
{loss:>7f} [{current:>5d}/{size_train:>5d}]")
# Compute the global training & as the mean of the mini-batch losses
train_loss /= len(trainloader)
print(f"--Fin Epoch {epoch+1}/{epochs} \n Training Loss: {train_loss:>7f}" )
print('\n')
return train_loss_list
train_loss = model_loop(model = model,
epochs = num_epochs,
trainloader = train_dataloader,
batch_size = 256,
pytorchLoss = nn.TripletMarginLoss(margin=0.1),
optimizer = optimizer,
device = device)
2022-02-18 20:26:30 : Epoch 1 on cuda
-------------------------------
mini-batch loss for training : 0.054199 [ 256/ 4972]
mini-batch loss for training : 0.007469 [ 4972/ 4972]
--Fin Epoch 1/10
Training Loss: 0.026363
2022-02-18 20:27:48 : Epoch 5 on cuda
-------------------------------
mini-batch loss for training : 0.005694 [ 256/ 4972]
mini-batch loss for training : 0.011877 [ 4972/ 4972]
--Fin Epoch 5/10
Training Loss: 0.004944
2022-02-18 20:29:24 : Epoch 10 on cuda
-------------------------------
mini-batch loss for training : 0.002713 [ 256/ 4972]
mini-batch loss for training : 0.001007 [ 4972/ 4972]
--Fin Epoch 10/10
Training Loss: 0.003000
Stats through a dataset of 620 images :
TP : 11.25%
TN : 98.87%
FN : 88.75%
FP : 1.13%
The model accuracy is 55.06%
pytorch
computer-vision
loss-function
siamese-network
triplet
0 Answers
Your Answer