pathterminuspages/language/aboutcontactabout me

Learning Vector Approach

07.01.2021 | Processing/Language Models/PyTorch


1. Pretrained Vector Approach
@2. Learning Vector Approach

We can extend on the language model from the last chapter. Instead of using pretrained vectors, we can add an extra hidden layer, the embedding layer $e$. This layer is used to train the encodings on. The procedure is almost as before, but now the inputs are one-hot vectors. Each are processed through $e$, and the results are concatenated. That is three one-hot vectors of dimension $|V|$ are processed by $e$, and then the results are concatenated into a vector of dimension $n \cdot d$, where $d$ is the embedding dimension. This vector is used as input for the $h$ layer.

PyTorch has the build in function nn.Embedding that does a lot of the work for us. We only need to define it the right way and then feed it the index of each word within the window.

The model is now defined as

class NGramLangModel(nn.Module): def __init__(self,vocab_size,emb_dim,context_size): super(NGramLangModel,self).__init__() self.emb = nn.Embedding(vocab_size,emb_dim) self.lin1 = nn.Linear(context_size * emb_dim,128) self.lin2 = nn.Linear(128,vocab_size) def forward(self,inputs): e = self.emb(inputs).view((1,-1)) h = F.relu(self.lin1(e)) u = self.lin2(h) logits = F.log_softmax(u,dim=1) return logits def concat_words(ctx): return torch.tensor([word_to_ix[w] for w in ctx],dtype=torch.long)

We do not need the synonyms anymore. The helper function for word concatenation has been simplified a lot. Note that we do the view change within the model. We train with the following code:

for epoch in range(n_epochs): total_loss = 0 for ctx,target in ngrams: ctx_itx = concat_words(ctx) model.zero_grad() log_probs = model(ctx_itx) loss = loss_fun(log_probs,torch.tensor([word_to_ix[target]],dtype=torch.long)) loss.backward() optimizer.step() total_loss += loss.item() n0 = len(ngrams) print(total_loss / n0)

The code for performance measure is the same. With this approach we get

4.316492726400333 4.234486750869881 4.1520942314251466 16/213

Both epochs, window size and $n$ can be tuned to better performance.

CommentsGuest Name:Comment: