Newest 'natural-language' Questions

1 vote

0 answers

12 views

+100

NER With Custom Tags, How to Approach

I am building a "field tagger" for documents. Basically, a document, in my case something like a proposal or sales quote, would have a bunch of entities scattered throughout it, and we want ...

redbull_nowings

21

asked Jul 18 at 17:17

0 votes

0 answers

28 views

Normalizing the embedding space of an encoder language model with respect to categorical data

Suppose we have a tree/hierarchy of categories (e.g. categories of products in an e-commerce website), each node being assigned a title. Assume that the title of each node is semantically accurate, ...

mtcicero

123

asked Jul 15 at 22:21

0 votes

0 answers

9 views

Why learn an embedding before self attention when training transformers?

I understand that self-attention layers learn the "role" of a word in a sentence while embedding layers learn the relationship between the words. But I am not totally convinced that a self-...

Nicolas Johnson

1

asked Jun 18 at 4:44

0 votes

0 answers

12 views

Log-likelihood calculation for unigrams

I am calculating the log-likelihood for each unigram that I generated by using the CountVectorizer to see each unigram's importance. However, I got all the positive value after calculating the log-...

Nick

1

asked Jun 11 at 5:25

4 votes

2 answers

534 views

Overfitting in randomForest model in R, WHY?

I am trying to train a Random Forest model in R for sentiment analysis. The model works with tf-idf matrix and learns from it how to classify a review, in positive or negative. Positive ones are ...

Anisa

43

asked May 8 at 21:59

0 votes

0 answers

20 views

Where does the equation $ C = 6 \times N \times T $ come from for Large Language Models, especially with a simple explanation for both passes?

Why $ C = 6 \times N \times T $? I'm trying to understand the computational steps specifically during the backward pass of neural networks in relation to the widely cited formula ( C = 6 \times N \...

Charlie Parker

6,926

asked Apr 30 at 19:00

0 votes

0 answers

18 views

Can 3D convolutions appropriately capture a frozen embedding space?

My project is a strange combination of NLP and Computer Vision. I have datapoints of 3D tensor where each element is a token in an NLP vocabulary. The vocabulary is around 1000 unique "words"...

schmixi

43

asked Apr 24 at 3:17

0 votes

1 answer

26 views

Find event date given the probabilities of finding an event

I have a set of clinical notes with dates for each patient and an NLP models which gives a score between 0.0 and 1.0 of a certain event being present in the note. Given the scores, what is the best ...

rhn89

101

asked Mar 7 at 23:51

0 votes

0 answers

10 views

Appropriateness of the Universal Sentence Encoder model

I have a classification problem where the goal is to predict, based on a small paragraph, if an individual is British or not. The model used for the classification is Universal Sentence Encoder (to ...

Sara Mun

1

asked Feb 23 at 10:08

0 votes

1 answer

33 views

Clustering of large text datasets with unknown number of clusters

I have a list of hotel names which may or may not be correct, and with different spellings (such as '&' instead of 'and'). I want to use clustering in order to group the hotels with different ...

user480840

103

asked Feb 20 at 21:17

1 vote

0 answers

18 views

BERT eval loss increase while performance metrics also increase

I want to fine-tune BERT for Named Entity Recognition (NER). However, when fine-tuning over several epochs on different datasets I get a weird behaviour where the training loss decreases, eval loss ...

CodingSquirrel

11

asked Feb 9 at 22:23

0 votes

0 answers

100 views

Locality sensitive hashing (LSH) with word embeddings and cosine similarity

I would like to ask about the methodology of LSH algorithm with Word Embeddings and Cosine Similarity to identify similar documents. First, I tokenize my sentences to create a list of tokens. Then, I ...

BDEngineer

123

asked Feb 1 at 18:05

0 votes

0 answers

9 views

Problems in understanding Word2vec architectures

I have probably a very simple question, but I did not find any clear resource on the web. First let's consider the Skip-gram model, in which we try to predict a context word given the target word. In ...

user405969

1

asked Jan 31 at 23:41

2 votes

1 answer

141 views

If a document set is too small for running a topic model, can you simply multiply the document set by a factor of 10 to be able to run the model?

Say I'm using Top2Vec as a topic model to capture the top 10 salient topics across documents. I have an array that contains the documents of the corpus. Initially, there are not enough documents to ...

NominalSystems

41

asked Jan 18 at 4:20

0 votes

0 answers

73 views

How is the unigram tokenization using EM algorithm?

I intuitively understand what is happening in the unigram tokenizer and I think I also understand the EM algorithm if I can figure out the formulation in which I understand it i.e. What is the latent ...

figs_and_nuts

2,613

asked Dec 9, 2023 at 20:37

Stack Exchange Network

Questions tagged [natural-language]

NER With Custom Tags, How to Approach

Normalizing the embedding space of an encoder language model with respect to categorical data

Why learn an embedding before self attention when training transformers?

Log-likelihood calculation for unigrams

Overfitting in randomForest model in R, WHY?

Where does the equation $ C = 6 \times N \times T $ come from for Large Language Models, especially with a simple explanation for both passes?

Can 3D convolutions appropriately capture a frozen embedding space?

Find event date given the probabilities of finding an event

Appropriateness of the Universal Sentence Encoder model

Clustering of large text datasets with unknown number of clusters

BERT eval loss increase while performance metrics also increase

Locality sensitive hashing (LSH) with word embeddings and cosine similarity

Problems in understanding Word2vec architectures

If a document set is too small for running a topic model, can you simply multiply the document set by a factor of 10 to be able to run the model?

How is the unigram tokenization using EM algorithm?

Hot Network Questions

Questions tagged [natural-language]

Related Tags