Questions tagged [nlp]

1

votes
1

answer
423

Views

Dialog api v2 - Unexpected error while acquiring application default credentials: Could not load the default credentials

i am trying to implement a chat bot application with google dialog flow. i was fallowing this github tutorial https://github.com/dialogflow/dialogflow-nodejs-client-v2 to implement api. this is my code var express = require('express'); var router = express.Router(); const projectId = 'my-project-id'...
pavithra rox
0

votes
0

answer
19

Views

how to handle with continuous values in array

I would like to create a submission file to the problem, but my predictions got continuous values in the array, please help me how to solve. I have array values like this: predictions array([[5.5161709e-01, 4.4297403e-01, 5.3959554e-03, 1.2935511e-05], [5.5161709e-01, 4.4297403e-01, 5.3959554e-03, 1...
suri
0

votes
0

answer
4

Views

[Keras][Embedding] How do I expand the vocabulary size of a pre trained embedding

I have a pre-trained Keras model and there's a word embedding [1000 vocabulary * 200 dimensions] inside of the model. Now I want to load it back to memory and continuous training it with new data. The vocabulary size increased because of the new data. I am wondering if it's possible to replace this...
Weiye Deng
1

votes
1

answer
868

Views

How to perform clustering on Word2Vec

I have a semi-structured dataset, each row pertains to a single user: id, skills 0,'java, python, sql' 1,'java, python, spark, html' 2, 'business management, communication' Why semi-structured is because the followings skills can only be selected from a list of 580 unique values. My goal is to clust...
Ivan
0

votes
1

answer
42

Views

Recursion in nltk's RegexpParser

Based on the grammar in the chapter 7 of the NLTK Book: grammar = r''' NP: {+} # ... ''' I want to expand NP (noun phrase) to include multiple NP joined by CC (coordinating conjunctions: and) or , (commas) to capture noun phrases like: The house and tree The apple, orange and mango Car, house, and p...
Leito
1

votes
1

answer
86

Views

Alter a single entity in Spacy

Is it possible to change one single entity in Spacy? I have some docs objects in a list, and some of the docs contains a 'FRAUD' label. However, I need to change a few of the 'FRAUD' entities labels to 'FALSE_ALARM'. I'm using Spacy's matcher to find the 'FALSE_ALARM' entities, but I can't override...
Xraycat922
1

votes
3

answer
45

Views

Concepts to measure text “relevancy” to a subject?

I do side work writing/improving a research project web application for some political scientists. This application collects articles pertaining to the U.S. Supreme Court and runs analysis on them, and after nearly a year and half, we have a database of around 10,000 articles (and growing) to work w...
ecole96
1

votes
0

answer
5

Views

Is it possible to load a pre-trained model into spacy?

I want to be able to use spacy's full functionality, but my language of choice does not currently have it's own model on spacy. By full functionality, I mean convolutional layer that is shared between the tagger, parser and NER; and be able to update all of these different models by extending the v...
user2827262
1

votes
2

answer
93

Views

Calculate TD-IDF for a single word in Textacy

I'm trying to use Textacy to calculate the TF-IDF score for a single word across the standard corpus, but am a bit unclear about the result I am receiving. I was expecting a single float which represented the frequency of the word in the corpus. So why am I receiving a list (?) of 7 results? 'accule...
ardochhigh
0

votes
0

answer
7

Views

Python wordcloud can't present Hebrew

I am trying to create wordcloud to a text in Hebrew. The text is: את הסיפור שלנו סיפרנו לעצמנו כל הזמן. בכפייתיות. בעל פה. לפעמים התעייפנו עוד לפני שהתחלנו ובכל זאת סיפרנו במשך שעות. הקשבנו רוב קש...
okuoub
2

votes
0

answer
17

Views

NLP - negative sampling - how to draw negative samples from noise distribution?

From my understanding, negative sampling randomly samples K negative samples from a noise distribution, P(w). The noise distribution is basically the frequency distribution + some modification on words. Typically we choose K = 5 ~ 20 negative samples. P(w) = Uw(w)^(3/4) / normalization_factor And I'...
Eric Kim
1

votes
2

answer
667

Views

it-idf with TfidfVectorizer on Japanese text

I am working with a huge collection of documents written in several languages. I want to compute cosine distance between documents from their tf-idf scores. So far I have: from sklearn.feature_extraction.text import TfidfVectorizer # The documents are located in the same folder as the script text_fi...
Edgar Derby
1

votes
2

answer
498

Views

Embedding in Keras

Which algorithm is used for embedding in Keras built-in function? Word2vec? Glove? Other? https://keras.io/layers/embeddings/
oren_isp
1

votes
2

answer
1.4k

Views

How to combine both word embeddings and pos embedding together to build the classifier

You known POS is like 'NP', 'VERB'. How can I combine these features to word2vec? Just like the follow vectors? keyword V1 V2 V3 V4 V5 V6 corruption 0.07397 0.290874 -0.170812 0.085428 'VERB' 'NP' people ..............................
Wei Chen
1

votes
2

answer
63

Views

nltk.org example of Sentence segmentation with Naive Bayes Classifier: how does .sent separate sentences and how does the ML algorithm improve it?

There is an example in nltk.org book (chapter 6) where they use a NaiveBayesian algorithm to classify a punctuation symbol as finishing a sentence or not finishing one... This is what they do: First they take a corpus and use the .sent method to get the sentences and build an index from them of whe...
Martin
1

votes
3

answer
50

Views

How can I solve a classification problem with a dependent variable with more than two values

I have a simple NLP problem, where I have some written reviews that have a simple binary positive or negative judgement. In this case I am able to train and test as independent variables the columns of X that contain the 'bags of words', namely the single words in a sparse matrix. from sklearn.feat...
Drocchio
1

votes
2

answer
34

Views

Assign an ID based on keywords present in Tweets

I have extracted Tweets by feeding in 44 different keywords, and the output is in a file which consists of 400k tweets in total. The output file has tweets that contain the relevant keywords. How could I create a separate ID column which contains the keyword present in that tweet? Eg: The tweet is:...
Skurup
1

votes
2

answer
105

Views

Predicting Missing Words in a sentence - Natural Language Processing Model [closed]

I have the sentence below : I want to ____ the car because it is cheap. I want to predict the missing word ,using an NLP model. What NLP model shall I use? Thanks.
Eliyah
0

votes
1

answer
13

Views

Text Classification Approach

I have data with 2 important columns, Product Name and Product Category. I wanted to classify a search term into a category. The approach (in Python using Sklearn & DaskML) to create a classifier was: Clean Product Name column for stopwords, numbers, etc. Create 90% 10% train-test split Convert text...
user519326
2

votes
0

answer
20

Views

Efficient way to replace incorrect words in Series of strings in Python

I'm working with text data, that is handwritten, so it has lots of ortographic errors. I'm currently working with pyspellchecker to clean the data and I'm using the correct() method to find the most likely word when a word doesn't exist. My approach was to create a dictionary with all poorly written...
Juan C
1

votes
1

answer
34

Views

Unigram vs Bigram vs Posgram in Natural Language Processing

I want to know what is the meaning and difference between unigram, bigram and posgram. I have searched the Internet but I could not find a comprehensive answer. Any help would be very much appreciated.
Eliyah
1

votes
4

answer
1.6k

Views

Online (preferably) lookup API of a word's class

I have a list of words and I want to filter it down so that I only have the nouns from that list of words (Using Java). To do this I am looking for an easy way to query a database of words for their type. My question is does anybody know of a free, easy word lookup API that would enable me to find...
Ben Page
1

votes
2

answer
1.5k

Views

How to measure Syntactic Similarity between a query and a document?

Is there a way to measure the syntactic similarity between a query (sentence) and a document (a set of sentences)?
hatemfaheem
1

votes
1

answer
128

Views

How to use Google Natural Language with portuguese sentences, with gcloud CLI tool?

I used this command: 'gcloud ml language analyse-syntax --language=pt-br --content='Capítulo' and get this error: ERROR: (gcloud.ml.language.analyze-syntax) Failed to read command line argument [--content=Cap\xedtulo] because it does not appear to be valid 7-bit ASCII. gcloud ml language to be an...
Warley Andre
1

votes
0

answer
187

Views

NLTK: Auto suggestion for query completion using grammar

I want to implement Auto-Suggestion for question completion (Refer Section 3.2) for my FCFG using nltk APIs. E.g. Consider the following CFG grammar: S -> NP VP NP -> Det NN | PropN VP -> V NP | V V -> 'eats' | 'sleeps' | 'ate' Det -> 'a' | 'an' | 'the' NN -> 'police' | 'horse' | 'apple' | 'potato'...
Aman Gill
1

votes
1

answer
91

Views

How to train a machine to label individual words in a text

For a text (say): 'I am leaving India today. I am headed to USA for a week.' 'I am travelling from India to USA' I need to train the machine to label USA as 'Destination' and India as 'Source' I am using SpaCy's NER to extract the locations. How should I proceed to create a training set and train it...
Phoenix
1

votes
2

answer
221

Views

Google Natural Language Sentiment Analysis Aggregate Scores

In this part of the documentation of the Google Cloud Platform Natural Language API, it is described that The overall score and magnitude values for an entity are an aggregate of the specific score and magnitude values for each mention of the entity. I can't figure out how this aggregation works. In...
Michael
1

votes
0

answer
56

Views

Caption of images on wikipedia pages

I'm looking to check the caption(text below each image) on a wikipedia article. I wish to parse those strings (mostly using regex) and then if it matches, I want to save the link of that image. I've been importing wikipedia directly to parse text, but after looking around the net I saw I'd need a di...
someone1
1

votes
1

answer
397

Views

Google NLP authentication/call issue

I am working on an MVC web application that uses Google Natural Language Processing API to parse different input from users. I have successfully consumed and implemented the API operations and everything works fine as long as I run the application on my local machine. But as soon as I publish a vers...
Abdullah
1

votes
0

answer
110

Views

Stanford CoreNLP NER .net gives different output to the java version and the online demo ones

I am doing NLP NER task and I'm using the Stanford CoreNLP, while trying the .net version I have noticed that the output of .net version is different to the online demo and the java versions(and those 2 are identical). Let's take an example of 'Obama was born on August 4, 1961, at Kapiolani Medical...
Ahmed Salah
1

votes
1

answer
482

Views

TypeError: '<' not supported between instances of 'NoneType' and 'str' using Pyner for Name entity recognition

I am trying to pass an email string to Pyner to pull out all the entities into a dictionary. I can verify my setup works with this returning two PERSON entities import ner tagger = ner.SocketNER(port=9191, output_format='slashTags') t = 'My daughter Sophia goes to the university of California. Jame...
PeachyDinosaur
1

votes
0

answer
24

Views

How can I plug in my own NER into the Stanford NLP parser pipeline?

I am trying to provide parsing on some biodiversity literature but I have my own NER tool that I have developed to identify species names. I need to plug this into the parser pipeline somehow to enhance the dependency parsing but I am not sure how to go about it and haven't been able to find anythin...
Sandra Young
1

votes
0

answer
513

Views

Cosine Similarity and LDA topics

I want to compute Cosine Similarity between LDA topics. In fact, gensim function .matutils.cossim can do it but I dont know which parameter (vector ) I can use for this function? Here is a snap of code : import numpy as np import lda from sklearn.feature_extraction.text import CountVectorizer cvec...
BARIK FATI
1

votes
1

answer
291

Views

Gensim word2vec most_similar filtering by # prefix

I have a word2vec model trained on twitter. I imported it into gensim using from gensim.models.keyedvectors import KeyedVectors word_vectors = KeyedVectors.load_word2vec_format('./twitter.txt', binary=False) I would like to use a function similar to this one: word_vectors.most_similar(positive=['w...
physicsmajor1
1

votes
1

answer
63

Views

Ambiguous Entity in stanfors NER

I am working on Stanford NER, My question is regarding ambiguous entities. For example, I have 2 sentences: I love oranges. Orange is my dress code for tomorrow. How can i train these 2 sentences to give out, first orange as Fruit, second orange as Color. Thanks
Deepa Huddar
1

votes
1

answer
44

Views

Gensim word embedding training with initial values

I have a dataset with documents separated into different years, and my objective is to train an embedding model for each year's data, while at the same time, the same word appearing in different years will have similar vector representations. Like this: for word 'compute', its vector in year 1 is [0...
richards
1

votes
1

answer
261

Views

Remove repeated n-grams from text with NLTK

I am starting to use nltk with python to analyse chat corpora. To start, I would like to identify the most use words, then I would like to use LDA to identify the topics of the conversations. I clean the texts as: stop = set(stopwords.words('english')) stop.update(['.', ',', ''', ''', '?', '!', ':',...
Titus Pullo
1

votes
1

answer
133

Views

Identify input text without any entities in IBM Watson Conversation

Is there a way with Watson to identify input.text's that have no entities at all? I don't need to know anything about the input.text, I just need to know if it does or doesn't have an entity.
Rica Gurgel
1

votes
0

answer
186

Views

How to generate word embeddings in Portuguese using Gensim?

I have the following problem: In English language my code generates successful word embeddings with Gensim, and similar phrases are close to each other considering cosine distance: The angle between 'Response time and error measurement' and 'Relation of user perceived response time to error measurem...
Rubens_Zimbres
1

votes
0

answer
351

Views

Creating questions from sentences

I'm trying to create an algorithm that could transform sentences to questions. This is the code: def sentence_to_question(arg): hverbs = ['is', 'have', 'had', 'was', 'could', 'would', 'will', 'do', 'did', 'should', 'shall', 'can', 'are'] words = arg.split(' ') zen_sim = (0, '', '') for hverb in hver...
ShellRox

View additional questions