DeepLearning4J - ParagraphVectors: Why is similarity negative?

I'm using the ParagraphVector tool in DeepLearning4j framework. What I'm doing is training a model on a set of text documents and then calculating the similarity between those documents.

Now, as the reference page (http://deeplearning4j.org/word2vec) says, the metric used by the tool to calculate similarity is cosine similarity, which should be included between 0 and 1. However, for some pair of documents, I get negative scores.

Can anybody tell why is that?

Thank you in advance.

Answers


By definition cosine similarity can be within [-1, 1]. https://en.wikipedia.org/wiki/Cosine_similarity

So technically it's still possible to get negative values for w2v/d2v.

However, usually you won't see -1 or even something close to that.


Need Your Help

Bash: How do I transform this text to fit my needs?

linux bash shell unix sed

I'm writing a Bash script where I need to transform a bunch of lines that look like the following:

How to use ChromeDriver in Selenium

java selenium selenium-webdriver selenium-chromedriver

I'm using windows on my system. I downloaded and extracted the chromedriver.exe file and I added it to my path.