Know Your Intimacy Profile to Make Better Romantic Connections

We vary greatly in the attributes we bring to our romantic experiences. While we often are attracted to those who are different from us, we get frustrated when our partners hold intimacy values that…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




CountVectorizer vs TfidfVectorizer

Machine learning models such as linear regression, logistic regression, and k-nearest neighbours take in an X and a y variable.

Text data is not already organised as a matrix or vector of real numbers. We say that this data is unstructured.

CountVectorizer converts text into fixed-length vectors by counting how many times each word appears. The tokens are now stored as a bag-of-words.

Limitations:

CountVectorizer

Tfidf works better than CountVectorizer as it also takes the importance of a words into account account.

Formula → The tf-idf weight is composed by two terms: the first computes the normalized Term Frequency (TF) which refers to the the number of times a word appears in a document divided by the total number of words in that document. The second term is the Inverse Document Frequency (IDF) which is calculated as the logarithm of the number of the documents in the corpus divided by the number of documents where the specific term appears.

Limitations:

TfidVectorizer

Add a comment

Related posts:

Mastering git rebase

This is the follow up of this post where I explained what really is the rebase and its golden rule. As many of you pointed it out, most of Anna’s troubles could have been avoided had she used the…

Three Things We Should Never Take for Granted

In order to keep ourselves sane we all do some little things in our daily life. Here are some things that work well for me might be helpful for people with the same interest and nature. By going out…

HANDS ON

Just a couple days ago, phpMyadmin was once again hit by a new CVE (Common Vulnerability Exposure) this time a SQL injection. For those like me that like to investigate vulnerabilities and ways of…