How to figure out whether your data can be described by Zipf’s Law or not

vnsrIt’s way harder than it should be to get Google to point you towards instructions for figuring out whether or not your data fits Zipf’s Law.  Since this blog is all about the effects of Zipf’s Law, this seems like a good place to publicize how to do that.  It turns out to be pretty easy, once you’ve learned what it is that you need to do!

1) You’ll want to use R and import the igraph package.

2) Put your data into a vector.  I sorted mine, but I don’t know whether or not that’s required.

3) Pass your vector to the power.law.fit() method.

4) The output will include KS.stat, which is the value for the Kolmogorov-Smirnov test, and KS.p, which is the associated p-value.

5) If your data DOES fit the power law, then your p-value will be greater than .05.  If it’s less than .05, then your data does NOT fit the power law.

For more information on igraph’s power.law.fit() function: http://igraph.org/r/doc/power.law.fit.html

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Curative Power of Medical Data

JCDL 2020 Workshop on Biomedical Natural Language Processing

Crimescribe

Criminal Curiosities

BioNLP

Biomedical natural language processing

Mostly Mammoths

but other things that fascinate me, too

Zygoma

Adventures in natural history collections

Our French Oasis

FAMILY LIFE IN A FRENCH COUNTRY VILLAGE

ACL 2017

PC Chairs Blog

Abby Mullen

A site about history and life

EFL Notes

Random commentary on teaching English as a foreign language

Natural Language Processing

Université Paris-Centrale, Spring 2017

Speak Out in Spanish!

living and loving language

- MIKE STEEDEN -

THE DRIVELLINGS OF TWATTERSLEY FROMAGE

mathbabe

Exploring and venting about quantitative issues

%d bloggers like this: