How to figure out whether your data can be described by Zipf’s Law or not

vnsrIt’s way harder than it should be to get Google to point you towards instructions for figuring out whether or not your data fits Zipf’s Law.  Since this blog is all about the effects of Zipf’s Law, this seems like a good place to publicize how to do that.  It turns out to be pretty easy, once you’ve learned what it is that you need to do!

1) You’ll want to use R and import the igraph package.

2) Put your data into a vector.  I sorted mine, but I don’t know whether or not that’s required.

3) Pass your vector to the power.law.fit() method.

4) The output will include KS.stat, which is the value for the Kolmogorov-Smirnov test, and KS.p, which is the associated p-value.

5) If your data DOES fit the power law, then your p-value will be greater than .05.  If it’s less than .05, then your data does NOT fit the power law.

For more information on igraph’s power.law.fit() function: http://igraph.org/r/doc/power.law.fit.html

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s