Zipf's Law for Natural Cities Extracted from Location-Based Social Media Data
Independent thesis Advanced level (degree of Master (One Year)), 10 credits / 15 HE creditsStudent thesis
Zipf’s law is one of the empirical statistical regularities found within many natural systems, ranging from protein sequences of immune receptors in cells to the intensity of solar flares from the sun. Verifying the universality of Zipf’s law can provide many opportunities for us to further seek the commonalities of phenomena that possess the power law behavior. Since power law-like phenomena, as many studies have previously indicated, is often interpreted as evidence for studying complex systems, exploring the universality of Zipf’s law is also of potential capability in explaining underlying generative mechanisms and endogenous processes, i.e. self-organization and chaos theory.
The main purpose of this study was to verify whether Zipf’s law is valid for city sizes, city numbers and population extracted from natural cities. Unlike traditional city boundaries extracted by applying census-imposed and top-down imposed data, which are arbitrary and subjective, the study established the new kind of boundaries of cities, namely, natural cities through using four location-based social media data from Twitter, Brightkite, Gowalla and Freebase and head/tail breaks rule. In order to capture and quantify the hierarchical level for studying heterogeneous scales of cities, ht-index derived from head/tail breaks rule was employed. Furthermore, the validation of Zipf’s law was examined.
The result revealed that the natural cities had deviations in subtle patterns when different social media data were examined. By employing head/tail breaks method, the result calculated the ht-index and detected that hierarchy levels were not largely influenced by spatial-temporal changes but rather data itself. On the other hand, the study found that Zipf’s law is not universal in the case of using location-based social media data. Compared to city numbers extracted from nightlight imagery, the study found out the reason why Zipf’s law does not hold for location-based social media data, i.e. due to bias of customer behavior. The bias mainly resulted in the emergence of natural cities were much more frequent than others in certain regions and countries so that making the emergence of natural cities was not exhibited objectively. Furthermore, the study showed whether Zipf’s law could be well observed depends not only on the data itself and man-made limitations but also on calculation methods, data precisions and scales and the idealized status of observed data.
Place, publisher, year, edition, pages
2015. , 44 + appendixes p.
big data, location-based social media data, Zipf's law, power law, natural cities, ht-index
Earth and Related Environmental Sciences
IdentifiersURN: urn:nbn:se:hig:diva-19121OAI: oai:DiVA.org:hig-19121DiVA: diva2:796324
Subject / course
Geomatics – master’s programme (one year) (swe or eng)
2015-01-28, 13103, UNIVERSITY GÄVLE, GÄVLE, 14:00 (English)
Åhlén, JuliaMa, Ding