Exact Probability Distribution versus Entropy
2014 (English)In: Entropy, ISSN 1099-4300, E-ISSN 1099-4300, Vol. 16, no 10, 5198-5210 p.Article in journal (Refereed) Published
The problem addressed concerns the determination of the average numberof successive attempts of guessing a word of a certain length consisting of letters withgiven probabilities of occurrence. Both first- and second-order approximations to a naturallanguage are considered. The guessing strategy used is guessing words in decreasing orderof probability. When word and alphabet sizes are large, approximations are necessary inorder to estimate the number of guesses. Several kinds of approximations are discusseddemonstrating moderate requirements regarding both memory and central processing unit(CPU) time. When considering realistic sizes of alphabets and words (100), the numberof guesses can be estimated within minutes with reasonable accuracy (a few percent) andmay therefore constitute an alternative to, e.g., various entropy expressions. For manyprobability distributions, the density of the logarithm of probability products is close to anormal distribution. For those cases, it is possible to derive an analytical expression for theaverage number of guesses. The proportion of guesses needed on average compared to thetotal number decreases almost exponentially with the word length. The leading term in anasymptotic expansion can be used to estimate the number of guesses for large word lengths.Comparisons with analytical lower bounds and entropy expressions are also provided.
Place, publisher, year, edition, pages
Basel, Switzerland: MDPI AG , 2014. Vol. 16, no 10, 5198-5210 p.
information entropy, security, guessing
Research subject Computer Science
IdentifiersURN: urn:nbn:se:kau:diva-34247DOI: 10.3390/e16105198ISI: 000344459500003OAI: oai:DiVA.org:kau-34247DiVA: diva2:754175