arxiv:math/0204078v1 [math.gr] 7 apr 2002 · 2018-10-29 · arxiv:math/0204078v1 [math.gr] 7 apr...

22
arXiv:math/0204078v1 [math.GR] 7 Apr 2002 Contemporary Mathematics Measuring sets in infinite groups Alexandre V. Borovik, Alexei G. Myasnikov, and Vladimir Shpilrain Abstract. We are now witnessing a rapid growth of a new part of group the- ory which has become known as “statistical group theory”. A typical result in this area would say something like “a random element (or a tuple of elements) of a group G has a property P with probability p”. The validity of a statement like that does, of course, heavily depend on how one defines probability on groups, or, equivalently, how one measures sets in a group (in particular, in a free group). We hope that new approaches to defining probabilities on groups outlined in this paper create, among other things, an appropriate framework for the study of the “average case” complexity of algorithms on groups. Contents 1. Introduction 1 2. Conditions on a measure 3 3. Atomic probability measures 4 4. Kolmogorov complexity functions 7 5. Kolmogorov complexity functions on finitely generated groups 9 6. Short elements bias and behaviour at infinity 9 7. Degrees of polynomial growth “on average” 11 8. Measures generated by random walks 13 9. Behaviour of the induced measures on finite factor groups 14 10. The growth function and asymptotic density 17 References 21 1. Introduction A new part of group theory, often called “statistical group theory”, is becoming increasingly popular since it connects group theory to other areas of science, most of all to statistics and to theoretical computer science. A typical result in this area would say something like “a random element (or a tuple of elements) of a group G has a property P with probability p” (see e.g. [1], [5], [18]). The validity of a statement like that does, of course, heavily depend on 1991 Mathematics Subject Classification. Primary 20E05, secondary 60B15. The first author was supported by the Royal Society and The Leverhulme Trust. c 0000 (copyright holder) 1

Upload: others

Post on 17-Mar-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

arX

iv:m

ath/

0204

078v

1 [

mat

h.G

R]

7 A

pr 2

002

Contemporary Mathematics

Measuring sets in infinite groups

Alexandre V. Borovik, Alexei G. Myasnikov, and Vladimir Shpilrain

Abstract. We are now witnessing a rapid growth of a new part of group the-ory which has become known as “statistical group theory”. A typical result inthis area would say something like “a random element (or a tuple of elements)of a group G has a property P with probability p”. The validity of a statementlike that does, of course, heavily depend on how one defines probability ongroups, or, equivalently, how one measures sets in a group (in particular, in afree group). We hope that new approaches to defining probabilities on groupsoutlined in this paper create, among other things, an appropriate frameworkfor the study of the “average case” complexity of algorithms on groups.

Contents

1. Introduction 12. Conditions on a measure 33. Atomic probability measures 44. Kolmogorov complexity functions 75. Kolmogorov complexity functions on finitely generated groups 96. Short elements bias and behaviour at infinity 97. Degrees of polynomial growth “on average” 118. Measures generated by random walks 139. Behaviour of the induced measures on finite factor groups 1410. The growth function and asymptotic density 17References 21

1. Introduction

A new part of group theory, often called “statistical group theory”, is becomingincreasingly popular since it connects group theory to other areas of science, mostof all to statistics and to theoretical computer science.

A typical result in this area would say something like “a random element (or atuple of elements) of a group G has a property P with probability p” (see e.g. [1],[5], [18]). The validity of a statement like that does, of course, heavily depend on

1991 Mathematics Subject Classification. Primary 20E05, secondary 60B15.The first author was supported by the Royal Society and The Leverhulme Trust.

c©0000 (copyright holder)

1

2 ALEXANDRE V. BOROVIK, ALEXEI G. MYASNIKOV, AND VLADIMIR SHPILRAIN

how one defines probability on groups, or, equivalently, how one measures sets ina group. This is the problem that we address in the present paper; we feel that itdeserves some discussion at a general methodological level. A much more technicaldevelopment of some of the ideas can be found in [3].

Our approach to statistical group theory is formed by needs of practical com-putations with infinite groups. In particular, the starting point of our study ofmeasures on groups was the desire to identify measures which can be used in theanalysis of the behaviour of genetic algorithms on infinite groups [15, 16, 22].Hence one of our main requirements for a measure is that it reflects the nature ofalgorithms used for generating random or pseudo-random elements of a group. InSection 2 we discuss some other natural conditions which would make a measuresuitable for practical computations.

Since we focus mostly on finitely generated discrete groups, almost all measuresdefined in the present paper are atomic. Recall that a probabilistic measure on acountable set is atomic if every subset is measurable. Clearly, the latter conditionis natural in the context of computational group theory where we are restrictedto dealing with finite sets of elements. In Section 3, we suggest a simple (perhapseven naive) general approach to constructing atomic measures on countable groupsassociated with various length (or “complexity”) functions.

In Sections 4, 5, we look at the problem of measuring sets in infinite groupswhen pseudorandom elements of a group are generated by a deterministic pro-cess. This naturally leads to Kolmogorov complexity of words as a more adequateconcept of the “length function” on a free group. The invariance theorems forKolmogorov complexity provide for a uniform treatment of different ways to definecomputational complexity functions on groups.

Section 6 deals with the so-called short elements bias which occurs in any gen-erator of random elements of a group G with a given atomic probability distributionµ. This effect is based on a simple observation that the measure µ(S) of an infiniteset S ⊂ G is essentially defined by a few short elements from S. There are severalprincipal approaches to this problem. The most popular one suggests to considerrelative frequencies

ρk(S) =|S ∩Bk||Bk|

(where Bk is the ball of radius k in the Cayley graph of G with respect to a givenset of generators) and their behaviour at “infinity” when k → ∞. This leads to thenotion of asymptotic density of S defined as the following limit (if it exists):

ρ(S) = limk→∞

|S ∩Bk||Bk|

.

Another way to avoid the short elements bias is to replace a single measure µ onG by a parametric family of distributions {µL} in which L is the average length ofelements of G relative to µL. In this case the asymptotic behaviour of the set S isdescribed by the limit

µ∞(S) = limL→∞

µL(S).

We discuss asymptotic densities in the last section and refer to [3] for a detaileddiscussion of the second method.

In Section 7, we show how a specific choice of the probability distribution on afree group allows one to introduce degrees of polynomial growth “on average” for

MEASURING SETS IN INFINITE GROUPS 3

functions on groups. In particular, it is a possible way for introducing hierarchiesof the average case complexity of various algorithms for infinite groups, makingmeaningful statements like “the algorithm works in cubic time on average”.

Section 8 discusses probability measures generated by random walks on theCayley graph of a free group Fn. This class of measures is well suited for analysisof randomized algorithms on groups and behaves nicely with respect to taking finitefactor groups.

In Section 10, we analyze some limitations of the asymptotic density as a tool formeasuring sets in infinite groups. The most important one is that it is not sensitiveenough, i.e., many sets of intuitively different sizes have the same asymptotic density(usually 1 or 0). Indeed, in [26], Woess proved that every normal subgroup ofinfinite index in a free group Fn, n > 2, has asymptotic density 0.

Another disappointment is offered by Theorem 10.4 which states that the setof primitive elements of a free group Fn, n > 2, has asymptotic density 0. In fact,our proof provides lower and upper bounds for the relative frequencies of the set ofprimitive elements of Fn, which is a result of independent interest.

One way around this problem is to consider the so-called growth rate of therelative frequency defined by

γ(S) = lim supk→∞

k√

ρk(S).

This is a useful characteristic of growth of the set S and it is more sensitive thanthe asymptotic density ρ(S) (see Section 10.2).

On the other hand, the function γ(S) is not even additive, which makes itdifficult to use as a measuring tool.

It would be very interesting to check whether sets having the same asymptoticdensities (see, for example, [1], [5], [18]) will show different sizes with respect to amore sensitive and adequate measure.

2. Conditions on a measure

Let G be a group (finite or countable infinite). A probabilistic measure µ on Ghas to satisfy the standard axioms of a probability space.

Recall that a probability space is a set X together with a σ-algebra of subsetsA of X and a probability measure µ : A −→ R which is a real-valued functionsatisfying the following axioms:

(M1) For any set S ⊆ A, µ(S) > 0.(M2) µ(X) = 1.(M3) If Si, i ∈ I, is a countable collection of pairwise disjoint sets from A, then

µ

(

i∈I

Si

)

=∑

i∈I

µ(Si).

Analysis of behaviour of randomized algorithms on groups requires estimationof the probability of a given element. A measure µ on X is atomic if it satisfies thefollowing condition

(M4) X is countable and every subset S ⊆ X is µ-measurable.

In this paper, we shall consider mostly atomic measures on a free group Fn

of rank n with basis {x1, . . . , xn}. Note that since every coset with respect to a

4 ALEXANDRE V. BOROVIK, ALEXEI G. MYASNIKOV, AND VLADIMIR SHPILRAIN

subgroup H < Fn is measurable, this defines an induced measure on the factor setFn/H ; this new induced measure is also atomic.

A question that might arise (and which is usually addressed in the theoryof growth of groups) is whether or not a measure should depend on a particularbasis of Fn. Independence of a basis means invariance of the measure under theaction of AutFn. In the case of atomic measures, this implies that infinitely manysingletons {gh}, h ∈ Fn, have the same measure as {g} does, which implies µ(g) = 0.Hence the only (AutFn)-invariant atomic measure on Fn is the singular measureconcentrated at the origin 1, and atomic measures necessarily depend on the choiceof a basis in Fn.

Finally, we record several informal conditions on a measure which will guide usin evaluating behaviour of various measures:

(C1) a measure should be natural, i.e., it should meet our expectations of whatthe sizes of various sets in a group are.

(C2) a measure should be sensitive, i.e., sets that (intuitively) seem to be ofdifferent sizes, should have different measures.

(C3) µ(S) (or, at least, rather tight bounds for µ(S)) should be easily computablefor “natural” sets S ⊆ Fn.

(C4) a measure should admit a natural generator of random elements in the group.

In the rest of the paper we consider various approaches to constructing measureson groups, checking them against our list of conditions (C1)–(C4).

3. Atomic probability measures

In this section, we discuss a general method which allows one to define atomicprobabilistic measures on countable groups. Our definition of a measure is based ona notion of “complexity” of elements of G and a probability distribution on the setof all non-negative integers N . These are the initial basic objects which determinethe intrinsic behaviour of the corresponding measure.

Depending on a problem at hand, one can use different types of complexities,for example, it can be a descriptive complexity, a computational complexity, theminimal length of the word representing an element with respect to a given gener-ating set, or the length of the normal form of an element. In general, a complexityfunction (or a complexity) on a group G is an arbitrary non-negative integral func-tion c : G −→ N such that for every n ∈ N the preimage c−1(n) is a finite subsetof G. Note that a group G with a complexity function is countable. At the endof the section we discuss more general complexity functions which allow elementsof G to have infinite complexity and elements with real value complexities (in thiscase the group G can be uncountable).

The most important example of a complexity function is the length function ona group. Let G be a finitely generated group with a given finite set of generatorsS ⊆ G. For an element g ∈ G by lS(g) we denote the minimal non-negative integern such that g = y1 . . . yn for some yi ∈ S ∪ S−1. If H is a subgroup of G then therestriction of lS on H gives rise to a new complexity function on H , which mightnot be a length function on H .

In Section 4 we consider another important type of complexity functions on G,so-called Kolmogorov complexity functions.

MEASURING SETS IN INFINITE GROUPS 5

For a given complexity function c on G and a discrete probability distributionon non-negative integers N , given by a density function d : N −→ R, we are goingto construct an atomic measure µc,d on G.

Recall that, for an atomic measure on a countable set X , the value of µ on anarbitrary subset S ⊂ X is defined uniquely by the formula

µ(S) =∑

w∈S

µ(w).(3.1)

This shows that an atomic measure µ is completely determined by its values onsingletons {w}, w ∈ X. In other words, to define µ it suffices to define a functionp : X −→ R (and put µ(w) = p(w)), which is called a probability mass function ora density function on X , such that:

p(w) > 0 for all x ∈ X,∑

w∈X

p(w) = 1.

Now we put forward one more condition on the measure µ which ties µ to a com-plexity function c : G → N :

(C5) for any u, v ∈ G, c(u) = c(v) implies µ(u) = µ(v).

We call a measure µ on G c-invariant if µ satisfies (C5), i.e., elements of the samec-complexity have the same measure.

The following result describes c-invariant measures on G. For k ∈ N define thek-sphere Ck with respect to a complexity c as follows:

Ck = {w ∈ G | c(w) = k}.Similarly, by Bn we denote the disc or the ball of radius n with respect to c:

Bn = {w ∈ G | c(w) 6 n}.

Lemma 3.1. Let µ be an atomic measure on G and c a complexity function onG. Then:

(1) If µ is c-invariant, then the function dµ : N −→ R defined by

dµ : k −→ µ(Ck)

is a probability measure on N ;(2) if d : N −→ R is a probability measure on N , then the function pc,d : G −→

R defined by

pc,d(w) =d(c(w))

| Cn |is a probability function which gives rise to an atomic c-invariant measureon G.

Proof is obvious. �

Remark 3.2. If c(w) is the length of an element w ∈ G with respect to a givenfinite set of generators of G, then c-invariant measures play an important role inasymptotic group theory. We will call such measures homogeneous.

Since d(k) −→ 0 as k −→ ∞, the following elementary but fundamental prop-erty of µc,d holds.

6 ALEXANDRE V. BOROVIK, ALEXEI G. MYASNIKOV, AND VLADIMIR SHPILRAIN

More complex (with respect to a given complexity c) elements of a setS ⊂ X contribute less to µc,d(S).

This discussion shows that in defining the probability measures µc,d on G,everything boils down to the problem of choosing a complexity function c : G −→ Nand a moderating distribution d : N −→ R. This choice might depend on aparticular problem one would like to address. To that end, it seems reasonable to:

(a) use complexity functions on G which reflect intrinsic features of the problemat hand;

(b) use probability distributions on N which reflect the nature of the processwhich generates (pseudo)random elements of the group, or which allow tostudy the statistical properties of the group in a meaningful and computa-tionally feasible way.

We discuss possible complexity functions on groups in the next section. Hereare some well-established parametric families of density functions on N that wehave considered in this framework:

(i) The Poisson density:

dλ(k) = e−λλk

k!

(ii) The exponential density:

dλ(k) = (1− e−λ)e−λk

(iii) The standard normal (Gauss) density:

da,σ(k) = b · e− (k−a)2

σ

(iv) The Cauchy density:

dλ(k) = b · 1

(k − λ)2 + 1

(v) The Dirac density:

dm(k) =

{

1 if k = m0 if k 6= m

The following density function depends on a given complexity function c on G:(vi) The finite disc uniform density:

dm(k) =

{

|Ck||Bm| if k ≤ m

0 otherwise

For example, the Cauchy density functions are very convenient for definingdegrees of polynomial growth “on average”, while the exponential density functionsare suitable for defining degrees of exponential growth “on average” (see Section 7).

On the other hand, different distributions arise from different random genera-tors of elements in a group. For example, Dirac densities correspond to the uniformrandom generators on the spheres Cn; finite disc densities arise from uniform ran-dom generators on the discs Bn. Measures on the free group Fn associated withthe exponential moderating distribution on N have especially nice properties andare studied in some detail in [3].

MEASURING SETS IN INFINITE GROUPS 7

Remark 3.3. Sometimes we allow for some elements in the set X to haveinfinite complexity ∞. In this case we consider complexity functions of the typec : X −→ N ∪ {∞} such that c−1(n) is a finite subset of X for every integer n (sothat we may have infinitely many elements in X with complexity ∞). Followingour requirement that more complex elements contribute less to the measure, wedefine the probability density function p on X for elements with finite complexity,and for elements x ∈ X of infinite complexity we just put p(x) = 0.

4. Kolmogorov complexity functions

4.1. Kolmogorov complexity functions on free groups. In this sectionwe discuss complexity functions on free groups. We are going to represent elementsof free groups as reduced words in a given basis, so the starting point of thisdiscussion is complexity of words.

Let A be a finite alphabet and A∗ the set of all finite words in A. The length|w| of a word w = x1 . . . xn, xi ∈ A, is equal to n. The length function l : w −→ |w|maps A∗ into N . Observe that l−1(n) = |A|n. It follows that

• The length function l : A∗ −→ N is a complexity function on A∗.

This is one of the basic complexity functions that we will consider in this paper.The problem however is that some very long words in A∗ do not look very complex.

For example, the word a10100

, where a ∈ A, has length 10100, but this word is veryeasy to describe. This leads to the notion of Kolmogorov complexity, or descriptivecomplexity, or sometimes it is called algorithmic complexity. We refer to [14], [25]for detailed treatment of Kolmogorov complexity.

Intuitively, Kolmogorov complexity of a word w ∈ A∗ is the minimum possiblesize (length) of a description of w with respect to a given formal general procedure.For example, one may think of the descriptive complexity of w as of the the mini-mum possible size of a program, in a given programming language, which produces

w after finitely many steps. In this case, the word a10100

above will have complexitymuch less than 10100.

Now we give a formal definition of Kolmogorov complexity.It is sufficient to consider programs of a very particular type, say, Turing ma-

chines. Denote by B = {0, 1} the standard binary alphabet and by B∗ the set ofall finite binary strings. Every Turing machine M determines a partial (perhapsempty) recursive function fM : B∗ −→ A∗. The function fM is defined on x ∈ B∗

if and only if the machine M starts on the tape with a word x, halts in finitelymany steps, and a word w ∈ A∗ is written on the tape. In this event, fM (x) = w.Denote by sM (x) and tM (x), respectively, the tape space and the number of stepsneeded for M to write w and halt.

The Kolmogorov complexity of a word w ∈ A∗ with respect to a given machineM is defined as follows:

KM (w) =

{

min{|x| | x ∈ B∗ & fM (x) = w} if such x exists∞ otherwise

Similarly, one can define time bounded and space bounded Kolmogorov com-plexities of a word w ∈ A∗. Namely, for a Turing machine M and a function

β : N −→ N

8 ALEXANDRE V. BOROVIK, ALEXEI G. MYASNIKOV, AND VLADIMIR SHPILRAIN

(a bound) define the space bounded and time bounded Kolmogorov complexitiesKSM (x, β) and KTM (x, β) (with respect to the bound β) as follows:

KSM (w, β) =

{

min{|x| | x ∈ B∗ & fi(x) = w & sM (x) 6 β(|x|)} if such x exists∞ otherwise

KTM(w, β) =

{

min{|x| | x ∈ B∗ & fi(x) = w & tM (x) 6 β(|x|)} if such x exists∞ otherwise

These definitions depend on a given Turing machine M . It turns out that auniversal Turing machine M provides an optimal notion of Kolmogorov complexity.Namely, the following invariance theorem holds.

Invariance Theorem I ([21] [13], [4]). There exists a Turing machine U suchthat for any Turing machine M and for any word w ∈ A∗ the following inequalityholds:

KU (w) 6 KM (w) + CM ,

where CM is a constant which does not depend on w.

Similar results hold for space bounded and time bounded Kolmogorov com-plexities.

Invariance Theorem II [11]. There exists a Turing machine U such that for anyTuring machine M , for any bound β : N −→ N , and for any word w ∈ A∗ thefollowing inequalities hold:

KSU (w,CMβ) 6 KM (w, β) + CM ,

KTU (w,CMβ log(β) 6 KM (w, β) + CM ,

where CM is a constant which does not depend on w.

The theorems above allow one to considerKU (w) as the Kolmogorov complexityK(w) of a word w ∈ A∗, were U is an arbitrary fixed optimal machine.

We will use these different types of Kolmogorov complexities in constructingmeasures on a free group. But first, two remarks are in order (a bad news and agood news).

Remark 4.1. The function w −→ K(w) is not recursive. Thus, we cannoteffectively compute the Kolmogorov complexity of a given word. However, it turnsout that we can estimate K(w) by the length of w. This shows that the length |w|as a complexity of w is back in the game.

Remark 4.2. There exists a constant C such that for any word w ∈ A∗

K(w) 6 |w| log2 |A|+ C.

Actually, there are better estimates which will be discussed elsewhere.

Now we define Kolmogorov complexity of an element of a free group F = F (X)with a basis X . Let X−1 = {x−1 | x ∈ X} and A = X ∪ X−1. For an elementf ∈ F denote by wf the unique reduced word in the alphabet A which representsf .

Now the Kolmogorov complexity K(f,X) of f , with respect to the basis X , isdefined as

K(f) = K(wf ).

MEASURING SETS IN INFINITE GROUPS 9

It is easy to see that for different bases X and Y of F , the correspondingKolmogorov complexities are equivalent up to some additive constants, i.e., thereare positive integers C1 and C2 such that for any f ∈ F ,

K(f, Y )− C1 6 K(f,X) 6 K(f, Y ) + C2.

This allows us to fix an arbitrary basis of the free group F and consider all theKolmogorov complexities with respect to this particular basis.

5. Kolmogorov complexity functions on finitely generated groups

Let G be a group generated by a finite set X . There are several ways ofintroducing Kolmogorov complexity on G.

Method I. Let F (X) be a free group on X and let η : F (X) −→ G be thecanonical epimorphism. For an element g ∈ G define the Kolmogorov complexityK(g,X) of g with respect to X as follows

K(g,X) = min{K(w,X) | η(w) = g},i.e., the Kolmogorov complexity of g ∈ G is the minimum of Kolmogorov complex-ities of representatives of g with respect to X .

As we have already seen, K(g,X) can be estimated from above by the geodesiclength of g in the Cayley graph of G with respect to the set X of generators. It isextremely difficult to deal with this type of complexity if the set of such geodesics isitself very complex. In this event, it might be useful to consider some special normalforms of elements. For example, this is the case for braid groups with respect tothe set of Artin generators.

Method II. Suppose the group G has a set of normal forms V , i.e., there existsa subset V ⊂ F such that η |V is one-to-one. Then one can define Kolmogorovcomplexity of g ∈ G with respect to V as

K(g, V ) = K(η−1|V (g)).The following method provides an average Kolmogorov complexity with respect

to a given measure on F .

Method III. Let η : F −→ G be as above. Let d : N −→ R be a probability onN and c : F −→ N a Kolmogorov complexity function on F . As we have discussedabove, there exists a probability measure µ = µc,d on the group F . Define anaverage Kolmogorov complexity of g ∈ G with respect to d, c, and X as follows:

KA(g, d, c,X) =

w∈η−1(g) c(w)µc,d(w)∑

w∈η−1(g) µc,d(w).

6. Short elements bias and behaviour at infinity

In this section we discuss how to deal with short elements bias in an infinitegroup.

Let G be a finitely generated group with a finite set X of generators. Forsimplicity we will discuss only measures corresponding to the length function lXon G, but similar arguments can be applied for other complexity functions as well.Let

Cn = {g ∈ G | lX(g) = n}, Bn = {g ∈ G | lX(g) 6 n}Let µ be an atomic function on G. Since

µ(Ck) = 1, we have µ(Ck) −→ 0as k −→ ∞. Therefore elements of bigger length in a set S ⊆ G contribute less

10 ALEXANDRE V. BOROVIK, ALEXEI G. MYASNIKOV, AND VLADIMIR SHPILRAIN

to µ(S), i.e., we witness the short elements bias. On one hand, this meets ourintuitive expectations because, technically, random elements of a very big lengthare inaccessible to us, e.g., no computer can generate for us a random element oflength > 10100. On the other hand, it may happen that only a few short elementsessentially define the measure of an infinite set.

There are several approaches to deal with the short elements bias.Method I.In practical computations with groups, when we wish to evaluate performance

of a given algorithm A, a typical solution to this problem is the following. Choosea sufficiently big random positive integer n (or several of them), generate (pseudo)randomly and uniformly enough elements of length n, and run your algorithm onthe produced inputs. The choice of n usually depends on the computer resourcesand the hardness of the algorithm. Mathematically this can be modelled by theprobability distribution µlX ,dn with Dirac density dn. The only problem is thechoice of n. Theoretically, to avoid the bias toward short elements, one has to takethe limit when n → ∞. More precisely, let R be a subset of G. Denote by ρn(R)the measure of R with respect to Dirac density concentrating at n:

ρ(s)n (R) =|R ∩ Cn||Cn|

.

Then the asymptotic behaviour of the set R (with respect to Dirac densities) canbe characterized by the following limit (if it exists):

ρ(s)(R) = limn→∞

ρ(s)n (R)

This limit is called the shperical asymptotic density of the set R.Similarly, if we measure R with respect to the disc uniform density function

(with respect to the complexity function lX):

ρ(d)n (R) =|R ∩Bn||Bn|

,

and then take the limit (if it exists)

ρ(d)(R) = limn→∞

ρ(d)n (R)

then we get the disc asymptotic density of the set R.In Section 10 we discuss asymptotic densities in detail, here it is worthwhile

to mention only that these characteristics are not sensitive enough to distinguishvarious subsets of groups. For example, all subgroups of infinite index have thesame asymptotic density (disc or spherical) equal to 0.

Method II.Let µ be a probability distribution on G. Then the mean length Lµ,X of ele-

ments of G with respect to µ and the set X of generators is the expected value ofthe length function lX :

Lµ,X =∑

g∈G

lX(g)µ(g).

For example, if G = F (X) and a measure µλ = µlX ,dλis given on F (X) by the

exponential density:

dλ(k) = (1− e−λ)e−λk

MEASURING SETS IN INFINITE GROUPS 11

then the mean length Lλ of words in F distributed according to µλ is equal to

Lλ =∑

w∈F

|w|µ(w) = (1− e−λ)

∞∑

k=1

k(e−λ)k =

=(1− e−λ)

∞∑

k=1

k(e−λ)k−1 =1

(eλ − 1)eλ.

Hence Lλ → ∞ when λ → 0. Thus, we have

a family of probabilistic distributions {µλ}λ with parameter λ suchthat the mean length Lλ tends to ∞ when the parameter λ approaches0.

Similar results hold for all other distributions introduced in Section 3 (afterrenormalizing the parameters).

Now one can measure the behaviour of R at infinity by the following limit (ifit exists)

µ∞(R) = limλ→0

µλ(R).

A study of limits of this type was initiated in [3].Note that if {µk}k is the Dirac parametric family of distributions, then

µ∞(R) = ρ(s)(R),

and if {µk}k is the parametric family of disc uniform distributions, then

µ∞(R) = ρ(d)(R).

7. Degrees of polynomial growth “on average”

Let c : Fn −→ R be a complexity function and µ the measure moderated bythe Cauchy distribution

d(k) =6

π2· 1

k2.

The advantage of this distribution is that it allows to measure degrees of polynomialgrowth of functions on Fn “on average” in the following sense.

Let f : Fn −→ R be a non-negative real valued function. We say that f haspolynomial growth of degree α > 0 if α is the greatest lower bound of the set of

real positive numbers β such that the mean value of f(w)c(w)β−1 is finite, that is,

α = inf

{

β

w∈Fn

f(w)

c(w)β−1µc,d(w) converges

}

.(7.1)

It follows immediately from the construction of our measure that the function c(w)m

has growth of degree m. (Just recall that the series∑

1n1+ǫ converges for all ǫ > 0,

while the harmonic series∑ 1

n diverges.)In particular, if f(x) is the running time of some algorithm with input x, this

definition allows us to make meaningful statements like “the algorithm works incubic time on average”.

Now we are going to generalize this situation and try to find sufficient conditionsto define the degrees of growth of functions on an arbitrary infinite factor group ofFn.

12 ALEXANDRE V. BOROVIK, ALEXEI G. MYASNIKOV, AND VLADIMIR SHPILRAIN

Let G = Fn/R be an infinite factor group of the free group Fn and η : Fn −→ Gthe natural homomorphism. We shall list some natural conditions for the complex-ity function c on Fn and the moderating probability distribution d which allow usto define degrees of the average growth of functions on G.

(D1) (Existence of a complexity function on G.) The mean complexity

c(g) =

w∈g c(w)µc,d(w)∑

w∈g µc,d(w)

is finite for every g ∈ G.

(D2) (Existence of degrees.) There exists a positive number τ such that the series

g∈G

c(g)aµ(g)

converges for a < τ and diverges for a > τ ; µ here is the measure on Ginduced by the measure µc,d on Fn.

If these conditions are satisfied, we can define the degree of growth of an arbitrarynonnegative real valued function f(g) on G as

(degree of growth of f) = inf

β

g∈G

f(g)

c(g)β−τµ(g) converges

.(7.2)

Now we at least have the (easy to check) property that the degree of growth of thefunction c(g)m is m.

Note that not every moderating probability distribution d on N ∪{0} is suitablefor defining degrees of polynomial growth. For example, if we take in the definitionof the degrees of growth on the free group Fn, the exponential distribution

d(k) = (1 − e−λ) · e−λk,

we note that the series∑

x∈Fn

c(x)aµc,d(x)

converges for all a.

Question 7.1. Can one find a moderating probability distribution d such thatthe conditions D1 and D2 are satisfied for every factor group G = Fn/R of infiniteindex and the measure µ on G induced from µc,d?

The Cauchy distribution d(k) = 6π2 · 1

k2 is still on the list of candidates for theaffirmative answer.

We need to warn that the degree of growth of a function f onG is not necessarilyequal to the degree of growth of its lift f ◦ η : Fn −→ R. It might happen that,for certain problems, it is much more convenient to work in a free group than inits factor group G and evaluate the degrees of growth of the lifted function f ◦ ηinstead of those of f . Therefore general results concerning relations between degreesof growth on Fn and G = Fn/R would be rather interesting.

MEASURING SETS IN INFINITE GROUPS 13

8. Measures generated by random walks

8.1. Random walks. An interesting and, in some cases, easier to analyzeclass of measures is related to random walks on Cayley graphs of groups. From analgorithmic point of view, the most natural way to produce a random element ina finitely generated group is to first make a random word on the generators, andthen apply relations. This is, in disguise, a random walk on the Cayley graph ofa given group. This becomes especially relevant when hardware random numbersgenerators are used to produce random words.

Let us look at this basic procedure in more detail.Let Fn be a free group on free generators x1, . . . , xn. We can associate with it a

free monoid Mn with the generators x1, X1, . . . , xn, Xn and the natural homomor-phism π : Mn −→ Fn which sends xi to xi and Xi to x−1

i . A random walk of lengthl on the Cayley tree of Fk which starts at the identity 1 is naturally described bya word of length l in Mn.

In essence, we take Mn, not Fn, as the ambient algebraic structure, and in-troduce measures and complexity functions on Mn rather than on Fn. There isa compelling evidence that, in at least some problems, it might be convenient towork in the ambient free monoid Mn. For example, it appears that physicists preferto use the random walk approach in their study of statistics of braids, knots andheaps, the latter being closely related to locally free groups

LFn = 〈f1, . . . , fn | [fi, fj] = 1 for |i − j| > 1〉(see e.g. [6, 7, 8, 23]). In any case, the physical process of the accumulation ofsoot (in the 2-dimensional case) is most naturally modeled by random walks on themonoid of positive words in LFn.

Let d : N ∪ {0} −→ R be a moderating probability distribution. Following theanalogy with our constructions for a free group, we introduce an atomic probabilitymeasure µ on Mn by assigning equal probabilities to words of equal complexity andthe total measure d(k) to the set of words of complexity k. This can be interpretedas running random walks on Fn of random complexities k distributed with theprobability density d(k). The probability for a random walk to stop at the elementx ∈ Fk defines a measure µ(x) on Fk. Obviously,

µ(x) =∑

π(w)=x

µ(w).

8.2. How is this new measure related to Kolmogorov complexitymeasures? The measure that we have just defined should have properties veryclose to Kolmogorov complexity measures described in Section 4. The word x =π(w) is obtained from a word w by cancelling all adjacent pairs of elements of theform xiXi, which amounts to running a certain Turing machine on the input wordw. Therefore, in view of the previous discussion, the Kolmogorov complexity c(x)of x differs from the Kolmogorov complexity c(w) of w by at most an additiveconstant:

c(x) 6 c(w) + C.

Since for most words of a fixed big length l, Kolmogorov complexity is close tothe length of the word, we should expect roughly the same asymptotic behaviourfrom a random walk measure as we do from a measure associated with Kolmogorovcomplexity.

14 ALEXANDRE V. BOROVIK, ALEXEI G. MYASNIKOV, AND VLADIMIR SHPILRAIN

8.3. Comparing the length of a random walk with the geodesic length.In a simple case, where the complexity function on Mn is just the usual length func-tion l(w), we can similarly compare its behaviour with the behaviour of the geodesiclength l = lgeod(x) of the end point x = π(w) of the walk on the Cayley tree Γ(Fn)of Fn, described by the word w.

It is easy to see that the function x 7→ lgeod(x) maps a random walk on Γ(Fn) toa non-symmetric random walk W+ on the set N∪ {0} of nonnegative integers withreflection at 0. In this new walk, we make steps of length 1; we move from the pointl = 0 to the right with probability 1, and, from any other point l 6= 0, we move tothe right with the probability p = 2n−1

2n and to the left with the probability q = 12n .

Obviously, the mean value of l is at least the mean value of l for a random walk Won Z without reflection at 0; in this walk we start at 0 and move by 1 to the rightwith the probability p and by 1 to the left with the probability q. In W , we makem moves to the right with the probability

(

km

)

· pm · qk−m and end up at the pointl′ = m− (k−m) = 2m− k. Thus, the random variable l′ is a linear transformationof the random variable m distributed according to the binomial distribution. Sincethe expectation E(m) = pk, we deduce that E(l′) = 2pk − k = n−1

n k.Therefore, for the random walk W+, the expected value of l is bounded from

below by n−1n k = E(l′) 6 E(l). The upper bound E(l) 6 k is obvious. Thus, the

expected geodesic length E(lgeod(x)) of an element x ∈ Fn produced by a randomwalk of length k is estimated as

n− 1

nk 6 E(lgeod(x)) 6 k.

9. Behaviour of the induced measures on finite factor groups

In this section we try to establish possible criteria to evaluate how “natural” agiven measure is, i.e., how it matches our expectations of what the probability ofhitting particular sets should be. In general, these expectations may, of course, varyfrom individual to individual, but we have some common grounds, at least, in thecase of finite sets. So, our idea is to set up some tests to compare a given measureµ on a free group F = Fn with the induced measures on finite quotients of F .Most people will probably agree that, for example, the probability for a randomlychosen element of F to have even length should be about 1/2. More generally, it isnatural to expect that a subgroup H < F of an index m would have the measure(approximately) equal to 1

m . Unfortunately, this is not the case if µ(1) > 0, indeed,

if m >> 1µ(1) then µ(H) > µ(1) >> 1

m .

Moreover, most people working with probabilities on finite groups are likely toprefer measures which are well behaved with respect to taking “big” finite factorsFn/R, for example, measures which yield reasonable bounds for the total variancedistance

1

2

g∈Fn/R

µ(g)− 1

|Fn : R|

(9.1)

of the induced distribution from the uniform distribution on Fn/R. Again, thisnatural condition cannot be satisfied basically for the same reason, namely, thevalue of (9.1) is bounded from below by 1

2 |µ(1)− (1/m)|, where m = |Fn : R|, anddoes not converge to 0 as m −→ ∞.

MEASURING SETS IN INFINITE GROUPS 15

This is a typical example of an unexcusable dependance of a measure on shortelements. This is one of the many reasons to believe in the following (maybecontroversial) metamathematical thesis:

There is no particular atomic measure on an infinite group which wouldmeet our expectations of sizes of particular sets in the group

A possible practical outcome of this thesis is either to consider a whole familyof parametric measures on a group instead of a fixed one, or to adjust our criteriato allow a margin of an error of approximation. The first approach was developedto some extent in [3], here we make an attempt to consider the second one.

One may wish to exclude anomalously big probabilities of short elements bytaking the measure of an element gR in a finite factor group Fn/R to be therenormalized measure of the set of “large” elements in gR:

µl(gR) =µ((Fn rBl) ∩ gR)

µ(Fn rBl),(9.2)

where Bl is the ball of radius l centered at 1. Anyway, it is only natural to assumethat, when assessing the average case complexity of algorithmic problems of prac-tical interest, we are working with words of sufficiently large size; the bias towardsshort elements can be safely ignored. A measure µ can be accepted as “good” if,for values of l much smaller than m,

1

2

g∈Fn/R

µl(g)−1

|Fn : R|

<1

e,(9.3)

or is bounded by some other reasonable constant, or decreases exponentially withthe growth of l:

1

2

g∈Fn/R

µl(g)−1

|Fn : R|

= o(e−cl).(9.4)

The reader probably feels already at this point that we are leaning towards theclassical concept of a random walk on a group. Taking images of “sufficientlylong” elements means allowing a sufficiently long random walk on the finite groupFn/R. It is reasonable to take the mixing time of the random walk on the factorgroup Fn/R with respect to the generators x1R, . . . , xnR for the characteristicword length l in the criterion (9.3). Recall that the mixing time is defined as theminimum l0 such that

‖Pl0 − U‖ =1

2

g∈Fn/R

Pl0(g)−1

|Fn : R|

<1

e,(9.5)

where Pl(g) is the probability for a random walk of length l to end up at the elementg, and U is the uniform distribution on Fn/R. From the mixing time on, a randomwalk distribution converges to the uniform distribution exponentially fast:

‖Pkl0 − U‖ =1

2

g∈Fn/R

Pkl0 (g)−1

|Fn : R|

= o(e−ck).(9.6)

Therefore, we can consider an alternative criterion for good behaviour of themeasure on finite factor groups:

16 ALEXANDRE V. BOROVIK, ALEXEI G. MYASNIKOV, AND VLADIMIR SHPILRAIN

(C1*) For every finite factor group G = Fn/R there exists a number l0 such that

1

2

g∈Fn/R

µkl0(g)−1

|Fn : R|

= o(e−k).

Of course, the value of l0 (“the mixing time”) is all important. One shouldexpect from a ‘good’ measure on Fn that it is approximately the same as themixing time of a random walk on Fn/R. It is worth mentioning here, that, by aresult by Pak [19], a random walk on a finite group G with respect to a randomgenerating set of size O(log |G|) mixes under O(log |G|) steps.

It is relatively easy to see that our criterion (C1*) is met by a wide class ofmeasures associated with the length function on Fn.

Theorem 9.1. If the moderating distribution d(k) satisfies a rather naturalcondition d(k) > 0 for infinitely may values of k then, for every finite factor groupG = Fn/R, the induced measure µl on G satisfies the condition (C1*).

9.1. Proof of Theorem 9.1. Let |G| = m. Every probability distributionon G is a vector p1, . . . , pm of non-negative real numbers subject to the conditionp1 + · · ·+ pm = 1. Thus, the set of all probability distributions on G is the convexhull ∆ of the distributions Eg concentrated at g ∈ G, i.e., Eg(h) = 1 if h = g and0 otherwise. We introduce on ∆ the total variance metric

‖P −Q‖ =1

2

g∈G

|P (g)−Q(g)|.

A step of a random walk induces an affine transformation τ : Rm −→ Rm by the

rule

τ : Eg 7→ 1

2n

h adjacent to g

Eh,

where the adjacency is considered in the Cayley graph of G.Since all vertices of the Cayley graph of G have the same degree, τ fixes the

uniform distribution U = (1/m, . . . , 1/m). By somewhat abusing the notation, wecan move the origin of the coordinate system in R

m to the point U . It will beconvenient to restrict the action of τ to the subspace R

m−1 spanned in Rm by the

polytope ∆. Then the distance ‖X−U‖ becomes a norm on Rm−1 (which we denote

by ‖X‖; hopefully, this will not cause a confusion), and τ becomes a linear operatoron R

m−1. Note that the norms ‖τk(Eg)‖ of vertices of the convex polytope τk(∆)are all equal. Since τ maps the vertices Eg of ∆ to points of ∆, it does the sameto the vertices of τk(∆). As a result, we have the following monotonicity property:

‖τk+1(Eg)‖ 6 ‖τk(Eg)‖.We can use again the property that τ maps the vertices Eg of the convex

polytope ∆ to prove that ‖τ(Eg)‖E < ‖Eg‖E for all g ∈ G, where ‖ ‖E stands forthe standard Euclidean norm on R

m−1. It follows that

‖τ‖E = maxX∈∆,X 6=0

‖τ(X)‖E‖X‖E

< 1,

hence ‖τk(X)‖E −→ 0 as k −→ ∞ for every distribution X . Moreover, ‖τk(X)‖Edecreases exponentially: ‖τk(X)‖E = o(ck). Since any two norms on a finite di-mensional space are equivalent, we have

1

C‖X‖E < ‖X‖ < C‖X‖E

MEASURING SETS IN INFINITE GROUPS 17

for some constant C, and it follows that ‖τk(X)‖ = o(ck).We have, in our new notation,

1

2

g∈G

µl(g)−1

m

= ‖µl‖ =

1∑

k>l d(k)

k>l

d(k)τk(E1)

=1

k>l d(k)

k>l

d(k)τk(E1)

61

k>l d(k)

k>l

d(k)τ l(E1)

= ‖τ l(E1)‖ ·∑

k>l d(k)∑

k>l d(k)= ‖τ l(E1)‖.

We see that the probabilistic distribution on G produced by a random walk ofrandom length > l converges to the uniform distribution as fast as the probabilitydistribution after l steps of the usual random walk does. In particular, this provesthat the measure µl on G satisfies the condition (C1*).

10. The growth function and asymptotic density

In this section, we discuss an approach to analyze the asymptotic behaviour ofsets in a group via asymptotic densities introduced in Section 6.

Let G be a group with a complexity function c : G → N (for example, a groupgenerated by a finite set X with the length function lX).

A pseudo-measure on G is a real-valued nonnegative additive function definedon some subsets of G. The pseudo-measure µ is called atomic if µ(S) is defined forany finite subset S of G. Let Ck and Bk be, respectively, the sphere and the ballof radius k in G with respect to the complexity c.

For a set R ⊆ F , we define its spherical asymptotic density with respect to µas the following limit (if it exists):

ρ(s)µ (R) = limk→∞

ρ(s)k (R),

where

ρ(s)k (R) =

µ(R ∩ Ck)

µ(Ck).

Similarly, we define disc asymptotic density of R as the limit (if it exists):

ρ(d)µ (R) = limk→∞

ρ(d)k (R),

where

ρ(d)k (R) =

µ(R ∩Bk)

µ(Bk).

One can also define the density functions above using lim sup rather then lim.For example, if µ is the cardinality function, i.e., µ(A) = |A|, then we obtain

the standard asymptotic density functions ρ(c) and ρ(d) on G.Moreover, if µ is c-invariant, i.e., µ(u) = µ(v) provided c(u) = c(v), then the

spherical asymptotic density with respect to µ is equal to the standard sphericalasymptotic density:

sρµ = sρ.

18 ALEXANDRE V. BOROVIK, ALEXEI G. MYASNIKOV, AND VLADIMIR SHPILRAIN

Lemma 10.1. For any atomic pseudo-measure µ on G, the asymptotic densitiesρ(c) and ρ(d) are also atomic pseudo-measures on G.

Proof is obvious.

Lemma 10.2. Let µ be a pseudo-measure on G. Suppose that

limk→∞

µ(Bk) = ∞.

Then, for any subset R ⊆ G, if the spherical asymptotic density ρ(c)µ (R) exists, then

the disc asymptotic density ρ(d)µ (R) exists, too, and

ρ(c)µ (R) = ρ(d)µ (R).

Proof. Let xk = µ(R ∩Bk) and yk = µ(Bk). Then yk < yk+1 and lim yk = ∞.By Stolz’s theorem

limk→∞

xk

yk= lim

k→∞

xk − xk−1

yk − yk−1.

Hence

ρ(d)µ (R) = limk→∞

µ(R ∩ Sk)

µ(Sk)= ρ(s)µ (R),

as claimed.In view of this result we will refer to the standard (spherical or disc) densities

ρ.Asymptotic densities provide a useful, though very coarse, tool to describe

behaviour of sets at infinity. Furthermore, there are very natural subsets which aresadly unmeasurable with respect to ρ. To see this, consider the set En of words ofeven length in a free group Fn of rank n. Then ρ(En) is easily seen to be undefined.A way around this problem (involving generalized summation methods for series)is discussed in [3].

Moreover, the following well-known result shows that that ρ is not sufficientlysensitive.

Theorem 10.3. (Woess [26]) If N is a normal subgroup of Fn, n > 2, ofinfinite index, then ρ(N) = 0.

Another disappointment is offered by the following result. As usual, we callan element u ∈ Fn primitive if it is part of a free basis of Fn, or, equivalently, ifα(u) = x1 for some α ∈ Aut(Fn).

Theorem 10.4. Let Fn be the free group of a finite rank n > 2. Then:

ρ(Prn) = 0,

where Prn is the set of all primitive elements of the group Fn. More precisely, ifP (n, k) is the number of primitive elements of length k in Fn, n ≥ 3, then for someconstants c1, c2, one has

c1 · (2n− 3)k ≤ P (n, k) ≤ c2 · (2n− 2)k.

We see that the asymptotic density ρ is not sensitive enough in measuring setsin Fn. It would be very interesting therefore to check whether probabilistic resultsof, say, [1], [5], [18], that are based on the asymptotic density ρ, will hold uponreplacing ρ with a more adequate measuring tool.

MEASURING SETS IN INFINITE GROUPS 19

10.1. Proof of Theorem 10.4. Our proof is based on the fact that theWhitehead graph of any primitive element of length > 2 has either an isolatededge or a cut vertex, i.e., a vertex that, having been removed from the graph to-gether with all incident edges, increases the number of connected components ofthe graph. Recall that the Whitehead graph Wh(u) of a word u is obtained asfollows. The vertices of this graph correspond to the elements of the free generatingset X and their inverses. If the word u has a subword xixj , then there is an edge

in Wh(u) that connects the vertex xi to the vertex x−1j ; if u has a subword xix

−1j ,

then there is an edge that connects xi to xj , etc. We note that usually, there is onemore edge (the external edge) included in the definition of the Whitehead graph:this is the edge that connects the vertex corresponding to the last letter of u, tothe vertex corresponding to the inverse of the first letter.

Assume first that the Whitehead graph of u has a cut vertex. We are goingto show that the number of elements u of length k with this property is no biggerthan C · (2n− 2)k−1, where C = C(n) is a constant.

Let Wh(u) be a disjoint union of two graphs, Γ1 and Γ2, complemented bya (cut) vertex A together with all incident edges. Let n1 > 1 and n2 > 1 be thenumber of vertices in Γ1 and Γ2, respectively. Then, in particular, n1+n2 = 2n−1.Let m = min(n1, n2).

The first letter of u can be any of the 2n possible ones. For the following letterhowever we have no more than 2n−m possibilities since, for example, if the firstletter, call it xi, corresponds to a vertex from Γ1, then the following letter, call itxj , cannot be such that the vertex corresponding to x−1

j belongs to Γ2, because

otherwise, there would be an edge in Wh(u) that connects Γ1 to Γ2 directly, notthrough A, i.e., A would not be a cut vertex.

Thus, there are only 2n− n2 6 2n−m possibilities for the following letter inu. The same argument applies to every letter in u, starting with the second one;therefore, the total number of possibilities (corresponding to a particular choice ofn1 and n2) is no bigger than C · (2n−m)k−1, where C = C(n) is a constant.

Now consider two cases:

(i) m > 2.

In this case, the number in question is bounded by C · (2n− 2)k−1.

(ii) m = 1.

In this case, one of the graphs, say, Γ2, consists of a single vertex (call it x1 fornotational convenience) connected only to the cut vertex (call it x2); in particular,the vertex x1 is a terminal vertex of the graph Wh(u), and, whenever the letter x1

occurs in u, it is followed by x−12 . Let q > 1 be the number of occurrences of x1 in

u.Apply the automorphism φ : x1 7→ x1x2, xi 7→ xi, i > 2, to the element u.

Then |φ(u)| = |u| − q < |u|. Now φ(u) is a primitive element, too; hence the wholeargument above is applicable to φ(u). In particular, if, in the notation above,m > 2for this element φ(u), then, as we have just proved, the number of those elementsis bounded by C · (2n− 2)k−1−q for some C = C(n). This number is also equal tothe number of elements u of the type we are considering now (for a particular q)because of the one-to-one correspondence between elements u and φ(u).

If m = 1 for the element φ(u), then we use the same trick again, until weget to a primitive element with m > 2. In any case, the number of primitive

20 ALEXANDRE V. BOROVIK, ALEXEI G. MYASNIKOV, AND VLADIMIR SHPILRAIN

elements of length k, with m = 1, and with q > 1 occurrences of x1, is bounded byC · (2n− 2)k−1−q for some C = C(n).

Now we have to sum up for all possible values of q. Note that q cannot be equalto k since xk

1 is not a primitive element; also, q cannot be equal to k − 1 since in

the Whitehead graph of xk−11 x2, the vertex corresponding to x1 is not a terminal

vertex. Hence, we have

k−2∑

q=1

C · (2n− 2)k−1−q = C · (2n− 2)k−1 − 1

2n− 36 C · (2n− 2)k−1.

Finally, we have to multiply this number by the number of ways we can choosetwo vertices (terminal and cut) out of 2n, i.e., by (2n − 1)n, but this does notchange the type of the bound. Therefore, we get here the same bound as we got inthe case m > 2.

Thus, summing up for all possible values of n1 > 1 and n2 > 1 such thatn1 + n2 = 2n − 1, we see that the total number of possibilities in (i) and (ii) isbounded by C · (2n− 2)k for some C = C(n).

Finally, we address the remaining case, where the Whitehead graph of u hasan isolated edge. In that case, some cyclic permutation of u must be of the formx±1i x±1

j u1, where u1 does not depend on xi, xj , and j does not have to be differentfrom i. The number of elements with this property is easily seen to be bounded byC · k · (2n− 3)k−1 for some constant C = C(n).

Therefore, the ratio of the number of primitive elements of length k to the num-

ber of all elements of length k is no more than C · (2n−2)k

2n(2n−1)k−1 = C′ · (2n−22n−1 )

k, where

C′ = C′(n) is a constant. This ratio obviously tends to 0, and, moreover, well-known properties of a geometric series now imply that the ratio of the number ofprimitive elements of length 6 k to the number of all elements of length 6 k tendsto 0, too.

Finally, we note that the lower bound c1 ·(2n−3)k ≤ P (n, k) is obvious becauseevery element of the form x1 · u(x2, ..., xn−1) is primitive.

Just to complete the picture, we also mention here the bounds for the numberof primitive elements of length k in F2:

Proposition 10.5. ([17]) The number of primitive elements of length k in thegroup F2 is:

(a) more than 4√3· (√3)k if k is odd.

(b) more than 43 · (

√3)k if k is even.

Informally speaking, “most” primitive elements in F2 are conjugates of primi-tive elements of smaller length. This is not the case in Fn for n > 2, where “most”primitive elements are of the form u ·x±1

i · v, where u, v are arbitrary elements thatdo not depend on xi.

10.2. The rate of convergence of the asymptotic density. Let S be asubset of a free group Fn of rank n > 2. As we have mentioned in the Introduction,the growth rate of the set S is defined as the limit

γ(S) = lim supk→∞

k√

ρk(S),

MEASURING SETS IN INFINITE GROUPS 21

where

ρk(S) =|S ∩Bk||Bk|

are the disc relative frequencies of S. The growth rate γ(S) is gauging the speedof convergence to zero of the sequence ρk(S). Indeed, the standard results fromcalculus show that if γ(S) < 1, then ρ(S) = 0, and, moreover, the sequence ρk(S)converges to 0 exponentially fast.

A classical theorem by Grigorchuk [9] states that if N ≤ Fn is a normal sub-group and Fn/N is not an amenable group, then γ(N) < 1. This gives us manyeasy examples of normal subgroups N with the exponential speed of convergenceof the asymptotic density. For example, this happens every time when the factorgroup Fn/N contains a non-abelian free group.

It would be interesting to have a look at the other end of the spectrum andestimate the speed of convergence of the sequence ρk(N) for an obviously “big”subgroup N ≤ Fn. The following result is one of the very few instances where wehave concrete information:

Theorem 10.6. (Sharp [20]) If n > 2, then the spherical relative frequencies

ρ(s)k =

|[Fn, Fn] ∩ Sk||Sk|

of words of length k from the derived subgroup [Fn, Fn] of the free group Fn asymp-totically behave as

ρ(s)k ∼

{

0 if k is odd,C

kn/2 if k is even.

References

[1] G. N. Arzhantseva, A property of subgroups of infinite index in a free group, Proc. Amer.Math. Soc. 128 (2000), 3205–3210.

[2] L. Babai, R. Beals, A polynomial-time theory of black box groups. I. Groups St. Andrews1997 in Bath, I, 30–64, London Math. Soc. Lecture Note Ser., 260, Cambridge Univ. Press,Cambridge, 1999.

[3] A. V. Borovik, A. G. Myasnikov and V. N. Remeslennikov, Multiplicative measures on free

groups, preprint.[4] G. Chaitin, On the length of programs for computing finite binary sequences: statistical

considerations, J. Assoc. Comput. Mach., 16 (1969), 145–149.[5] C. Champetier, Statistical properties of finitely presented groups, Adv. Math. 116 (1995),

197–262.[6] A. Comtet and S. Nechaev, Random operator approach for word enumeration in braid

groups, J. Phys. A, Math. Gen. 31 (1998), No. 26, 5609–5630.[7] J. Desbois and S. Nechaev, Statistical mechanics of braided Markov chains. I: Analytical

methods and numerical simulations model, J. Stat. Phys. 88 (1997), No.1–2, 201–229.[8] J. Desbois and S. Nechaev, Statistics of reduced words in locally free and braid groups:

Abstract studies and applications to ballistic growth model, J. Phys. A, Math. Gen. 31

(1998), No.12, 2767–2789.[9] R. I. Grigorchuk, Symmetrical random walks on discrete groups, in “Multicomponent ran-

dom systems” (R. L. Dobrushin and Ya. G. Sinai, eds.), Dekker, New York, 1980, pp. 285–325.

[10] R. Grigorchuk, P. de la Harpe, On problems related to growth, entropy, and spectrum in

group theory, J. Dynam. Control Systems 3 (1997), 51–89.[11] J. Hartmanis, Generalized Kolmogorov complexity and the structure of feasible computa-

tions, In Proc. 24th IEEE Simposium on Foundations of Computer Science, pp. 439–445,1983.

22 ALEXANDRE V. BOROVIK, ALEXEI G. MYASNIKOV, AND VLADIMIR SHPILRAIN

[12] W. Kantor, A. Seress, Permutation group algorithms via black box recognition algorithms.Groups St. Andrews 1997 in Bath, II, 436–446, London Math. Soc. Lecture Note Ser.,261, Cambridge Univ. Press, Cambridge, 1999.

[13] A. Kolmogorov, Three approaches for defining the concept of information quantity. Prob.Inform. Trans., 1 (1965), 1-7.

[14] M. Li and P. M. B. Vitanyi, An introduction to Kolmogorov complexity and its applications.Graduate Texts in Computer Science, 2nd ed., Springer, 1997.

[15] A. D. Myasnikov, Genetic algorithms for the Whitehead method, preprint.[16] A. D. Myasnikov and A. G. Myasnikov, ‘Balanced presentations of the trivial group on

two generators and the Andrews-Curtis conjecture’, in Groups and Computation III, (W.Kantor and A. Seress, eds.), de Gruyter, Berlin, 2001, pp. 257-264.

[17] A. G. Myasnikov, V. Shpilrain, Automorphic orbits in free groups, J. Algebra, to appear.[18] A. Yu. Ol’shanskii, Almost every group is hyperbolic, Internat. J. Algebra Comput. 2

(1992), 1–17.

[19] I. Pak, Random walks on finite groups with few random generators, Electronic J. of Prob.,4 (1999), 1–11.

[20] R. Sharp, Local limit theorems for free groups, Math. Ann. 321 (2001), 889–904.[21] R. Solomonoff, A formal theory of inductive inference, part 1 and part 2, Information and

Control, 7 (1964), 1–22, 224–254.[22] A. Ushakov, Genetic algorithms for the conjugacy problem in one relator groups, in prepa-

ration.[23] A. M Vershik, S. Nechaev and R. Bikbov, Statistical properties of locally free groups with

applications to braid groups and growth of random heaps, Commun. Math. Phys. 212

(2000), 469–501.[24] J. Wang, Average-case completeness of a word problem for groups, in ‘Proceedings of the

27th Annual Symposium on Theory of computing, ACM Press, 1995, pp. 325–334.[25] O. Watanabe, Kolmogorov Complexity and Computational Complexity, Springer-Verlag,

EATS Monographs on Theoretical Computer Science 1992.[26] W. Woess, Cogrowth of groups and simple random walks, Arch. Math. 41 (1983), 363–370.

Department of Mathematics, UMIST, PO Box 88, Manchester M60 1QD, United

Kingdom; http://www.ma.umist.ac.uk/avb

E-mail address: [email protected]

Department of Mathematics, The City College of New York, New York, NY 10031,

USA; http://home.att.net/~ alexeim/index.htm

E-mail address: [email protected]

Department of Mathematics, The City College of New York, New York, NY 10031,

USA; http://zebra.sci.ccny.cuny.edu/web/shpil/

E-mail address: [email protected]