sizing and profiling the small, medium and micro business

8/17/2019 Sizing and Profiling the Small, Medium and Micro Business

1/6

Sizing and profiling the Small, Medium and Micro Business

Market in South AfricaGalpin, Jacky

University of the Witwatersrand, School of Statistics and Actuarial Science

1 Jan Smuts Avenue

Johannesburg (2000), South Africa [email protected]

Neethling, Ariane

University of the Free State and University of Stellenbosch

[email protected]

Introduction

FinMark Trust is an independent Trust, established with funding from the United Kingdom Department for

International Development, with the objective of “Making financial markets work for the poor” in Africa. One

aim is to build a picture of the informal business sector, both as to the size and characteristics, and well as the

role they can play in developing countries. Businesses range from very informal (such as vendors on streetcorners and hawkers) to semi-formal (such as those running a garden service or computer repair business), to

more formal registered businesses, with a formal office.

The 2010 FinscopeTM survey targeted Small, Micro and Medium Enterprises (SMMEs) in South Africa

(SA). A nationally representative sample of business owners aged 16+, with less than 200 employees, was

drawn. The objectives of the survey were to estimate the size of the small business market in SA, to quantify the

number of people engaged in small business activities, and to profile the businesses.

Sample design

A stratified random sample of 1000 enumerator areas (EAs) was drawn, representative of SA at national,

provincial and geo-type levels (Finscope, 2010). Probability proportional to size sampling was used, with the

estimated number of households per EA in 2009 being used as the measure of size. The dominant race group

(Black, Coloured, Asian, White) of the EA was used as a further stratification variable, to ensure that a

representative sample of all race groups was obtained. Power rule allocation (power = 0.7) was used to ensure an

adequate sample size in each of the strata, this disproportionate allocation procedure being recommended for

surveys with numerous small strata where there is a need for relatively precise estimates at each stratum level

(Lehtonen & Pahkinen, 1994). The distribution of the EAs is shown in Table 1.

The dwelling units (DUs) in each chosen EA were listed. A step-size was chosen to yield six segments for

the EA, as the aim was to obtain six interviews with small business owners in each selected EA. From each

starting point successive dwellings in the segment (as marked on a map supplied to the interviewers) were

visited, with contact information being listed, as well as whether any of the household members was involved in

a small business. On finding a dwelling with a SMME, a full interview was conducted, and the interviewer

progressed to the next starting point. This gave the hit rate information for each EA, namely the number of

dwelling with no „success‟, before a success was obtained. The number of successful interviews in the EA was

also recorded. A total of 5676 interviews with SMMEs were obtained.

nt. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS055) p

mailto:[email protected]:[email protected]:[email protected]


2/6

2

Province

Geo-

area

Dominant race group Overall total per

Geo-area and

provinceBlacks Coloureds Asians Whites

Western

Cape

Urban 37 43 2 37 119

Rural 0 13 0 0 13

Total 37 56 2 37 132

Eastern Cape

Urban 34 15 2 18 69

Rural 52 4 0 0 56

Total 86 19 2 18 125

Northern

Cape

Urban 13 15 0 7 35

Rural 7 6 0 0 13

Total 20 21 0 7 48

Free State

Urban 40 5 0 16 61

Rural 16 1 0 0 17

Total 56 6 0 16 78

Kwazulu

Natal

Urban 45 7 29 27 108

Rural 58 0 0 0 58

Total 103 7 29 27 166

North West

Urban 27 3 2 16 48

Rural 36 0 0 0 36

Total 63 3 2 16 84

Gauteng

Urban 102 14 12 59 187

Rural 10 0 0 1 11

Total 112 14 12 60 198

Mpumalanga

Urban 26 2 1 14 43

Rural 36 0 0 0 36

Total 62 2 1 14 79

Limpopo

Urban 12 0 1 10 23

Rural 67 0 0 0 67

Total 79 0 1 10 90

Overall total per race: 618 128 49 205 1000

Table 1: Distribution of the EAs for the stratified random sample

Data were weighted to the population figures based on the EA inclusion probability, the inclusion

probability of a household, and the weight of a person having one or more small businesses. The negative

binomial approach was used to determine the inclusion probability of a household, by taking the number of

„failures‟ (households with no small business owners) into account. The final weights were used to estimate the

number of SMME‟s. It is estimated that there are just under 6 million small businesses in SA, with 5.6 million

small business owners. The geographical spread of these, relative to the population, is shown in Figure 1.



3/6

3

Figure 1: Geographical distribution of small businesses and population (Finscope, 2010)

Profiling the small businesses

In order to profile the businesses, a Business Sophistication Measures (BSM) was created. Questions used

concerned „hard facts‟, such as where the business operates from (e.g. street corner, no fixed address, house,

office block) as well as access to and use of a number of services such as water, electricity, computers, banking

and insurance. The responses were coded as 1=yes or 0=no, which places all 181 questions on the same scale,

making principal component analysis (PCA) the appropriate technique for creating an index. The data were

weighted up to the estimated population size.

The first principal component explained 14.3% of the variance, and formed the BSM index. The sensitivity

of the analyses was investigated by omitting questions with very few informants in either the yes or no

categories. Two scenarios were investigated, omitting variables with fewer than 10, and fewer than 30

informants in one of the categories. A k-means cluster analysis was used to group informants into similar groups.

Groups with small numbers of informants (i.e. informants who differ from the others) were omitted from the

PCA, in order to check the sensitivity of index. The results of these investigations showed little sensitivity of the

major principal components to the combinations of variables and informants.

The resulting principal component scores were divided into equi-sized groups, giving and initial ranking of

the businesses. Initially, 20 groups were formed, which would allow for merging of similar groups, to obtain a

smaller number of distinct groups. The choice of 20 initial groups resulted in approximately 284 equivalent

informants per group. (Equivalent informants are obtained by rescaling the sum of the weighted informants

from the population size to the sample size.) The groups could then be examined as to their characteristics, and

„similar‟ groups could be merged, and anomalous groups split.

The stability of these groups was investigated using discriminant analysis (DA), using the groups and the

variables used to create the scores. The success in recovering the original 20 groups is shown in Table 2. The

highlighting indicates the number of equivalent informants correctly assigned by the DA, while the last column

gives the percentage correctly classified for each 20 individual group. A total of 54.1% were correctly classified.



4/6

4

Gp #infs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 %

1 320 315 6 98

2 284 81 132 31 21 18 47

3 255 48 40 126 22 17 1 49

4 261 27 37 15 140 26 4 12 54

5 296 22 18 16 50 123 27 30 4 4 1 42

6 285 21 7 5 9 21 122 64 24 11 43

7 291 6 12 4 9 27 32 142 17 35 6 49

8 281 8 1 6 8 7 13 76 81 47 28 6 1 29

9 281 11 3 3 9 7 22 18 131 41 17 18 1 47

10 293 3 2 5 11 3 6 24 14 46 93 22 44 16 3 32

11 280 3 2 1 6 3 9 9 47 99 56 33 8 4 35

12 277 2 4 1 4 17 1 3 22 23 152 25 9 13 1 55

13 286 1 2 2 1 15 30 36 117 58 23 3 41

14 210 4 3 8 6 4 2 8 21 23 80 34 18 38

15 352 1 1 1 1 1 1 5 11 27 20 19 193 69 2 55

16 289 1 1 5 1 9 8 15 36 197 15 68

17 286 1 2 2 4 1 6 11 55 163 42 5718 282 1 2 1 17 39 208 12 74

19 285 1 1 2 5 42 230 4 81

20 282 2 21 259 92

Table 2: Comparison of the classification of BSM groups (rows) and DA (columns)

BSM group A (8) B (7) C (7) D (6)

1 -3 84.8 89.4 93.3 93.3

4 70.3

5 68.3

6 76.8 76.87 -8 69.7

9-10 78

11-13 71.5 76.6 76.6

14 73 73

15-16 72 85.4

17 -18 80.6 80.6 80.6

19 83.4 83.4 83.4 83.4

20 91.5 91.5 91.5 91.5

% correct 75.6 79.8 80.6 82.9

Table 3: Combinations of the 20 groups, showing the percentages correctly classified by the DA

Possible combinations of these groups were investigated, in order to determine „similar‟ BSM categories,

namely those for which the reconstruction by the DA was satisfactory (at least 70% for all groups). Table 3

shows the results for 4 possible combinations, resulting in 6-8 final groups, together with the overall percentage

of correct classification. These groups were then profiled in terms of the variables, to allow the Finscope experts



5/6

5

to assess the coherence of the groups with respect to interpretation in business terms. The 8 group solution

(scenario A) was chosen on the grounds of providing reasonable differentiation and business usefulness.

Investigation of variables discriminating between the BSM groups, using Chi-squared automatic

interaction detection (CHAID)

Chi-squared automatic interaction detection (CHAID) was used to profile of the BSM groups (Kass, 1980,

Hawkins and Kass, 1994). At the first step, CHAID looks at which of the predictors differentiate between the 8

BSM categories. The most discriminating variable was „do not have any insurance‟=1, and 0=‟have some

insurance‟ (p=1.6e-809 for the contingency table). At the second step, CHAID examines each of the new nodes

to determine the most significant predictor for each node. The stronger discriminator between the BSM groups

with some insurance, was „own, lease or hire: internet‟ (p=3.6e-96). For the BSM groups who did not have any

insurance, the strongest predictor was „do not use a bank for the business‟ p=1.9e-588). Table 4 shows the

details of the CHAID, against the 8 BSM groups, showing increase in sophistication for the BSM groups.

Percentages above 10 are highlighted.

BSM1 BSM2 BSM3 BSM4 BSM5 BSM6 BSM7 BSM8 Infs

N o i n s u r a n c e

No bank

used for

business

No hotrunning

water

inside

54.8 35.4 8.2 1.1 0.2 0.2 2010

No

inside

toilet4.2 34.0 42.5 10.7 6.0 2.6 0.1 740

Bank used

for business

Business

registered

No up-

to-date

financial

records

0.7 23.4 50.2 15.3 6.6 3.8 687

No bank

used for

business

Hot

running

water

inside

6.3 63.8 19.4 8.8 1.9 160

Bank used

for business

Business

registered

Up-to-

date

financial

records

0.7 17.6 28.0 37.6 14.2 2.0 917

No bank

used for

business

Toilet

inside0.7 24.5 26.6 21.7 22.4 4.2 142

Bank used

for business

Not

registered

No

running

water

inside

4.5 17.4 29.2 39.9 8.4 0.6 177

Notregistered

Running

water

inside0.4 4.0 61.2 26.3 8.3 278

S o m e i n s u r a n c e

No internet Registered 0.6 1.3 13.2 48.4 30.2 6.3 159

No internet Not

Registered15.8 45.8 38.3 241

Lease/own

internet4.9 95.1 165

Table 4: Characterisation of the 8 BSM groups



6/6

6

It should be noted that, since CHAID is aimed at determining the most predictive variables and splits at each

stage, the p-values can be interpreted as giving a ranking of the usefulness of the variables. So although allvariables used are significant at the 1% level, the ratio of the p-values gives an indication of the relative

predictive power of the variables. The no insurance variable is approximately 107 times more predictive than thenext strongest predictor: „have a credit or debit card‟.

Looking at the number of employees in the businesses in the different BSM groups, shown in Figure 2, it can be seen that the lower BSM groups essentially have less than 10 employees.

Figure 2: distribution of the number of employees by BSM group

Conclusions

The BSM index has been interrogated by users, and has been accepted as a useful categorization.

References

Finscope (2010). Finscope South Africa Small Business Survey 2010. Finmark Trust, South Africa.

Kass, GV (1980). An exploratory technique for investigating large quantities of categorical data.

Journal of the Royal Statistical Society, Series C (Applied Statistics), Vol. 29, No. 2, pp 119-127.

Hawkins, DM and Kass, GV. (1982) Automatic Interaction Detection. In: Hawkins, DM. (Ed)

Topics in Applied Multivariate Analysis. Cambridge University Press: Cambridge.

Lehtonen, R. and Pahkinen, E.J. (1994) Practical Methods for Design and Analysis of Complex

Surveys. John Wiley & Sons, New York.

284281572567565113011411136N=

Analysis weighted by WTSC

BSM_8GP

87654321

N_ E M P L O Y

200

175

150

125

100

75

50

25

0

4841520148085303466751585260369413256514704233549205072514151635112241955655548503336822483321528728604910536855205555462452615238528919642042686511051145667797

4877

2861

4166

5150

4461

51573136

25873621113031213602930499033205000155533163432470291136062029328110111032

55254188

488051591319

76549993876228942765137

5385

172413932934449242802195143338791665420229614545

49454315195285443352910434127102043273032003383407236564376385883320921575460217818424064726763

44286241384965671140412424515336196143063766702385001204554412349389931434712978151442149473419382530384200258130284403390516905273207713944738228315044861059831304447462076218455045127024629717332847130510882661179100228332134505634672750355567230904380117210436184337194316126869338142772196951455422070665969464714032363341820723250190418931396235714904043267412725423780461168420095470319988360516637289464134193443113534257457520811301154813071819163151795739411507430851015387266397120308931815387445577751328510305641100547926295764007290396240061132369039581905340517906260524314499403244351616422252121055952113110271138293928212529673799304642391111240276020675358494813696554523481590264158947351676110042432673473418345599422645844634851830864755550852701110113425391266179913121722329409757149764903533132045541280140647221422756361049562053483131314235005189622695509495551324906173928645488225554872953

4302366415842078551355362854732578551302518461922712304223033363337389247016488183061383281236425122129421646448433138083600265345014404183129029688416574484709309239322526522716379337112010210243443398289130952375548028362353884198313138962271132947974870386832513722688139740754729382013141008316567619053319633528489115637081189181634103882163054635893740680363198966325022829419070759283528393729355834383029259914782834270866239537508594537856145745602622835521345611871379542284169362412503383115634725276289446452974870208221242147876771356211150295442343743345420356231952284124780231084081336416622889215327002374535645113667166340141197

3768753486383283835183902087320266741893743320525801595860377026527418292096281939486831321269243742388813190838772977105242638636683828523853459249619211882453496268146191341284021394368250949683104358358645108793821366018381655391974392218818324849245668523782711682468292187313431458292110850401001


sizing and profiling the small, medium and micro business

Documents