sizing and profiling the small, medium and micro business

Upload: 2lie

Post on 06-Jul-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/17/2019 Sizing and Profiling the Small, Medium and Micro Business

    1/6

    Sizing and profiling the Small, Medium and Micro Business

    Market in South AfricaGalpin, Jacky

    University of the Witwatersrand, School of Statistics and Actuarial Science

    1 Jan Smuts Avenue

     Johannesburg (2000), South Africa [email protected] 

     Neethling, Ariane

    University of the Free State and University of Stellenbosch

     [email protected]

    Introduction

    FinMark Trust is an independent Trust, established with funding from the United Kingdom Department for

    International Development, with the objective of “Making financial markets work for the poor” in Africa. One

    aim is to build a picture of the informal business sector, both as to the size and characteristics, and well as the

    role they can play in developing countries. Businesses range from very informal (such as vendors on streetcorners and hawkers) to semi-formal (such as those running a garden service or computer repair business), to

    more formal registered businesses, with a formal office.

    The 2010 FinscopeTM  survey targeted Small, Micro and Medium Enterprises (SMMEs) in South Africa

    (SA). A nationally representative sample of business owners aged 16+, with less than 200 employees, was

    drawn. The objectives of the survey were to estimate the size of the small business market in SA, to quantify the

    number of people engaged in small business activities, and to profile the businesses.

    Sample design

    A stratified random sample of 1000 enumerator areas (EAs) was drawn, representative of SA at national,

     provincial and geo-type levels (Finscope, 2010). Probability proportional to size sampling was used, with the

    estimated number of households per EA in 2009 being used as the measure of size. The dominant race group

    (Black, Coloured, Asian, White) of the EA was used as a further stratification variable, to ensure that a

    representative sample of all race groups was obtained. Power rule allocation (power = 0.7) was used to ensure an

    adequate sample size in each of the strata, this disproportionate allocation procedure being recommended for

    surveys with numerous small strata where there is a need for relatively precise estimates at each stratum level

    (Lehtonen & Pahkinen, 1994). The distribution of the EAs is shown in Table 1.

    The dwelling units (DUs) in each chosen EA were listed. A step-size was chosen to yield six segments for

    the EA, as the aim was to obtain six interviews with small business owners in each selected EA. From each

    starting point successive dwellings in the segment (as marked on a map supplied to the interviewers) were

    visited, with contact information being listed, as well as whether any of the household members was involved in

    a small business. On finding a dwelling with a SMME, a full interview was conducted, and the interviewer

     progressed to the next starting point. This gave the hit rate information for each EA, namely the number of

    dwelling with no „success‟, before a success was obtained. The number of successful interviews in the EA was

    also recorded. A total of 5676 interviews with SMMEs were obtained.

    nt. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS055)   p

    mailto:[email protected]:[email protected]:[email protected]

  • 8/17/2019 Sizing and Profiling the Small, Medium and Micro Business

    2/6

    2

    Province

    Geo-

    area

    Dominant race group Overall total per

    Geo-area and

    provinceBlacks Coloureds Asians Whites

    Western

    Cape

    Urban 37 43 2 37 119

    Rural 0 13 0 0 13

    Total 37 56 2 37 132

    Eastern Cape

    Urban 34 15 2 18 69

    Rural 52 4 0 0 56

    Total 86 19 2 18 125

    Northern

    Cape

    Urban 13 15 0 7 35

    Rural 7 6 0 0 13

    Total 20 21 0 7 48

    Free State

    Urban 40 5 0 16 61

    Rural 16 1 0 0 17

    Total 56 6 0 16 78

    Kwazulu

    Natal

    Urban 45 7 29 27 108

    Rural 58 0 0 0 58

    Total 103 7 29 27 166

    North West

    Urban 27 3 2 16 48

    Rural 36 0 0 0 36

    Total 63 3 2 16 84

    Gauteng

    Urban 102 14 12 59 187

    Rural 10 0 0 1 11

    Total 112 14 12 60 198

    Mpumalanga

    Urban 26 2 1 14 43

    Rural 36 0 0 0 36

    Total 62 2 1 14 79

    Limpopo

    Urban 12 0 1 10 23

    Rural 67 0 0 0 67

    Total 79 0 1 10 90

    Overall total per race: 618 128 49 205 1000

    Table 1: Distribution of the EAs for the stratified random sample

    Data were weighted to the population figures based on the EA inclusion probability, the inclusion

     probability of a household, and the weight of a person having one or more small businesses. The negative

     binomial approach was used to determine the inclusion probability of a household, by taking the number of

    „failures‟ (households with no small business owners) into account. The final weights were used to estimate the

    number of SMME‟s. It is estimated that there are just under 6 million small businesses in SA, with 5.6 million

    small business owners. The geographical spread of these, relative to the population, is shown in Figure 1.

    nt. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS055)   p

  • 8/17/2019 Sizing and Profiling the Small, Medium and Micro Business

    3/6

    3

    Figure 1: Geographical distribution of small businesses and population (Finscope, 2010)

    Profiling the small businesses

    In order to profile the businesses, a Business Sophistication Measures (BSM) was created.  Questions used

    concerned „hard facts‟, such as where the business operates from (e.g. street corner, no fixed address, house,

    office block) as well as access to and use of a number of services such as water, electricity, computers, banking

    and insurance. The responses were coded as 1=yes or 0=no, which places all 181 questions on the same scale,

    making principal component analysis (PCA) the appropriate technique for creating an index. The data were

    weighted up to the estimated population size.

    The first principal component explained 14.3% of the variance, and formed the BSM index. The sensitivity

    of the analyses was investigated by omitting questions with very few informants in either the yes or no

    categories. Two scenarios were investigated, omitting variables with fewer than 10, and fewer than 30

    informants in one of the categories. A k-means cluster analysis was used to group informants into similar groups.

    Groups with small numbers of informants (i.e. informants who differ from the others) were omitted from the

    PCA, in order to check the sensitivity of index. The results of these investigations showed little sensitivity of the

    major principal components to the combinations of variables and informants.

    The resulting principal component scores were divided into equi-sized groups, giving and initial ranking of

    the businesses. Initially, 20 groups were formed, which would allow for merging of similar groups, to obtain a

    smaller number of distinct groups. The choice of 20 initial groups resulted in approximately 284 equivalent

    informants per group. (Equivalent informants are obtained by rescaling the sum of the weighted informants

    from the population size to the sample size.) The groups could then be examined as to their characteristics, and

    „similar‟ groups could be merged, and anomalous groups split.

    The stability of these groups was investigated using discriminant analysis (DA), using the groups and the

    variables used to create the scores. The success in recovering the original 20 groups is shown in Table 2. The

    highlighting indicates the number of equivalent informants correctly assigned by the DA, while the last column

    gives the percentage correctly classified for each 20 individual group. A total of 54.1% were correctly classified.

    nt. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS055)   p

  • 8/17/2019 Sizing and Profiling the Small, Medium and Micro Business

    4/6

    4

    Gp #infs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 %

    1 320 315 6 98

    2 284 81 132 31 21 18 47

    3 255 48 40 126 22 17 1 49

    4 261 27 37 15 140 26 4 12 54

    5 296 22 18 16 50 123 27 30 4 4 1 42

    6 285 21 7 5 9 21 122 64 24 11 43

    7 291 6 12 4 9 27 32 142 17 35 6 49

    8 281 8 1 6 8 7 13 76 81 47 28 6 1 29

    9 281 11 3 3 9 7 22 18 131 41 17 18 1 47

    10 293 3 2 5 11 3 6 24 14 46 93 22 44 16 3 32

    11 280 3 2 1 6 3 9 9 47 99 56 33 8 4 35

    12 277 2 4 1 4 17 1 3 22 23 152 25 9 13 1 55

    13 286 1 2 2 1 15 30 36 117 58 23 3 41

    14 210 4 3 8 6 4 2 8 21 23 80 34 18 38

    15 352 1 1 1 1 1 1 5 11 27 20 19 193 69 2 55

    16 289 1 1 5 1 9 8 15 36 197 15 68

    17 286 1 2 2 4 1 6 11 55 163 42 5718 282 1 2 1 17 39 208 12 74

    19 285 1 1 2 5 42 230 4 81

    20 282 2 21 259 92

    Table 2: Comparison of the classification of BSM groups (rows) and DA (columns)

    BSM group A (8) B (7) C (7) D (6)

    1 -3 84.8 89.4 93.3 93.3

    4 70.3

    5 68.3

    6 76.8 76.87 -8 69.7

    9-10 78

    11-13 71.5 76.6 76.6

    14 73 73

    15-16 72 85.4

    17 -18 80.6 80.6 80.6

    19 83.4 83.4 83.4 83.4

    20 91.5 91.5 91.5 91.5

    % correct 75.6 79.8 80.6 82.9

    Table 3: Combinations of the 20 groups, showing the percentages correctly classified by the DA

    Possible combinations of these groups were investigated, in order to determine „similar‟ BSM categories,

    namely those for which the reconstruction by the DA was satisfactory (at least 70% for all groups). Table 3

    shows the results for 4 possible combinations, resulting in 6-8 final groups, together with the overall percentage

    of correct classification. These groups were then profiled in terms of the variables, to allow the Finscope experts

    nt. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS055)   p

  • 8/17/2019 Sizing and Profiling the Small, Medium and Micro Business

    5/6

    5

    to assess the coherence of the groups with respect to interpretation in business terms. The 8 group solution

    (scenario A) was chosen on the grounds of providing reasonable differentiation and business usefulness.

    Investigation of variables discriminating between the BSM groups, using Chi-squared automatic

    interaction detection (CHAID)

    Chi-squared automatic interaction detection (CHAID) was used to profile of the BSM groups (Kass, 1980,

    Hawkins and Kass, 1994). At the first step, CHAID looks at which of the predictors differentiate between the 8

    BSM categories. The most discriminating variable was „do not have any insurance‟=1, and 0=‟have some

    insurance‟ (p=1.6e-809 for the contingency table). At the second step, CHAID examines each of the new nodes

    to determine the most significant predictor for each node. The stronger discriminator between the BSM groups

    with some insurance, was „own, lease or hire: internet‟ (p=3.6e-96). For the BSM groups who did not have any

    insurance, the strongest predictor was „do not use a bank for the business‟ p=1.9e-588). Table 4 shows the

    details of the CHAID, against the 8 BSM groups, showing increase in sophistication for the BSM groups.

    Percentages above 10 are highlighted.

    BSM1 BSM2 BSM3 BSM4 BSM5 BSM6 BSM7 BSM8 Infs

       N  o   i  n  s  u  r  a  n  c  e

     No bank

    used for

     business

     No hotrunning

    water

    inside

    54.8 35.4 8.2 1.1 0.2 0.2 2010

     No

    inside

    toilet4.2 34.0 42.5 10.7 6.0 2.6 0.1 740

    Bank used

    for business

    Business

    registered

     No up-

    to-date

    financial

    records

    0.7 23.4 50.2 15.3 6.6 3.8 687

     No bank

    used for

     business

    Hot

    running

    water

    inside

    6.3 63.8 19.4 8.8 1.9 160

    Bank used

    for business

    Business

    registered

    Up-to-

    date

    financial

    records

    0.7 17.6 28.0 37.6 14.2 2.0 917

     No bank

    used for

     business

    Toilet

    inside0.7 24.5 26.6 21.7 22.4 4.2 142

    Bank used

    for business

     Not

    registered

     No

    running

    water

    inside

    4.5 17.4 29.2 39.9 8.4 0.6 177

     Notregistered

    Running

    water

    inside0.4 4.0 61.2 26.3 8.3 278

       S  o  m  e   i  n  s  u  r  a  n  c  e

     No internet Registered 0.6 1.3 13.2 48.4 30.2 6.3 159

     No internet Not

    Registered15.8 45.8 38.3 241

    Lease/own

    internet4.9 95.1 165

    Table 4: Characterisation of the 8 BSM groups

    nt. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS055)   p

  • 8/17/2019 Sizing and Profiling the Small, Medium and Micro Business

    6/6

    6

    It should be noted that, since CHAID is aimed at determining the most predictive variables and splits at each

    stage, the p-values can be interpreted as giving a ranking of the usefulness of the variables. So although allvariables used are significant at the 1% level, the ratio of the p-values gives an indication of the relative

     predictive power of the variables. The no insurance variable is approximately 107 times more predictive than thenext strongest predictor: „have a credit or debit card‟.

    Looking at the number of employees in the businesses in the different BSM groups, shown in Figure 2, it can be seen that the lower BSM groups essentially have less than 10 employees.

    Figure 2: distribution of the number of employees by BSM group

    Conclusions

    The BSM index has been interrogated by users, and has been accepted as a useful categorization.

    References

    Finscope (2010).  Finscope South Africa Small Business Survey 2010. Finmark Trust, South Africa.

    Kass, GV (1980). An exploratory technique for investigating large quantities of categorical data.

     Journal of the Royal Statistical Society, Series C (Applied Statistics), Vol. 29, No. 2, pp 119-127.

    Hawkins, DM and Kass, GV. (1982) Automatic Interaction Detection. In: Hawkins, DM. (Ed)

    Topics in Applied Multivariate Analysis.  Cambridge University Press: Cambridge.

    Lehtonen, R. and Pahkinen, E.J. (1994)  Practical Methods for Design and Analysis of Complex

    Surveys. John Wiley & Sons, New York.

    284281572567565113011411136N=

    Analysis weighted by WTSC

    BSM_8GP

    87654321

       N_   E   M   P   L   O   Y

    200

    175

    150

    125

    100

    75

    50

    25

    0

    4841520148085303466751585260369413256514704233549205072514151635112241955655548503336822483321528728604910536855205555462452615238528919642042686511051145667797

    4877

    2861

    4166

    5150

    4461

    51573136

    25873621113031213602930499033205000155533163432470291136062029328110111032

    55254188

    488051591319

    76549993876228942765137

    5385

    172413932934449242802195143338791665420229614545

    49454315195285443352910434127102043273032003383407236564376385883320921575460217818424064726763

    44286241384965671140412424515336196143063766702385001204554412349389931434712978151442149473419382530384200258130284403390516905273207713944738228315044861059831304447462076218455045127024629717332847130510882661179100228332134505634672750355567230904380117210436184337194316126869338142772196951455422070665969464714032363341820723250190418931396235714904043267412725423780461168420095470319988360516637289464134193443113534257457520811301154813071819163151795739411507430851015387266397120308931815387445577751328510305641100547926295764007290396240061132369039581905340517906260524314499403244351616422252121055952113110271138293928212529673799304642391111240276020675358494813696554523481590264158947351676110042432673473418345599422645844634851830864755550852701110113425391266179913121722329409757149764903533132045541280140647221422756361049562053483131314235005189622695509495551324906173928645488225554872953

    4302366415842078551355362854732578551302518461922712304223033363337389247016488183061383281236425122129421646448433138083600265345014404183129029688416574484709309239322526522716379337112010210243443398289130952375548028362353884198313138962271132947974870386832513722688139740754729382013141008316567619053319633528489115637081189181634103882163054635893740680363198966325022829419070759283528393729355834383029259914782834270866239537508594537856145745602622835521345611871379542284169362412503383115634725276289446452974870208221242147876771356211150295442343743345420356231952284124780231084081336416622889215327002374535645113667166340141197

    3768753486383283835183902087320266741893743320525801595860377026527418292096281939486831321269243742388813190838772977105242638636683828523853459249619211882453496268146191341284021394368250949683104358358645108793821366018381655391974392218818324849245668523782711682468292187313431458292110850401001

    nt. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS055)   p