the impact of ict development on the global digital divide

16
The impact of ICT development on the global digital divide Shing H. Doong a,1 , Shu-Chun Ho b,a Shu Te University, No. 59, Hengshan Rd., Yanchao District, Kaohsiung City 824, Taiwan b National Kaohsiung Normal University, No. 116, Heping 1st Rd., Lingya District, Kaohsiung City 802, Taiwan article info Article history: Available online 22 February 2012 Keywords: Convergence Cross-country study Data clustering Digital divide Information and communication technology National development Peer effects abstract Information and communication technology (ICT) has accelerated the growth of the global economy and improved the quality life of the world’s inhabitants. ICT has brought new ways of creating livelihoods for people. The diffusion of ICT has also increased year by year and made it possible to reduce poverty. The opportunities created by ICT also may eventually decrease the ‘‘distance’’ between countries in many other ways. Because access to ICT plays a key role in defining the global digital divide, it is important to study how the ICT gaps among countries have changed. This study examines global ICT development in the last decade. We collected secondary data for 136 countries from 2000 to 2008. Four relevant vari- ables are used as proxies for the ICT development status of a country. Because of this multivariate nature of the data, most previous studies have applied a composite index approach to represent the ICT status of a country. For this study, we developed a framework to reduce multivariate raw data into an ordinal number representing a country’s ICT development level. The methodology behind the framework involves data clustering and multi-dimensional data ranking. After applying this data reduction proce- dure, we explored ICT development paths of different countries, and also conducted panel data analysis based on gross national income and various fixed effects. Ó 2012 Elsevier B.V. All rights reserved. 1. Introduction Information and communication technology (ICT) has fostered economic growth and social progress in the past few decades. Prior studies have shown that ICT plays a critical role in the national e- commerce growth (Fathian et al. 2008, Ho et al. 2007, 2011), eco- nomic growth (Hanafizadeh et al. 2009, Andrianaivo and Kpodar 2011, Papaioannou and Dimelis 2007, Tcheng et al. 2007, Seo et al. 2009), and country development (Heeks 2008). Both developed and developing countries in the world have boosted their national investments in ICT to drive their economic growth (Dewan and Kraemer 2000, Andrianaivo and Kpodar 2011, Tcheng et al. 2007). Heeks (2008) argued that ICT development requires new technolo- gies and new approaches to innovate and integrate. The diffusion of ICT in recent years also has surprised many analysts who serve with leading international organizations. These include the United Nations, the World Bank, the Organization for Economic Cooperation and Development (OECD), and the International Telecommunica- tion Union (ITU), as well as the governments of many countries. For example, benchmark progress in worldwide ICT access with an emphasis on mobile applications was realized in 2008. This was earlier than the prediction of 2015 made by the World Summit of the Information Society (WSIS) in 2005. The estimation is that more than half of the world’s inhabitants will have access to ICTs by 2015. Mobile applications have been designed not only for voice communications, but also business transactions and information access (UNCTAD 2009). In developing countries, the number of users using mobile devices to access the Internet has jumped up rapidly too. As the largest developing country in the world, China had 233 million mobile Internet users with an estimated annual growth rate of 51% in 2009 (CINIC 2010). Many countries have endeavored to develop ICT through heavy resource investments over the years. Wealthier countries are con- sidered to have more resources at their disposal in ICT develop- ment and may have created a higher level of ICT development. Thus, it is critical to investigate the result of the ICT investments in the past decade. The objective of this paper is to explore the global ICT development trend and to examine national wealth effects on the trend. Through regional network effects, countries with spatial proximity may influence one another in their ICT development. Social influence theory (Friedkin 1998) stipulates that an individual’s behavior may depend on the behavior of others to whom the individual is tied. The study of Agarwal et al. (2009) supported this social influence theory on an individual’s Internet adoption behavior, which may be influenced by the Internet adoption behavior of the individual’s spatial peers. In addition to national wealth effects on ICT development, we are also interested in peer effects resulting from spatial proximity on the ICT develop- ment of a country. Thus, this study provides another example of social influence theory at the entity level of countries. 1567-4223/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.elerap.2012.02.002 Corresponding author. Tel.: +886 7 7172930x1707. E-mail addresses: [email protected] (S.H. Doong), [email protected] (S.-C. Ho). 1 Tel.: +886 7 6158000x3004. Electronic Commerce Research and Applications 11 (2012) 518–533 Contents lists available at SciVerse ScienceDirect Electronic Commerce Research and Applications journal homepage: www.elsevier.com/locate/ecra

Upload: shing-h-doong

Post on 25-Nov-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: The impact of ICT development on the global digital divide

Electronic Commerce Research and Applications 11 (2012) 518–533

Contents lists available at SciVerse ScienceDirect

Electronic Commerce Research and Applications

journal homepage: www.elsevier .com/locate /ecra

The impact of ICT development on the global digital divide

Shing H. Doong a,1, Shu-Chun Ho b,⇑a Shu Te University, No. 59, Hengshan Rd., Yanchao District, Kaohsiung City 824, Taiwanb National Kaohsiung Normal University, No. 116, Heping 1st Rd., Lingya District, Kaohsiung City 802, Taiwan

a r t i c l e i n f o

Article history:Available online 22 February 2012

Keywords:ConvergenceCross-country studyData clusteringDigital divideInformation and communication technologyNational developmentPeer effects

1567-4223/$ - see front matter � 2012 Elsevier B.V. Adoi:10.1016/j.elerap.2012.02.002

⇑ Corresponding author. Tel.: +886 7 7172930x170E-mail addresses: [email protected] (S.H. Doong)

1 Tel.: +886 7 6158000x3004.

a b s t r a c t

Information and communication technology (ICT) has accelerated the growth of the global economy andimproved the quality life of the world’s inhabitants. ICT has brought new ways of creating livelihoods forpeople. The diffusion of ICT has also increased year by year and made it possible to reduce poverty. Theopportunities created by ICT also may eventually decrease the ‘‘distance’’ between countries in manyother ways. Because access to ICT plays a key role in defining the global digital divide, it is importantto study how the ICT gaps among countries have changed. This study examines global ICT developmentin the last decade. We collected secondary data for 136 countries from 2000 to 2008. Four relevant vari-ables are used as proxies for the ICT development status of a country. Because of this multivariate natureof the data, most previous studies have applied a composite index approach to represent the ICT status ofa country. For this study, we developed a framework to reduce multivariate raw data into an ordinalnumber representing a country’s ICT development level. The methodology behind the frameworkinvolves data clustering and multi-dimensional data ranking. After applying this data reduction proce-dure, we explored ICT development paths of different countries, and also conducted panel data analysisbased on gross national income and various fixed effects.

� 2012 Elsevier B.V. All rights reserved.

1. Introduction

Information and communication technology (ICT) has fosteredeconomic growth and social progress in the past few decades. Priorstudies have shown that ICT plays a critical role in the national e-commerce growth (Fathian et al. 2008, Ho et al. 2007, 2011), eco-nomic growth (Hanafizadeh et al. 2009, Andrianaivo and Kpodar2011, Papaioannou and Dimelis 2007, Tcheng et al. 2007, Seo et al.2009), and country development (Heeks 2008). Both developedand developing countries in the world have boosted their nationalinvestments in ICT to drive their economic growth (Dewan andKraemer 2000, Andrianaivo and Kpodar 2011, Tcheng et al. 2007).Heeks (2008) argued that ICT development requires new technolo-gies and new approaches to innovate and integrate. The diffusion ofICT in recent years also has surprised many analysts who serve withleading international organizations. These include the UnitedNations, the World Bank, the Organization for Economic Cooperationand Development (OECD), and the International Telecommunica-tion Union (ITU), as well as the governments of many countries.

For example, benchmark progress in worldwide ICT access withan emphasis on mobile applications was realized in 2008. This wasearlier than the prediction of 2015 made by the World Summit ofthe Information Society (WSIS) in 2005. The estimation is that

ll rights reserved.

7., [email protected] (S.-C. Ho).

more than half of the world’s inhabitants will have access to ICTsby 2015. Mobile applications have been designed not only for voicecommunications, but also business transactions and informationaccess (UNCTAD 2009). In developing countries, the number ofusers using mobile devices to access the Internet has jumped uprapidly too. As the largest developing country in the world, Chinahad 233 million mobile Internet users with an estimated annualgrowth rate of 51% in 2009 (CINIC 2010).

Many countries have endeavored to develop ICT through heavyresource investments over the years. Wealthier countries are con-sidered to have more resources at their disposal in ICT develop-ment and may have created a higher level of ICT development.Thus, it is critical to investigate the result of the ICT investmentsin the past decade. The objective of this paper is to explore theglobal ICT development trend and to examine national wealtheffects on the trend. Through regional network effects, countrieswith spatial proximity may influence one another in their ICTdevelopment. Social influence theory (Friedkin 1998) stipulatesthat an individual’s behavior may depend on the behavior of othersto whom the individual is tied. The study of Agarwal et al. (2009)supported this social influence theory on an individual’s Internetadoption behavior, which may be influenced by the Internetadoption behavior of the individual’s spatial peers. In addition tonational wealth effects on ICT development, we are also interestedin peer effects resulting from spatial proximity on the ICT develop-ment of a country. Thus, this study provides another example ofsocial influence theory at the entity level of countries.

Page 2: The impact of ICT development on the global digital divide

S.H. Doong, S.-C. Ho / Electronic Commerce Research and Applications 11 (2012) 518–533 519

We intend to answer the following research questions. Whatare the trends of ICT development in a global sense for the last dec-ade? Do countries with different gross national income (GNI) levelshave different ICT development paths? Are there peer effects in theICT development of countries with spatial proximity?

In Section 2, we review the literature related to ICT and theframework in our analysis. Section 3 illustrates our model develop-ment and analysis approach: data clustering, cluster ranking, met-rics for measuring ICT gaps, and the panel study. In Section 4, wedescribe the secondary data collected for data analysis and datapreprocessing according to our framework. Sections 5 and 6 pro-vide analysis and discussion, and Section 7 concludes with ourfindings, contributions, and limitations.

2. Literature review

We first review studies in the digital divide. To analyze globalICT development, we will assess studies related to ICT that haveconsidered it as a general purpose technology. This perspectivesuggests a multivariate representation of ICT development data.

2.1. Digital divide

Though more than half of the world’s inhabitants have access toICT, the distribution of resources has not been uniform throughoutthe world. For example, there is more communication fiber in theAsian, North American and European continents than in the Africancontinent. Even within the same continent though, there are differ-ent levels of ICT access for different countries and regions. As ICTplays a key role in economic growth, the disparities have createdmany socio-economic imbalance problems in the world. Thephrase digital divide, in particular, has caught the attention of aca-demic researchers and policy-makers worldwide. The digital dividerefers to the gap between those who have access to IT and thosewho do not (Rice and Katz 2003). The OECD (2001) defined digitaldigital divide as ‘‘the gap between individuals, households, busi-nesses and geographic areas at different socio-economic levelswith regard both to their opportunities to access information andcommunication technologies and to their use of the Internet for awide variety of activities.’’ Thus, the concept of digital divide hastwo key components: granularity and contents. Granularity refersto the level of entities such as individuals, businesses, countriesand regions where the gap occurs. Contents refer to activities thatdefine the gap, for example, in terms of ICT development and useof the Internet.

Alleviation of the global digital divide has been a major task ofinternational organizations such as the United Nations (UN), theWorld Bank, and the G8 countries (Canada, France, Germany, Italy,Japan, Russia, the United Kingdom and the United States). Theseorganizations have endeavored to explore how ICT impacts thedevelopment of a country. They have analyzed the status quo ofthe development of ICT in countries, and have provided practicalevidence year by year. In addition, various researchers have ap-plied different approaches to studying the digital divide (Bélangerand Carter 2009, Cuervo and Menendez 2006, Dasgupta et al. 2001,Kauffman and Techatassanasoontorn 2005a, Sacchi et al. 2009).

Research on digital disparity can be divided into the study of theglobal digital divide (the gap between countries) and the domesticdigital divide (the gap between groups within countries). Cross-country digital divides result from social and economic inequalitiesamong developed and developing countries. Some prior studieshave focused on the extent of the cross-country divide (Chinn andFairlie 2007, Crenshaw and Robinson 2006, Cuervo and Menendez2006, Dasgupta et al. 2001, Dewan et al. 2005, Dewan et al. 2010,Emrouznejad et al. 2010, Hanafizadeh et al. 2009, Kauffman and

Techatassanasoontorn 2005a, Shirazi et al. 2009, Vicente and Lopez2011). We summarize these studies in terms of measures of digitaldivide, data type, research method, data period, unit of analysis,and variables examined in Table 1. These studies use wireless tech-nology, PC, Internet, and ICT indicators to measure the digital di-vide. They also examine a large set of variables that may affectthe digital divide across countries. Most of these studies collectedsecondary data and performed cross-sectional and time-seriesanalyses (see Table 1).

Previous studies imply that opportunity to access ICT is a keycomponent in measuring the digital divide. In addition, Internet ac-cess, PC access, user digital capability, and government policy alsoform a basis for measurement of the digital divide. ICT-relatedmeasures are critical indicators that show the differences betweenrich and poor countries (Chinn and Fairlie 2007, Cuervo andMenendez 2006). ICT opportunities and the digital divide have aninteresting bidirectional relationship: ICT opportunities influencethe digital divide, and the digital divide may hinder ICT opportuni-ties as well (OECD 2005). The ICT opportunities of a country seemto be closely tied to the ICT development in that country. Instead ofaddressing the full set of issues related to the digital divide, we willuse a broad perspective to measure ICT development in this study.We also intend to measure ICT as a general purpose technology(GPT).

2.2. ICT as a general purpose technology

General purpose technologies (GPTs) are original ideas or tech-niques that have the potential to significantly influence a varietyof industries in a country (Guerrieri and Padoan 2007). GPTs arecharacterized by their pervasiveness of use, inherent potential fortechnical improvements and innovational complementarities(Bresnahanm and Trajtenberg 1995). An example is the steam engine.Another example is electrical systems. As GPTs improve and spreadthrough an economy, the economy may achieve improved produc-tivity. Since ICT is a type of GPT, measuring ICT development is amultifaceted challenge. The WSIS 2003 annual meeting providedguidelines for measuring the ICT development of a country (Sciadis2005). The increasing penetration of ICT involves several criticalindices for economic growth and technology diffusion. They in-clude mobile phone penetration, Internet penetration, PC penetra-tion, investment in ICT infrastructure, and so on. These indicatorsof ICT development have been empirically tested in previous stud-ies (Cuervo and Menendez 2006, Ho et al. 2007, Kauffman andTechatassanasoontorn 2005a, UNCTAD 2010). In this study, we ar-gue that the Internet and mobile phones are GPTs. We will focus onthese two GPTs in this study.

Technology adoption does not take place uniformly across theworld. Researchers have argued that spatial proximity is likely toresult in relational proximity when there are increased interactions(Niles and Hanson 2003). Agarwal et al. (2009) examined geo-graphical variation in Internet use by using the spatial distributionof individuals to define their reference group. An individual’s socialor peer group refers to everyone else living in the same region – forexample, a US county. Peer effects suggest that people in the sameregion affect the propensity of an individual’s use of the Internet(Agarwal et al. 2009). Chin and Fairlie (2007) argued that incomeand the telecommunication infrastructure contribute to Internetpenetration. Moreover, applications of the Internet also seem tohave accelerated the development of electronic commerce world-wide. The introduction of e-commerce has gradually changed thestructure of global business as well (Gibbs et al. 2003). Developingcountries can utilize the opportunities of e-commerce and ICTs toincrease their country’s competitiveness (UNCTAD 2001). Theadvantages of developing ICT infrastructure are beneficial; theylead to the growth of a domestic economy and also foster economic

Page 3: The impact of ICT development on the global digital divide

Table 1Studies on the digital divide.

Authors Measurementof digital divide

Data/method Year/countries Variables

Kauffman andTechatassanasoontorn(2005a)

Digital wirelesstechnology diffusion

Cross-countries,time-series data

1992–1999, 43countries

Wealth, telecom infrastructure, marketcompetition, access cost, standards

Dewan et al. (2005) IT penetration(mainframes, PCs,Internet)

Cross-countries,time-series data

1985–2001, 40countries

Density of main telephone line, averagemonthly telephone subscription cost, averagecost of local call, size of urban population, GDPper capita, average year of schooling, size oftrade in goods in the economy

Crenshaw and Robinson (2006) Internet Panel data analysis 1995–2000, 58countries

Internet hosts, telephone mainlines,employment in service sector, politicalopenness, global urban share

Cuervo and Menendez (2006) ICT-related indicators Factor analysis and clusteranalysis

2001, 15 EuropeanUnion countries

Computers, main telephone lines, broadbandconnections, secure servers, business with awebsite, business buying online, Internet dialup access cost, households connected to theInternet, public service online, activepopulation using a computer for professionalpurposes

Dewan et al. (2010) PC and Internet Cross-countries, secondarydata analysis

1991–2005, 26countries

GDP, PCs, Internet users, average PC unit price,average monthly cost of telephone access

Emrouznejad et al. (2010) ICT opportunity index Data envelopment analysis 2007, 183 economies Main telephone lines, mobile cellularsubscribers, International Internet bandwidth,adult literacy rates, gross enrolment rates,primary secondary tertiary, Internet users,household with a TV, computers, broadbandInternet subscribers, International outgoingtelephone traffic

Banker et al. (2011) Digital trading platform Secondary data, regressionmodel

881 transactions Raw grade indicator, premium grade indicator,coefficient of variation, sell transactionindicator, click and book indicator, order bookmanagement indicator, seller/buyer is a trader,number of seller/buyer transaction

Talukdar and Gauri (2011) Internet access andusage

Telephone surveys 2000 and 2008, US Internet adoption level at home, annualhousehold income, education level

520 S.H. Doong, S.-C. Ho / Electronic Commerce Research and Applications 11 (2012) 518–533

development and infrastructure upgrades (Meso et al. 2009, Okoliet al. 2010).

2.3. Analysis of multifaceted ICTs

To achieve a more comprehensive understanding, we will usemultiple indicators to represent the ICT development of a country.Many studies have attempted to capture the multifaceted nature ofICT by using composite indicators known as indices (Fazio et al.2000, Crenshaw and Robinson 2006, Cuervo and Menendez2006). An index refers to a mathematical combination of severalprimitive variables measuring different aspects of ICT develop-ment; popular ICT indices include, for example, the InformationSociety Index (IDC 1995), and the Digital Access Index (ITU2003). Determining the proper forms and weights with which tocombine primitive variables is a key challenge for these indexapproaches.

Corrocher and Ordanini (2002) provided a multivariate statis-tics-based method to compose an index for measuring the digitaldivide. The authors used principal component analysis to aggre-gate elementary indicators into six digitization factors: market,diffusion, infrastructure, human resources, competition and com-petitiveness. They then aggregated these six factors to obtain asynthetic index that can be used to gauge the digital divide. Cuervoand Menendez (2006) provided a factor analysis and cluster anal-ysis-based methodology to analyze the digital divide among fifteenEuropean Union (EU) countries. The authors collected ten ICT-re-lated indicators from 2001 that were useful for the analysis ofthe digital divide. These indicators included ICT infrastructure (e.g.,number of computers per 100 inhabitants) and the pervasiveness

of ICT (e.g., percentage of businesses buying online). In order tominimize the number of variables, Cuervo and Menendez (2006)used factor analysis to extract two meaningful factors: ICT infra-structure and diffusion (Factor 1), and e-government and Internetaccess cost (Factor 2). The authors then applied cluster analysisto group together countries with similar factor scores (UNESCO2003). Digital gaps among the fifteen EU countries were thenanalyzed based on the cluster results.

3. Model development

Researchers have been endeavoring to understand the implica-tions of ICT development in various countries. This requires mea-sures and analysis of the relationship between ICT and nationaldevelopment. Several methodological approaches and statisticaltechniques have been applied and developed to meet this need.Our multivariate ICT development data contain four indicators(mobile phone, Internet, Telecommunication investment, andTelecommunication revenue) for each country that we will study,for the study years between 2000 and 2008. To analyze globalICT development trend, one can compose a representative indexfrom these four indicators and study the path of this index. Princi-pal component analysis is often used to combine multiple rawvariables into a composite index (Johnson and Wichern 2007).We used the first principal component score to observe the layoutof the first 10 countries in our data set (see Fig. 1).

From the ten curves, it is difficult to say whether the compositeindex has converged throughout the period. If we had plotted thecurves from all 136 countries from our data set, the chart would lookmuch messier. With this short time-series data set, the definition of

Page 4: The impact of ICT development on the global digital divide

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

2000 2001 2002 2003 2004 2005 2006 2007 2008

1st P

rinci

pal C

ompo

nent

Sco

re

Albania

Algeria

Argentina

Armenia

Australia

Austria

Bahamas

Bahrain

Bangladesh

Barbados

Fig. 1. First principal component score for ten countries.

S.H. Doong, S.-C. Ho / Electronic Commerce Research and Applications 11 (2012) 518–533 521

convergence cannot be literally applied though. The limit of eachscore curve cannot be detected with only nine different points ofdata over time. However, we found that gaps among the countriesnarrowed from 2000 to 2008. Applying the r-convergence conceptfrom Barro and Sala-i-Martin (2004), we can compute the varianceof component scores in each year, and study their variations withthe passage of time. We expect to see a fluctuating variance curvewith declining amplitudes in general over the study period.

3.1. Extending r-convergence with data clustering

To define the convergence of ICT development across countries,we apply the Barro and Sala-i-Martin (2004) r-convergence ap-proach to compute the variance of a cross-sectional data at eachpoint in time. This variance

Pni¼1ðwi � lÞ2=n is based on a common

mean l for all data wi in a cross-section. Here, wi represents a gen-eric univariate variable with n data points. When we examine thecurves more carefully, it seems more reasonable to assert thatthere are multiple means in the process of measuring data disper-sion. For example, we could probably assume a three-mean orfour-mean state. Assume that four means l1, l2, l3, l4 are to beused and each data point wi is associated with a properly selectedmean lj(i), where j(i) e {1,2,3,4} denotes the association. j(i) as-signs wi to mean lj(i) in a multiple means based variance formula.Then, a multiple mean based formula capturing local variances maybe defined as var ¼

Pni¼1ðwi � ljðiÞÞ

2: The association of a mean lj(i)

to the data point wi is not trivial when the means themselves are tobe determined at the same time. Using multiple means to measuredata variances has the advantages of accommodating local datadispersion, and it should result in a smaller variance than a singlemean based formula can provide. Though simultaneous determina-tion of means l1, l2, l3, l4 and the association function j(i) to makethe variance as small as possible is a computationally difficultproblem, heuristic algorithms such as k-means clustering algo-rithms exist that converge quickly to local minimums. Here kstands for the number of means to be used in the partitioning ofdata. For example, k = 4 in the previous example.

Cuervo and Menendez (2006) argued that data clustering pro-vides a useful tool to handle multivariate data related to ICT devel-opment data. Data clustering has been widely used in data mining toexplore the implicit structure embedded in data (Han and Kamber2007). The goal of a clustering algorithm is to partition the datainto non-overlapping subsets called clusters, such that intra-clusterdata similarity is maximized while inter-cluster data similarity isminimized. Two popular algorithms can be applied to segmentdata: hierarchical clustering algorithms and partitional clusteringalgorithms (Han and Kamber 2007). An agglomerative hierarchicalalgorithm starts with each data point as its own cluster. Using a

well-defined inter-cluster distance, such as the average linkageor the complete linkage, the two closest clusters can be mergedto form a bigger cluster. This procedure is repeated until all ofthe data belong to the one and only cluster. A dendrogram showingthe merging process is generated at the end. Application users thendecide where to cut the dendrogram to create different clusters.The cut points represent users’ preferences for inter-cluster and in-tra-cluster data similarity levels, and determine the number ofclusters. The selection of an appropriate data similarity level issometimes a subjective choice. One drawback of the hierarchicalalgorithm is its procedure to merge the closest clusters at eachintermediate step. This type of greedy procedure may result in abad dendrogram in the end. Improved algorithms using stochasticmethods to merge clusters have been introduced in many studies(Witten and Frank 2005).

Unlike hierarchical algorithms, a partitioning algorithm such asthe k-means algorithm uses representative points to segment dataand aims to optimize an objective function measuring the qualityof data segmentations. Assuming the number of clusters has beendetermined, a k-means algorithm will start with a random set of krepresentative points called initial centroids g1,g2, . . . ,gk. Depend-ing on the original data points, these centroids may be multivariateas well. The k-means algorithm then repeats the following twosteps in sequence until some preset conditions are met:

� Partition step: Assign each data point to the cluster representedby the nearest centroid.� Expectation step: Update the centroid of each cluster by averag-

ing data points belonging to the same cluster.

These two steps are iterated to minimize a dissimilarity mea-sure, indicating the quality of data segmentation. A commonlyused dissimilarity measure is defined based on the Euclidean dis-tance (Han and Kamber 2007).

Dissimilarity ¼Xk

j¼1

Xxi2Pj

kxi � gjk2E ð1Þ

In the above equation, the inner summation is taken over all pointsxi belonging to cluster Pj represented by the centroid gj, and theouter summation is taken over all clusters. The Euclidean distance||xi � gj||E between a data point and its cluster centroid measuresthe local data dispersion. This dissimilarity measure is the multivar-iate analogy of the variance that captures local variances withmultiple means. Here, i represents the index of a data point, j isthe index of a cluster and k denotes the number of clusters in dataclustering or the number of means in a multiple means basedvariance formula.

Page 5: The impact of ICT development on the global digital divide

522 S.H. Doong, S.-C. Ho / Electronic Commerce Research and Applications 11 (2012) 518–533

It is a computationally hard task to find the centroidsg1,g2, . . . ,gk that minimize the objective function globally (Maha-jan et al. 2009). The iterative steps for the k-means algorithm canfrequently converge to a local minimum. Random restarts withdifferent centroids are proposed to alleviate this local optimumproblem. Another issue with k-means algorithms is the numberof clusters. Finding a way to determine the number k such thatintra-cluster dissimilarity is minimized while inter-cluster dissim-ilarity is maximized is still an active research topic in data mining.Objective goals for using cross-validation principles or clusteringvalidation indices have been proposed (Han and Kamber 2007).In some cases, subjective opinion also matters in the determinationof the cluster number. We will explain later how we determine thiscritical number for our data set.

3.2. Ranking k-means clusters

There are two advantages from using data clustering to summa-rize the internal data structure of ICT development. First, thetechnique can easily handle multivariate data sets that are quitecommon for ICT-relevant data. Thus, we do not need to use acomposite index approach to analyze the data set. Second, bycategorizing ICT data with discrete cluster labels, we reduce thefluctuation and noise normally associated with interval type data.For example, in Fig. 1, it is hard to analyze the trend of componentscore curves. If we discretize the component scores into differentlevels, we would have a better chance to detect the trajectory ofICT development over the years.

By using data clustering on a cross-sectional data of multivari-ate ICT indicators, we can summarize the development status ofcountries with discrete cluster labels. Since we also obtained atime-series of these cross-sectional data, finding a way to trackthe evolution of the cluster labels is also important. Centroids fromk-means algorithms are natural choices for representing clusterlabels. If we can rank centroids, we can use the natural numbers1, 2, and 3, etc. as cluster labels. Using numbers as cluster labelsallows us to track the ICT development status of a country overthe study period. Unfortunately, there is no easy way to rankmultivariate centroids, though we can easily compare two realnumbers.

The non-dominated solutions concept in multi-objectiveprogramming may provide a solution to our problem (Deb 2001).Using two-dimensional data as an example, we say that a point(a1,b1) is dominated by another point (a2,b2) if and only if theircomponents satisfy conditions a1 6 a2; b1 6 b2 and at least one ofthese inequalities is strict. In other words, the second point is asgood as the first one in every component and it must be better inat least one component. For example, (10,20) is dominated by(15,20). However, it may appear that two data points are notcomparable with this domination relation. For example, (10,20)and (15,15) are not dominated by each other in any direction.Therefore, the domination relation is only a partially-orderedrelation for sets of multivariate data. Though Zorn’s (1935) lemmamay be helpful to deal with partially-ordered sets, we adopt asimpler approach to make the set of centroids a totally-orderedset.

Two centroids can be compared based on the values of a chosencomponent in each. For example, assume that the mobile penetra-tion rate is a preferred variable to compare two centroids. Then, theone with a higher mobile penetration rate ranks higher than theother one. In very rare cases when two centroids have exactlythe same mobile penetration rate, then we can use the secondpreferred component to make the comparison. To maintain aconsistency in comparison, the order of component preferences isthe same for every yearly cross-sectional data. With this definitionof order comparison, the multivariate centroids data set becomes

a totally-ordered set. After ranking all centroids in each year,we use ordinal cluster labels to represent the ICT developmentstatus of a country and track the movement of these ordinallabels.

3.3. Entropy: comparing the distribution of cluster members

After using data clustering to segment countries into differentICT development groups and assigning cluster labels with ordinalnumbers, we can gain insights into the ICT development trend ofeach country. For example, using the same ten countries in Fig. 1and their ordinal cluster labels (1–4), we can plot the curve of theirICT development history (see Fig. 2).

The clustering is done for all 136 countries in our data set. Fig. 2shows the results for ten countries. In each year, we partitioned the136 countries into four clusters based on their ICT relevant attri-butes. The clustering and ranking procedure transforms the inter-val type multivariate data into ordinal cluster labels called clusterscores. With this data discretization, ICT development stabilizesfor these countries near the end of the study period. Since the clus-ter scores for each country do not change after 2006, convergenceseems to have occurred. However, the following questionsremained unanswered. Do all 136 countries have a convergentICT development path? Do they converge mostly to high clusterscores or low cluster scores? How do we measure the dispersionof cluster scores in each year and how does this dispersion movealong the time axis? We will use the concept of entropy to tacklethis issue.

Entropy is a concept in thermodynamics, but it was applied byShannon to study information theory in the late 1940s (Wittenand Frank 2005). It can be used to gauge the diversity level of mul-tiple categories in a data set. Assume that a set contains k differenttypes of objects and the probability of seeing each type of object isp1,p2, . . . ,pk. Objects refer to cluster labels, thus, using our previousexample of 4 clusters, k = 4. Then the heterogeneous level ofinformation content of this set is computed as an entropy value:Entropy = �p1 log2p1�� � �� pk log 2pk.

This entropy value reaches a maximum for log2k when theprobability distribution on different types of objects is uniform,i.e. p1 = p2 = � � �= pk = 1/k. If the distribution is skewed to one singletype of object though, for example, p1 = 1, then it has the minimalvalue of 0. The size of the entropy value is often used to measurethe homogeneity of the elements in a data set. The larger the entro-py value, the more heterogeneous the set, and the smaller the en-tropy, the more homogeneous the set.

3.4. A research framework for studying trends of ICT development

We next develop a framework to analyze ICT developmenttrends at the country level. This framework applies to a cross-sec-tional time-series data set of different ICT development attributes.It consists of a cluster score assignment step and a trend analysisstep. The cluster score assignment step is applied to all countriesin our data set for every year. The trend analysis step may be ap-plied to a restricted subset of countries that do not change overthe study period. For example, we may restrict the subjects to allAsian countries or members of certain economic organizations.Assuming that ICT development attributes have been collectedfor a fixed set of countries over successive years, for the cross-sec-tional data in each year we apply multivariate clustering algo-rithms to partition the set of countries, with each cross-sectionof data to be partitioned into the same number of clusters. A con-sistent ranking method will be applied to the yearly clustering re-sult to denote cluster labels as cluster scores. Using the ordinalcluster scores, we can track how the ICT development level of acountry changes over the years. The cluster scores approach

Page 6: The impact of ICT development on the global digital divide

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

2000 2001 2002 2003 2004 2005 2006 2007 2008

Clu

ster

labe

ls

Albania

Algeria

Argentina

Armenia

Australia

Austria

Bahamas

Bahrain

Bangladesh

Barbados

Fig. 2. ICT cluster labels of the ten countries in Fig. 1 during the study period. Note: In 2000, Cluster Label 1 included Albania, Algeria, Bangladesh, Cluster Label 3 includedArgentina, Armenia, Bahamas, Bahrain, Barbados, and Cluster Label 4 included Australia, Austria. In 2008, Cluster Label 1 included Algeria, Armenia, Bangladesh, Cluster Label3 included Albania, and Cluster Label 4 includes Argentina, Australia, Austria, Bahamas, Bahrain and Barbados.

S.H. Doong, S.-C. Ho / Electronic Commerce Research and Applications 11 (2012) 518–533 523

simplifies data constructs embedded in a clustering result, and itenables us to track the ICT development trend more easily.

Tracking the trend of a country is a micro-application of thecluster scores. Let st(c) represent the cluster score of country cin the year t. We sort countries in our data set by the namesalphabetically, with Albania as country 1 (c = 1) and Zambia ascountry 136 (c = 136) in the analysis. The study period runs from2000 to 2008, thus t can be 2000 up to 2008. The trend of a coun-try can be explored by the absolute change of the score over suc-cessive years dt(c) = |st+1(c) � st(c)|. If the cluster score st(c)converges to a limit, that is, country c has a convergent ICT devel-opment path, then dt(c) should converge to zero as t becomeslarge. We have seen that the ICT development status of the tencountries in Fig. 2 converged to their respective limits near theend of the study period.

At a macro-level, we also want to explore the trends of all coun-tries as a whole. By aggregating the individual changes Dt =

Pcdt(c)

from all countries, we can explore whether ICT development hasconverged for them as a group. In addition to this convergencemetric, we would also like to measure how the diversity of clusterscores changes over the years. Let Et denote the entropy of clusterscores in year t. With these metrics, four interesting scenarios ariseas t approaches the end of the study period:

� Scenario I: Both Dt and Et go to zero as t approaches the end ofthe study period. In this case, the ICT development of mostcountries will converge to a common level.� Scenario II: Dt goes to zero, and Et remains large. The ICT devel-

opment of most countries will converge, and the convergencewill be uniform on every level.� Scenario III: Dt remains large, and Et goes to zero. This indicates

that cluster scores will be concentrated at a single level nearthe end of the study period, and some countries will not havea convergent ICT development path. A possible cause for thiscould be a large fluctuation of cluster scores for many coun-tries. For example, some countries may belong to the lowICT level in a year, and then move to the high ICT level inthe following year.� Scenario IV: Both Dt and Et remain large as t approaches the end

of the study period. In this case, some countries do not have aconvergent ICT development path, and cluster scores are likelyto be distributed uniformly on every level. A possible causefor this phenomenon might be that two nearby cluster levelsswap their members in successive years.� All other cases are intermediate cases that fall between the four

extreme cases. Our data will show the actual pattern of thecountries collected for this study.

3.5. A panel data analysis of ICT development based on a nation’swealth

The previous section describes a framework for implementingan exploratory study of ICT development on a global basis. Thereis no causal factor for explaining why development has evolved.ICT development is a capital-intensive development process formost countries. A country must invest its own funds in this areaor invite foreign parties to conduct development projects. Richcountries usually have more resources at their disposal, and theyoften invest more money than poor countries in ICT projects. Dorich countries always have a high level of ICT developmentthough? Is there any causal link between a country’s nationalwealth and its ICT development trend? A country’s wealth maybe represented by its gross national income (GNI), which is avail-able as a time-series data set for the country. Collecting GNItime-series for a cross-section of countries allows us to make a pa-nel data set. Cluster scores used to denote the ICT development sta-tus of countries over years make another panel data set. A causalstudy linking GNI and ICT development may be conducted by usingestablished regression tools in panel data research. A properlyestablished panel study model can explain why trends havehappened.

Let ICTct = st(c) and GNIct respectively denote the ICT clusterscore and the World Bank GNI classification of country c at yeart. If we consider the cluster score as an interval-type data, wecan regress ICTct against GNIct with fixed-effects or random-effectslinear models as follows:

Fixed effects : ICTct ¼ b0 þ b1GNIct þ a2C2 þ a3C3 þ � � �þ aNCN þ c2T2þ c3T3 þ � � � þ cMTM þ uct

ð2Þ

Random effects : ICTct ¼ b0 þ b1GNIct þ uct ð3Þ

where C2,C3, . . ., and CN are dummy variables for the country effects,T2,T3, . . ., and TM are dummy variables for the time effects, and uit isthe idiosyncratic error. A dummy variable takes the value of 0 or 1only, and is assumed 0 unless its special fixed effect is being consid-ered. For example, C2 = 1 only when we are considering the countryfixed effect of Country 2, and T2 = 1 only when we are consideringthe time fixed effect of Year 2 in our study. The panel data containN countries and M years. (In this study, Country 2 (c = 2) is Algeriaand Year 2 is 2001 (t = 2001); also, N = 136 and M = 9.) The firstcountry and the first year are used as the base to compute countryor time fixed effects, thus their dummy variables are omitted in thefixed effects model. Using the Hausman (1978) test, we can

Page 7: The impact of ICT development on the global digital divide

524 S.H. Doong, S.-C. Ho / Electronic Commerce Research and Applications 11 (2012) 518–533

determine whether the random-effects model is consistent andshould be preferred over the fixed-effects model.

On the other hand, if we consider the cluster score to be an ordi-nal number, then we can use an ordered probit or logit regressionmodel to estimate the expected value of ICT given GNI with anappropriate link function F (see Appendix B).

EðICTctjGNIctÞ ¼ Fðb � GNIct þ a2C2 þ a3C3 þ � � � þ aNCN þ c2T2

þ c3T3 þ � � � þ cMTMÞ ð4Þ

A satisfactory panel regression model with fixed country or yeareffects may help us explain why a shift in the ICT developmenttrend has occurred for a country or a given year. This type of casualmodel complements our exploratory study by attempting todiscover what the changes are while reasoning why changes maybe happening.

4. Data preprocessing and cluster score computation

To investigate global ICT development, we collected secondarydata published by international organizations. The data collectionprocedure involved some tradeoffs between the breadth and depthof data. When it comes to data breadth, the concern is with thenumber of countries. In contrast, data depth refers to the numberof variables associated with each country. Since our focus is on aglobal study, we decided to place more emphasis on breadth thandepth. Overall, we collected the four ICT relevant variables for 136countries from the Year 2000 to the Year 2008.

4.1. Data collection

To analyze a large sample of 136 countries, we selected fourvariables to represent the ICT development of each country. We ar-gued that mobile phones and the Internet are ICT-related generalpurpose technologies (GPTs). Capital investment in telecommuni-cations and total telecommunication revenues represent theoverall ICT market in a country. Table 2 shows the definitions ofthe variables and sources of our data.

We do not claim that the above variables are exhaustive, butwe believe they represent a country’s ICT development to someextent. The first two variables measure the ICT infrastructure(info-use of WSIS 2003), while the last two variables measurethe financial results of ICT development (infodensity of WSIS2003).

0In addition to the ICT-related variables, we also collected thegross national income (GNI) classification from the World Bank(2009) from the year 2000 to 2008. Each year, the World Bank deter-mines boundary points to stratify countries into four categories.They include: high GNI, upper middle GNI, lower middle GNI, andlow GNI. High GNI countries typically include developed countriessuch as Austria, Japan and the U.S., and low GNI countries includedeveloping countries such as Bangladesh, Gambia and Zambia.Though a country’s GNI category may change from year to year,the changes are infrequent. For example, during the period from2000 to 2008, Armenia changed its status only once: the country

Table 2Definitions of variables and data sources.

Variables Definition

Mobile phone penetration Subscriptions to a public mobile telephoneInternet user Number of total fixed-line Internet subscriCapital investment in telecom Gross annual investment in telecom includTotal telecom revenue Gross annual telecom revenue earned fromGross national income (GNI) per capita Economies classified as low, middle (subdi

Note: We converted values to international dollars in capital investment for telecom/GD

moved from a low GNI status to a lower middle GNI status in2002. The United States has kept its high GNI level throughout theentire period.

4.2. Data description

The raw data in Table 2 were further normalized to account forsize differences among countries. We divided national mobile sub-scription number by the population of a country to get the mobilepenetration variable. We obtained the Internet user penetration bydividing the number of Internet users in each country by the pop-ulation. We adjusted two financial indicators (capital investmentand telecom revenue) according to the GDP of each country.Descriptive statistics of these new variables for the end years areshown in Table 3.

From the descriptive statistics, we see that the two financialindicators are several orders smaller than the other two penetra-tion variables. Since data clustering algorithms using Euclideandistance to measure dissimilarity level may ignore the impact ofvariables with less strength, we further normalized each variableby its z-score transformation z = (x � l)/r. Here l and r are thesample mean and sample standard deviation of a variable. We per-formed this normalization procedure for the cross-sectional data ofevery year so that each variable will have a sample mean of 0 and asample standard deviation of 1. For example, a z = (x � .1886)/.2479 transformation is applied to the mobile penetration variablefor 2000, and for 2008 the formula is z = (x � .8024)/.4419. The fi-nal variables used in this study are shown in Table 4.

4.3. Cluster score computation

According to the proposed research framework, we need tocompute a cluster score for every country and every year. The clus-ter score assignment begins with a data clustering procedure tocluster the multivariate ICT variables (MP, IUP, INV, and REV) in eachyear. We chose k-means algorithms rather than hierarchical algo-rithms for the following reasons:

� A hierarchical algorithm needs to make a subjective selection ofthe cut point on its dendrogram to determine the number ofclusters. This may be difficult. Also, its greedy procedure formerging the closest clusters at every intermediate step maynot produce a good dendrogram.� The objective function used by a k-means algorithm is a natural

extension of the concept of r-convergence in Barro and Sala-i-Martin (2004). In addition, centroids from the algorithm aregood representatives for the clusters, and they can be rankedwith an appropriate procedure. Ranking clusters allows us toconvert cluster labels into ordinal numbers as cluster scores,and also enables the analyses that follow.

There are some issues associated with the k-means algorithmthough. For example, how do we stop the iterative steps and findadequate centroids? We performed the partition-expectation cycle100 times for each run. In most cases, the centroids stopped

Sources

service ITU, Euromonitor Intlptions Euromonitor Intling fixed, mobile and Internet services Euromonitor Intlfixed, mobile and Internet services Euromonitor Intl

vided into lower middle, upper middle), or high income World Bank (2009)

P and total telecom revenue/GDP to avoid the variation in purchasing power.

Page 8: The impact of ICT development on the global digital divide

Table 3Descriptive statistics for 2000 and 2008.

Variables Min Max Mean Std. dev.

2000 2008 2000 2008 2000 2008 2000 2008

Mobile penetration .0002 .0074 .8173 2.0781 .1886 .8024 .2479 .4419Internet user penetration .000002 .0040 .4802 .8710 .0897 .3012 .1337 .2681Capital investment in telecom/GDP .000012 .000023 .0276 .0392 .0082 .0069 .0059 .0075Total telecom revenue/GDP .0010 .0009 .0695 .1192 .0278 .0319 .0141 .0193

Table 4List of final variables.

Variables Description Use

GNI Gross national income Panel dataanalysis

Mobile penetration (MP)a Mobile subscription/population

Clustering

Internet user penetration (IUP)a Internet subscription/population

Clustering

Capital investment in telecom(INV)a

Investment/GDP Clustering

Total telecom revenue (REV)a Telecom revenue/GDP Clustering

a z-Score of the variables in the description field.

S.H. Doong, S.-C. Ho / Electronic Commerce Research and Applications 11 (2012) 518–533 525

moving before the 100th cycle, so this is a good termination crite-rion. To guarantee good final centroids, we ran the algorithm 100times using different initial centroids in each run. The run withthe minimum total dissimilarity as described in Eq. (1) producedthe final data clusters for a year in our procedure.

A more critical problem related to the k-means algorithm is thedetermination of the parameter k, the number of intended clusters.Several cluster validation indices have been discussed in literature.Among these indices, the Davis and Bouldin 1979) index measuresthe intra-to-inter clustering effect. This index is defined as:

DB ¼ 1k

Xk

j¼1

Dj

where Dj ¼ maxj–j0 fDj;j0 g and Dj;j0 is the within-to-between clusterspread for the involved clusters j and j0. That is Dj;j0 ¼ ðdjþdj0 Þ=distj;j0 , where dj and dj0 are the average within-cluster distancesof clusters j and j0, and distj;j0 is the distance between the centroidsof two involved clusters. (In other words, dj measures the intra-clus-ter dissimilarity of a cluster and distj;j0 measures the inter-clusterdissimilarity between two clusters.) Since our goal is to minimizethe intra-cluster dissimilarity and maximize the inter-clusterdissimilarity at the same time, we prefer to choose a value of k witha smaller Davis–Bouldin index.

5. Data analysis

The first principal component (FPC) score is initially used as acomposite index approach to explore the trend of global ICT devel-opment. Then, the results of cluster scores approach are presented.The cluster scores approach is also applied to a fixed subset ofcountries (OECD members) as an example to check the viabilityof the approach. Finally, panel data analysis is presented to assesswhether there is a causal relationship between national wealth andICT development status. Peer effects from spatial proximity arealso presented as a modified panel regression model.

5.1. A composite index approach with FPC scores

The FPC score from the four ICT variables (MP, IUP, INV and REV)was computed for every country in each year from 2000 to 2008.

The FPC accounted for about 44.6–49.5% of the data variance inevery year. Using r-convergence, we computed the standard devi-ation of the FPC scores to measure the extent of data dispersion(see Fig. 3).

The standard deviation declined initially, increased in 2002–2004 and continued to diminish until 2008. If we ignore the minorfluctuations from 2002 to 2004, this seems to indicate that globalICT development tended to achieve r-convergence in the end.Thus, the FPC scores of all countries become less dispersed fromtheir common mean 0 as a whole; but the result does not tell uswhether a country has convergent FPC scores. (The mean of allFPC scores in a year is 0.) We have seen in Fig. 1 that it is difficultto check the convergence of the FPC scores of a country because ofthe complexity involved in interval type scores. Although we canapply the research framework to cluster the univariate variable(FPC score) and discretize each country’s ICT development statuswith its cluster score, we did not do this because the FPC accountedfor less than 50% of the total variance in each year. Adding the sec-ond principal component score will increase the explanatorypower to more than 50% of total variance, but it also creates diffi-culties associated with a multivariate trend analysis.

5.2. Using multivariate ICT development indicators

The cross-sectional multivariate data (MP, IUP, INV, REV) arepartitioned into a proper number of clusters in each year. We firstuse a cross-validation approach to determine the number of clus-ters. The open-source software package Weka 3.6 has an expecta-tion–maximization clustering algorithm that can help determinethe number of clusters (Witten and Frank 2005). Three to five clus-ters are appropriate for yearly cross-sections from 2000 to 2008. Sowe decided to try a k-means algorithm with three to five centroids.The total dissimilarity and Davis–Bouldin index for each cross-sec-tional data are shown in Table 5.

For each year, the total dissimilarity decreases as the numberof centroids increases. So the more centroids we have, the higherthe within-cluster similarity can be obtained. However, the be-tween-cluster dissimilarity can become smaller as well. This isthe reason why the Davis–Bouldin index includes a between-cluster distance (distj;j0 ) in its formula. After calculating the Davis–Bouldin index for each year, we see that the five-cluster partitionhas a smaller index only for 2001. Though it also has a minimumvalue of .86 in 2002, unfortunately, one of the clusters containsonly one country. Thus, we viewed the next smaller index (.93)as the preferred partition for this year. Bold-faced numbers inthe table indicate the preferred number of clusters according tothe Davis–Bouldin index.

According to the Davis–Bouldin index, a four-cluster partition ispreferable for the five yearly cross-sectional data sets (2002, 2003,2006, 2007 and 2008), while a three-cluster partition is appropriatefor the three yearly data sets (2000, 2004 and 2005). In contrast, afive-cluster partition makes sense for the cross-sectional data setof 2001. There are two reasons for not using different numbers ofclusters for different years. First, this practice creates difficultiesin tracking cluster movements from year to year. For example, if

Page 9: The impact of ICT development on the global digital divide

1.3

1.32

1.34

1.36

1.38

1.4

1.42

2000 2001 2002 2003 2004 2005 2006 2007 2008

Stan

dard

dev

iatio

n

Fig. 3. Standard deviation of the first principal component scores in each cross-section.

Table 5Total dissimilarity and Davis–Bouldin index.

Clusters 2000 2001 2002 2003 2004 2005 2006 2007 2008

Total dissimilarity 3 189.88 197.91 220.25 208.20 209.77 205.21 214.13 225.92 239.484 154.27 160.11 166.24 163.69 182.41 171.74 172.89 180.93 190.555 130.51 131.84 142.06 134.88 157.00 146.59 143.01 144.21 151.68

Davis–Bouldin index 3 0.96 1.01 1.13 0.98 0.99 1.00 1.02 1.07 1.074 1.05 1.14 0.93 0.86 1.10 1.08 0.98 0.97 0.985 1.01 1.00 0.86 0.99 1.18 1.12 1.05 1.05 1.07

0

0.5

1

1.5

2

2.5

2000 2001 2002 2003 2004 2005 2006 2007 2008

D &

M m

etri

cs

D

E

Fig. 4. Standardized convergence metric Dt and entropy metric Et for all countries.

526 S.H. Doong, S.-C. Ho / Electronic Commerce Research and Applications 11 (2012) 518–533

we use four clusters for 2003 and three clusters for 2004, there willbe a problem of tracking changes in the four clusters in 2003 to thethree clusters in 2004. Second, the maximum entropy value of clus-ter scores in a year depends on the number of clusters. In a yearwith three clusters, this maximum value is log23, and the maximumvalue will be log24 for a year with four clusters. Due to these rea-sons, we preferred to choose the same number of clusters for eachyear. Since a four-cluster partition had more years with a preferredDavis–Bouldin index, we used a four-cluster partition for everyyear. Last, a subjective reason to partition countries into four differ-ent ICT development clusters is because the World Bank practices afour-stratum classification of GNI.

After ranking the centroids, we may label them as high, uppermiddle, lower middle and low ICT development centroids like theWorld Bank has done for the GNI classification. The correspond-ing clusters are marked with ordinal numbers of 4, 3, 2 and 1.Using cluster scores to denote the ICT development status of acountry, we can obtain useful insights about its ICT developmentpath over time. For example, cluster scores of the US for the yearsfrom 2000 to 2008 are all 4s, meaning the US belonged to thehigh ICT development cluster during the whole study period.Cluster scores for Armenia over the same period are 3, 2, 1, 1,2, 1, 1, 1 and 1, indicating that Armenia belonged to the uppermiddle cluster in 2000, the lower middle cluster in 2001 and2004, and the low cluster for all other years. Notice that the clus-ter score st(c) for country c in year t is a relative measure for thatyear only, because we conducted the data clustering procedureindependently every year. Thus, a country may have had clusterscores s2000(c) = 3 and s2001(c) = 2 when it could not keep pacewith the global ICT development in 2001.

Accordingly, cluster scores for the US have converged, and sohave the scores for Armenia. To measure the global trend, we usethe aggregate cluster score changes Dt introduced before. To min-imize the effects of big jumps (e.g., from 1 to 4), we refine our pre-vious definition of the metric as follows:

Dt ¼X136

c¼1

dtðcÞ; dtðcÞ ¼ Minð1; jstþ1ðcÞ � stðcÞjÞ;

t ¼ 2000; . . . ;2008

In other words, regardless of the extent of the gaps, we only counthow many times a country has changed its ICT status over the studyperiod. The entropy value for measuring the diversity of clusterscores in a year is computed as:

Et ¼X4

s¼1

�pt;slog2ðpt;sÞ; t ¼ 2000; . . . ;2008

where pt,s is the proportion of cluster scores s e {1,2,3,4} in year t.Using these two metrics, we plot the changes over years in Fig. 4.The convergence metric Dj has been normalized to the range (0,1)by dividing each number by 92. The largest migration of clusterscores occurred from 2003 to 2004. The smallest number of scorechanges, three, appeared in the period from 2006 to 2007. The per-iod from 2007 to 2008 had five score changes. Cluster scores migra-tion occurred before 2006 but slowed down after that year. Near theend of the study period, most countries’ ICT development levelsstabilized.

How these countries development levels converged is anotherstory. With four categories or clusters in a data set, the maximum

Page 10: The impact of ICT development on the global digital divide

S.H. Doong, S.-C. Ho / Electronic Commerce Research and Applications 11 (2012) 518–533 527

entropy value is 2. In 2001, the entropy value was close to 2, thusthe cluster scores distribution should be pretty uniform for thatyear. Unfortunately, we did not see an entropy (Et) curve that goesto zero like the convergence (Dt) curve. Thus, we cannot concludethat countries converged to the same ICT development level.

Charting the cluster scores distribution in Fig. 5, we can observethe trend of each cluster over the years. In 2000, more than half ofthe countries (70 to be exact) were in the lower middle and low ICTdevelopment clusters. Later, more than half of the countries (70)were in the upper middle or high clusters in 2008. The numberof countries in the ICT low cluster stayed about the same in 2000and 2008, but the number in the ICT high cluster grew. This growthcame at the cost of the upper middle cluster.

Fig. 6 shows the movement of centroids in the mobile phonepenetration (MP) variable. The mobile phone penetration rateincreased for all clusters over the years. The lower three clusters(upper middle, lower middle, and low) grew together, while thehigh cluster maintained a substantial lead during the study period.Even though the lower clusters had a higher rate of growth, it maytake them many more years to catch up to the level enjoyed by theleading countries. The growth trend of the Internet userpenetration (IUP) variable is similar. Both telecom financial-related

(Note: 4= high, 3=upper middle, 2= lower midd

0

10

20

30

40

50

60

70

80

2000 2001 2002 2003 200

Num

ber

of c

ount

ries

Fig. 5. Number of countries in each cluster (Note: 4 = high, 3 = upper

Note: 4= high, 3=upper middle, 2= lower mid

0

0.2

0.4

0.6

0.8

1

1.2

1.4

2000 2001 2002 2003 200

MP

vari

able

Fig. 6. The MP (mobile penetration) variable of four centroids. Note: 4 = high,

Note: 4= high, 3=upper middle, 2= lower mid

0

0.01

0.02

0.03

0.04

0.05

0.06

2000 2001 2002 2003 2004

INV

var

iabl

e

Fig. 7. The INV (Capital Investment in Telecom) variable of four centroids. Note: 4 =

variables (INV and REV) had similar trends of movement to one an-other. Fig. 7 shows the time series of the INV variable of all cen-troids. The capital investment rate moved up and down over theyears, indicating no clear trend.

5.3. ICT development trend for OECD countries

With the ICT development cluster scores at our disposal, we areready to explore the ICT development trends for a subset ofcountries. Out of the 136 countries in our data, there are 29 OECDmembers (Australia, Austria, Belgium, Canada, Czech Republic,Denmark, Finland, France, Germany, Greece, Hungary, Iceland, Ire-land, Italy, Japan, Luxembourg, Mexico, Netherlands, New Zealand,Norway, Poland, Portugal, South Korea, Spain, Sweden, Switzer-land, Turkey, United Kingdom, and the United States). During ourstudy period of 2000–2008, there were no OECD membershipchanges in our data set. Thus, this group of countries can serveas a pivotal example to examine our proposed framework.

Fig. 8 shows the convergence (Dt) and entropy (Et) curves of theOECD group over the study period. The Dt curve was normalized bydividing the raw number of cluster score changes by 6; this is thelargest change in cluster scores, which happened from 2000 to

le, and 1= low ICT development status)

4 2005 2006 2007 2008

4

3

2

1

middle, 2 = lower middle, and 1 = low ICT development status).

dle, and 1= low ICT development status

4 2005 2006 2007 2008

4

3

2

1

3 = upper middle, 2 = lower middle, and 1 = low ICT development status.

dle, and 1= low ICT development status.

2005 2006 2007 2008

4

3

2

1

high, 3 = upper middle, 2 = lower middle, and 1 = low ICT development status.

Page 11: The impact of ICT development on the global digital divide

0

0.2

0.4

0.6

0.8

1

1.2

2000 2001 2002 2003 2004 2005 2006 2007 2008

D &

E m

etri

cs

D

E

Fig. 8. Standardized convergence metric Dt and entropy metric Et of 29 OECD countries.

528 S.H. Doong, S.-C. Ho / Electronic Commerce Research and Applications 11 (2012) 518–533

2001. Because Dt was zero in 2007 and 2008, we see that the ICTdevelopment of OECD members converged. The entropy valuehas fallen from 0.994 (2000) to 0.575 (2008). Cluster scores appearin different primary clusters. Indeed, most OECD countries con-verged to the high ICT development status at the end of the studyperiod. In 2008, there were 26 members in the high ICT cluster, onemember (Canada) in the upper middle cluster and two members(Mexico and Turkey) in the low ICT cluster.

5.4. Panel data study of GNI and ICT causal effects

Up to this point, we have explored the trends in global ICTdevelopment, but we have not studied why they happened. Aswe argued before, a nation’s wealth may be an important factoraffecting its ICT development trend, a panel data analysis usingGNI and ICT cluster scores as the cause and the effect was con-ducted next. We used the software package STATA 9.2 for this pur-pose. First, we considered simple linear regression models withrandom effects and fixed effects (see Table 6).

According to Hsiao (2003), the fixed-effects model is alwaysconsistent, but may not be as efficient as the random-effects mod-el. On the other hand, the random-effects model is efficient, butmay not be consistent because it may fail to recognize correlationsbetween regressors and idiosyncratic errors. The Hausman testshows that there is a significant difference between the fixed-ef-fects coefficients and the random effects coefficients (p = .000).Thus, the fixed-effects model should be used to model our paneldata. After using dummy variables for countries and years, weran the linear regression model in Eq. (2) again with the results re-ported in Table 7. Many countries have a significant fixed effectthat controls for the specific time-invariant environment of a coun-try. For example, the relatively stable human capital of a country ispart of the country fixed effect that has an important impact on itsICT development. Although model fitness has increased substan-tially (R2 = .802), the fixed effects of country and year together ren-der our main regressor (GNI) insignificant (p = .056) with thesignificance level of a = .05. The year dummy variable for 2003has a significant negative impact, which may explain why manycountries shifted to the low cluster in 2003 (see Fig. 5).

Although linear models provide useful insights into the rela-tionship between regressors and output, our cluster scores areactually ordinal numbers. Thus, it is more natural to use panel dataanalysis with an ordered categorical outcome variable. The aboveanalysis shows that fixed-effects models are more appropriate forour data. Thus, we conducted an ordered probit analysis with fixedeffects using a STATA add-on package (reoprob) (Frechette 2001).Since a full ordered probit model with both the country and yearfixed-effects failed to converge, we ran the analysis with the yearfixed-effects only. Table 8 shows that the main regressor GNI hasa significant positive impact on the dependent variable and thedummy variable for 2003 has a significant negative impact.

Considering only significant factors at the p < .05 level in the or-dered probit model (Table 8), we have a regression model as fol-lows (see Eq. (4) and Appendix B).

EðICTctjGNIctÞ ¼ FðfctÞ ¼ 4�Uð2:113� fctÞ �Uð3:025� fctÞ�Uð4:264� fctÞ

fct ¼ 1:241GNIct � :430T4 þ 1:028T5

UðuÞ ¼Z u

�1ð1=

ffiffiffiffiffiffiffi2pp

Þe�ðw2=2Þdw

The GNI factor has a significant positive impact on the ICT status ofa country. A higher GNI value increases the expected value of ICT.Thus, wealthier countries are expected to have a higher ICTdevelopment level. In 2000, out of 35 GNI high countries, 27 of thembelonged to the ICT high cluster (ratio = .77), and in 2008, 42 of the44 GNI high countries were in the ICT high cluster (ratio = .95). Theratios of GNI upper middle countries belonging to the ICT high clus-ter were .08 and .30 in 2000 and 2008.

There is a significant negative impact on ICT in 2003 and a sig-nificant positive impact in 2004. The estimation model for 2003 isfc,2003 = 1.241GNIc,2003 � .430. Thus, countries migrated to a lowerICT level unless they had a higher GNI value to counteract thespecific year fixed-effect. On the other hand, the estimation modelfor 2004 is fc,2004 = 1.241GNIc,2004 + 1.028, reflecting a reversetrend of global ICT development in that year (see Fig. 5). Littlechange in the cluster scores distribution was observed after2004. This can be explained by the estimation model fct =1.241GNIct, t = 2005, 2006, 2007, 2008, and the relatively stableGNI classification.

5.5. Panel data models with region fixed-effects

If we want to study whether the ICT development of countriesin a geographical region has converged, we can apply proceduresused for the investigation of OECD countries. For example, wemay ask whether Asian counties, as a whole, have convergent ICTdevelopment paths by considering the convergence metric Dt andthe entropy metric Et. On the other hand, if we want to investigatehow a specific geographical region may impact the ICT develop-ment of a country, we can use a panel data model by consideringregion fixed-effects. The United Nations defines the world in termsof five geographical regions. The region of a country is fixed. To seeregion fixed-effects on ICT development, we introduce a time-invariant dummy variable Regionk for each region (see Table 9,Eq. (4) and Appendix B).

EðICTctjGNIctÞ ¼ Fðb � GNIct þ a2Region009 þ a3Region019

þ a4Region142 þ a5Region150 þ uctÞ

The ordered probit model with Region fixed effects is reported inTable 10.

Page 12: The impact of ICT development on the global digital divide

Table 6Panel data GNI-ICT regressions with a linear model.

Fixed-effects regression Random-effects regression

Estimate (Std. dev.) p Estimate (Std. dev.) p

GNI .240 (.076) .002 .620 (.043) .000Constant 1.80 (.190) .000 .859 (.122) .000

R2 = .496, model sig. = .002 R2 = .496, model sig. = .000

Note: Standard statistical test notations are used: R2 denotes the coefficient of determination and p denotes the probability of obtaining a teststatistic at least as extreme as the one actually observed, assuming that the null hypothesis is true. The popular significance level a = .05 was usedin the study.

Table 7Panel data linear model with fixed effects.

t GNI 2001 2002 2003 2004 2005 2006 2007 2008 Constant

Estimated .153 .068 .108 �.180 .404 .045 .192 .162 .175 2.031(Std. dev.) (.080) (.073) (.073) (.073) (.073) (.073) (.074) (.075) (.076) (.258)P .056 .350 .140 .014 .000 .537 .009 .030 .021 .000

Note: R2 = .802, model sig. = .000. Country effects are not shown.

Table 8Ordered probit model with year fixed-effects.

t GNI 2001 2002 2003 2004 2005 2006 2007 2008 cut1 cut2 cut3 q

Estimated 1.241 .242 .180 �.430 1.028 .038 .293 .171 .143 2.113 3.025 4.264 .627(Std. dev.) (.072) (.164) (.169) (.176) (.161) (.171) (.172) (.173) (.172) (.226) (.231) (.248) (.031)P .000 .139 .287 .015 .000 .826 .088 .321 .404 .000 .000 .000 .000

Note: Log likelihood = �962.20, model sig. = .000.

Table 9Geographical regions according to the United Nations.

UN code Region Examples of countries Number of countriesin our study

002 Africa Algeria, Egypt, South Africa 36009 Oceania Austria, Tonga 6019 Americas Barbados, Peru, US 23142 Asia China, Japan, Taiwan 38150 Europe France, Germany, UK 33

S.H. Doong, S.-C. Ho / Electronic Commerce Research and Applications 11 (2012) 518–533 529

It appears that only Region019 (the Americas) and Region150

(Europe) have significant region fixed-effects. The base region isAfrica, and so its fixed effect is not omitted. A negative region effectwas observed for the Americas region, while a positive effect wasobserved for the Europe region. Thus, if a country is in theAmericas, then it had a negative impact on its ICT developmentduring the study period. This seems to contradict the commonsense that countries in Americas generally have a better ICTdevelopment than countries in Africa. The estimation modelfct = 1.01GNIct � .544Region019 + 1.636Region150 involves the GNIfactor, so we need to interpret the result more carefully. The GNIclassifications, according to the World Bank, differ substantiallyfor countries of different regions. For example, in 2000, the averageGNI level for Americas was 2.61, while the average GNI level forAfrica was only 1.47 in the same year. Thus, the negative impactof the dummy variable Region019 can be interpreted as inefficient

Table 10Ordered probit model for region fixed-effects.

GNI Region009 Region019 Region142

Estimated 1.010 �.100 �.544 .048(Std. dev.) (.089) (.631) (.272) (.302)P .000 .874 .046 .875

Note: Log likelihood = �987.19, model sig. = .000.

utilizations of national wealth in ICT development for countriesof Americas. In contrast, countries in Europe managed to improvetheir ICT development by taking advantages of positive regionfixed-effects during the study period.

5.6. Peer effects from spatial proximity

Do countries with spatial proximity tend to have similar ICTdevelopment paths? A cross-sectional study by Agarwal et al.(2009) demonstrated the peer effects of the Internet: people inthe same region affect the propensity of Internet use by otherindividuals. Are these peer effects applicable on a country-widelevel? The last section of region fixed-effects partially investigatesthis issue. We should keep in mind that the region fixed-effects aretime invariant. In this section, we study how the ICT developmentof peer countries in the same UN defined region as a focal countrymay impact the focal country’s ICT development during the studyperiod. To examine peer effects, we need to create a variable toaggregate the ICT levels of peer countries in a region. Peer effectsare possible causes that may increase the likelihood of a particularaction based on the incidence of actions by peers (Agarwal et al.2009). Unlike social norms that are frequently perceivedsubjectively, peer effects can be more objectively measured. Agar-wal et al. (2009) used the average Internet adoption rate of peerrespondents residing in the same county as the focal respondent

Region150 cut1 cut2 cut3 q

1.636 1.731 2.591 3.800 .536(.316) (.187) (.193) (.215) (.047).000 .000 .000 .000 .000

Page 13: The impact of ICT development on the global digital divide

530 S.H. Doong, S.-C. Ho / Electronic Commerce Research and Applications 11 (2012) 518–533

to represent peer effects. Thus, peer effects are not measured bythe perception of the focal respondent.

To calculate peer effects, we define ICTPEct as the average ICTcluster score of peer countries residing in the same U.N. defined re-gion as the focal country c in year t. Similar to Agarwal et al. (2009),this peer influence is individualized within a region and updatedfor every year:

ICTPEct ¼X

m2R�fcgICTmt=ðjRj � 1Þ

where R denotes the set of countries in the region from Table 9 thatcontains country c, |R| is the number of countries in the region, andm e R � {c} indicates all other countries in the region except c. TheICT cluster score is regressed against the main regressor GNI andthe peer effects variable ICTPE. Table 11 shows that the peer effectsvariable is significant. Thus, peer effects are observed in our data.

6. Discussion

ICTs have been viewed as impacting the economic growth of acountry and improving the quality of life of the poor. During thepast decades, countries have increasingly invested in their ICTinfrastructure to enhance development. How ICT development var-ies across countries becomes a critical indicator for global digitaldivide. It is critical to investigate the trends of global ICT develop-ment for better resource allocation at the national level.

Our first research question was to explore global ICT develop-ment trends in the last decade. To answer the question, we usedata clustering and centroids ranking to summarize the ICT devel-opment status of a country into an ordinal number called a clusterscore. This idea is similar to the World Bank’s classification of a na-tion’s wealth (GNI) in four strata. Differences in cluster scoresimplicitly define digital divides between two countries. By plottingcluster scores of a country from different years, we can detect thetrend of the country’s ICT development. We apply a data clusteringapproach to demonstrate that each of the listed countries in Fig. 1had a convergent ICT development path over the study period. Thecluster scores curve stabilized after 2006 for each country. In addi-tion, by aggregating cluster score changes from all countries into aconvergence metric Dt, we can examine the convergence trend ofthe global ICT development. Most countries had a convergent ICTdevelopment path during the study period. Using the entropy met-ric Et, we see that the convergence happened at non-unique ICTdevelopment levels. Indeed, our cluster scores approach showsthat there were two heavy clusters (high and low clusters) nearthe end of the study period, and the high ICT development clustergrew at the expense of the upper middle ICT development clusterin the last decade.

Since a country’s ICT development level changed from year toyear, we cannot say that ICT investments were ineffective for thosecountries initially belonging to the lower two levels (lower middleand low) in 2000. Some of these countries have moved to the uppertwo levels through well-designed development projects. Using thescores distribution, we also see that about half of the countrieswere in the upper two levels and the other half in the lower twolevels from 2000 to 2008. Thus, the severity of global digital divideswas not reduced from this simple count of cluster members. It is

Table 11Ordered probit model for peer effects.

GNI ICTPE cut1 cut2 cut3 q

Estimated .837 1.349 4.127 5.037 6.327 .644(Std. dev.) (.099) (.117) (.289) (.297) (.325) (.029)p .000 .000 .000 .000 .000 .000

Note: Log likelihood = �956.20, model sig. = .000.

optimistic to see that the key indicator MP (mobile penetrationrate) increased for all four clusters; but the unstable capital invest-ments in telecom (INV) for the lower two levels makes the goal ofeliminating worldwide digital divides questionable. These resultsimply that the digital divide is alleviated somewhat if we measureit from the mobile phone penetration rate but capital investmentsin telecom cannot tell us an absolute answer. Since telecom invest-ments of a country usually allocate resources in the infrastructure,the consequent development from the ICT infrastructure can onlybeen realized after a few years.

After exploring the trend in global ICT development, we thenanalyzed the relationship between a nation’s wealth and its ICTdevelopment. Research has proved that these national wealth ef-fects are significant in a country’s ICT development. Thus, we useda panel data model to analyze the causal effect of GNI on the ICTdevelopment of a country. We found that GNI had a significant po-sitive impact on the ICT cluster score through ordered probitregression model. That is, countries with higher national incometended to enjoy a higher ICT development status. This finding isconsistent with prior research.

Our second question asked whether countries with differentGNI levels have different ICT development paths. Based on the re-sults from our panel data modeling, the answer is affirmative ingeneral. For example, Albania and Argentina had GNI level se-quences of 2,2,2,2,2,2,2,2,2 and 3,3,3,3,3,3,3,3,3, and ICT devel-opment paths were 1,2,3,2,3,2,3,3,3 and 3,2,3,1,2,1,4,4,4 duringour study period from 2000 to 2008. This example shows the posi-tive impact of GNI on ICT. In contrast, Armenia had a GNI level se-quence of 1,1,2,2,2,2,2,2,2, similar to that of Albania, but its ICTdevelopment path was 3,2,1,1,2,1,1,1,1, which is quite differentfrom Albania’s. This discrepancy can only be explained by a coun-try fixed-effect resulting from Armenia. Because of a convergenceissue in model estimation, the ordered probit model was not runwith country fixed-effects. With a simple linear regression model,there was a significant and negative country fixed-effect of �.947(p = .001) for Armenia.

To sum up, GNI always shows a positive impact on ICT develop-ment, and thus countries of higher GNI levels tended to have ahigher ICT development path. Individual country fixed-effectsmay also play a role in the ICT development path of a country. Cur-rently, these country fixed-effects can only be examined throughsimple linear regression models. The ordered probit model didshow year fixed-effects that match our observation of ICT develop-ment trends with cluster scores.

Finally, our third research question on peer effects investigatedthe social influence theory at the entity level of countries. The spa-tial proximity was determined with respect to the UN defined geo-graphical regions. Two countries are considered peers if theybelong to the same region. Before considering the behavior interms of ICT development status of peer countries, we first investi-gated region fixed effects, which were a natural extension of coun-try fixed-effects. We found that region fixed-effects did exist. Forexample, Americas had a negative region fixed-effect on ICT devel-opment, while Europe had a positive region fixed-effect. The Amer-icas regional effect was negative with respect to its members’ GNIcontributions on ICT development, and the Europe regional effectwas the opposite. What caused these regional effects remains anissue to be studied further.

Following the idea of social influence from Agarwal et al. (2009),we constructed a peer influence variable to measure peer effectsbased on regional spatial proximity. This peer influence variableaverages the ICT development cluster scores of peers in the sameregion. We expected higher peer effects to bring a higher ICT devel-opment path to a country, due to social influences from spatialpeers. Countries in the same regions may have more interactionsin terms of international trade, culture exchange activities, or

Page 14: The impact of ICT development on the global digital divide

S.H. Doong, S.-C. Ho / Electronic Commerce Research and Applications 11 (2012) 518–533 531

business transactions. The higher national interactions amongcountries may lead to higher social learning with respect togovernment policies and investments. Our findings validate thisexpectation by exhibiting a positive and significant impact of thepeer effects. Thus social influence theory has also been validatedat the country level for the behavior of ICT development.

7. Conclusions

The rapid development of ICT has diminished the socio-eco-nomic gap between countries. Although it is too early to claimthat ICT development will eliminate the digital divide acrosscountries altogether, ICT is certainly an influential factor thathas affected the global economy. The purpose of this researchwas to investigate ICT development across countries in the lastdecade. In this study, we collected secondary data of ICT relevantvariables of 136 countries spanned five continents from 2000 to2008. This is large data set among cross-country studies. It hasbeen a great challenge to collect secondary and country level databecause of the differences in data capture practices and the timelags of the data reports. Prior studies (Kauffman and Techatassan-asoontorn 2004, 2005b, 2005c; Kauffman and Kumar 2005, Hoet al. 2007, 2011) have showed that effective data collection iscritical to create the possibilities to study global technologydiffusion.

Our major findings are as follows. First, the first principle com-ponent score and data clustering approaches show that most coun-tries had a convergent ICT development path during from 2000 to2008. The convergence occurred at non-unique ICT developmentlevels. Both the high and low clusters have more numbers of coun-tries in ICT development after 2005 to 2008. From mobile phonepenetration, we are optimistic to see the global digital divide wassomewhat less by the end of study period. Overall, ICT develop-ment of most countries converged near the end of the study period.Second, the panel data analysis results demonstrate that countrieswith different GNI levels have different ICT development paths.Countries with higher GNI levels tend to invest more in the ICTinfrastructure and have higher ICT development path, which hasbeen confirmed in prior studies. We can infer that ICT infrastruc-ture and investment have a positive association with the level ofcountry wealth. Third, country and region fixed-effects have alsobeen found which indicate why countries with similar GNI timeseries had different ICT development paths. By considering theICT development of peer countries in the same region, we also val-idated peer effects on the ICT development of a country.

We next discuss the methodological, empirical, and theoreticalcontributions. First, our methodological contribution is that we ap-plied different approaches to probe the trends of our data. The dif-ferent methodological approaches allow us to perform empiricalestimations and with alternative assessments of multiple analysis.Our approach includes the use of k-means algorithm to clustermultivariate ICT data, which gives us a way to interpret a clusterthrough its centroid. We selected an appropriate number of clus-ters based on the Davis–Bouldin cluster validation index. Clusterscores simplified our investigation of the ICT development trendof different countries and they made the ICT development pathof a country easier to comprehend. Cluster scores also enabled apanel data study connecting the GNI level and the ICT developmentstatus of a country. Our proposed framework reduces multivariateICT data into a manageable format of ordinal cluster scores. Clusterscores gave us practical insights into the ICT development paths ofmany countries. With cluster scores, we were also able to inspectthe impact of national wealth and various fixed effects on the ICTdevelopment of a country. We last applied the panel data analysisapproach to examine whether GNI impacts the national ICT devel-opment. Overall, different methodological approaches contributed

to a pattern of results that allow us to learn more about the empir-ical patterns from our collected data.

Second, our empirical contributions are that we empiricallyexamined the trend of ICT development of 136 countries from2000 to 2008. We also examined the convergence and peer effectsof ICT development across countries. These empirical results pro-vide practical insights for government policy makers as well asthe international organizations, such as the United Nations andthe OECD. Governments and international organizations can allo-cate government and international financial resources and fundingmore effectively and institute related national policies and plansfor ICT development.

Third, the theoretical contributions are that we applied differ-ent approaches to examine the convergence and peer effects ofglobal ICT development. Whether ICT development across coun-tries has been converging or diverging has been a critical researchissue. Most of the previous studies have built theoretical modelsand have applied modeling approaches to predict convergence ordivergence. Few studies empirically examined global ICT develop-ment by a large data set. We also empirically validated peer influ-ence theory for ICT development across countries. We appliedsocial influence theory (Friedkin 1998, Agarwal et al. 2009) to ini-tially examine the peer effects among countries. We measuredthe peer effects based on regional spatial proximity, which allowsidentification of the peer effects from regions. We have arguedthat disparate ICT developments created global digital divides.In this study, our focus was on the analysis of global ICT develop-ment. Thus, it is natural to ask: how can we apply the knowledgewe learn from the analysis of ICT development to issues related toglobal digital divides? This knowledge gap between ICT develop-ment and digital divides should be alleviated for the theorydeveloped in the study to have practical use in improving thequality of life.

There are several limitations of this study that are worthwhileto point out. First, due to the lack of available data, this study islimited in that it only analyzed four ICT indicators. Although cov-ered more countries and a longer period of time, missing data con-strained our ability to examine other variables in our data set. It isalways a tradeoff to struggle with the coverage and size of a dataset. Since our focus in this study is to explore the trends over time,we chose to use the larger data set. Future research can focus oncollecting and analyzing more indicators. Second, we used an heu-ristic method based on subjective judgments of the importance ofvarious indicators to rank multivariate centroids from the k-meansalgorithm. Future research may develop more objective procedureto compare two centroids. Third, peer effects in this study werelimited to peers defined by spatial proximity. As social networkanalysis becomes more prevalent in general social studies, wemay need to consider peer effects resulting from different socialties or culture differences. For example, peers may be defined interms of economic or cultural proximity. Do countries involvedin a social network created by economic relations also exhibit peereffects from this network? Future studies can also investigate thesetypes of peer effects from non-spatial proximity.

Acknowledgments

The authors thank the program chairs, the anonymous review-ers, and audience for helpful remarks on our research at the 2011International Conference on Electronic Commerce in Liverpool, UKin August 2011, where an earlier version of this research was pre-sented. We benefited from the comments of Fred Riggins and RobKauffman, who provided helpful comments and inputs. Shu-ChunHo and Shing H. Doong thank the National Science Council in Tai-wan for partially supporting the research under Grant Nos. NSC97-2410-H-017-034-MY2 and NSC99-2410-H-366-006-MY2.

Page 15: The impact of ICT development on the global digital divide

532 S.H. Doong, S.-C. Ho / Electronic Commerce Research and Applications 11 (2012) 518–533

Appendix A. Notation table

Notation

Definition Comments

wi

A generic univariatevariable

The index i runs from 1to n

K

The number of clustersin data clustering orthe number of meansin a multiple meansbased variance formula

In the data clusteringstep, we set k = 4 to get4 clusters

j, j0

A generic clusternumber

Must be an integerbetween 1 and k

j(i)

Assigns wi to mean lj(i)

in a multiple meansbased variance formula

Must be an integerbetween 1 and k

xi

A generic multivariatevariable used in dataclustering

Conceptually, thisdenotes themultivariate variablefor ICT development

gj

The j-th multivariatecentroid

j is an integer between1 and k

Pj

The cluster representedby centroid gj

j is an integer between1 and k

c

A generic index usedfor alphabeticallysorted countries

In our data set, c isbetween 1 and 136

t

A generic index usedfor years in the studyperiod

In our data set, t isbetween 2000 and2008

st(c)

The cluster score ofcountry c in year t

With our data,st(c) e {1, 2, 3, 4}

dt(c)

The absolute difference|st+1(c) � st(c)|

This measures thechange of clusterscores in twosuccessive years

Dt

The aggregate clusterscore change from allcountries in year t, i.e.,P

cdt(c)

The convergencemetric used to checkfor the convergence ofcluster scores of manycountries

Et

The entropy of clusterscores distribution inyear t

The entropy metricused to detectconcentrations ofcluster scores

ICTct

The cluster score ofcountry c in year t

Same as st(c)

GNIct

The GNI classificationfrom the World Bank ofcountry c in year t

Used as the mainregressor in most paneldata models

C1,C2, . . . ,CN

Dummy variables (0 or1) for country fixedeffects

N = 136 for our data set

T1,T2, . . . ,TM

Dummy variables (0 or1) for time (year) fixedeffects

M = 9 for our data set

E(ICTct|GNIct)

The expected value ofICT given GNI

See Appendix B forformula derivation

uct

Idiosyncratic errors Used in panel datamodels

fct

A linear combination ofall regressors includingGNI and dummyvariables of various fix-effects

The total contributionfrom all regressors inpanel data models

Appendix A (continued)

Notation

Definition Comments

yct

fct + uct Sum of thecontribution fromregressors andidiosyncratic errors

Dj

maxj–j0{Dj;j0} Dj;j0 is the within-to-between cluster spreadfor clusters j and j0 inDavis–Bouldin index

distj;j0

Distance between thecentroids of clusters jand j’

It measures thebetween-clusterdissimilarity in Davis–Bouldin index

Regionk

Dummy variable (0 or1) for region fixedeffects

k can be 002, 009, 019,142 and 150 accordingto the UN

ICTPEct

A panel data variablerepresenting ICT peereffects for country c inyear t.

Similar to the peereffect used in Agrawalet al. (2009)

Appendix B. Derivation of Eq. (4)

The derivation of Eq. (4) runs as follows. If we assume fourlevels of ICT development status (1 = low, 2 = low middle,3 = upper middle, 4 = high), that is, ICTct e {1,2,3,4}, then anordered probit or logit regression model is given by a hiddenvariable yct = b � GNIct + a2C2 + � � � + aNCN + c2T2 + � � � + cTTM + uct,three cut points cut1 < cut2 < cut3 and the following formulae(Hsiao 2003):

PrðICTct ¼ 1Þ ¼ Prðyct < cut1ÞPrðICTct ¼ 2Þ ¼ Prðcut1 6 yct < cut2ÞPrðICTct ¼ 3Þ ¼ Prðcut2 6 yct < cut3ÞPrðICTct ¼ 4Þ ¼ Prðcut3 6 yctÞ

When the idiosyncratic error uit assumes a standard logistic proba-bility density function eu/(1 + eu)2, we have the logit model. In thefollowing, we assume a probit model. uit assumes a standard normalprobability density function uðuÞ ¼ ð1=

ffiffiffiffiffiffiffi2pp

Þe�ðu2=2Þ. Let UðuÞ ¼R u�1uðwÞdw denote the cumulative distribution function. Denoting

fct = b � GNIct + a2C2 + . . . + aNCN + c2T2 + . . . + cMTM and from yct <cut1, we obtain an equivalent event uct < cut1 � fct. Therefore,

Pr(ICTct = 1) = Pr (yct < cut1) = Pr(uct < cut1 � fct) = U(cut1 � fct).Similarly, we obtain Pr (ICTct = 2) = U(cut2 � fct) �U(cut1 � fct),Pr(ICTct = 3) = U(cut3 � fct) �U(cut2 � fct) and Pr(ICTct = 4) = 1 �U(cut3 � fct). Finally, the expected value of ICT given GNI is

EðICTctjGNIctÞ ¼ Uðcut1 � fctÞ þ 2ðUðcut2 � fctÞ �Uðcut1 � fctÞÞþ 3ðUðcut3 � fctÞ �Uðcut2 � fctÞÞþ 4ð1�Uðcut3 � fctÞÞ ¼ 4�Uðcut1 � fctÞ�Uðcut2 � fctÞ �Uðcut3 � fctÞ ¼ FðfctÞ¼ Fðb � GNIct þ a2C2 þ � � � þ aNCN þ c2T2 þ � � � þ cMTÞ

References

Agarwal, R., Animesh, A., and Prasad, K. Social interactions and the ‘‘digital divide’’:explaining variations in Internet use. Information Systems Research, 20, 2, 2009,277–294.

Andrianaivo, M. and Kpodar, K. ICT, financial inclusion, and growth: evidence fromAfrican countries, International Monetary Fund, 2011. Working paper. Availableat www.imf.org/external/pubs/ft/wp/2011/wp1173.pdf.

Page 16: The impact of ICT development on the global digital divide

S.H. Doong, S.-C. Ho / Electronic Commerce Research and Applications 11 (2012) 518–533 533

Banker, R., Mitra, S., and Sambamurthy, V. The effects of digital trading platforms oncommodity prices in agricultural supply chains. MIS Quarterly, 35, 3, 2011,599–611.

Barro, R. J., and Sala-i-Martin, X. Economic Growth, 2nd edition. MIT Press,Cambridge, MA, 2004.

Bélanger, F., and Carter, L. The impact of the digital divide on e-government use.Communications of the ACM, 52, 4, 2009, 132–135.

Bresnahanm, T. F., and Trajtenberg, M. General purpose technologies: engines ofgrowth? Journal of Econometrics, 65, 1995, 83–108.

Chinn, M. D., and Fairlie, R. W. The determinants of the global digital divide: across-country analysis of computer and Internet penetration. Oxford EconomicPapers, 59, 1, 2007, 16–44.

CINIC. Statistical survey report on Internet development in China. China InternetNetwork Information Center, Beijing, China, January, 2010.

Corrocher, N., and Ordanini, A. Measuring the digital divide: a framework for theanalysis of cross-country differences. Journal of Information Technology, 17, 1,2002, 9–19.

Cuervo, M. R. V., and Menendez, A. J. L. A multivariate framework for the analysis ofthe digital divide: evidence for the European Union. Information & Management,43, 6, 2006, 756–766.

Crenshaw, E. M., and Robinson, K. K. Jump-starting the internet revolution: howstructural conduciveness and global connections help diffuse the internet.Journal of the Association of Information Systems, 7, 1, 2006, 4–18.

Dasgupta, S., Lall, S., and Wheeler, D. Policy reform, economic growth, and thedigital divide: an econometric analysis. Working paper 2567, World Bank,Washington, DC, 2001.

Davis, D., and Bouldin, D. A cluster separation measure. IEEE Transactions in PatternAnalysis and Machine Intelligence, 1, 1979, 224–227.

Deb, K. Multi-Objective Optimization Using Evolutionary Algorithms. John Wiley andSons, New York, 2001.

Dewan, S., Ganley, D., and Kraemer, K. L. Across the digital divide: a cross-countryanalysis of the determinants of IT penetration. Journal of the Association ofInformation Systems, 6, 12, 2005, 409–432.

Dewan, S., Ganley, D., and Kraemer, K. L. Complementarities in the diffusion ofpersonal computers and the internet: implications for the global digital divide.Information Systems Research, 21, 4, 2010, 925–941.

Dewan, S., and Kraemer, K. Information technology and productivity: evidence fromcountry-level data. Management Science, 46, 4, 2000, 548–562.

Emrouznejad, A., Cabanda, E., and Gholami, R. An alternative measure of the ICT-opportunity index. Information and Management, 47, 2010, 246–254.

Fathian, M., Akhavan, P., and Hoorali, M. E-readiness assessment of non-profit ICTSMEs in a developing country: the case of Iran. Technovation, 28, 9, 2008,578–590.

Fazio, M., Simone, A., Gregori, E., and Riccardini, F. Measuring the Digital Divide, theDigital Divide: Enhancing Access to ICTs. OECD, Paris, France, 2000.

Frechette, G. sg158: Random-effects ordered probit. Stata Technical Bulletin, 59,2001, 23–27.

Friedkin, N. A Structural Theory of Social Influence. Cambridge University Press,Cambridge, UK, 1998.

Guerrieri, P., and Padoan, P. C. Modeling ICT as a general purpose technology:evaluation models and tools for assessment of innovation and sustainabledevelopment at the EU level. 35, 2007.

Han, J., and Kamber, M. Data Mining: Concepts and Techniques, 2nd edition. ElsevierMorgan Kaufmann, San Francisco, 2007.

Hanafizadeh, M., Saghaei, A., and Hanafizadeh, P. An index for cross-countryanalysis of ICT infrastructure and access. Telecommunications Policy, 33, 7, 2009,385–405.

Hausman, J. Specification tests in econometrics. Econometrica, 46, 6, 1978,1251–1271.

Heeks, R. ICT4D 2.0: the next phase of applying ICT for international development.IEEE Computer, 41, 6, 2008, 26–33.

Ho, S. C., Kauffman, R. J., and Liang, T. P. A growth theory perspective on B2C e-commerce growth in Europe: an exploratory study. Electronic CommerceResearch and Applications, 6, 3, 2007, 237–259.

Ho, S. C., Kauffman, R. J., and Liang, T. P. Internet-based selling technology and e-commerce growth: a hybrid growth theory approach with cross-modelinference. Information and Technology Management, 12, 3, 2011, 409–429.

Hsiao, C. Analysis of Panel Data, 2nd edition. Cambridge University Press, Cambridge,UK, 2003.

IDC. Information society index. Framingham, MA, 1995. Available at www.idc.com/groups/isi/main.html.

ITU. World telecommunication development report: access indicators for theinformation society. Geneva, Switzerland, 2003.

Johnson, R. A., and Wichern, D. W. Applied Multivariate Statistical Analysis, 6thedition. Pearson Prentice Hall, Englewood Cliffs, NJ, 2007.

Kauffman, R. J., and Kumar, A. The role of MAR and Jacob externalities in the growthof IT industry clusters. Working paper, MIS Research Center, Carlson School ofManagement, University of Minnesota, Minneapolis, MN, 2005.

Kauffman, R. J., and Techatassanasoontorn, A. A. Is there a global digital divide fordigital wireless phone technologies? Journal of the Association of InformationSystems, 6, 12, 2005, 338–382.

Kauffman, R. J., and Techatassanasoontorn, A. A. Does one standard promote fastergrowth? In R. Sprague (ed.), Proceedings of the 37th Hawaii InternationalConference on Systems Science, Kona, HI, January 2004, IEEE Computing SocietyPress, Los Alamitos, CA, 2004.

Kauffman, R. J., and Techatassanasoontorn, A. A. Is there a global digital divide fordigital wireless technologies? Journal of the Association for Information Systems,6, 12, 2005, 338–382.

Kauffman, R. J., and Techatassanasoontorn, A. A. International diffusion of digitalmobile technology: a coupled-hazard state-based approach. InformationTechnology and Management, 6, 2, 2005, 253–292.

Gibbs, J. L., Kraemer, K. L., and Dedrick, J. Environment and policy factors shapingglobal e-commerce diffusion: a cross-country comparison. The InformationSociety, 19, 1, 2003, 5–18.

Mahajan, M., Nimbhorkar, P., and Varadarajan, K. The planar k-means problem isNP-hard. Lecture Notes in Computer Science, 5431, 2009, 274–285.

Meso, P., Musa, P., Straub, D., and Mbarika, V. Information infrastructure,governance, and socio-economic development in developing countries.European Journal of Information Systems, 18, 2009, 52–65.

Niles, S., and Hanson, S. A. New era of accessibility. Journal of the Urban and RegionalInformation Systems Association, 15, 1, 2003, 35–40.

OECD. Guide for measuring the information society. Paris, France, 2005.OECD. Understanding the digital divide. Paris, France, 2001.Okoli, C., Mbarika, V. W. A., and McCoy, S. The effects of infrastructure and policy on

e-business in Latin America and Sub-Saharan Africa. European Journal ofInformation Systems, 19, 2010, 5–20.

Papaioannou, S. K., and Dimelis, S. P. Information technology as a factor of economicdevelopment: evidence from developed and developing countries. Economics ofInnovation and New Technology, 16, 3, 2007, 179–194.

Rice, R. E., and Katz, J. E. Comparing internet and mobile phone usage: digital dividesof usage, adoption, and dropouts. Telecommunications Policy, 27, 8–9, 2003,597–623.

Sacchi, A., Giannini, E., Bochic, R., Reinhard, N., and Lopes, A. B. Digital inclusion withthe Internet: would you like fries with that? Communications of the ACM, 52, 3,2009, 113–116.

Sciadis, G. (Ed.) From the Digital Divide to Digital Opportunities: MeasuringInfostates for Development. ITU, Geneva, Switzerland, 2005.

Seo, H.-J., Lee, Y. S., and Oh, J. H. Does ICT investment widen the growth gap?Telecommunications Policy, 33, 8, 2009, 422–431.

Shirazi, F., Gholami, R., and Higo’n, A. The impact of information andcommunication technology (ICT), education and regulation on economicfreedom in Islamic Middle Eastern countries. Information and Management, 46,8, 2009, 426–433.

Talukdar, D., and Gauri, D. K. Home Internet access and usage in the USA: the trendsin the socio-economic digital divide. Communications of the Association forInformation Systems, 28, 1, 2011, 85–98.

Tcheng, H., Huet, J. M., Viennois, I., and Romdhane, M. Telecoms and development inAfrica: the chicken or the egg? Convergence Letter, 8, 16, 2007.

UNCTAD. E-commerce and development report 2001. Publication No. E.01.II.D.30,United Nations, New York, NY, 2001.

UNCTAD. Information economy report 2009: trends and outlook in turbulenttimes. Publication sales No. E.09.II.D.18, United Nations, New York, NY,2009.

UNCTAD. Information economy report 2010: ICTs, enterprises and povertyalleviation. Publication sales No. E.10.II.D.17, United Nations, New York, NY,2010.

UNESCO. Measuring and monitoring the information and knowledge societies: astatistical challenge. UNESCO, Paris, France, 2003.

Vicente, M. R., and Lopez, A. J. Accessing the regional digital divide across theEuropean Union. Telecommunications Policy, 35, 3, 2011, 220–237.

Witten, I., and Frank, E.. Data Mining, Practical Machine Learning Tools andTechniques. Morgan Kauffman, San Francisco, CA, 2005.

World Bank. Country and lending groups. Washington, DC, 2009. Available atdata.worldbank.org/about/country-classifications/country-and-lending-groups.

Zorn, M. A remark on method in transfinite algebra. Bulletin of the AmericanMathematical Society, 41, 10, 1935, 667–670.