edge caching for d2d enabled hierarchical wireless...

Research ArticleEdge Caching for D2D Enabled Hierarchical WirelessNetworks with Deep Reinforcement Learning

Wenkai Li 1 Chenyang Wang1 Ding Li1 Bin Hu2 Xiaofei Wang 1 and Jianji Ren3

1College of Intelligence and Computing Tianjin University 300350 Tianjin China2Department of Network Engineering Technical College for Deaf Tianjin University of Technology 300000 Tianjin China3Institute of Computer Science and Technology Henan Polytechnic University 454150 Jiaozuo Henan China

Correspondence should be addressed to Xiaofei Wang xiaofeiwangtjueducn

Received 15 November 2018 Revised 16 January 2019 Accepted 10 February 2019 Published 27 February 2019

Guest Editor Yubin Zhao

Copyright copy 2019 Wenkai Li et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Edge caching is a promising method to deal with the traffic explosion problem towards future network In order to satisfy thedemands of user requests the contents can be proactively cached locally at the proximity to users (eg base stations or userdevice) Recently some learning-based edge caching optimizations are discussedHowever most of the previous studies explore theinfluence of dynamic and constant expanding action and caching space leading to unpracticality and low efficiency In this paperwe study the edge caching optimization problem by utilizing the Double Deep Q-network (Double DQN) learning framework tomaximize the hit rate of user requests Firstly we obtain the Device-to-Device (D2D) sharingmodel by considering both online andoffline factors and then we formulate the optimization problem which is proved as NP-hard Then the edge caching replacementproblem is derived by Markov decision process (MDP) Finally an edge caching strategy based on Double DQN is proposed Theexperimental results based on large-scale actual traces show the effectiveness of the proposed framework

1 Introduction

With the development of network services and the sharpincreasing of mobile devices severe traffic pressure posed anurgent demand of network operator to explore the effectiveparadigm towards 5G Related works show that the requestsof top 10 video account for 80 of all traffic that isthe repeated downloads of the same content [1] Device-to-Device (D2D) content sharing is an effective methodto reduce mobile network traffic In this way users candownload required content from nearby devices and enjoydata services with low access latency [2] which can improvetheir service qualities (QoS)

In order to design an efficient caching strategy in mobilenetworks we need to achieve the statistical information ofthe user requests and sharing activities by system learningfrom the extreme volume of mobile traffic In previouswork some important factors in mobile networks (such ascontent popularity mobilemodels user preferences and userbehaviour) are assumed to be well known which is notrigorous [3] Recently a learning-based method is proposed

to jointly optimize the mobile content sharing and caching[4 5] The authors of [6] calculated the minimum unloadloss according to userrsquos request interval and explored contentcaching of small base station (SBSs) Srinivasan et al [7]used the Q-learning method to determine the load-basedspectrum optimizing the spectral sharing However tradi-tional RL technology is not feasible for the mobile networkenvironment with large state space

Motivated by this we studied the D2D edge cachingstrategy in hierarchical wireless network in order tomaximizeunloading traffic and reduce pressure through D2D com-munication And the cache replacement process is modelledby Markov decision process (MDP) Finally a Double DeepQ-network (Double DQN) based edge caching strategy isproposedThe contributions of this paper are summarized asfollows

(i) We model the D2D sharing activities by consideringboth online factor (usersrsquo social behaviours) andoffline factor (user mobility) The optimization thenis proved as NP-hard

HindawiWireless Communications and Mobile ComputingVolume 2019 Article ID 2561069 12 pageshttpsdoiorg10115520192561069

2 Wireless Communications and Mobile Computing

MNO Core

Internet

Figure 1 Illustration of edge caching architecture in D2D networks

(ii) The cache replacement problem is established byMarkov decision process (MDP) to address thecontinuousness of edge caching problem And wepropose a Double DQN-based edge caching strategyto deal with the challenge of actionstate spacesexplosion

(iii) Combined with the theoretical model real traceevaluation and simulation experimental platformthe proposed Double DQN-based edge caching strat-egy achieves better performance than some existingcaching algorithms including the least recently used(LRU) least frequently used (LFU) and first-in first-out (FIFO)

The rest of this article is organized as follows We explainedthe relevant work in the second part The third part intro-duces the systemmodelThe fourth part introduces the cacheoptimization strategy and raises the relevant problem Thefifth part introduces the details of cache strategy optimiza-tion And in the sixth part large-scale experiments based onreal tracking are carried out

2 Related Work

There are many researches on edge caching in mobile net-work For example it is studied and proposed in [8ndash10] thatadding caching to mobile network is very promising Femtocaching proposed in [11 12] and AMVS-NDN proposedby [13] are both committed to adding the cache in BSfor the purpose of unloading the traffic The authors of[14ndash16] proposed a collaborative caching strategy betweenBS which greatly improves the QoS of users In recentyears the application of intelligence in wireless networksis getting more and more attention Research in [17 18]shows that enhanced learning (RL) has great potential in

the design of BSs content caching schemes Particularlythe author proposed the base station caching replacementstrategy based on Q-learning and used multiarmed bandit(MAB) to place the cache through RL technology [17]However considering the extreme complexity of the actualnetwork environment and the maximum of the state spacetraditional RL technology is not feasible Besides all of theworks mentioned above are focused on single-level cachingwithout considering multilevel caching

Multitier caching is widely used to exploit the potentialof system infrastructure especially in web caching systems[19ndash21] and IPTV systems [22] Reference [23] focused onthe theoretical performance analysis of the content cachein HetNets which assumes that the content is in the samesize However [22 23] do not involve the design of cachingpolicies which required practical considerations in terms ofconstraints (for instance limited front-endbackhaul capac-ity diversity of content sizes) and specific characteristics ofnetwork topologies

3 System Model

As shown in Figure 1 we consider hierarchical networkarchitecture The core network communicates with N basestations via the backhaul link and the base station commu-nicates with the user via the cellular link N mobile usersare uniformly distributed 119880 = 1199061 1199062 119906119873 with a localbuffer size L119906 = 1198971199061 1198971199062 119897119906119873 users can establish directcommunications with each other via D2D links and theycan also be served by the BSs via cellular links M files arestored in the content library F = 1198911 1198912 119891119872 andtheir content sizes are denoted as L119891 = 1198971198911 1198971198912 119897119891119872 119897119891represents the size of the requested content 119891The cache stateis described by 119904119888119906119891 Here 119904119888119906119891 is binary where 119904119888119906119891= 1 denotes

Wireless Communications and Mobile Computing 3

that the user u caches the content f while 119904119888119906119891= 0 means nocaching

31 Content Popularity and User Preference The popularityof content is often described as the probability of a contentfrom the libraryFwhich is requested by all the users Denotean 119873 times 119872 popularity matrix P where 119902119906119891 = P(119902119899119898) isthe probability of user 119906119899 requests for content 119891119898 in the(119899119898)119905ℎ component In related studies the content popularityis always described by ZipF distribution as [24]

119902119906119891 = 119877minus120573119906119891sum119894isinF 119877minus120573119894 (1)

where the 119877minus120573119906119891 is popularity index that user 119906 gives to content119891 in a descending order and 120573 ge 0 is the ZipF exponentWe measured usersrsquo sharing activities by large-scale trac-

ing of D2D sharing based on Xender As shown in Figure 3[25] in the real world the matrix P changes over time (wewill introduce the tracking in detail in the sixth part) Weassume that the matrix remains constant over time and ourcaching strategy refreshes with changes of the popularitymatrix P And the period of user sharing activities canbe divided into Peak hours and Peak-off hours The cachereplacement action occurs during the Peak-off hours at eachperiod

User preference the user preference which is denoted asP119906119891 is the probability distribution of a userrsquos request for eachcontent According to the content popularity matrix P eachrow of the matrix denotes a popularity vector of a user whichreflects the preference of a user for a certain content in astatistical way Assuming that the content popularity and userpreference are stochastic we can obtain the relation

P119906119891 = 119873sum119906=1

119908119906119902119906119891 (2)

where119908119906 is the probability of user 119906 isin 119880 sending a request forvarious contents 119891 isin F given to a user request distribution119882 = [1199081 1199082 119908119873] sum119873119906=1119908119906 = 1 119908119906 isin [0 1] whichreflects the request active level of each user

32 D2D Sharing Model Under the D2D-aid cellular net-works users can select either D2D links model or cellularlinks model In the D2D links model users can request andreceive the content from the others via D2D links (eg Wi-Fi or Bluetooth) or request the content from the BSs directlyin a cellular links manner In our model the users select D2Dlinks model in advance If the requested content is not in theirown buffers (or their neighboursrsquo) the cellular links model ischosen

Tomodel the D2D sharing activities among mobile usersthe opportunistic encounter (eg user mobility meetingprobability and geographical distance) and social relation-ship (eg online relations and user preference) are twoimportant factors to be concerned about

(1) Opportunistic Encounter It is necessary to ensure thatthe distance between the two users is less than the critical

value 119889119888when the user communicates via the D2D link Sincethe devices are carried by humans or vehicles we use themeeting probability to describe the user mobility

Similar with the prior work [26] we regard 120582119906V as thecontact rate of user 119906 and V which follows the Poissondistribution and the contact event is independent of the userpreference We can obtain the opportunistic delivery as thePoisson process with rate P119906119891120582119906V If user 119906 caches content119891 in its buffer we can derive the probability 119901119906V that user Vreceives content 119891 from user 119906 before the content expires attime 119879119891 For a node pair we can derive that

119901119906V = intinfin119879119891

P119906119891120582119906V119890minusP119906119891120582119906V119910119889119910 = 1 minus 119890minusP119906119891120582119906V119879119891 (3)

However if the content 119891 is not cached in user 119906 119901119906V = 0Combined with the definition of 119904119888119906119891 we can overwrite (3)as 119901119906V = 1 minus 119890minusP119906119891120582119906119891119879119891119904119888119906119891 Hence the probability that userV cannot receive content 119891 from all the other user 119906 isin 119880 isprod119906isin119880(1minus119901119906V)Then the probability of user V receiving content119891 from user 119906 can be expressed by

119875119906V = 1 minus prod119906isin119880

(1 minus 119901119906V) = 1 minus 119890minusP119906119891119879119891 sum119906isin119880 120582119906119891119904119888119906119891 (4)

(2) Social Relationship In social relationship among usersmobile users with weak social relationship may not be willingto share the content with the others owing to the secu-rityprivacy concerns On the other hand users sometimeshave additional resource and are willing to share the contentwith others However the sharing activities may fail becauseof the hardwarebandwidth restriction (the content may betoo large or the traffic speed is too slow) Thus we considerthe social relationshipmainly depends on user preference andcontent transmission rate condition

We employ the notion of Cosine Similarity to measurethe user preference between two users and the preferencesimilarity factor 119862119906V is defined as

119862119906V = sum119891isinF 119902119906119891119902V119891radicsum119891isinF (119902119906119891)2radicsum119891isinF (119902V119891)2

forall119906 V isin 119880 (5)

Finally based on the opportunistic encounter and socialrelationship we can obtain the probability of D2D sharingbetween user 119906 and V as follows

1198751198632119863119906V = 119862119906V sdot 119875119906V forall119906 V isin 119880 forall119891 isin F (6)

wheresumVisin119880 1198751198632119863119906V le 1 forall119906 isin 119880The sumof probability ofD2Dsharing between each user and other users is less than 1

33 Association of Users and BSs Users can ask the contentdirectly from the associated local BS when the requestedcontent cannot be satisfied by D2D sharing Definition 119875119861119878119906is the cellular serving ratio which is the average probabilitythat the requests of user 119906 have to be served by local BS viabackhaul link rather than D2D communications Thus wecan obtain 119875119861119878119906 = 1 minus sumVisin119880 1198751198632119863119906V forall119906 isin 119880 In this paper


we consider that the content transmission process can befinished within the user mobility tolerant time eg beforethe user moves out of the communication range of the localBS The requested content can be satisfied from the buffer oflocal BS or obtained from the neighbour BSs via BS-BS linkas well as the Internet via backhaul link Let 119875119861119878119906119861 denote theprobability of BS 119899 serving user 119906 then we have

119875119861119878119906119861 = sum119894 119879119861119878119906119899 (119894)sum119899isinNsum119894 119879119861119878119906119899 (119894) (7)

where 119879119861119878119906119899 (119894) denote the time period of the 119894-th cellularserving from BS 119899 to user 119906 during the total sample time 119879119905119900119905Therefore we have the probability 119875119861119878119906V that user 119906 is served byBS 119899 as follows

119875119861119878119906119861119899 = 119875119861119878119906 sdot 119875119861119878119906119861 forall119906 isin 119880 forall119899 isinN (8)

Note that sum119899isinN 119875119861119878119906119899 + sum119906isin119880 1198751198632119863119906V = 1 forall119906 isin 11988034 CommunicationModel Wemodel the wireless transmis-sion delay between the User and the BS as the ratio betweenthe content size and the downlink data rate Similar to [27]the downlink data rate from BS 119899 to User 119906 can be expressedas

119903119906119899 = 119908 log2 (1 + 119902119906g1199061198991205902 + sumVisin119880119906 119902VgV119899) (9)

where119908 is channel bandwidth 1205902represents the backgroundnoise power 119902119906 is transmission power of BS 119899 to User 119906 andg119906119899 is the channel gain and is determined by the distancebetween the User 119906 and the BS 11989935 Optimization for D2D-Enabled Edge Caching ProblemMobile users can share the content via D2D communicationsUser pair 119906 and V can get the requested content 119891 from 119906 ifV has the content (eg119904119888119906119891 = 1) while V does not under theprobability of P1198632119863119906V Thus the content offload from the BSsor Internet via D2D link between 119906 and V can be obtained as1198971198911198751198632119863119906V Whether the user 119906 has the content 119891or not we canobtain the total content 1198741198632119863 via D2D sharing as

1198741198632119863 = sum119891120598F

119897119891sum119906120598119880

1198751198632119863119906V 119904119888119906119891 (1 minus 119904119888V119891) (10)

Our aiming is to maximize the total size of content offloadat users via D2D sharing while satisfying all the buffersize constraints of mobile users Formally the optimizationproblem is defined as

max 1198741198632119863119904119905 sum

119891isinF

119904119888119906119891119897119891 le 119871119906 forall119906 isin U

119904119888119906119891 isin 0 1 forall119906 isin U forall119891 isin F

(11)

where sum119891isin119865 119904119888119906119891119897119891 le 119871119906 is the buffer size constraint of all themobile usersrsquo devices and 119904119888119906119891 isin 0 1 is the caching state ineach mobile device

The optimization problem (11) is NP-hard

Proof Let e119906V119891 = 119904119888119906119891(1 minus 119904119888V119891) and e119906V119891 isin 0 1Thus we can rewrite Problem (11) as

max sum119891isinF

119897119891 sum119906isinU

1198751198632119863119906V 119904119888119906119891 (1 minus 119904119888V119891)119904119905 sum

119891isinF

119904119888119906119891119897119891 le 119871119906 forall119906 isin U

119890V119906119891 119904119888119906119891 isin 0 1 forall119906 V isin U forall119891 isin F

(12)

where sum119891isinF 119904119888119906119891119897119891 le 119871119906 is the cardinality constant of 119871119906 Itis easy to observe that Problem (11) has the same structurewith the problem formulated in [28] which has been provedas NP-hard

36 Cache Replacement Model Wemodel the cache replace-ment process as an MDP Besides we discuss the details ofthe related state space action space and reward function asfollows(1) State SpaceWe define 119904119888119894119906119891 as the content caching stateduring each decision epoch 119894 with respect to the content119891 isin F which independently picks a value from a state spaceP 119904119888119894119906119891 = 1 means content 119891 is cached in the user 119906 and119904119888119894119906119891 = 0 means the opposite In addition 119904119903119894V is introduced todenote the current requesting content from other users v inthe decision epoch 119894The state of an available user during eachdecision epoch 119894 can be represented by

z119894 = (119904119903119894V 119904119888119894119906 ) isinZdef= 1 2 119865 times times 119891isinFP (13)

(2) Action Space The system action with respect to thestate z119894 can be denoted asA(z119894) All users possess the sameaction space A as

A = a1198632119863119894 a119861119878119894 (14)

Namely the system action A(z119894) can be divided into twoparts according to their different characters as follows

(a) Requests Handled via D2D link The available cachecontrol in the adjacent users is represented by a1198632119863119894

def= [11988611986321198631198940 11988611986321198631198941 1198861198632119863119894119865 ] where 1198861198632119863119894119891 isin 0 1(119891 isin 1 119865) indicatesthat whether and which content in the local user should bereplaced by the current requesting content (11988611986321198631198940 isin 0 1)represents whether the local user makes replacement ie thecontent request is handled by the user itself

(b) Requests Handled by BSs Certainly each user can getcontent directly fromBSs when the D2D link fails tomeet therequirements 119886119878119875119894 isin 0 1 is introduced to represent this kindof action where 119886119861119878119894 = 1 means that the request is chosen tobe directly handled by BSs namely the User shall fetch thecontent from BSs(3) Reward Function Reward (utility) functionR(zA)which determines the reward fed back to the user whenperforming the action A(z119894) upon the state z119894 shall bedetermined in the interactive wireless environment to leadthe DRL agent (we will introduce it later) in users towardsachieving ideal performance Among the QoS metrics themost important is to improve the hit rate of user-requested


content Our goal is to maximize the hit rate of user requestsTherefore in our edge caching architecture we design thereward function as

R (z119894A (z119894)) = 119890119897119891 A (z119894) = 1198861198632119863119894119890minus(119897119891) A (z119894) = 119886119861119878119894 (15)

where exponential function with respect to the traffic isadopted here to guide the objective of maximizing the traffic

4 Edge Caching Policy Discussion

In the hierarchical wireless networks with cache-enabledD2D communications we explore the maximum capacity ofthe network based on the mobility and social behaviours ofusers The goal is to optimize the network edge caching byoffloading the contents to users via D2D communicationsand reducing the system cost of content exchange betweenBSs and core network via cellular links

41 Problem Formulation Based on the above analysis andcombined with (15) the optimization objective is defined as

119877long

= max119864A [ lim119868997888rarrinfin

1119868119868sum119894=1

R (z119894A ( z119894)) | z1 = z] (16)

which indicates maximizing the expected long-term rewardvalue conditioned on any initial state z1

Nevertheless in general a single-agent infinite-horizonMDP with the discounted utility (17) can be used to approxi-mate the expected infinite-horizon undiscounted value when120574120598[0 1) approaches 1119881 (zA) = 119864A [infinsum

119894=1

(120574)119894-1 sdotR (z119894A (z119894)) | z1 = z] (17)

Further we can obtain the optimal state value function V(z)for any initial state 120594 as

119881 (z) = 119881 (zAlowast) forallz isinZ (18)

In conclusion each user is expected to learn an optimalcontrol policy Alowast that maximizes V(zA) with any initialstate z The optimal control policy can be described asfollows

Alowast = argmax

A

119881 (zA) forallz isinZ (19)

5 Double DQN-Based Edge Cache Strategy

51 Reinforcement Learning Reinforcement learning is amachine learning algorithm In other words it is a wayfor an agent to keep trying to learn from mistakes andfinally to find patterns RL problems can be described asthe optimal control decision making problem in MDP RLcontains many forms among which Q-learning algorithmbased on tabular learning is commonly used Q-learning isan off-policy learning algorithm that allows an agent to learnthrough current or past experiences

In our D2D caching architecture the agent pertains to theuser senses and obtains its current cache state z119894 Then theagent selects and carries out an action A(z119894) Meantime theenvironment experiences a transition from z119894 to a new statez119894+1 and obtains a rewardR(z119894A(z119894))

According to the Bellman Equation the optimal Q-valuefunction 119876(zA) can be expressed as (20) where z = z119894is the state at current decision epoch i and the next state isz1015840 = z119894+1 after taking the actionA = A(z119894)

119876 (zA) =R (zA) + 120574 sdot sumz1015840

119875119903 z1015840 | zAsdotmax

A1015840119876(z1015840A1015840) (20)

The iterative formula of Q-function can be obtained as

119876119894+1 (zA)= 119876119894 (zA) + 120572119894sdot (R (zA) + 120574 sdotmax

A1015840119876119894 (z1015840A1015840) minus 119876119894 (zA))

(21)

where 120572119894120598[0 1) is the learning rate and the state z119894 will turnto the state z119894+1 when the agent chooses action A(z119894) alongwith the corresponding reward R(z119894A(z119894)) Based on (21)the Q-Table can be used to store the Q value of each state-action pair when the state and action space dimensions arenot high in theQ-Learning algorithm We conclude the train-ing algorithm based on the Q-Learning in Algorithm 1 Thecomplexity of theQ-learning algorithmdepends primarily onthe scale of the problemUpdating the Q value in a given staterequires determining the maximum Q value for all possibleactions in the corresponding state in the table In a givenstate if there are 119899 possible actions finding the maximumQ value requires 119899 minus 1 comparisons In other words if thereare 119899 states the update of the entire Q-table requires 119898(119899 minus1) comparison Hence the learning process in Q-Learningbecomes extremely difficult when the scenarios are with hugenetwork states and action spaces Therefore using neuralnetwork to generate Q value becomes a potential solution

52 Double Deep Q-Learning DQN is the first model thatsuccessfully combines Deep Learning with ReinforcementLearning It replaced the Q-table with the neural networkwhich effectively solved the complicated and high dimen-sional RL problems It comes in many variations the mostfamous of which is Double DQN [29] In our model weuse Double DQN to train our DRL agents in users whichis formed as shown in Figure 2 The Q-function couldbe approximated to the optimal Q value by updating theparameter 120591119894 of neural network as follows

119876 (zA) asymp 119876 ((zA) 120591119894) (22)

Experience replay is the core component of DQN It actu-ally is a memory for storing transitions with a finite size119873119898 and its stored procedures are overridden by loops Itcan effectively eliminate the correlation between trainingdata The transition sample can be represented as 119879119894 =


Initialization Q-TableIteration1 for each episode2 Initialize z3 for each step of episode4 Generatea at random5 ifa le 1205766 randomly select an action7 else8 chooseA(z) using policy derived from 119876(zA)9 Take actionA(z)10 ObtainR(zA(z)) and z1015840

11 Update Q-Table 119876(zA) larr997888 119876(zA) + 120572 sdot (R(zA) + 120574 sdotmaxΦ1015840119876(z1015840A1015840) minus 119876(zA))12 zlarr997888 z1015840

13 end for14 end for

Algorithm 1 Q-Learning-based content caching algorithm

Outside Network

Replay Memory

Loss FunctionParameter

Updating

Gradient

hellip hellip

helliphellip hellip

hellip

MainNet

TargetNet

lowast= argsmaxQ(i+1

(i+1) )

i

i

+1

(i(i) Q (i(i)) i+1)

(i(i))Q (i(i) )

maxQ(i+1 (i+1)

)

argsmaxQ(i (i) )

Figure 2 Illustration of training process

(z119894A(z119894) 119877(z119894A(z119894))z119894+1) which represents one statetransition The whole experience pool can be denoted asM = 119879119894minus119873119898+1 119879119894 Note that each DRL agent maintainstwo Q networks namely Q(zA 120591119894) and 1198761015840(zA 1205911198941015840) withnetwork Q used to choose action and network 1198761015840 to evaluateaction Besides the counterpart 120591119894 of network Q periodicallyupdates the weight parameters 1205911198941015840 of network 1198761015840

Throughout the training process the DRL agent ran-domly samples a minibatch M1015840 from the experience replayMThen at each epoch the networkQ is trained towards thedirection of minimizing the loss function as

119871 (120591119894) = 119864(zAR(zA)z1015840)isinM119894 [(R (zA) + 120574

sdot 1198761015840 (z argmaxA1015840

119876(z1015840A1015840 120591119894) 1205911015840119894))

minus 119876 (zA 120591119894))2](23)

And with (23) the gradient guiding updates of 120591 can becalculated by 120597119871(120591119894)120597120591119894 Hence Stochastic Gradient Descent(SGD) is performed until the convergence of Q networksfor approximating optimal state-action Q-function We con-clude the training algorithm based on the Double DQN inAlgorithm 2


108

107

106

105

104

103

102

101

100

24 48 72 96 120 144 168

Time (Hours)

App AggregateAudio FileFolder ImageMusic OtherVideo

Shar

eAct

ivity

Figure 3 Statistic for all sharing activities [25]

Initialization Experience replay memoryM main 119876 network with random weights 120591 target 1198761015840network with 1205911015840 = 120591 and the period of replacing target Q network 120601Iteration1 for each episode2 Initialize z3 i larr997888 04 for each step of episode5 119894 larr997888 119894 + 16 Randomly generatea8 ifa le 1205769 randomly select an action10 else11 A(z) larr997888 arg maxA(z)119876(zA(z) 120591119894)12 Take actionA(z119894)13 ObtainR(z119894A(z119894)) and z101584014 Store 119879 larr997888 (zA(z)R(zA(z))z1015840) intoM15 Randomly sample a mini-batch of transitions M1015840 isinM16 Update 120591119894 with 120597119871(120591119894)12059712059111989417 if i== 12060118 Update 120591119894101584019 119894 larr997888 020 zlarr997888 z1015840


Algorithm 2 Double DQN-based content caching algorithm

About algorithm complexity it mainly includes collect-ing transitions and executing backpropagation to train theparameters Since collecting one transition requires 119874(1)computational complexity the total computational complex-ity for collecting 119870 transitions into the replay memory is

119874(119870) Let 119886 and 119887 denote the number of layers and themaximum number of units in each layer respectively Train-ing parameters with backpropagation and gradient descentrequires the computational complexity of 119874(119898119886119887119894) where mand i denote the number of transitions randomly sampled


fitting valuesoriginal values

101

102

103

104

100

Content Ranking

00

02

04

06

08

10

12

Con

tent

Pop

ular

ity

1e-2

Figure 4 Content popularity

from the replaymemory and the number of iterations respec-tively Furthermore the replay memory and the parametersof the double deep Q-learning model dominate the storagecomplexity Specially storing 119870 transitions needs the aboutspace complexity of 119874(119870) while the parameters need theabout space complexity of 119874(119886119887)6 Experiment

In this section we evaluate the proposed cache policybased on the experimental results of the mobile applicationXender

61 DataSet Xender is a mobile APP that can realize offlineD2D communication activities It provides a new way toshare diversified content files users are interested in withoutaccessing 3G4G cellular mobile networks largely reducingrepeated traffic load and waste of network resources asa result achieving resource sharing Currently Xender hasaround 10 million daily and 100 million monthly activeusers as well as about 110 million daily content deliver-ies

We captureXenderrsquos trace for onemonth (from01082016to 31082016) including 450786 active mobile users con-veying 153482 content files and 271785952 content requests[30] As shown in Figure 4 the content popularity distribu-tion in the Xenderrsquos trace can be fitted by MZipf distributionwith a plateau factor of minus088 and a skewness factor of035

62 Parameter Settings In our simulations four BSs areemployed with maximum cover range 250 m 119892119906119899 = 306 +367log10119897119906119899 in dB [31] is taken as the channel gain modeland the channel bandwidth of each BS is set as 20 MHzThe delays of D2D link BS to MNO and MNO to Internetare 5ms 20ms and 100ms respectively Besides the total

transmit power of BS is 40Wwith serving at most 500 UsersWith respect to the parameter settings of Double DQN asingle-layer fully connected feed forward neural networkincluding 200 neurons is used to serve as the target andthe eval 119876 network Other parameter values are given inTable 1

63 Evaluation Results In order to evaluate the performanceof our caching strategy we compared it with three classiccache replacement algorithms(1) LRU replace the least recently used content(2) LFU replace the least commonly used content first(3) FIFO replace the first in content first

Figure 5 shows the performance comparison of cachehit ratio delay and traffic at F=1000 and C=100M As wecan see at the beginning of the simulation the cachingstrategy we proposed was surely at a great disadvantageamong three aspects But soon the hit rate increased andstabilized eventually This is because our reward functionis used to increase the cache hit rate thus our DRL agentis dedicated to maximizing the system hit rate It can beseen that our caching strategy is significantly 9 12and 14 higher than LRU LFU and FIFO in terms ofhit rate respectively At the same time the improvementof the hit rate has a positive impact on the delay trafficindicators and other indexes The delay of our strategy is12 17 and 21 lower than that of LRU LFU and FIFOrespectively Besides the traffic saved is 8 10 and 14respectively

In addition we explored the effect of content quan-tity on performance comparison results We compared theperformance when the number of contents is 1000 and2000 As shown in Figure 6 it can be inferred that whenthe number of contents increases the convergence of thealgorithm changes and the hit rate decreases Howeverit cannot change the overall trend of the algorithm Our


0 20 40 60 80 100

Time

DRL AlgorithmLFU Algorithm

LRU AlgorithmFIFO Algorithm

025

030

035

040

045

050

055

Hit

Rate

()

(a) Hit rate

1e5

07

08

09

10

Del

ay (m

s)

0 20 40 60 80 100

Time



(b) Delay

1e3

Cel

lula

r Tra

ffic (

MB)

50

55

60

65

70

75

45

0 20 40 60 80 100

Time



(c) Traffic

Figure 5 Performance evaluation in terms of hit rate delay and traffic with respect to the time

Table 1 Parameter Value

119865 119908[119872119867119911] 1205902[119889119861119898] M M1015840 120574 120598 120572 120601Range 1000 200 -95 5000 200 09 01 005 250

caching strategy can still perform optimally in these fouralgorithms

Finally we explored the effects of learning rate and explo-ration probability on our algorithm performances As shownin Figure 7 learning rate is 05 and 005 and explorationprobability is 01 and 05 respectively It can be seen that bothof these factors have a great impact on the cache strategymainly manifesting in convergence and performance Thuslarge numbers of experiments are performed to find an

appropriate learning rate and exploration probability for theproposed edge caching scenarios Hence in our setting120572 = 005 and 120598 = 01 are selected for achieving betterperformance

7 Conclusions

In this paper we study the edge caching strategy of layeredwireless networks Specifically we use the Markov decision


0 20 40 60 80 100

Time

025

030

035

040

045

050

055

Hit

Rate

()

F =1000F =2000

(a) Hit rate

1e5

07

08

09

10

Del

ay (m

s)

0 20 40 60 80 100

Time

F =1000F =2000

(b) Delay

1e3

Cel

lula

r Tra

ffic (

MB)

50

55

60

65

70

75

45

0 20 40 60 80 100

Time

F =1000F =2000

(c) Traffic

Figure 6 Performance comparison between F=1000 F=2000

process and Deep Reinforcement Learning in the proposededge cache replacement strategy The experimental resultsbased on actual tracking show that our proposed strategy issuperior to LRU LFU and FIFO in terms of hit rate delay andtraffic offload Finally we also explored the impact of learningrate and exploration probability on algorithm performance

In the future wersquoll focus more on the user layerrsquos impacton cache replacement (1) In the existing D2D model thetransmission process of files is not persistent and complexuser movement will lead to the interruption of contentdelivery In the future we will consider this factor in thereward function (2) The cache replacement process requiresadditional costs such as latency and energy consumption

all of which should be considered but how to quantifythese factors in the simulation experiment still needs to beexplored (3) The computing resources of user devices arelimited Although Deep Reinforcement Learning can solvethe problem of dimensional explosion it still requires alot of computing resources Therefore we will explore theapplication of more lightweight learning algorithms in D2D-aid cellular networks

Data Availability

The data used to support the findings of this study have notbeen made available because commercial reasons


0 20 40 60 80 100

Time

025

030

035

040

045

050

055

Hit

Rate

()

=005=05

(a) Hit rate under 120572 = 005 120572 = 05

025

030

035

040

045

050

055

Hit

Rate

()

0 20 40 60 80 100

Time

=01=05

(b) Hit rate under 120598 = 01 120598 = 05

Figure 7 Performance of hit rate under different parameters

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The conference version of the manuscript is firstly pre-sented in 2018 IEEE 15th International Conference onMobile Ad Hoc and Sensor Systems (MASS) Authors haveextended the work significantly by exploiting the edgecaching problem with deep reinforcement learning frame-work in this journal version This work was supportedin part by the National Key Research and DevelopmentProgram of China under grant 2018YFC0809803 and in partby the Natural Science Foundation of China under grant61702364

References

[1] X Wang M Chen Z Han et al ldquoTOSS traffic offloading bysocial network service-based opportunistic sharing in mobilesocial networksrdquo in Proceedings of the INFOCOM pp 2346ndash2354 2014

[2] M Gregori J Gomez-Vilardebo J Matamoros andD GunduzldquoWireless content caching for small cell and D2D networksrdquoIEEE Journal on Selected Areas in Communications vol 34 no5 pp 1222ndash1234 2016

[3] T Rodrigues F Benevenuto M Cha K Gummadi and VAlmeida ldquoOn word-of-mouth based discovery of the webrdquo inProceedings of the 2011 ACM SIGCOMM Internet MeasurementConference IMCrsquo11 pp 381ndash396 November 2011

[4] J Song M Sheng T Q Quek C Xu and X Wang ldquoLearningbased content caching and sharing for wireless networksrdquo IEEETransactions on Communications vol 99 pp 1-1 2017

[5] N Morozs T Clarke and D Grace ldquoDistributed heuristi-cally accelerated Q-learning for robust cognitive spectrum

management in LTE cellular systemsrdquo IEEE Transactions onMobile Computing vol 15 no 4 pp 817ndash825 2016

[6] B N Bharath K G Nagananda and H V Poor ldquoA learning-based approach to caching in heterogenous small cell networksrdquoIEEE Transactions on Communications vol 64 no 4 pp 1674ndash1686 2016

[7] M Srinivasan V J Kotagi and C S R Murthy ldquoA Q-learning framework for user QoE enhanced self-organizingspectrally efficient network using a novel inter-operator prox-imal spectrum sharingrdquo IEEE Journal on Selected Areas inCommunications vol 34 no 11 pp 2887ndash2901 2016

[8] X Wang M Chen T Taleb A Ksentini and V C M LeungldquoCache in the air exploiting content caching and deliverytechniques for 5G systemsrdquo IEEE Communications Magazinevol 52 no 2 pp 131ndash139 2014

[9] M Sheng C Xu J Liu J Song X Ma and J Li ldquoEnhancementfor content delivery with proximity communications in cachingenabled wireless networks Architecture and challengesrdquo IEEECommunications Magazine vol 54 no 8 pp 70ndash76 2016

[10] E Zeydan E Bastug M Bennis et al ldquoBig data caching fornetworkingmoving fromcloud to edgerdquo IEEECommunicationsMagazine vol 54 no 9 pp 36ndash42 2016

[11] N Golrezaei A Molisch A G Dimakis and G Caire ldquoFemto-caching and device-to-device collaboration a new architecturefor wireless video distributionrdquo IEEE Communications Maga-zine vol 51 no 4 pp 142ndash149 2013

[12] N Golrezaei K Shanmugam A G Dimakis A F Molischand G Caire ldquoFemtoCaching Wireless video content deliverythrough distributed caching helpersrdquo in Proceedings of the IEEEConference on Computer Communications INFOCOM2012 pp1107ndash1115 March 2012

[13] B Han XWang N Choi T Kwon and Y Choi ldquoAMVS-NDNAdaptivemobile video streaming and sharing inwireless nameddata networkingrdquo in Proceedings of the 2013 IEEE Conference onComputer Communications Workshops (INFOCOMWKSHPS)pp 375ndash380 April 2013

[14] K Shanmugam N Golrezaei A G Dimakis A F MolischandGCaire ldquoFemtoCaching wireless content delivery through


distributed caching helpersrdquo IEEE Transactions on InformationTheory vol 59 no 12 pp 8402ndash8413 2013

[15] X Li X Wang S Xiao and V C Leung ldquoDelay performanceanalysis of cooperative cell caching in future mobile networksrdquoin Proceedings of the 2015 IEEE International Conference onSignal Processing for Communications (ICC) pp 5652ndash5657June 2015

[16] S H Chae J Y Ryu T Q Quek and W Choi ldquoCooperativetransmission via caching helpersrdquo in Proceedings of the GLOBE-COM 2015 - 2015 IEEE Global Communications Conference pp1ndash6 San Diego CA USA December 2015

[17] J GuWWang A Huang H Shan and Z Zhang ldquoDistributedcache replacement for caching-enable base stations in cellularnetworksrdquo in Proceedings of the 2014 1st IEEE InternationalConference on Communications ICC 2014 pp 2648ndash2653Australia June 2014

[18] C Wang S Wang D Li X Wang X Li and V C LeungldquoQ-learning based edge caching optimization for D2D enabledhierarchical wireless networksrdquo in Proceedings of the 2018 IEEE15th International Conference on Mobile Ad Hoc and SensorSystems (MASS) pp 55ndash63 Chengdu China October 2018

[19] P Rodriguez C Spanner and E W Biersack ldquoAnalysis of webcaching architectures Hierarchical and distributed cachingrdquoIEEEACM Transactions on Networking vol 9 no 4 pp 404ndash418 2001

[20] H Che Y Tung and Z Wang ldquoHierarchical web cachingsystems modeling design and experimental resultsrdquo IEEEJournal on Selected Areas in Communications vol 20 no 7 pp1305ndash1314 2002

[21] K Poularakis and L Tassiulas ldquoOn the complexity of optimalcontent placement in hierarchical caching networksrdquo IEEETransactions on Communications vol 64 no 5 pp 2092ndash21032016

[22] J Dai Z Hu B Li J Liu and B Li ldquoCollaborative hierarchicalcaching with dynamic request routing for massive content dis-tributionrdquo in Proceedings of the IEEE Conference on ComputerCommunications INFOCOM 2012 pp 2444ndash2452March 2012

[23] E Bastug M Bennis and M Debbah ldquoLiving on the edgethe role of proactive caching in 5G wireless networksrdquo IEEECommunications Magazine vol 52 no 8 pp 82ndash89 2014

[24] M Hefeeda and O Saleh ldquoTraffic modeling and proportionalpartial caching for peer-to-peer systemsrdquo IEEEACM Transac-tions on Networking vol 16 no 6 pp 1447ndash1460 2008

[25] S Wang Y Zhang H Wang Z Huang X Wang and T JiangldquoLarge scale measurement and analytics on social groups ofdevice-to-device sharing in mobile social networksrdquo MobileNetworks and Applications vol 23 no 2 pp 203ndash215 2017

[26] A Balasubramanian B Levine and A Venkataramani ldquoDTNrouting as a resource allocation problemrdquo in Proceedings of theACM SIGCOMM 2007 Conference on Computer Communica-tions pp 373ndash384 August 2007

[27] X Chen L Jiao W Li and X Fu ldquoEfficient multi-user compu-tation offloading formobile-edge cloud computingrdquo IEEEACMTransactions on Networking vol 24 no 5 pp 2795ndash2808 2016

[28] T H Cormen C E Leiserson R Rivest et al An Introductionto Algorithms MIT Press Cambridge MA USA 2nd edition2001

[29] H V Hasselt A Guez and D Silver ldquoDeep reinforcementlearning with Double Q-learningrdquo in Proceedings of the AAAIpp 2094ndash2100 2016

[30] X Li X Wang P Wan Z Han and V C Leung ldquoHierarchicaledge caching in device-to-device aided mobile networks mod-eling optimization and designrdquo IEEE Journal on Selected Areasin Communications vol 36 no 8 pp 1768ndash1785 2018

[31] 3GPP ldquoFurther advancements for E-UTRA physical layeraspects (release 9)rdquo Tech Rep 36814 V120 2009

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018


Active and Passive Electronic Components

VLSI Design



Shock and Vibration


Civil EngineeringAdvances in

Acoustics and VibrationAdvances in



Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of



Journal ofEngineeringVolume 2018

SensorsJournal of



RotatingMachinery


Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018


Chemical EngineeringInternational Journal of Antennas and

Propagation




Navigation and Observation


Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom


MNO Core

Internet

Figure 1 Illustration of edge caching architecture in D2D networks

(ii) The cache replacement problem is established byMarkov decision process (MDP) to address thecontinuousness of edge caching problem And wepropose a Double DQN-based edge caching strategyto deal with the challenge of actionstate spacesexplosion

(iii) Combined with the theoretical model real traceevaluation and simulation experimental platformthe proposed Double DQN-based edge caching strat-egy achieves better performance than some existingcaching algorithms including the least recently used(LRU) least frequently used (LFU) and first-in first-out (FIFO)

The rest of this article is organized as follows We explainedthe relevant work in the second part The third part intro-duces the systemmodelThe fourth part introduces the cacheoptimization strategy and raises the relevant problem Thefifth part introduces the details of cache strategy optimiza-tion And in the sixth part large-scale experiments based onreal tracking are carried out

2 Related Work

There are many researches on edge caching in mobile net-work For example it is studied and proposed in [8ndash10] thatadding caching to mobile network is very promising Femtocaching proposed in [11 12] and AMVS-NDN proposedby [13] are both committed to adding the cache in BSfor the purpose of unloading the traffic The authors of[14ndash16] proposed a collaborative caching strategy betweenBS which greatly improves the QoS of users In recentyears the application of intelligence in wireless networksis getting more and more attention Research in [17 18]shows that enhanced learning (RL) has great potential in

the design of BSs content caching schemes Particularlythe author proposed the base station caching replacementstrategy based on Q-learning and used multiarmed bandit(MAB) to place the cache through RL technology [17]However considering the extreme complexity of the actualnetwork environment and the maximum of the state spacetraditional RL technology is not feasible Besides all of theworks mentioned above are focused on single-level cachingwithout considering multilevel caching

Multitier caching is widely used to exploit the potentialof system infrastructure especially in web caching systems[19ndash21] and IPTV systems [22] Reference [23] focused onthe theoretical performance analysis of the content cachein HetNets which assumes that the content is in the samesize However [22 23] do not involve the design of cachingpolicies which required practical considerations in terms ofconstraints (for instance limited front-endbackhaul capac-ity diversity of content sizes) and specific characteristics ofnetwork topologies

3 System Model

As shown in Figure 1 we consider hierarchical networkarchitecture The core network communicates with N basestations via the backhaul link and the base station commu-nicates with the user via the cellular link N mobile usersare uniformly distributed 119880 = 1199061 1199062 119906119873 with a localbuffer size L119906 = 1198971199061 1198971199062 119897119906119873 users can establish directcommunications with each other via D2D links and theycan also be served by the BSs via cellular links M files arestored in the content library F = 1198911 1198912 119891119872 andtheir content sizes are denoted as L119891 = 1198971198911 1198971198912 119897119891119872 119897119891represents the size of the requested content 119891The cache stateis described by 119904119888119906119891 Here 119904119888119906119891 is binary where 119904119888119906119891= 1 denotes








P119906119891 = 119873sum119906=1

119908119906119902119906119891 (2)







119901119906V = intinfin119879119891



119875119906V = 1 minus prod119906isin119880





forall119906 V isin 119880 (5)







119875119861119878119906119861 = sum119894 119879119861119878119906119899 (119894)sum119899isinNsum119894 119879119861119878119906119899 (119894) (7)






1198741198632119863 = sum119891120598F

119897119891sum119906120598119880

1198751198632119863119906V 119904119888119906119891 (1 minus 119904119888V119891) (10)


max 1198741198632119863119904119905 sum

119891isinF

119904119888119906119891119897119891 le 119871119906 forall119906 isin U


(11)




max sum119891isinF

119897119891 sum119906isinU

1198751198632119863119906V 119904119888119906119891 (1 minus 119904119888V119891)119904119905 sum

119891isinF

119904119888119906119891119897119891 le 119871119906 forall119906 isin U


(12)





A = a1198632119863119894 a119861119878119894 (14)







R (z119894A (z119894)) = 119890119897119891 A (z119894) = 1198861198632119863119894119890minus(119897119891) A (z119894) = 119886119861119878119894 (15)





119877long


1119868119868sum119894=1

R (z119894A ( z119894)) | z1 = z] (16)



119894=1

(120574)119894-1 sdotR (z119894A (z119894)) | z1 = z] (17)




Alowast = argmax

A






119876 (zA) =R (zA) + 120574 sdot sumz1015840

119875119903 z1015840 | zAsdotmax

A1015840119876(z1015840A1015840) (20)



A1015840119876119894 (z1015840A1015840) minus 119876119894 (zA))

(21)



119876 (zA) asymp 119876 ((zA) 120591119894) (22)







Outside Network

Replay Memory


Updating

Gradient

hellip hellip

helliphellip hellip

hellip

MainNet

TargetNet


(i+1) )

i

i

+1

(i(i) Q (i(i)) i+1)

(i(i))Q (i(i) )

maxQ(i+1 (i+1)

)

argsmaxQ(i (i) )




119871 (120591119894) = 119864(zAR(zA)z1015840)isinM119894 [(R (zA) + 120574

sdot 1198761015840 (z argmaxA1015840

119876(z1015840A1015840 120591119894) 1205911015840119894))

minus 119876 (zA 120591119894))2](23)



108

107

106

105

104

103

102

101

100

24 48 72 96 120 144 168

Time (Hours)


Shar

eAct

ivity









101

102

103

104

100

Content Ranking

00

02

04

06

08

10

12

Con

tent

Pop

ular

ity

1e-2












0 20 40 60 80 100

Time



025

030

035

040

045

050

055

Hit

Rate

()

(a) Hit rate

1e5

07

08

09

10

Del

ay (m

s)

0 20 40 60 80 100

Time



(b) Delay

1e3

Cel

lula

r Tra

ffic (

MB)

50

55

60

65

70

75

45

0 20 40 60 80 100

Time



(c) Traffic



119865 119908[119872119867119911] 1205902[119889119861119898] M M1015840 120574 120598 120572 120601Range 1000 200 -95 5000 200 09 01 005 250




7 Conclusions



0 20 40 60 80 100

Time

025

030

035

040

045

050

055

Hit

Rate

()

F =1000F =2000

(a) Hit rate

1e5

07

08

09

10

Del

ay (m

s)

0 20 40 60 80 100

Time

F =1000F =2000

(b) Delay

1e3

Cel

lula

r Tra

ffic (

MB)

50

55

60

65

70

75

45

0 20 40 60 80 100

Time

F =1000F =2000

(c) Traffic





Data Availability



0 20 40 60 80 100

Time

025

030

035

040

045

050

055

Hit

Rate

()

=005=05

(a) Hit rate under 120572 = 005 120572 = 05

025

030

035

040

045

050

055

Hit

Rate

()

0 20 40 60 80 100

Time

=01=05

(b) Hit rate under 120598 = 01 120598 = 05




Acknowledgments


References





































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia









P119906119891 = 119873sum119906=1

119908119906119902119906119891 (2)







119901119906V = intinfin119879119891



119875119906V = 1 minus prod119906isin119880





forall119906 V isin 119880 (5)







119875119861119878119906119861 = sum119894 119879119861119878119906119899 (119894)sum119899isinNsum119894 119879119861119878119906119899 (119894) (7)






1198741198632119863 = sum119891120598F

119897119891sum119906120598119880

1198751198632119863119906V 119904119888119906119891 (1 minus 119904119888V119891) (10)


max 1198741198632119863119904119905 sum

119891isinF

119904119888119906119891119897119891 le 119871119906 forall119906 isin U


(11)




max sum119891isinF

119897119891 sum119906isinU

1198751198632119863119906V 119904119888119906119891 (1 minus 119904119888V119891)119904119905 sum

119891isinF

119904119888119906119891119897119891 le 119871119906 forall119906 isin U


(12)





A = a1198632119863119894 a119861119878119894 (14)







R (z119894A (z119894)) = 119890119897119891 A (z119894) = 1198861198632119863119894119890minus(119897119891) A (z119894) = 119886119861119878119894 (15)





119877long


1119868119868sum119894=1

R (z119894A ( z119894)) | z1 = z] (16)



119894=1

(120574)119894-1 sdotR (z119894A (z119894)) | z1 = z] (17)




Alowast = argmax

A






119876 (zA) =R (zA) + 120574 sdot sumz1015840

119875119903 z1015840 | zAsdotmax

A1015840119876(z1015840A1015840) (20)



A1015840119876119894 (z1015840A1015840) minus 119876119894 (zA))

(21)



119876 (zA) asymp 119876 ((zA) 120591119894) (22)







Outside Network

Replay Memory


Updating

Gradient

hellip hellip

helliphellip hellip

hellip

MainNet

TargetNet


(i+1) )

i

i

+1

(i(i) Q (i(i)) i+1)

(i(i))Q (i(i) )

maxQ(i+1 (i+1)

)

argsmaxQ(i (i) )




119871 (120591119894) = 119864(zAR(zA)z1015840)isinM119894 [(R (zA) + 120574

sdot 1198761015840 (z argmaxA1015840

119876(z1015840A1015840 120591119894) 1205911015840119894))

minus 119876 (zA 120591119894))2](23)



108

107

106

105

104

103

102

101

100

24 48 72 96 120 144 168

Time (Hours)


Shar

eAct

ivity









101

102

103

104

100

Content Ranking

00

02

04

06

08

10

12

Con

tent

Pop

ular

ity

1e-2












0 20 40 60 80 100

Time



025

030

035

040

045

050

055

Hit

Rate

()

(a) Hit rate

1e5

07

08

09

10

Del

ay (m

s)

0 20 40 60 80 100

Time



(b) Delay

1e3

Cel

lula

r Tra

ffic (

MB)

50

55

60

65

70

75

45

0 20 40 60 80 100

Time



(c) Traffic



119865 119908[119872119867119911] 1205902[119889119861119898] M M1015840 120574 120598 120572 120601Range 1000 200 -95 5000 200 09 01 005 250




7 Conclusions



0 20 40 60 80 100

Time

025

030

035

040

045

050

055

Hit

Rate

()

F =1000F =2000

(a) Hit rate

1e5

07

08

09

10

Del

ay (m

s)

0 20 40 60 80 100

Time

F =1000F =2000

(b) Delay

1e3

Cel

lula

r Tra

ffic (

MB)

50

55

60

65

70

75

45

0 20 40 60 80 100

Time

F =1000F =2000

(c) Traffic





Data Availability



0 20 40 60 80 100

Time

025

030

035

040

045

050

055

Hit

Rate

()

=005=05

(a) Hit rate under 120572 = 005 120572 = 05

025

030

035

040

045

050

055

Hit

Rate

()

0 20 40 60 80 100

Time

=01=05

(b) Hit rate under 120598 = 01 120598 = 05




Acknowledgments


References





































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia




119875119861119878119906119861 = sum119894 119879119861119878119906119899 (119894)sum119899isinNsum119894 119879119861119878119906119899 (119894) (7)






1198741198632119863 = sum119891120598F

119897119891sum119906120598119880

1198751198632119863119906V 119904119888119906119891 (1 minus 119904119888V119891) (10)


max 1198741198632119863119904119905 sum

119891isinF

119904119888119906119891119897119891 le 119871119906 forall119906 isin U


(11)




max sum119891isinF

119897119891 sum119906isinU

1198751198632119863119906V 119904119888119906119891 (1 minus 119904119888V119891)119904119905 sum

119891isinF

119904119888119906119891119897119891 le 119871119906 forall119906 isin U


(12)





A = a1198632119863119894 a119861119878119894 (14)







R (z119894A (z119894)) = 119890119897119891 A (z119894) = 1198861198632119863119894119890minus(119897119891) A (z119894) = 119886119861119878119894 (15)





119877long


1119868119868sum119894=1

R (z119894A ( z119894)) | z1 = z] (16)



119894=1

(120574)119894-1 sdotR (z119894A (z119894)) | z1 = z] (17)




Alowast = argmax

A






119876 (zA) =R (zA) + 120574 sdot sumz1015840

119875119903 z1015840 | zAsdotmax

A1015840119876(z1015840A1015840) (20)



A1015840119876119894 (z1015840A1015840) minus 119876119894 (zA))

(21)



119876 (zA) asymp 119876 ((zA) 120591119894) (22)







Outside Network

Replay Memory


Updating

Gradient

hellip hellip

helliphellip hellip

hellip

MainNet

TargetNet


(i+1) )

i

i

+1

(i(i) Q (i(i)) i+1)

(i(i))Q (i(i) )

maxQ(i+1 (i+1)

)

argsmaxQ(i (i) )




119871 (120591119894) = 119864(zAR(zA)z1015840)isinM119894 [(R (zA) + 120574

sdot 1198761015840 (z argmaxA1015840

119876(z1015840A1015840 120591119894) 1205911015840119894))

minus 119876 (zA 120591119894))2](23)



108

107

106

105

104

103

102

101

100

24 48 72 96 120 144 168

Time (Hours)


Shar

eAct

ivity









101

102

103

104

100

Content Ranking

00

02

04

06

08

10

12

Con

tent

Pop

ular

ity

1e-2












0 20 40 60 80 100

Time



025

030

035

040

045

050

055

Hit

Rate

()

(a) Hit rate

1e5

07

08

09

10

Del

ay (m

s)

0 20 40 60 80 100

Time



(b) Delay

1e3

Cel

lula

r Tra

ffic (

MB)

50

55

60

65

70

75

45

0 20 40 60 80 100

Time



(c) Traffic



119865 119908[119872119867119911] 1205902[119889119861119898] M M1015840 120574 120598 120572 120601Range 1000 200 -95 5000 200 09 01 005 250




7 Conclusions



0 20 40 60 80 100

Time

025

030

035

040

045

050

055

Hit

Rate

()

F =1000F =2000

(a) Hit rate

1e5

07

08

09

10

Del

ay (m

s)

0 20 40 60 80 100

Time

F =1000F =2000

(b) Delay

1e3

Cel

lula

r Tra

ffic (

MB)

50

55

60

65

70

75

45

0 20 40 60 80 100

Time

F =1000F =2000

(c) Traffic





Data Availability



0 20 40 60 80 100

Time

025

030

035

040

045

050

055

Hit

Rate

()

=005=05

(a) Hit rate under 120572 = 005 120572 = 05

025

030

035

040

045

050

055

Hit

Rate

()

0 20 40 60 80 100

Time

=01=05

(b) Hit rate under 120598 = 01 120598 = 05




Acknowledgments


References





































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia




R (z119894A (z119894)) = 119890119897119891 A (z119894) = 1198861198632119863119894119890minus(119897119891) A (z119894) = 119886119861119878119894 (15)





119877long


1119868119868sum119894=1

R (z119894A ( z119894)) | z1 = z] (16)



119894=1

(120574)119894-1 sdotR (z119894A (z119894)) | z1 = z] (17)




Alowast = argmax

A






119876 (zA) =R (zA) + 120574 sdot sumz1015840

119875119903 z1015840 | zAsdotmax

A1015840119876(z1015840A1015840) (20)



A1015840119876119894 (z1015840A1015840) minus 119876119894 (zA))

(21)



119876 (zA) asymp 119876 ((zA) 120591119894) (22)







Outside Network

Replay Memory


Updating

Gradient

hellip hellip

helliphellip hellip

hellip

MainNet

TargetNet


(i+1) )

i

i

+1

(i(i) Q (i(i)) i+1)

(i(i))Q (i(i) )

maxQ(i+1 (i+1)

)

argsmaxQ(i (i) )




119871 (120591119894) = 119864(zAR(zA)z1015840)isinM119894 [(R (zA) + 120574

sdot 1198761015840 (z argmaxA1015840

119876(z1015840A1015840 120591119894) 1205911015840119894))

minus 119876 (zA 120591119894))2](23)



108

107

106

105

104

103

102

101

100

24 48 72 96 120 144 168

Time (Hours)


Shar

eAct

ivity









101

102

103

104

100

Content Ranking

00

02

04

06

08

10

12

Con

tent

Pop

ular

ity

1e-2












0 20 40 60 80 100

Time



025

030

035

040

045

050

055

Hit

Rate

()

(a) Hit rate

1e5

07

08

09

10

Del

ay (m

s)

0 20 40 60 80 100

Time



(b) Delay

1e3

Cel

lula

r Tra

ffic (

MB)

50

55

60

65

70

75

45

0 20 40 60 80 100

Time



(c) Traffic



119865 119908[119872119867119911] 1205902[119889119861119898] M M1015840 120574 120598 120572 120601Range 1000 200 -95 5000 200 09 01 005 250




7 Conclusions



0 20 40 60 80 100

Time

025

030

035

040

045

050

055

Hit

Rate

()

F =1000F =2000

(a) Hit rate

1e5

07

08

09

10

Del

ay (m

s)

0 20 40 60 80 100

Time

F =1000F =2000

(b) Delay

1e3

Cel

lula

r Tra

ffic (

MB)

50

55

60

65

70

75

45

0 20 40 60 80 100

Time

F =1000F =2000

(c) Traffic





Data Availability



0 20 40 60 80 100

Time

025

030

035

040

045

050

055

Hit

Rate

()

=005=05

(a) Hit rate under 120572 = 005 120572 = 05

025

030

035

040

045

050

055

Hit

Rate

()

0 20 40 60 80 100

Time

=01=05

(b) Hit rate under 120598 = 01 120598 = 05




Acknowledgments


References





































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia







Outside Network

Replay Memory


Updating

Gradient

hellip hellip

helliphellip hellip

hellip

MainNet

TargetNet


(i+1) )

i

i

+1

(i(i) Q (i(i)) i+1)

(i(i))Q (i(i) )

maxQ(i+1 (i+1)

)

argsmaxQ(i (i) )




119871 (120591119894) = 119864(zAR(zA)z1015840)isinM119894 [(R (zA) + 120574

sdot 1198761015840 (z argmaxA1015840

119876(z1015840A1015840 120591119894) 1205911015840119894))

minus 119876 (zA 120591119894))2](23)



108

107

106

105

104

103

102

101

100

24 48 72 96 120 144 168

Time (Hours)


Shar

eAct

ivity









101

102

103

104

100

Content Ranking

00

02

04

06

08

10

12

Con

tent

Pop

ular

ity

1e-2












0 20 40 60 80 100

Time



025

030

035

040

045

050

055

Hit

Rate

()

(a) Hit rate

1e5

07

08

09

10

Del

ay (m

s)

0 20 40 60 80 100

Time



(b) Delay

1e3

Cel

lula

r Tra

ffic (

MB)

50

55

60

65

70

75

45

0 20 40 60 80 100

Time



(c) Traffic



119865 119908[119872119867119911] 1205902[119889119861119898] M M1015840 120574 120598 120572 120601Range 1000 200 -95 5000 200 09 01 005 250




7 Conclusions



0 20 40 60 80 100

Time

025

030

035

040

045

050

055

Hit

Rate

()

F =1000F =2000

(a) Hit rate

1e5

07

08

09

10

Del

ay (m

s)

0 20 40 60 80 100

Time

F =1000F =2000

(b) Delay

1e3

Cel

lula

r Tra

ffic (

MB)

50

55

60

65

70

75

45

0 20 40 60 80 100

Time

F =1000F =2000

(c) Traffic





Data Availability



0 20 40 60 80 100

Time

025

030

035

040

045

050

055

Hit

Rate

()

=005=05

(a) Hit rate under 120572 = 005 120572 = 05

025

030

035

040

045

050

055

Hit

Rate

()

0 20 40 60 80 100

Time

=01=05

(b) Hit rate under 120598 = 01 120598 = 05




Acknowledgments


References





































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia



108

107

106

105

104

103

102

101

100

24 48 72 96 120 144 168

Time (Hours)


Shar

eAct

ivity









101

102

103

104

100

Content Ranking

00

02

04

06

08

10

12

Con

tent

Pop

ular

ity

1e-2












0 20 40 60 80 100

Time



025

030

035

040

045

050

055

Hit

Rate

()

(a) Hit rate

1e5

07

08

09

10

Del

ay (m

s)

0 20 40 60 80 100

Time



(b) Delay

1e3

Cel

lula

r Tra

ffic (

MB)

50

55

60

65

70

75

45

0 20 40 60 80 100

Time



(c) Traffic



119865 119908[119872119867119911] 1205902[119889119861119898] M M1015840 120574 120598 120572 120601Range 1000 200 -95 5000 200 09 01 005 250




7 Conclusions



0 20 40 60 80 100

Time

025

030

035

040

045

050

055

Hit

Rate

()

F =1000F =2000

(a) Hit rate

1e5

07

08

09

10

Del

ay (m

s)

0 20 40 60 80 100

Time

F =1000F =2000

(b) Delay

1e3

Cel

lula

r Tra

ffic (

MB)

50

55

60

65

70

75

45

0 20 40 60 80 100

Time

F =1000F =2000

(c) Traffic





Data Availability



0 20 40 60 80 100

Time

025

030

035

040

045

050

055

Hit

Rate

()

=005=05

(a) Hit rate under 120572 = 005 120572 = 05

025

030

035

040

045

050

055

Hit

Rate

()

0 20 40 60 80 100

Time

=01=05

(b) Hit rate under 120598 = 01 120598 = 05




Acknowledgments


References





































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia




101

102

103

104

100

Content Ranking

00

02

04

06

08

10

12

Con

tent

Pop

ular

ity

1e-2












0 20 40 60 80 100

Time



025

030

035

040

045

050

055

Hit

Rate

()

(a) Hit rate

1e5

07

08

09

10

Del

ay (m

s)

0 20 40 60 80 100

Time



(b) Delay

1e3

Cel

lula

r Tra

ffic (

MB)

50

55

60

65

70

75

45

0 20 40 60 80 100

Time



(c) Traffic



119865 119908[119872119867119911] 1205902[119889119861119898] M M1015840 120574 120598 120572 120601Range 1000 200 -95 5000 200 09 01 005 250




7 Conclusions



0 20 40 60 80 100

Time

025

030

035

040

045

050

055

Hit

Rate

()

F =1000F =2000

(a) Hit rate

1e5

07

08

09

10

Del

ay (m

s)

0 20 40 60 80 100

Time

F =1000F =2000

(b) Delay

1e3

Cel

lula

r Tra

ffic (

MB)

50

55

60

65

70

75

45

0 20 40 60 80 100

Time

F =1000F =2000

(c) Traffic





Data Availability



0 20 40 60 80 100

Time

025

030

035

040

045

050

055

Hit

Rate

()

=005=05

(a) Hit rate under 120572 = 005 120572 = 05

025

030

035

040

045

050

055

Hit

Rate

()

0 20 40 60 80 100

Time

=01=05

(b) Hit rate under 120598 = 01 120598 = 05




Acknowledgments


References





































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia



0 20 40 60 80 100

Time



025

030

035

040

045

050

055

Hit

Rate

()

(a) Hit rate

1e5

07

08

09

10

Del

ay (m

s)

0 20 40 60 80 100

Time



(b) Delay

1e3

Cel

lula

r Tra

ffic (

MB)

50

55

60

65

70

75

45

0 20 40 60 80 100

Time



(c) Traffic



119865 119908[119872119867119911] 1205902[119889119861119898] M M1015840 120574 120598 120572 120601Range 1000 200 -95 5000 200 09 01 005 250




7 Conclusions



0 20 40 60 80 100

Time

025

030

035

040

045

050

055

Hit

Rate

()

F =1000F =2000

(a) Hit rate

1e5

07

08

09

10

Del

ay (m

s)

0 20 40 60 80 100

Time

F =1000F =2000

(b) Delay

1e3

Cel

lula

r Tra

ffic (

MB)

50

55

60

65

70

75

45

0 20 40 60 80 100

Time

F =1000F =2000

(c) Traffic





Data Availability



0 20 40 60 80 100

Time

025

030

035

040

045

050

055

Hit

Rate

()

=005=05

(a) Hit rate under 120572 = 005 120572 = 05

025

030

035

040

045

050

055

Hit

Rate

()

0 20 40 60 80 100

Time

=01=05

(b) Hit rate under 120598 = 01 120598 = 05




Acknowledgments


References





































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia



0 20 40 60 80 100

Time

025

030

035

040

045

050

055

Hit

Rate

()

F =1000F =2000

(a) Hit rate

1e5

07

08

09

10

Del

ay (m

s)

0 20 40 60 80 100

Time

F =1000F =2000

(b) Delay

1e3

Cel

lula

r Tra

ffic (

MB)

50

55

60

65

70

75

45

0 20 40 60 80 100

Time

F =1000F =2000

(c) Traffic





Data Availability



0 20 40 60 80 100

Time

025

030

035

040

045

050

055

Hit

Rate

()

=005=05

(a) Hit rate under 120572 = 005 120572 = 05

025

030

035

040

045

050

055

Hit

Rate

()

0 20 40 60 80 100

Time

=01=05

(b) Hit rate under 120598 = 01 120598 = 05




Acknowledgments


References





































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia



0 20 40 60 80 100

Time

025

030

035

040

045

050

055

Hit

Rate

()

=005=05

(a) Hit rate under 120572 = 005 120572 = 05

025

030

035

040

045

050

055

Hit

Rate

()

0 20 40 60 80 100

Time

=01=05

(b) Hit rate under 120598 = 01 120598 = 05




Acknowledgments


References





































RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia























RoboticsJournal of




VLSI Design



Shock and Vibration







Journal of



Volume 2018



Volume 2018


Journal of




SensorsJournal of



RotatingMachinery





Propagation






Hindawi


Advances in

Multimedia


edge caching for d2d enabled hierarchical wireless...

Documents