demonstration tpredictor: a micro-blog based system for teenagers' stress...
TRANSCRIPT
tPredictor: A Micro-blog Based System for Teenagers’Stress Prediction
Jing Huang, Qi Li, Zhuonan Feng, Yiping Li, Ling FengDept. of Computer Science and Technology
Centre for Computational Mental Healthcare Research, Institute of Data ScienceTsinghua University, Beijing 100084, China
{j-huang14,liqi13}@mails.tsinghua.edu.cn, [email protected],[email protected]
ABSTRACTToo much stress is easy to do harm to the physical andpsychological health of teenagers, because most teenagersneither have the ability to cope with the stress by them-selves nor like seeking adults’ help initiatively. Social mediahas demonstrated its feasibility in detecting teenagers’ stresswith the micro-blog having become a popular channel forteenagers’ self-expression. In this demonstration, we presenta system called tPredictor, which can predict teenagers’ fu-ture stress based on the social media micro-blog. Two ques-tions are to be resolved: (1) what will the stress level ofthe teenager(s) be in the next time unit? (2) how will thestress level of the teenager(s) change (increase, decrease, re-main unchanged) in the next time unit? tPredictor tack-les the above prediction questions, taking into account theinfluence of future stressful events on teenagers’ emotion.Considering the similarity of stock price movement and thestress level change, we define the stress candlestick chartsand the stress reversal signals for stress change trend pre-diction. tPredictor can predict the stress of both individ-uals and a group of teenagers, which provides a platformfor teenagers themselves, their parents or some institutionssuch as schools to know teenagers’ future stress for takingmeasures timely to prevent the serious consequences.
KeywordsMicro-blog, teenager, psychological pressure, prediction
1. INTRODUCTIONNowadays, teenagers are inevitably suffering much stress
from various aspects. A survey, conducted by the Ameri-can Psychological Association in August 2013, showed thatthe teenagers were the most stressed-out age group in theU.S [1]. Due to the unsoundness of teenagers’ psycholog-ical regulation mechanism, too much stress contributes toteenagers’ bad consequences more easily such as sleepless-
c⃝2016, Copyright is with the authors. Published in Proc. 19th Inter-national Conference on Extending Database Technology (EDBT), March15-18, 2016 - Bordeaux, France: ISBN 978-3-89318-070-7, on OpenPro-ceedings.org. Distribution of this paper is permitted under the terms of theCreative Commons license CC-by-nc-nd 4.0
ness, depression and even suicide. However, for most teenager-s, they lack experience and can’t realize the seriousness oftheir stress so that they seldom actively seek for adults’ help.And for guardians, parents can’t be around and focus onteenagers every minute and even abundant teachers are notsufficient to attend to each student. Therefore, it is very use-ful to find methods for teenagers to understand their ownstress, for parents to timely know teenagers’ stress situa-tion when teenagers don’t initiatively reveal their stress tothem and for teachers to be able to get hold of all the s-tudents’ stress status. Nowadays, micro-blog has become amajor channel for teenagers’ self-expression. Teenagers postso many tweets expressing their personal emotions everyday,which provides abundant available data to detect teenagers’stress and further predict their stress change. In the lit-erature, many researchers have studied using micro-blog toanalyze people’s mental health. [8] found the differencebetween depressed and non-depressed people through ana-lyzing their tweeting contents and behaviors . [9] proposeda depression detection model based on the sentiment anal-ysis of the micro-blog. [10, 5] provided a machine learn-ing method to detect teenagers’ stress of study, affection,inter-personal and self-cognition. However, all the studiesaimed to detect the existing psychological problems wherethe harm has done in fact. To prevent bad consequences,[3] investigated people’s social media behaviors and built astatistical classifier to predict the depression. Our previouswork focused on the teenagers group and predicted theirfuture stress using a stress level time series detected frommicro-blog. It integrated the stressful event to predict fu-ture stress level and defined the stress candlestick chart topredict future stress general change trend [7, 6].
In this demonstration, we present a system called tPredictorto predict teenagers’ future stress value and change trend.It is a system implementation of our previous work [7, 6]which can visualize the prediction results and enable usersto easily understand the stress prediction results. For thefunctionalities, we further extend the system to the groupstress prediction. It aggregates all the prediction results ofteenagers and presents some statistical results of the groupstress prediction. It also picks up the teenagers needing helpbased on out predicted future stress status from a group ofteenagers. tPredictor provides a straightforward way forusers to obtain the most important information, as well asget hold of the overall stress situation and meanwhile knowwell each teenager’s stress status. Therefore, our system canbe applied to both institutional users, such as schools and
Demonstration
Series ISSN: 2367-2005 612 10.5441/002/edbt.2016.61
StressSequences
Teenagers Tweets
StressPrediction
Stress Value
Prediction
Stress Trend
Prediction
DataPreparation
Stress SequenceAggregation
Missing DataImputation
Stress Candlestick
Chart Generation
User Requests
User
Prediction Results
UserInteraction
Result Presentation
Figure 1: System Architecture
individual users such as teenagers themselves and parents.
2. SYSTEM ACHITECTUREFig.1 shows the system framework including the User In-
teraction, the Data preparation and the Stress Prediction.
2.1 User InteractionUser Requests receives users’ requests including adding
new teenagers to their concerning list, choosing teenagersto be predicted, and setting prediction parameters like thetime granularity and the event information.Result Presentation aims to visualize the prediction
results for users’ easy understanding. It analyzes the groupprediction results and picks up the most important informa-tion to users. In details, it aggregates teenagers by differentpredicted stress values and change trends, and presents theirdistributions through lucid charts, which helps users get holdof the overall stress situation. Besides, it demonstrates eachteenager’s predicted stress together, which enables users todo the comparative analysis and quickly find the teenagerswhose future stress will be severe.
2.2 Data PreparationThe Data preparation preprocesses the original stress se-
quences which detected from teenagers’ micro-blog [10].Stress Sequence Aggregation aggregates the tweets’
stress sequence with different time granularity (day, month,etc.) to obtain six stress-related indexes. The stress se-quence is the Stress(T ): (t1, l1), (t2, l2)...(tn, ln), where tirepresents the time of ith tweet, and the li represents thestress level of the ith tweet. The six stress-related indexesinclude the max stress level Lmax, the min stress level Lmin,the average stress level Lavg, the sum stress level Lsum, thenumber of stress tweets Lscount, the total number of tweetsLcount, and the proportion of stress tweets Lproportion in theaggregated time interval.Missing Data Imputation aims to solve the missing
data problem which is because of the casualness of users’tweeting time and frequency. We apply the Gaussian Pro-cess Regression to do the data imputation. The adjacentdata, including the stress value before and after the miss-ing data, will be used to inference the missing value. The(∆t, L) , where ∆t and L denote the time distance and thestress level of the adjacent time unit, is used as the inputfeature vector form. We use non-missing data as the train-ing data and then get the estimated value of the missingdata through the trained model. Our experiments showedthat the Gaussian Process Regression performed better thanother imputation methods including the nearest mean ap-proach, the linear interpolation and exponential smoothing.
First
Last
1 2 3 4 5 6 7 8 9
Last
First
Long upper
shadow
Big Stress decrease
First=Last
Stress decrease
Decreasing
candlestick
Special
Candlesticks
Last
First
Long lower
shadow
Big Stress increase
First=Last
Stress increase
Increasing
candlestick
Special
Candlesticks
Last
First
1 2 3 4 5 6 7 8 9
First
stress level
Last
stress level
Lowest
stress level
Highest
stress level
Body
Upper
shadow
Lower
shadow
(a) The Structure (b) The Decreasing Reversal Signals (c)The Increasing Reversal Signals
Figure 2: The Stress Candlestick Charts
Stress Candlestick Chart Generation generates thestress candlestick chart SC and its corresponding featurevector SCF . The stress candlestick derives from the can-dlestick chart of stock price due to some similarities be-tween stock price movement and stress level change. Forinstance, the stock price is influenced by the game betweensellers and buyers and the related events of companies whilethe stress level is influenced by the personal self-regulationmechanism and the stressful events. The stress candlestickchart SC is defined as (Lfirst, Llast, Lhigh, Llow), namely thefirst, last, highest and lowest stress level in the time unit re-spectively. The stress candlestick chart feature vector SCFis defined as a five tuples (shape, bodylen, upperlen, low−erlen, changeslope) which represents the shape, body length,uppershadow length, lowershadow length of the SC as Fig. 2(a) shows and the stress change rate between two SCs.
2.3 Stress Prediction
2.3.1 Stress Value PredictionStress Value Prediction aims to predict the max, min and
average stress level of the next time unit. The three stressvalues can comprehensively represent the teenagers’ stressstatus and are also the major concerns to their followers.
Feature-Aware Time Series Prediction. Accordingto Granger Causality analysis, the Lmax, Lmin and Lavg
are related to not only their past values but also the oth-er stress-related indexes such as the Lsum, Lscount, Lcount
and Lproportion. Based on the confidence of 95%, we findthat the Lmax is correlated to Lsum, Lmin is correlatedto (Lproportion, Lcount), and Lavg is correlated to (Lsum,Lproportion). We apply the seasonal Autoregressive Integrat-ed Moving Average (SVARIMA)[2] approach to our prob-lem. The key function of the stress value prediction is:
Ln+1 = C +
k−1∑i=0
AiLn−i +
k−1∑i=0
BiXn−i +
r−1∑i=0
θiεn−i + εn+1
Ln+1 is the stress value we want to predict, where L isthe past values of the corresponding predicted stress in-dexes (max, min and average). The X represents differ-ent features of the other correlated stress value sequencessuch as the (Lsum, Lproportion) to Lavg. The order k and ris determined by the Akaike’s Information Criterion (AIC)and other parameters are estimated through damped leastsquares method. εi is the white noise error terms satisfyingE(εi) = 0, and C is a constant.
Stressful Event Influence. The SVARIMA model pre-dicts stress values based on the historical stress change pat-tern. However, some happened or forthcoming stressful eventsmay add extra influence which can’t be neglected. Hence, weuse the Moving Average Convergence/Divergence(MACD)to depict the extra stressful event influence. We first find
613
TeenStressPredictor Homepage Introduction Hello Login Help
Your username Your password
Register Login
Figure 3: The System Homepage
Search
Add
Choose Teenagers
Add Teenagers
Set Parameters
Predict
Study Early
Previous Next1 2 3 4 5
Figure 4: User Requests
the stressful events and divide them into study and emotionevent sets according to the users’ tweets. Then we determinethe start and end of the event influence cycle along with thestress value: 0. In the event influence cycle, we computethe MACD of the stress values, which can get a sequencerepresenting the extra event influence. For the two eventinfluence sequence sets, we use the Generalized SequentialPattern(GSP) mining algorithm to mine the frequent se-quence with 50% confidence to represent the extra influenceof the two kinds of events. Finally, we divide equally themined sequence into early, middle and later stage, and usethe average MACD value of each stage as the adjustmentvalue, which will be added to the original value predictedby SVARIMA model, if the predicted time is in the corre-sponding stage of different kinds of stressful events.
2.3.2 Stress Trend PredictionStress Trend Prediction aims to predict the future stress
change trend including increase, decrease and remaining un-changed. We explore the candlestick chart to predict thestress trend. It outperforms the trend prediction methodusing the predicted stress values to subtract the last one,which might be a small fluctuation within the future trend.Stress Reversal Signals. Fig. 2 (b) shows the decreas-
ing reversal signals in a increasing process and Fig. 2 (c)shows the increasing reversal signals in a decreasing process.We take the decreasing reversal signals as examples for in-terpretation. For the 1-4 decreasing reversal signals, their
last stress level is lower than the first stress level, which rep-resents a decrease in the time unit of the stress candle stickand indicates a continual decrease trend in the future. Forthe 5-6 decreasing reversal signals, their max stress level ismuch higher than the last stress level. It indicates the heavystress is gradually released through teenagers’ stress regula-tion mechanism in the time unit of the stress candle stick,where the release mechanism is likely to remain effective inthe future. For the 8-9 decreasing reversal signals, their firststress level is the same as the last stress level which meansthe power of stress release balances the stress accumulation.In the past increasing process, the power of stress accumu-lation is stronger than the stress release and now they arebalanced. Therefore, the power of releasing stress may be-come stronger than accumulating stress next, which signifiesthe stress will decrease in the future.
Trend Decision Making. Let n be the current time, wetrace back to the nearest local highest or lowest stress leveland form a stress pattern Pcurrent: (SCFn−k+1, ..., SCFn)sequence, where k is the length of Pcurrent.[Case 1 ] If SCFn is not the reversal signal, the stress changetrend will continue.[Case 2 ] If SCFn is the reversal signal, we decide whetherthe stress change trend will reverse according to the past
experience. Firstly, we define the distance D(SCFi, SCF′i )
between two SCF s to find similar past stress pattern Ppast
to the current stress pattern Pcurrent.
D(SCFi, SCFj) =5∑
k=1
wkD(fik, fjk),5∑
k=1
wk = 1.
Here, fik and fjk denote feature values of SCFi and SCFj ,and wk is the parameter determined by the analytic hierar-chy process (AHP). For the nominal feature Shape of SCF ,
D(fi1, fj1) =
{1, if fi1 ̸= fj10, if fi1 = fj1.
.
For other four numeric features, D(fik, fjk) = |f′ik − f
′jk|,
j = 2, · · · , 5, where f′ik and f
′jk are the normalized fik and
fjk between 0 and 1. We set a distance threshold to judgewether two SCF s match successfully. After matching thelast SCFn−k+1, we get a past pattern Ppast with length oft. If the matchrate = k/t is higher than a threshold, weconsider the two patterns match successfully. Finally, weobtain a set of matched patterns Pmatched and observe thefuture trend after the reversal signals in Pmatched, where thestress level time series have been segmented [4] to eliminatethe influence of fluctuation to get the general change trend.We choose the majority change trends of Pmatched as thepredicted result. If there is no matched sequence, the stresschange trend will be predicted to reverse.
3. EVALUATIONWe collect stress sequences of 91 teenagers obtained by
the stress detection method whose precision is proved to be82.6% [10], for the prediction results evaluation. For thestress value prediction, integrating stressful events and S-VARIMA model reduces the MAPE (Mean Absolute Per-centage Error) of the predicted max, min and average stresslevel by 18%, 43%, and 3% separately, compared to the s-ingle SVARIMA model. For the stress trend prediction, theprecision of our method achieves 83.79% which outperform-s the time series (74.82%), MACD (63.59%) and KDJ (S-tochastic Oscillator) (66.73%) based prediction approaches.
614
Michal
Group1
Michael 2015-8-27 2015-9-22
Aug 3,2015 Sep 21,2015
Ca
nd
lestick
Ch
art
Accu
mu
latio
n
Username:
Group:
Predicted Vaule
Change Trend:
Max :
Min :
Average:
Average Show
18,Sep12,Sep 15,Sep9,Sep6,Sep3,Sep31,Aug28,Aug
Saturday, Sep 6, 2015
*Average: 3.2
First: 3
Highest: 4
Lowest: 3
Last: 2
*Accumulation: 25
*Sum of tweets: 6
Now
10
20
6
2
4
Stre
ss A
ccum
ula
tion
Stre
ss C
an
dle
stick C
ha
rt
Aug 27, 2015 Sep 22, 2015
3.6
2.0
2.4
2
4
1
3
Increase
From To
18,Sep12,Sep 15,Sep9,Sep6,Sep3,Sep31,Aug28,Aug
Now
0
Michal :
I feel so nervous about the
exam! 13:20 Sep 4, 2015
Event Type: Study
Stress Level
Event-Exam
(a) Individual Prediction Results
Increase Not-change Decrease
33.33%(8)
41.67%(10)
25.00%
(6)
Slow increasing
from low stress level
50
%(4
)
12
.5%
(1)
12.5%(1)
25%(2)
Max Average Min
Max Average Min
Name: Bob
Max: 4.6
Average: 4.1
Min: 3.7
Teenagers number
Stress level
Teenagers ID
0% 0%
8.33%
12.5% 12.5%12.5%
16.67% 16.67%
33.33%33.33%33.33%
29.16%29.16%
25%
37.5%
Stress level
Name Group Trend
Class1
Class1
Class1
Class1
Class1
Class1
Class1
Class1
Increase
Increase
Increase
Increase
Increase
Increase
Increase
Increase
Ann
Mike
Paul
Amy
Bob
John
May
Joy
Name Group Trend
Class1
Class1
Class1
Class1
Increase
Increase
Increase
Increase
Ann
Mike
Amy
Joy
Slow increasing
from high stress level
Fast increasing
from high stress level
Fast increasing
from low stress level
(b) Group Prediction Results
Figure 5: Prediction Results
4. SYSTEM DEMONSTRATIONDuring the demonstration, attendees can access tPredictor
through the browser and experience our friendly interaction.First, the user enters the home page of tPredictor and
logins with his/her account as Fig. 3 shows.After logining, the user enters the second page as Fig. 4
shows. In this page, the user can view the profile of theteenagers listed based on the user’s search. The user canclick teenagers in the list to see more detailed informationsuch as the photo and self-introduction. Besides, a newteenager can be added to his/her teenagers list. For the pre-diction, the user chooses one or a group of teenagers to bepredicted and sets the parameters including the time granu-larity and the stressful event information. Here the user setthe time granularity to be day, event type to be study andthe event stage to be early about the upcoming exam.When the user chooses one teenager to be predicted, the
prediction results are presented through two charts as Fig. 5(a) shows. The stress curve exhibits the past and predict-ed stress values of the teenager, where the original tweetscontent and event information can also be presented in thischart when the user clicks the corresponding point. Thestress candlestick chart is to demonstrate the future stresschange trend of the teenager and the detailed stress-relatedindexes are also shown in this chart.When the user chooses a group of teenagers to be pre-
dicted, he/she can see both individual prediction results oftwo charts in Fig. 5 (a) and the statistical group predictionresults as Fig. 5 (b) shows. The two doughnut charts tellthe user about the teenagers proportion of different futurestress change trends and the corresponding teenagers will belisted directly when the user clicks one part of the dough-nut chart. It helps the user easily know whose stress willincrease. The histogram shows the corresponding numberof teenagers whose future stress values are in different stresslevel range. Through the histogram, the user can understandthe overall stress situation of the group in the future. For ex-ample, the stress situation of the group is severe when mostteenagers’ future stress values are between 2-5. The line
chart shows the predicted stress value of each teenager ofthe group, through which the user can easily find teenagerswho have much higher stress level
AcknowledgementThe work is supported by National Natural Science Foun-dation of China (61373022, 61532015, 71473146) and Chi-nese Major State Basic Research Development 973 Program(2015CB352301).
5. REFERENCES[1] APA’s 2013 Stress In America survey.
http://www.apa.org/news/press/releases/stress/2013, 2013.[2] G. Box and G. Jenkins. Time series analysis:Forecasting
and Control. Holden-Day, San Francisco, 1970.[3] M. De Choudhury, M. Gamon, S. Counts, and E. Horvitz.
Predicting depression via social media. In ICWSM, 2013.[4] J. Jiang, Z. Zhang, and H. Wang. A new segmentation
algorithm to stock time series based on pip approach. InWireless Communications, Networking and MobileComputing, 2007. WiCom 2007. International Conferenceon, pages 5609–5612. IEEE, 2007.
[5] Q. Li, Y. Xue, J. Jia, and L. Feng. Helping teenagersrelieve psychological pressures: A micro-blog based system.In EDBT, pages 660–663, 2014.
[6] Y. Li, Z. Feng, and L. Feng. Using candlestick charts topredict adolescent stress trend on micro-blog. ProcediaComputer Science, 63:221–228, 2015.
[7] Y. Li, J. Huang, H. Wang, and L. Feng. Predictingteenager’s future stress level from micro-blog. In Proc. ofCBMS, 2015.
[8] M. Park, D. W. McDonald, and M. Cha. Perceptiondifferences between the depressed and non-depressed usersin twitter. In Proc. of ICWSM, 2013.
[9] X. Wang, C. Zhang, Y. Ji, L. Sun, L. Wu, and Z. Bao. Adepression detection model based on sentiment analysis inmicro-blog social network. In Trends and Applications inKnowledge Discovery and Data Mining, pages 201–213.Springer, 2013.
[10] Y. Xue, Q. Li, L. Jin, L. Feng, D. A. Clifton, and G. D.Clifford. Detecting adolescent psychological pressures frommicro-blog. In Health Information Science, pages 83–94.Springer, 2014.
615