Download - 北京富士通研发中心实习报告 邱 诚
![Page 1: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/1.jpg)
北京富士通研发中心实习报告
邱 诚
![Page 2: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/2.jpg)
报告主题富士通的工作
Auto-Regressive and Moving Average Model (ARMA)介绍
RHadoop介绍
![Page 3: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/3.jpg)
富士通的工作
研究数据选择方式; TBSC 均值法 指示性片段
优化 ARMA模型和 SVR模型;动态结合 ARMA模型和 SVR模型;
![Page 4: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/4.jpg)
均值法描述基本步骤
查找与预测天 1~9点的欧式距离最接近的五天; 将所得到的五天通过 10~20点的欧式距离进行展; 将前两步得到的全部天通过 k-means聚成两类; 挑选预测天之前最接近的同一工作日作为判定天,和两个聚类中心计算欧式距离,挑选距离较小的聚类;
将所得聚类中的各天求平均值作为预测结果。
![Page 5: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/5.jpg)
ARMA 模型介绍
ARMA模型原理
ARMA模型优化
R 中 ARMA模型的使用
![Page 6: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/6.jpg)
ARMA 基本原理Auto-Regressive model
Moving Average model
![Page 7: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/7.jpg)
0
10
20
30
40
50
60
70
80
90
100
X1X0 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
![Page 8: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/8.jpg)
ARMA 基本原理
自回归模型描述的是当前值与历史值之间的关系;
滑动平均模型描述的是自回归部分的误差累计;
ARMA模型就是通过将自回归模型的预测值与累计误差相结合;
![Page 9: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/9.jpg)
ARMA 模型的优化
Akaike’s Information Criterion (AIC)
AIC, Bias Corrected (AICc)
Bayesian Information Criterion (BIC)
以上优化都是针对通过最大似然估计进行拟合得到的ARMA模型
![Page 10: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/10.jpg)
AIC 优化指标
:代表最大似然;
:代表模型的参数个数;
![Page 11: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/11.jpg)
R 中 ARMA 模型的使用
arima
auto.arima
![Page 12: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/12.jpg)
arima 函数arima ( x,
order = c(0, 0, 0),
seasonal = list(order = c(0, 0, 0), period = NA),
xreg = NULL,
include.mean = TRUE,
transform.pars = TRUE,
fixed = NULL,
init = NULL,
method = c("CSS-ML", "ML", "CSS"),
n.cond,
optim.method = "BFGS",
optim.control = list(),
kappa = 1e6
)
![Page 13: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/13.jpg)
R 中 arima 参数说明
![Page 14: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/14.jpg)
auto.arima 函数auto.arima( x,
d=NA, D=NA, max.p=5, max.q=5, max.P=2, max.Q=2, max.order=5, start.p=2, start.q=2, start.P=1, start.Q=1, stationary=FALSE, ic=c("aicc","aic", "bic"), stepwise=TRUE, trace=FALSE, approximation=(length(x)>100 | frequency(x)>12), xreg=NULL, test=c("kpss","adf","pp"), seasonal.test=c("ocsb","ch"), allowdrift=TRUE, lambda=NULL, parallel=FALSE, num.cores=NULL
)
![Page 15: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/15.jpg)
Nowadays, we have lots of data. BIG DATA!
![Page 16: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/16.jpg)
![Page 17: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/17.jpg)
![Page 18: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/18.jpg)
What is R?
![Page 19: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/19.jpg)
What is R?
![Page 20: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/20.jpg)
Why R?
![Page 21: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/21.jpg)
Why R?
![Page 22: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/22.jpg)
What need?
There is a need for more than counts and averages on these big data sets
Analyzing all of the data can lead to insights that sampling or subsets can’t reveal
![Page 23: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/23.jpg)
Why R and Hadoop?
![Page 24: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/24.jpg)
Why R and Hadoop?
![Page 25: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/25.jpg)
Why R and Hadoop?
![Page 26: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/26.jpg)
Why R and Hadoop?
![Page 27: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/27.jpg)
RHadoop 介绍
![Page 28: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/28.jpg)
Rhadoop 用途 The open-source RHadoop project makes it
easier to extract data from Hadoop for analysis with R, and to run R within the nodes of the Hadoop cluster -- essentially, to transform Hadoop into a massively-parallel
statistical computing cluster based on R.
![Page 29: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/29.jpg)
Rhadoop
![Page 30: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/30.jpg)
rhdfs
Manipulate HDFS directly from R
Mimic as much of the HDFS Java API as possible
![Page 31: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/31.jpg)
rhdfs Functions
![Page 32: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/32.jpg)
rmr
Designed to be the simplest and most elegant way to write MapReduce programs
Gives the R programmer the tools necessary to perform data analysis in a way that is “R” like
Provides an abstraction layer to hide the implementation details
![Page 33: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/33.jpg)
rmr mapreduce Function
![Page 34: 北京富士通研发中心实习报告 邱 诚](https://reader035.vdocuments.mx/reader035/viewer/2022081415/568136df550346895d9e7974/html5/thumbnails/34.jpg)
Thank you!