website auto scraping with autoit and .net httprequest
TRANSCRIPT
AgendaAgenda•Background•Behavior and System Analysis•HLD
Background
•User requirement• A desktop application for scraping Odds and OnlineList from rental
site• Total 1, 652 links include 16,5200 records• Data format arrange and export to a excel file
•Non-function requirement• Avoid to lock account by action as similar as DDOS
Background
Behavior and System Analysis
Behavior Analysis•Website need Login first•Login session will keep alive if you idle in MainPage(timely sync request post from client)•After login to MainPage, each click open Pop-up window to display•Each data page will display 100 records by filter you give
System Analysis•Security• Website need login to get SessionId and StickyId for request• Website have security mechanism to redirect invalid request• Using one time token to avoid user request page data without
permission when login to Main page• All sub-page(pop-up window) only allow open from Main page• Website using RESTful-like routing include UserSession token
•Routing and Request Post• URL routing included Login Token• MainPage routing included BuildVersion• Request need add Query key(it pass from Main window) for Odds and
OnlineList service
Chanllege•Issue 1• Too many links to scraping if using Selenium or other similar solution.
•Issue 2• Some data need using JavaScript to decrypt and re-generate(RSA
token, one time token and etc…).•Issue 3• Need capture response header(Session and StickyId) to mock the
request to query the Odds an OnlineList service.
HLD
Use of Technology•C# and .Net framework•AutoIt(Download)• AutoIt v3 is a freeware BASIC-like scripting language designed for
automating the Windows GUI and general scripting.• Have script Editor to build up the script• Can execute ShellScript• Can compile script to .exe file
•AutoItX aka NAutoIt(Download)• Methods available to AutoIt BASIC, but not provided via AutoItX, are
replaced by .NET counterparts.• AutoItX with PowerShell, .NET, C, COM, COM interop and reg free COM
interfaces.
HLD
Use of Technology - AutoIt•AutoIt Window Info
Use of Technology - AutoIt•SciTe Script Editor• Write script with IDE, hint intelligence and Help guide
Use of Technology - AutoIt•Run Script• Execute .au3 or .a3x file.
•Compile Script to .exe• Convert .au3 script to .exe or .a3x file
•Setup• Project reference AutoItX3.Assembly.dll• Project add AutoItX3.dll and update setting to CopyToOutput
Use of Technology - AutoItX
Q & AQ & A
11F., No.399, Ruiguang Rd., Neihu Dist., Taipei City 114, Taiwan TEL: +886 2 2798 8529 Fax: +886 2 2798 8531 Website : www.xuenn.com
THANK YOU!THANK YOU!