introduction to project
DESCRIPTION
An introductory talk for Hacker News Kansai meetup on the ruby rewrite of Jim Breen's wwwjdicTRANSCRIPT
![Page 1: Introduction to project](https://reader033.vdocuments.mx/reader033/viewer/2022051818/54b7428d4a79595b708b45a2/html5/thumbnails/1.jpg)
1
About me
マーク・バーンズabout.me/mark.burns
日本語ができる Ruby developer
On holiday from England
I love ruby and startups
![Page 2: Introduction to project](https://reader033.vdocuments.mx/reader033/viewer/2022051818/54b7428d4a79595b708b45a2/html5/thumbnails/2.jpg)
2
Introduction
Jim Breen’s (Monash University)
Japanese-English online dictionary
wwwjdic.com
Data freely available
accepts user-contributions
![Page 3: Introduction to project](https://reader033.vdocuments.mx/reader033/viewer/2022051818/54b7428d4a79595b708b45a2/html5/thumbnails/3.jpg)
3
wwwjdic (rewrite)
https://github.com/markburns/wwwjdic
![Page 4: Introduction to project](https://reader033.vdocuments.mx/reader033/viewer/2022051818/54b7428d4a79595b708b45a2/html5/thumbnails/4.jpg)
4
Current interaction
GET http://wwwjdic.com
301 -> http://www.edrdg.org/cgi-bin/wwwjdic/wwjdic?1C
POST http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwjdic.cgi?1E
BODY: dsrchkey=%CD%F1&dicsel=1
![Page 5: Introduction to project](https://reader033.vdocuments.mx/reader033/viewer/2022051818/54b7428d4a79595b708b45a2/html5/thumbnails/5.jpg)
5
Response
5
![Page 6: Introduction to project](https://reader033.vdocuments.mx/reader033/viewer/2022051818/54b7428d4a79595b708b45a2/html5/thumbnails/6.jpg)
6
Aims
JSON API
Cleaner UI
Nice features: e.g. autocomplete
Easily extensible open source codebase
![Page 9: Introduction to project](https://reader033.vdocuments.mx/reader033/viewer/2022051818/54b7428d4a79595b708b45a2/html5/thumbnails/9.jpg)
9
Autocomplete
![Page 10: Introduction to project](https://reader033.vdocuments.mx/reader033/viewer/2022051818/54b7428d4a79595b708b45a2/html5/thumbnails/10.jpg)
10
Trie index
http://oldblog.antirez.com/post/autocomplete-with-redis.html
Autocomplete
![Page 11: Introduction to project](https://reader033.vdocuments.mx/reader033/viewer/2022051818/54b7428d4a79595b708b45a2/html5/thumbnails/11.jpg)
11
Trie index
Time: O(log(N)) N=~150,000.
Space: N*(Ma+1)
=~ 51MB
![Page 12: Introduction to project](https://reader033.vdocuments.mx/reader033/viewer/2022051818/54b7428d4a79595b708b45a2/html5/thumbnails/12.jpg)
12
TRIE
12
![Page 13: Introduction to project](https://reader033.vdocuments.mx/reader033/viewer/2022051818/54b7428d4a79595b708b45a2/html5/thumbnails/13.jpg)
13
https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_complete.rb
![Page 14: Introduction to project](https://reader033.vdocuments.mx/reader033/viewer/2022051818/54b7428d4a79595b708b45a2/html5/thumbnails/14.jpg)
14
https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_complete.rb
["eg", "ega", "egal", "egali", "egalit", "egalita", "egalitar", "egalitari", "egalitaria", "egalitarian", "egalitarian*", "egg", "egg ",
"egg (", "egg (e"]
![Page 15: Introduction to project](https://reader033.vdocuments.mx/reader033/viewer/2022051818/54b7428d4a79595b708b45a2/html5/thumbnails/15.jpg)
15
https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_complete.rb
["eg", "ega", "egal", "egali", "egalit", "egalita", "egalitar", "egalitari", "egalitaria", "egalitarian", "egalitarian*", "egg", "egg ",
"egg (", "egg (e"]
["egg dish", "egg dishe", "egg dishes", "egg dishes*", "egg l", "egg la", "egg lai", "egg laid", "egg laid ", "egg laid i", "egg laid in", "egg laid in ", "egg laid in w",
"egg laid in wi", "egg laid in win"]
![Page 16: Introduction to project](https://reader033.vdocuments.mx/reader033/viewer/2022051818/54b7428d4a79595b708b45a2/html5/thumbnails/16.jpg)
16
["egg laid in wint", "egg laid in winte", "egg laid in winter", "egg laid in winter*", "egg m",
"egg me", "egg mem", "egg memb", "egg membr", "egg membra", "egg membran",
"egg membrane", "egg membrane*", "egg s", "egg sa"]
["eg", "ega", "egal", "egali", "egalit", "egalita", "egalitar", "egalitari", "egalitaria", "egalitarian", "egalitarian*", "egg", "egg ",
"egg (", "egg (e"]
["egg dish", "egg dishe", "egg dishes", "egg dishes*", "egg l", "egg la", "egg lai", "egg laid", "egg laid ", "egg laid i", "egg laid in", "egg laid in ", "egg laid in w",
"egg laid in wi", "egg laid in win"]
https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_complete.rb
![Page 17: Introduction to project](https://reader033.vdocuments.mx/reader033/viewer/2022051818/54b7428d4a79595b708b45a2/html5/thumbnails/17.jpg)
17
"walr""walt"
"walrus"
["walr", "walru", "walrus", "walrus*", "walruse", "walruses", "walruses*", "walt", "waltz", "waltz ", "waltz (",
"waltz (c", "waltz (co", "waltz (com", "waltz (comp"]
https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_complete.rb
![Page 18: Introduction to project](https://reader033.vdocuments.mx/reader033/viewer/2022051818/54b7428d4a79595b708b45a2/html5/thumbnails/18.jpg)
18
shutl.com & graphs
![Page 19: Introduction to project](https://reader033.vdocuments.mx/reader033/viewer/2022051818/54b7428d4a79595b708b45a2/html5/thumbnails/19.jpg)
19
Isomorphism?
![Page 20: Introduction to project](https://reader033.vdocuments.mx/reader033/viewer/2022051818/54b7428d4a79595b708b45a2/html5/thumbnails/20.jpg)
20
N-grams
安心 リフォーム へ の 近道 [TAB]29 (Anshin reform he no chikamichi)
安心 + リフォーム + へ + の + 近道安心 [TAB]41,322,178
![Page 21: Introduction to project](https://reader033.vdocuments.mx/reader033/viewer/2022051818/54b7428d4a79595b708b45a2/html5/thumbnails/21.jpg)
21
Present/State of Play
Data import to redis
Indexed word lookup
Autocomplete
Begun work on text glossing
![Page 22: Introduction to project](https://reader033.vdocuments.mx/reader033/viewer/2022051818/54b7428d4a79595b708b45a2/html5/thumbnails/22.jpg)
22
Noticably Missing
Not yet released to production
No test/staging server
However, should be easy enough to run locally
![Page 23: Introduction to project](https://reader033.vdocuments.mx/reader033/viewer/2022051818/54b7428d4a79595b708b45a2/html5/thumbnails/23.jpg)
23
Future
Wordnet plus graph db => mapping of languages
Analysis of kanji
User experience/Design/Polish
N-grams
Other ideas/collaboration?
![Page 24: Introduction to project](https://reader033.vdocuments.mx/reader033/viewer/2022051818/54b7428d4a79595b708b45a2/html5/thumbnails/24.jpg)
24
https://github.com/markburns/wwwjdichttp://www.slideshare.net/_mark_burns/slides-24568551
about.me/mark.burns
Questions?24