internationalised domain names & internet investigations
DESCRIPTION
English is not the only language that the Internet “speaks.” Internationalised Domain Names (IDNs) now allow for domain names in Arabic, Cyrillic, Chinese, and other non-Latin characters. This session will show how to trace IDNs and will examine some of the IDN info security issues. There will be a quick introduction to working with foreign language Websites and useful tips for using online search and translation tools.TRANSCRIPT
Internationalised DomainNames, Foreign LanguageWebsites, & Investigations
Jonathan D. AbolinsThu, 28 July 201111:00 AM - 12:00 PM PDT (GMT-08:00)
Post-Webinar Version with additional notes.
Introduction
About me
Why this topic
Some notes about this presentation’s approach.
Note About Translation Tools
Machine translation tools help a lot.
But they can also leave out much or mislead.
Helps to know the languages involved or workwith a competent translator.
But the translators might not know about somerecent Internet developments.
Quick Overview of Terms
Labels – example: www.veresoftware.com
TLD – Top Level Domain (e.g., .com or .uk)
ccTLD – Country Code TLD (e.g., .uk, .ru)
IDN – Internationalised Domain Name
Unicode
ACE – ASCII Compatible Encoding
Punycode (RFC 3492), a form of ACE
Label 1 Label 2 Label 3
OSINT in an Alphabet Soup ofthe Networked World
But see http://www.cartoonistgroup.com/store/add.php?iid=8381Sometimes, alphabet soup is soup, not a coded message.
A couple of Examples of non-English Windows 7 Desktops
First is Russian.
Second is Arabic. Note the shift to the right.
They were done by switching the languages onone of my Windows 7 Ultimate PCs.
The GUI labels for My Documents, My Music,etc. are localised. But the underlying directorynames, as seen via dir command in a CMDwindow, did not change.
The Net No Longer “Speaks”Primarily English
Old days
Had to use code pages (character encodings) fornon-Latin text. Can be confusing.
Difficult to mix languages.
Now
Unicode covers most of the world’s writing systems.90+ scripts.
Still encounter code pages.
But Underlying Code isUniversal
Bits & Bytes
Programming languages
HTML codes
IP Adresses
Etc.
This can work to your advantage!
If a foreign site offers English, whyread the foreign language version?
http://krebsonsecurity.com/2010/12/russian-police-only-translate-the-good-news/
What if you can’t read Russian?
File/Pathnames May Have Clues…
http://www.mvd.ru/news/
File/Pathnames May Have Clues…
http://www.mvd.ru/presscenter/
Note for the Previous Slides…
Sometimes the foreign site might be using a sitestructure developed in the English speakingworld. Particularly the case with some Webforums.
Other times, the Web designers are trying toavoid problems with mixing texts for directoryand file names.
In any case, the file path info often can be ahelp.
Tip: Google Chrome HasBuilt-in Translation Function
http://habrahabr.ru/blogs/DIY/
Search Tip:A Picture is Worth 1K Words
An image search might help to zero in on the entries ofinterest.
Especially useful if you want to save time wadingthrough foreignlanguage hits.
Example search for theRASKAT (Раскат) data destruction device fromRussia. Look for imagesthe look “computerish”.
Google Translate Annoyance:URL Conversion
/
Uncheck the Phonetic Typing boxbefore entering URLs for sitetranslation
Tried to type in “http://www.xakep.ru”but Google “Russified” it.
Internationalised DomainNames (IDN)
Intro – The Phonebook Analogy
Imagine a phonebook where people could have entries in their preferedscripts. Mr. Wong could have his in Chinese. Ms. Romanov could haveher in Russian. And so on. Many people will choose to have both Latintext and foreign text entries for the same phone number. Makes it easierfor their family and friends to find them. But others fret about thedifferent texts.
Underneath it all, however, the phone system hardware, networks, andthe phone numbers remain the same.
Something like this is happening with the Internet.
The First Four IDN ccTLDs
In May 2010
United Arab Emirates: .امارات
Saudi Arabia: .السعودیة
Russian Federation: .рф
Egypt: .مصر
More IDN ccTLDs have been launched.
Remember, IDNs can also exist under non-IDN ccTLDs.Example: גינדי.com or bücher.com
http://blog.icann.org/2010/05/idn-cctlds-%E2%80%93-the-first-four/
Examples of IDNs & Punycode
com.גינדי
스타벅스코리아.com
газпром.рф
مصر.سجل
汕头大学.中国
xn--pssza05mm53a.xn--fiqs8s/
Gindi Realty (Israel)com.גינדי
Punycode: http://xn--6dbcrb7a.com/
Offline IDN Example
Starbucks Korea스타벅스코리아.com
Punycode: http://xn--oy2b35ckwhba574atvuzkc.com/
Shantou University (PRC)汕头大学.中国/
Same ashttp://stu.edu.cn
Punycode: http://xn--pssza05mm53a.xn--fiqs8s/
Sajela.MiSr (Egypt)مصر.سجل
Punycode: http://xn--rgbn6c.xn--wgbh1c/
Fun with Arabic & OtherRTL (right to Left) IDN URLs
Reading direction can switch.
Example URL.http:// مصر.سجل /Files/GeneralPolicy.pdf
The direction changes can cause problems invarious tools and procedures.
This is where Punycode really helps.http://xn--rgbn6c.xn--wgbh1c/Files/GeneralPolicy.pdf
1 ----> <----------2 3 --------------------------------------------->
Punycode
DNS works with Punycode for IDN labelsExample: مصر.سجلPunycode: xn--rgbn6c.xn--wgbh1c
.xn--wgbh1c is Punycode for the مصر IDN ccTLD. Note the distinctive xn– prefix.
Much safer way to store & use IDNs.
Various online and offline tools for conversion.
Conversions works in both directions.Unicode IDN <-> Punycode.
An Online Converter
http://idnaconv.phlymail.de/
idn: An Offline IDN Converter(Linux)
Challenges with IDNs
Recognising what it is.(domain name, URL, e-mail address).
Which end is the ccTLD?
What language is it?
What country of registry?
Sad 'cause I can't find the ص (Saad) key.(How do I enter the IDN?) Some characters have multiple codes.
Many tools don't work correctly with IDNs.
Homograph (Look-alike) Attacks
Recognising IDNs. Not just URLs.How About IDN E-mail Addresses?
What if you found a note with this:ваше_имя@письмо.рф ?
Would you know it’s ane-mail address?
Would your translatorrecognise it as an e-mailaddress?
By the Way, What About Vocalisation ofURLs & e-Mail Addresses in ForeignLanguages?
The way a URL or an email address – IDN or not – issaid can differ across languages.
How is the “at” symbol or the “dot” said? Example with Russian and “[email protected]”:
“Ivan sobachka pochta tochka ru”or“Ivan sobachka pochta dot ru” Sobachka (собачка – “little dog”) is a popular Russian way of
voicalising the “@” sign. Tochka (точка – “point”) or Dot (дот) used for the “.” mark.
How to say an e-mail address in Russian:http://www.themoscowtimes.com/opinion/article/the-really-cool-people-say-dot/439857.html
What Does the IDN URL Mean?
How Do I Type the IDN?
Copy & Paste Directly from page
Google Translate
Wikipedia
Keyboard input Need the right keyboard or
keytops.
System setup for allowingthe foreign language input.
Character map tools
One Character, Multiple Codes
http://singapore41.icann.org/meetings/singapore2011/presentation-idn-variant-tlds-update-20jun11-en.pdf
Common Net Commands & IDN
Windows cmd CLI a problem w/o modifcation
Tools have to be able to handle Unicode.
ping
nslookup
dig
Whois (can be tricky at times)
Punycode is more reliable.
Not All Our Tools Are Unicodeor IDN-Ready
Whois & IDN ccTLD Domains
Whois on the domain name might not alwayswork well with some IDN ccTLD domains.
But there are options, including:
Get and lookup IP address
Use IANA db & Delegation Record
IANA Root Zone dbhttp://www.iana.org/domains/root/db/#
IANA Delegation Records
http://www.iana.org/domains/root/db/xn--p1ai.html
Security Concern:Homograph Attacks
Are These Sets The Same?
АаВьСсЕеНКкМРрОоТуХхЗ
AaBbCcEeHKkMPpOoTyXx3
Looking at the Underlying Code
АаВьСсЕеНКкМРрОоТуХхЗ <-Cryllic
AaBbCcEeHKkMPpOoTyXx3 <-ASCII
0410 0430 0412 044C 0421 0441 0415 0435 041D041A 043A 041C 0420 0440 041E 043E 0422 04430425 0445 0417
0041 0061 0042 0062 0043 0063 0045 0065 0048004B 006B 004D 0050 0070 004F 006F 0054 00790058 0078 0033
Homographs for Fraud& Punycode for Detection
http://www.facebook.com/Really is http://www.facebook.com/
http://www.facebοok.com/http://www.xn--facebok-dpf.com/
http://www.faceboοk.com/http://www.xn--facebok-epf.com/
http://www.facebοοk.com/http://www.xn--facebk-m0ea.com/
http://idnaconv.phlymail.de/
Homograph Attack Concerns
Raised by various people, including 3ricJohanson at Shmoocon in 2005.
He registered www.xn—pypal-4ve.com to spoofPaypal.
Anti-Phishing Working Group Global PhishingSurvey 1H2010: last true homograph attackwas in 2009. A “hotmail.net” look-alike:xn--hotmal-t9a.netGlobal Phishing Survey 1H2010: http://tinyurl.com/2ch5o87
Not All Homographs Are Bad.Clever Homograph: xakep.ru
Special Topic:Character Encodings
Code Pages /CharacterEncodings
Examples:
Arabic: Windows 1256, IBM 864
Cyrillic: IBM 855, KOI8-R, Windows 1251
Hebrew: IBM 862, Windows 1255
See also http://en.wikipedia.org/wiki/Code_pages
Character Encoding in Internetdocuments
If page doesn’t render properly:
Check HTML source for clues like<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=KOI8-R">
Server’s country location might be a clue.
Try browser’s character encoding tools. (FireFoxexample)
For Cyrillic, check out these tools:
Universal Cyrillic Decoder page http://2cyr.com/decode/
Russian Anywhere (re) package for many Linux distros.
Example
http://www.lena.ru/songs.html
Firefox – Character EncodingSet to Auto Detect
http://www.lena.ru/songs.html
In recent versions of Firefox,Firefox button-> Web Developer-> Character Encoding-> Auto Detect
In some cases, trial & error isneeded.
This method also can workfor local files.
Resources
ICANN IDN Info: http://www.icann.org/en/topics/idn/
Blog: http://blog.icann.org/
IDN Wiki: http://idn.icann.org/
IDN TLD Map: http://www.icann.org/en/maps/idntld.htm
IDN Bloghttp://idnblog.com/
Verisign IDN FAQhttp://www.verisigninc.com/en_US/products-and-services/domain-name-services/domain-information-center/idn-resources/idn-faq/index.xhtml
This Domain Name is Greek to Me: An Introduction to InternationalizedDomain Names for Investigators (DFI News)http://www.dfinews.com/article/domain-name-greek-me-introduction-internationalized-domain-names-investigators?page=0,1
Internationalized Domain Names & Investigations in the Networked World(one of the DojoCon 2010 videos)http://www.irongeek.com/i.php?page=videos/dojocon-2010-videos
Resources (cont)
XN—ICANNhttp://www.hackerfactor.com/blog/index.php?/archives/321-xn-ICANN.html
IDNForums.ComEmphasis upon buying & selling IDN domains.http://www.idnforums.com/
IANA ccTLDs Databasehttp://www.iana.org/domains/root/db/#
Stratchclyde Forensics – IDN Homograph Attackshttp://www.computerforensicsglasgow.info/IDN_Homograph_Attacks.htm
New Arrival in Russian Spam – .РФhttp://www.thesecurityblog.com/2011/02/new-arrival-in-russian-spam-%D1%80%D1%84/
An IDN – Punycode Converterhttp://idnaconv.phlymail.de/
How to say an e-mail address in Russianhttp://www.themoscowtimes.com/opinion/article/the-really-cool-people-say-dot/439857.html
Resources (cont)
Keyboard Setup
How to Change Keyboard Languagehttp://www.lib.uchicago.edu/e/using/catalog/inputoptions.htmlhttp://tlt.its.psu.edu/suggestions/international/keyboards/winkey.html
http://www.al-bab.com/arab/comp.htm
Translation and Language Issues
American Translators Association: Getting It Right (insights into translationissues)http://www.atanet.org/publications/getting_it_right.php
Basis Technology – Excellent papers & presentations on language issues.http://www.basistech.com/resources/(The links on the left have more papers on topics such as Middle Eastern Languages, Digital Forensics,etc.)
Resources: Google Searchesfor Some IDN ccTLDs Republic of Korea:한국
http://www.google.com/search?q=site%3A.한국
Serbia: СРБhttp://www.google.com/search?q=site%3A%D0%A1%D0%A0%D0%91
Peoples Republic of China: 中国http://www.google.com/search?q=site%3A.%E4%B8%AD%E5%9B%BDhttp://www.google.com/search?q=site%3A.%E4%B8%AD%E5%9C%8B
Hong Kong SAR: 香港http://www.google.com/search?q=site%3A.%E9%A6%99%E6%B8%AF
Taiwan: 台湾http://www.google.com/search?q=site%3A.%E5%8F%B0%E6%B9%BEhttp://www.google.com/search?q=site%3A.%E5%8F%B0%E7%81%A3
Egypt: مصرhttp://www.google.com/search?q=site%3A.مصر
Jordan: االردن
http://www.google.com/search?q=site%3A.%D8%A7%D9%84%D8%A7%D8%B1%D8%AF%D9%86
Saudi Arabia: السعودیةhttp://www.google.com/search?q=site%3A.%D8%A7%D9%84%D8%B3%D8%B9%D9%88%D8%AF%D9%8A%D8%A9
Russian Federation: РФhttp://www.google.com/search?q=site%3A.%D0%A0%D0%A4