chipping away at censorship with user-generated content sam burnett, nick feamster and santosh...
Post on 15-Jan-2016
228 Views
Preview:
TRANSCRIPT
Chipping Away at Censorship with User-Generated Content
Sam Burnett, Nick Feamster and Santosh Vempala
Internet Censorship is a Problem
• 12 censors• 11 monitors• 25% of population• More on the way
See http://rsf.org for more
It’s Not Only China…at Home, Too
Censored net Uncensored netBob
Firewall
Alice
Intro to Internet Censorship
Block TrafficPunish User
CensorCensor
Censored net Uncensored netBob
Firewall
Alice
Solution: Use a Helper
The helper sends messages to and from blocked hosts on your behalf
Helper
Design Goals for the Helper
• Be robust against blocking• Be deniable against user identification• Require no dedicated infrastructure
What about Proxies and Mixnets? (e.g., Tor)
Proxy Proxy
Censored net Uncensored netBob
Firewall
Alice
• Censors can block proxies if the proxy list is public• Not deniable if encryption is incriminating• Requires dedicated infrastructure (network of proxies)
What About Covert Channels?(e.g., Infranet)
• Not entirely robust against blocking• More deniable because messages are hidden• Requires dedicated infrastructure (Web servers)
Unblocked host
usenix.org
Censored net Uncensored netBob
Firewall
Alice
Alice
Collage: Let User-Generated Content Help Defeat Censorship
• Robust by using redundancy• Users generate innocuous-looking traffic• No dedicated infrastructure required
User-generated content hosts
Bob, a Flickr user
Why Might Collage Work?
• Lots of User-Generated Content (UGC)– More than 4 billion Flickr images– A day of video uploaded to YouTube every minute
• Many sites host UGC• We have tools to store censored data in UGC
– Steganography, watermarking
Outline
• Background and Design Goals• Collage Design• Performance and Demo
Content host
Bob
Collage, Step-by-Step
Step 1: Obtain messageStep 3: Obtain cover media• Your personal photos• Generous users
Step 4: Embed message in cover• Next slideStep 5: Upload UGC to content hostStep 6: Find and download UGCStep 7: Decode message from UGCStep 2: Pick message identifier• Application specific• Only intended recipient should know it
VectorMessage
Embedded Vector
Alice
Collage steps:1. Obtain message2. Pick message identifier3. Obtain cover media4. Embed message in cover5. Upload UGC to content host
6. Find and download UGC7. Decode message from UGC
Embedding Messages in Vectors
• Encrypt the message using the identifier• Generate chunks using erasure coding
– Generate many chunks, recover from any k-subset– Allows splitting among many vectors, robustness
• Embed chunks into vectors
Steganography: hard to detectWatermarking: hard to remove
Do the reverse to decode
Collage steps:1. Obtain message2. Pick message identifier3. Obtain cover media4. Embed message in cover5. Upload UGC to content host
6. Find and download UGC7. Decode message from UGC
Where Are Bob’s Vectors?
• Crawling all of Flickr is not an option• Must agree on a subset of a content host
without any immediate communication
Collage steps:1. Obtain message2. Pick message identifier3. Obtain cover media4. Embed message in cover5. Upload UGC to content host
6. Find and download UGC7. Decode message from UGC
Solution: A predictable way of mapping message identifiers to subsets of content hosts
Message Identifier
Solution: Task Mapping
Tasks
http://nytimes.com
3
6
9
1111
9
• Receivers perform these tasks to get vectors
• Senders publish vectors so that when receivers perform tasks, they get the sender’s vectors
Tasks1. Hash the identifier2. Hash the tasks3. Map identifier to closest tasks
Collage steps:1. Obtain message2. Pick message identifier3. Obtain cover media4. Embed message in cover5. Upload UGC to content host
6. Find and download UGC7. Decode message from UGC
1
Look at JohnDoe’s videos on YouTube
Search for blue flowers on Flickr
How Does Collage Meet the Design Goals?
• Robust against blocking– Erasure coding– Many content hosts
• Deniable against user identification– Traffic only to/from content hosts– Depends upon task construction
• Require no dedicated infrastructure– Messages stored on content hosts
How Do You Start Using Collage?
Send & Receive Messages1. Distribute software
– CDROM– Spam everyone– A secure network
2. Refresh task list– Receive using Collage– Online resource
3. Message identifier– Application specific
Help Censored Users1. Donate your UGC vectors
– Photos on Flickr– Tweets on Twitter– Etc.
2. Write Collage applications
Outline
• Background and Design Goals• Collage Design• Performance and Demo
Performance Metrics
• Sender and receiver traffic overhead• Sender and receiver transfer time• Storage required on content hosts
But these metrics can vary a lot:• Different content hosts• Different tasks• Different applications
Case Study
News Articles Covert TweetsContent host Flickr TwitterMessage size 30 KB 140 BytesVectors needed 5 30Storage needed 600 KB 4 KBSending traffic 1,200 KB 1,100 KBSending time 5 minutes 60 minutesReceiving traffic 6,000 KB 600 KBReceiving time 2 minutes ½ minute
Experiments performed on a 768/128 Kbps DSL connection
Demo of a Collage Application
What Should You Do Now?
• Try out the demo application• Donate your photos
– For now, only Flickr Pro users– Embed news articles when you upload photos
Visit http://gtnoise.net/collage
Conclusion
• Collage evades Internet censorship by tunneling messages inside user-generated content– Robust against blocking– Deniable against user identification– Requires no dedicated infrastructure
• More work needed– Statistical deniability against traffic analysis– Learn timing behavior from users– Tor bridge discovery
http://gtnoise.net/collage
Yes, Steganography is Broken
• Doesn’t mean it isn’t still practical• Collage can use new information hiding
techniques as they come along– Not dependent upon a particular technology
• Use many hiding techniques in parallel
Can Bob Be Behind the Firewall?
Comparison to Other SystemsTechnique System Robustness Deniability Infrastructure
Onion routing TorLayered encryption; bridges
Encryption Network of relay servers
Anonymous remailing Mixminion Many hops; mixing Encryption Network of
remailers
Covert channels
Infranet Could use many proxy forwarders
Browsing regular Web sites; Steganography
Custom Web servers
CovertFS Multiple content hosts
Browsing regular Web sites; Steganography
None
CollageErasure coding; Many content hosts
Browsing content hosts; Info hiding
None
“A Web based Covert File System”, HotOS ’07
• Stores files inside of Flickr photos• Intended for personal file storage• Collage is more robust
– Erasure coding– Automatically spread data among content hosts
• Could be implemented using Collage• Implementation not publically available
Where Could We Store Content?
• Photo sharing sites (Wikipedia lists 71)• Video sharing sites (Wikipedia lists 68)• Web forums, blogs
– Plugins for popular forum and blogging software
What If Senders and Receivers Disagree on Task Lists?
• Consistent hashing ensures some agreement• Depends on number of tasks per identifier:
– If identifier mapped to 2 tasks, can still communicate with 25% disagreement
– If identifier mapped to 4 tasks, can still communicate with 50% disagreement
Agreeing on Message Identifiers
• Goal: Agree without communicating• For news reader application:
– Public knowledge, hopefully censor doesn’t block• Another option:
– Exchange keys with communicators beforehand– Identifier is, e.g., ciphertext of current date
What Affects Deniability?
• Task construction:– The content host– Popularity the sequence of Web requests made– Injection of think times– Injection of garbage requests– Alternative routes to the same vectors
• Frequency of task execution• Are you uploading suspicious vectors?
– E.g., tagging pictures with bogus terms
Censorship Is an Arms Race
• Popular systems will eventually be blocked• We must continually create systems that are
harder to compromise
top related