[freenet-chat] New Scientist article on harvesting social network sites (analysis)

Matthew Toseland toad at amphibian.dyndns.org
Thu Jun 8 13:07:06 UTC 2006


So... apart from the general reminder not to blab too much online if you
care about your privacy (and especially on myspace/linkedin/etc), what
relevance does this have to us?

1. Joining the dots is easy.

Between phone records and email records, the NSA have "joining the dots"
pretty much sewn up.

"phone logs .. can only be used to build a very basic picture of
someone's contact network, a process sometimes called 'connecting the
dots'. Clusters of people in highly connected groups become apparent, as
do people with few connections who appear to be intermediaries between
such groups. The idea is to see by how many links or 'degrees' separate
people from say, a member of a blacklisted organization"

2. Identifying the nature of the connections is hard. Social networking
sites offer a short-cut, providing a) whether these connections are
regarded as significant/worthy of record (people's enemies won't
normally be added on orkut), and b) what shared interests connect the
two individuals.

"By adding online social networking data to its phone analyses, the NSA
could connect people at deeper levels, through shared activities ... "

3. Semantic Web will make this easier. IMHO you have two choices: Either
a) don't give away any information on the web, or b) build tools that
make it as easy as possible to harvest the semantic web and find out as
much as possible about people, just as the NSA are. You don't have
access to bank records and cellphone locator fixes, but nonetheless you
can get a good idea of what's out there.


Now, how do we apply this?

A. The NSA already knows your contact list, because they read your email
headers and watch your phone logs. It's the strength of the connection,
and the context of the connection, that is harder to establish.

B. Darknet links are considerably harder to detect than
MySpace/Orkut/etc links. They require packet analysis hardware at the
ISP, and may require network level correlation / flow analysis for more
advanced transports. Unfortunately, this is entirely plausible in the
future.

C. Darknet links, assuming they can be detected, tell the NSA (or
Chinese intel, or the RIAA working with law enforcement as provided for
under pending EU legislation) that:
a) There is probably a social connection, (unless it's a #freenet-refs
connection, as most such links are; correlate with email records etc to
find out if it's a real social connection), AND
b) There is probably a certain level of trust. Right now, connecting to
somebody on darknet requires a significant level of trust because of a
lack of countermeasures against malicious "friends", but it will always
require a greater level of trust than a "random stranger" link; "random
strangers" will often be agents of the Evil Powers attempting to
infiltrate the darknet, so you at least have to establish that they
aren't bots!

D. If Freenet is huge, and relatively mainstream, and your darknet peers
are similar to your AIM buddy list, and there are reasonable
countermeasures against treachery, then darknet connections don't
necessarily tell Them much more than your AIM list does. However, if
freenet remains largely a haven for extreme libertarians, terrorists,
paedophiles, and geeks, then being connected to somebody on Freenet
tells Them a great deal, and the emphasis has to be on steganography.

E. So we have an interesting dilemma:
- The more hostile the environment (the more the risk of being
  discovered running a node, the smaller the network, etc), the more
  information (and leverage) is available to a party discovering that
  two people are connected over the darknet. Extreme stego transports
  and so on may reduce the risk of traffic analysis, and in such
  circumstances, users will be willing to put up with the reduced
  performance and increased hassle imposed thereby. But this will reduce
  the number of users further, to a degree. The rewards for treachery
  are also higher, and the likely available resources on the attacker's
  side per user will also be higher.
- The less hostile the environment, the less the need for darknet, in
  theory, since it may not be harvested and blocked. On the other hand,
  darknet may well be a superior topology performance-wise once it
  reaches a certain density (look at the weeks it takes to get good
  performance on 0.5!). But the less hostile the environment, the more
  people are on the darknet. And the more people who are on the darknet,
  the faster it grows, the more useful it is, the safer it is, the less
  information is given away by the knowledge of a darknet link.

Essentially, the more hostile the environment, (and also the weaker the
security of freenet itself against treachery) the greater the strength
of the social connection required for darknet, and the more significant
the fact of a darknet connection becomes. The less hostile the
environment, the closer your darknet connection list is to your AIM
connection list / email addressbook, the less information is given away
by knowing about a darknet link.

On Thu, Jun 08, 2006 at 01:40:40PM +0100, Matthew Toseland wrote:
> Page 30-31, New Scientist issue 2555, 10 June 2006.
> Keep out of MySpace: Social networking websites could be the latest
> target of the US National Security Agency
> 
> New Scientist has discovered that the Pentagon's National Security
> Agency, which specializes in eavesdropping and code-breaking, is funding
> research into the mass harvesting of the information that people post
> about themselves on social networks. And it could harness advances in
> internet technology - specifically the forthcoming "semantic web"
> championed by the web standards organization W3C - to combine data
> from social networking websites with details such as banking, retail and
> property records, allowing the NSA to build extensive, all-embracing
> personal profiles of individuals.
> ...
> Meanwhile, the NSA is pursuing its plans to tap the web, since phone
> logs have limited scope. They can only be used to build a very basic
> picture of someone's contact network, a process sometimes called
> "connecting the dots". Clusters of people in highly connected groups
> become apparent, as do people with few connections who appear to be
> intermediaries between such groups. The idea is to see by how many links
> of "degrees" separate people from, say, a member of a blacklisted
> organization.
> 
> By adding the online social networking data to its phone analyses, the
> NSA could connect people at deeper levels, such as taking flying
> lessons. Typically online social networking sites ask members to enter
> details of their immediate and extended circles of friends, whose blogs
> they might follow. People often list other facets of their personality,
> including political, sexual, entertainment, media and sporting
> preferences too. Some go much further, and a few have lost their jobs by
> publicly descibing drinking and drug-taking exploits...
> 
> "You should always assume anything you write online is stapled to your
> resume. People don't realise you get Googled just to get a job interview
> these days,", says [ PGP chief security officer ] Callas.
> 
> Other data the NSA could combine with social networking details includes
> information on purchases, where we go (available from cellphone
> records...) and what major financial transactions we make, such as
> buying a house.
> 
> Right now this is difficult to do, because today's web is stuffed with
> data in incompatible formats. Enter the semantic web, which aims to iron
> out these incompatibilities over the next few years via a common data
> structure called the Resource Definition Framework...
> 
> "RDF turns the web into a kind of universal spreadsheet that is readable
> by computers as well as people," says David de Roure at the University
> of Southampton, UK, who is an adviser to the W3C. "It means you will be
> able to ask a website questions you couldn't ask before, or perform
> calculations on the data it contains."...
> 
> [the NSA]'s interest in [harvesting the semantic web] is evident in a
> funding footnote to a research paper delivered at the W3C's WWW2006
> conference in Edinburgh, UK, in late May.
> 
> That paper, entitled Semantic Analytics on Social Networks, by a
> research team lead by Amrit Sheth of the University of Georgia in Athens
> and Anupam Joshi of the University of Maryland in Baltimore reveals how
> data from online social networks and other databases can be combined to
> uncover facts about people. The footnote said the work was part-funded
> by an organization called ARDA.
> 
> ... Chief among ARDA's aims is to make sense of the massive amounts of
> data the NSA collects - some of its sources grow by around 4 million
> gigabytes a month.
> ...
> So the team developed software that combined data from the RDF tags of
> online social network Friend of a Friend (www.foaf-project.org), where
> people simply outline who is in their circle of friends, and a
> semantically tagged commercial bibliographic database called DBLP, which
> lists the authors of computer science papers.
> 
> Joshi says their system found conflicts of interest between potential
> reviewers and authors pitching papers for an internet conference. "It
> certainly made relationship finding between people much easier", Joshi
> says. "It picked up softer [ non-obvious ] conflicts we would not have
> seen before."
> 
> The technology will work in exactly the same way for intelligence and
> national security services and for financial dealings, such as detecting
> insider trading, the authors say. Linking "who knows who" with
> purchasing or bank records could highlight groups of terrorists, money
> launderers of blacklisted groups, says Sheth.
> 
> ... [ ARDA renamed to Disruptive Technologies Office ... ]
> ... [ references to the Total Information Awareness project, which was
> shelved, but elements continue in the September 2003 Defence
> Appropriations Act ] ...
> 
> Privacy groups worry that "automated intelligence profiling" could sully
> people's reputations or even lead ot miscarriages of justice -
> especially since the data from social networking sites may often be
> inaccurate, untrue, or incomplete, De Roure warns.
> 
> But Tim Finin, a colleague of Joshi's, thinks that the spread of such
> technology is unstoppable. "Information is getting easier to merge, fuse
> and draw inferences from. There is money to be made and control to be
> gained in doing so. And I don't see much that will stop it," he says.
> 
> Callas thinks people have to wise up about how much information about
> themselves they should divulge on public websites. It may sound obvious,
> he says, but being discrete is a big part of maintaining privacy. Time,
> perhaps, to hit the delete button.
> 
> 
> -- 
> Matthew J Toseland - toad at amphibian.dyndns.org
> Freenet Project Official Codemonkey - http://freenetproject.org/
> ICTHUS - Nothing is impossible. Our Boss says so.



> _______________________________________________
> chat mailing list
> chat at freenetproject.org
> Archived: http://news.gmane.org/gmane.network.freenet.general
> Unsubscribe at http://emu.freenetproject.org/cgi-bin/mailman/listinfo/chat
> Or mailto:chat-request at freenetproject.org?subject=unsubscribe

-- 
Matthew J Toseland - toad at amphibian.dyndns.org
Freenet Project Official Codemonkey - http://freenetproject.org/
ICTHUS - Nothing is impossible. Our Boss says so.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://emu.freenetproject.org/pipermail/chat/attachments/20060608/bc1aa389/attachment.pgp 


More information about the chat mailing list