From prosenmai at hotmail.com Sun Apr 6 12:03:15 2008 From: prosenmai at hotmail.com (Peter Rosenmai) Date: Sun, 6 Apr 2008 22:03:15 +1000 Subject: [Tech] Would it be possible to effectively shutdown Freenet within the PRC? Message-ID: Hello all, I've looked around and have been unable to find an answer to the following questions. I hope I am posting these to the right group. 1. Suppose Freenet were to prove such an irritant to the mainland Chinese government that they decided to shut it down altogether within China. How great a technical challenge would this present? I understand that the PRC farms out much of the responsibility for censoring internet traffic to ISPs: Chinese ISPs could simply look for and block the Freenet protocol, couldn't they? 2. Would it be possible for the PRC to run Freenet nodes in order to determine the IP addresses of other nodes within China? 3. Is it true that the PRC has previously blocked Freenet? If so, how was this achieved? Kind regards, Peter from Sydney _________________________________________________________________ Are you paid what you're worth? Find out: SEEK Salary Centre http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Eseek%2Ecom%2Eau%2Fcareer%2Dresources%2Fsalary%2Dcentre%2F%3Ftracking%3Dsk%3Ahet%3Asc%3Anine%3A0%3Ahot%3Atext&_t=764565661&_r=OCT07_endtext_salary&_m=EXT From nextgens at freenetproject.org Sun Apr 6 12:27:06 2008 From: nextgens at freenetproject.org (Florent =?iso-8859-1?Q?Daigni=E8re?=) Date: Sun, 6 Apr 2008 14:27:06 +0200 Subject: [Tech] Would it be possible to effectively shutdown Freenet within the PRC? In-Reply-To: References: Message-ID: <20080406122703.GF3501@freenetproject.org> * Peter Rosenmai [2008-04-06 22:03:15]: > > Hello all, > I've looked around and have been unable to find an answer to the following questions. I hope I am posting these to the right group. > > 1. Suppose Freenet were to prove such an irritant to the mainland Chinese government that they decided to shut it down altogether within China. How great a technical challenge would this present? I understand that the PRC farms out much of the responsibility for censoring internet traffic to ISPs: Chinese ISPs could simply look for and block the Freenet protocol, couldn't they? > Blocking opennet is easy if not trivial, blocking darknet is way more complicated. How exactly would you fingerprint the Freenet protocol ? To block something you've to discriminate it from the background "noise". No doubt they are ways of doing that but it's a non-trivial problem... Only one technique has been brought to our attention so far and we are going to mitigate its effectiveness soon implementing something we call transport-plugins (a steganographic layer on top of the protocol). > 2. Would it be possible for the PRC to run Freenet nodes in order to determine the IP addresses of other nodes within China? > They could determine the IP addresses of opennet nodes, yes. That wouldn't work for darknet nodes of course and it's why growing a real darknet is so important. > 3. Is it true that the PRC has previously blocked Freenet? If so, how was this achieved? > Yes, using deep-packet-inspection on their firewalls : the old version of the protocol had some matchable session bytes. They have also been blocking the website since ages. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : http://emu.freenetproject.org/pipermail/tech/attachments/20080406/8b3f4e94/attachment.pgp From m.rogers at cs.ucl.ac.uk Sun Apr 6 23:17:15 2008 From: m.rogers at cs.ucl.ac.uk (Michael Rogers) Date: Mon, 07 Apr 2008 00:17:15 +0100 Subject: [Tech] Would it be possible to effectively shutdown Freenet within the PRC? In-Reply-To: References: Message-ID: <47F959FB.9050906@cs.ucl.ac.uk> Peter Rosenmai wrote: > Chinese ISPs could simply look > for and block the Freenet protocol, couldn't they? Hi Peter, It's certainly possible in theory, but I'm not sure whether it can be done with the technology they're currently using. First, it might be difficult to detect the Freenet protocol reliably: as far as I know all parts of the protocol are encrypted or obfuscated, even the initial handshake. Second, I don't think internet traffic within China passes through the same filters as international traffic. Third, the international routers don't perform the filtering themselves, they send a copy of every packet to a separate piece of equipment that kills connections that match certain rules by sending forged TCP RST packets to both ends of the connection. Freenet uses UDP rather than TCP, so sending TCP RSTs wouldn't work, but perhaps they have another way of filtering UDP. > 2. Would it be possible for the PRC to run Freenet nodes in order to > determine the IP addresses of other nodes within China? Yes. Freenet users can choose between 'darknet' mode, in which they only connect to their trusted friends, and 'opennet' mode, in which the node automatically finds other nodes to connect to. Opennet users can also have darknet friends. By running an opennet node the Chinese government could discover opennet users in China very easily. With additional effort it might be possible to follow the darknet connections of the opennet users to discover some or all of the darknet users too. > 3. Is it true that the PRC has previously blocked Freenet? If so, how > was this achieved? The protocol was previously based on TCP and there were some plaintext bytes in the initial handshake that could be used to identify a Freenet connection. Nowadays the protocol's based on UDP and as far as I know there are no plaintext fields any more. Cheers, Michael From prosenmai at hotmail.com Mon Apr 7 14:37:55 2008 From: prosenmai at hotmail.com (Peter Rosenmai) Date: Tue, 8 Apr 2008 00:37:55 +1000 Subject: [Tech] Would it be possible to effectively shutdown Freenet within the PRC? In-Reply-To: <47F959FB.9050906@cs.ucl.ac.uk> References: <47F959FB.9050906@cs.ucl.ac.uk> Message-ID: Hi Michael, So, do I understand correctly then that if my Freenet node sends out a packet to my ISP (to be forwarded on to your Freenet node) that packet will feature: - Encrypted content about which my ISP can know virtually nothing (unless it has the relevant private key); - A UDP protocol header That is, until the packet content is decrypted, there is nothing in the header to indicate that the packet is a "Freenet packet". So it's like having a Freenet packet hidden behind encryption within the content part of the packet? (Although I guess the encrypted Freenet packet would have to be broken up across a number of packets). Or am I completely off the mark? And where is the initial handshake done? I would have thought that a handshake only becomes possible once the packet content is decrypted. That is, my ISP and your ISP look at the header of the packet I send and then send it through to the Freenet port on your machine, believing it to be something other than it is. I mean, isn't it only when your Freenet node decrypts the packet content that it is able to see the Freenet protocol? Your comment about malicious opennet nodes finding darknet friends to use as entry points into darknets is interesting. I would assume that once one darknet node is found in such a manner, the IP addresses of all nodes on the relevant darknet could then be discovered. That is, if my opennet node has a darknet friend, doesn't this mean that I have effectively amalgamated the relevant darknet into the opennet? If so, this "darknet friend" business seems like a dangerous idea. Or am I missing something here? Oh, and finally, do I understand correctly that a denial of service attack on Freenet would actually cause the Freenet webpage being attacked to become more - rather than less - available?! That is, a barrage of requests for the webpage would in fact proliferate it across the network? Sorry to throw all these questions at you. I'm mulling over a hypothesis that networks such as Freenet and TOR are vulnerable to being entirely shutdown in autocratic states unless they can attract large groups of mainstream users. Thank you very much for your comments. Peter ---------------------------------------- > Date: Mon, 7 Apr 2008 00:17:15 +0100 > From: m.rogers at cs.ucl.ac.uk > To: tech at freenetproject.org > Subject: Re: [Tech] Would it be possible to effectively shutdown Freenet within the PRC? > > Peter Rosenmai wrote: >> Chinese ISPs could simply look >> for and block the Freenet protocol, couldn't they? > > Hi Peter, > > It's certainly possible in theory, but I'm not sure whether it can be > done with the technology they're currently using. > > First, it might be difficult to detect the Freenet protocol reliably: as > far as I know all parts of the protocol are encrypted or obfuscated, > even the initial handshake. > > Second, I don't think internet traffic within China passes through the > same filters as international traffic. > > Third, the international routers don't perform the filtering themselves, > they send a copy of every packet to a separate piece of equipment that > kills connections that match certain rules by sending forged TCP RST > packets to both ends of the connection. Freenet uses UDP rather than > TCP, so sending TCP RSTs wouldn't work, but perhaps they have another > way of filtering UDP. > >> 2. Would it be possible for the PRC to run Freenet nodes in order to >> determine the IP addresses of other nodes within China? > > Yes. Freenet users can choose between 'darknet' mode, in which they only > connect to their trusted friends, and 'opennet' mode, in which the node > automatically finds other nodes to connect to. Opennet users can also > have darknet friends. By running an opennet node the Chinese government > could discover opennet users in China very easily. With additional > effort it might be possible to follow the darknet connections of the > opennet users to discover some or all of the darknet users too. > >> 3. Is it true that the PRC has previously blocked Freenet? If so, how >> was this achieved? > > The protocol was previously based on TCP and there were some plaintext > bytes in the initial handshake that could be used to identify a Freenet > connection. Nowadays the protocol's based on UDP and as far as I know > there are no plaintext fields any more. > > Cheers, > Michael > _______________________________________________ > Tech mailing list > Tech at freenetproject.org > http://emu.freenetproject.org/cgi-bin/mailman/listinfo/tech _________________________________________________________________ Find the job of your dreams before someone else does http://mycareer.com.au/?s_cid=596064 From m.rogers at cs.ucl.ac.uk Mon Apr 7 16:01:32 2008 From: m.rogers at cs.ucl.ac.uk (Michael Rogers) Date: 07 Apr 2008 17:01:32 +0100 Subject: [Tech] Would it be possible to effectively shutdown Freenet within the PRC? In-Reply-To: References: <47F959FB.9050906@cs.ucl.ac.uk> Message-ID: Hi Peter, Thanks for your questions, I'm interested in the robustness/censorship-resistance issue myself so I'm happy to have a go at answering them, but take my information with a grain of salt because I'm not very familiar with the code. > So, do I understand correctly then that if my Freenet node > sends out a packet to my ISP (to be forwarded on to your Freenet node) > that packet will feature: - Encrypted content about which my ISP can know > virtually nothing (unless it has the relevant private key); - A UDP > protocol header That is, until the packet content is decrypted, there is > nothing in the header to indicate that the packet is a "Freenet packet". Right, the handshaking packets are obfuscated so they can only be recognised as Freenet packets if you know the node identifiers of both the nodes involved in the handshake, and all subsequent packets are encrypted (except the UDP header). > So it's like having a Freenet packet hidden behind encryption within the > content part of the packet? (Although I guess the encrypted Freenet > packet would have to be broken up across a number of packets). Or am I > completely off the mark? No, that's right, large messages are broken up across multiple packets and small messages are coalesced (multiple messages per packet). Each packet also contains a random amount of padding to make traffic analysis harder. > And where is the initial handshake done? I would have thought that a > handshake only becomes possible once the packet content is decrypted. The handshaking packets are obfuscated rather than encrypted - I don't know the details but there's a bit more information on the wiki: http://wiki.freenetproject.org/FreenetZeroPointSevenSecurity > That is, my ISP and your ISP look at the header of the packet I send and > then send it through to the Freenet port on your machine, believing it to > be something other than it is. I mean, isn't it only when your Freenet > node decrypts the packet content that it is able to see the Freenet > protocol? Right, the ISPs just look at the UDP header and forward the packet, they don't need to examine the payload or recognise the application-layer protocol (although of course they might try to do so in the case of a national firewall for example). > Your comment about malicious opennet nodes finding darknet friends to use > as entry points into darknets is interesting. I would assume that once > one darknet node is found in such a manner, the IP addresses of all nodes > on the relevant darknet could then be discovered. That is, if my opennet > node has a darknet friend, doesn't this mean that I have effectively > amalgamated the relevant darknet into the opennet? If so, this "darknet > friend" business seems like a dangerous idea. Or am I missing something > here? Darknet is more secure than opennet for a couple of reasons. First, only an ISP or someone with equivalent eavesdropping ability (eg a government agency) can compile a list of darknet nodes by recursively following each node's connections; on the other hand anyone who runs an opennet node can compile a list of opennet nodes. Second, once you know the address and port of an opennet node you can connect to it, which makes certain other attacks (denial of service, traffic analysis) easier. Learning the address and port of a darknet node doesn't give you the ability to connect to it, you must convince its owner to trust you first. > Oh, and finally, do I understand correctly that a denial of service > attack on Freenet would actually cause the Freenet webpage being attacked > to become more - rather than less - available?! That is, a barrage of > requests for the webpage would in fact proliferate it across the network? Yup, a barrage of requests for an existing file would be a very poor DoS attack against Freenet, requesting nonexistent files would be more effective. Cheers, Michael From ian.clarke at gmail.com Mon Apr 7 23:36:30 2008 From: ian.clarke at gmail.com (Ian Clarke) Date: Mon, 7 Apr 2008 18:36:30 -0500 Subject: [Tech] Distributed file system using routing inspired by Freenet Message-ID: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> http://video.google.com/videoplay?docid=-2372664863607209585 He mentions Freenet's use of this technique about 10-15 minutes in, they also use erasure codes, so it seems they are using a few techniques that we also use (unclear about whether we were the direct source of this inspiration). Ian. -- Email: ian at uprizer.com Cell: +1 512 422 3588 Skype: sanity From toad at amphibian.dyndns.org Tue Apr 8 11:36:34 2008 From: toad at amphibian.dyndns.org (Matthew Toseland) Date: Tue, 8 Apr 2008 12:36:34 +0100 Subject: [Tech] Distributed file system using routing inspired by Freenet In-Reply-To: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> Message-ID: <200804081236.44302.toad@amphibian.dyndns.org> On Tuesday 08 April 2008 00:36, Ian Clarke wrote: > http://video.google.com/videoplay?docid=-2372664863607209585 > > He mentions Freenet's use of this technique about 10-15 minutes in, > they also use erasure codes, so it seems they are using a few > techniques that we also use (unclear about whether we were the direct > source of this inspiration). They use 500% redundancy in their RS codes. Right now we use 150% (including the original 100%). Maybe we should increase this? A slight increase to say 200% or 250% may give significantly better performance, despite the increased overhead... Also, they discourage low uptime nodes by not giving them any extra storage. I'm not sure exactly what we can do about this, but it's a problem we need to deal with. We should also think about randomising locations less frequently. It can take a while to recover, and the current code randomizes roughly every 13 to 22 hours. It may be useful to increase this significantly? Unfortunately this parameter is very dependant on the network size and so on, it's not really something we can get a good value for from simulations... I suggest we increase it by say a factor of 4, and if we get major location distribution issues, we can reduce it again. > > Ian. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://emu.freenetproject.org/pipermail/tech/attachments/20080408/ae258f1b/attachment.pgp From toad at amphibian.dyndns.org Tue Apr 8 12:00:05 2008 From: toad at amphibian.dyndns.org (Matthew Toseland) Date: Tue, 8 Apr 2008 13:00:05 +0100 Subject: [Tech] Distributed file system using routing inspired by Freenet In-Reply-To: <200804081236.44302.toad@amphibian.dyndns.org> References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <200804081236.44302.toad@amphibian.dyndns.org> Message-ID: <200804081300.14311.toad@amphibian.dyndns.org> On Tuesday 08 April 2008 12:36, Matthew Toseland wrote: > On Tuesday 08 April 2008 00:36, Ian Clarke wrote: > > http://video.google.com/videoplay?docid=-2372664863607209585 > > > > He mentions Freenet's use of this technique about 10-15 minutes in, > > they also use erasure codes, so it seems they are using a few > > techniques that we also use (unclear about whether we were the direct > > source of this inspiration). > > They use 500% redundancy in their RS codes. Right now we use 150% (including > the original 100%). Maybe we should increase this? A slight increase to say > 200% or 250% may give significantly better performance, despite the increased > overhead... In fact, I think I can justify a figure of 200% (the original plus 100%, so 128 -> 255 blocks to fit within the 8 bit fast encoding limit). On average, in the long term, a block will be stored on 3 nodes. Obviously a lot of popular data will be stored on more nodes than 3, but in terms of the datastore, this is the approximate figure. On an average node with a 1GB datastore, the 512MB cache has a lifetime of less than a day; stuff lasts a lot longer in the store, and on average data is stored on 3 nodes (by design). We then multiply that by two from splitfile redundancy, to get a total redundancy of 6. Wuala works well with a factor of 5 redundancy... but that's entirely due to FEC. They simulated ordinary redundancy and needed a factor of 24 to be reliable, but a factor of 5 for FEC. So maybe what we need is less network level redundancy and more FEC level redundancy? So we're talking about the data itself. IMHO we can't reduce the network level redundancy much below the current store-in-3-nodes, because we do use freenet for things other than splitfiles - frost posts, the top level block, ... The top level block is a special case, it will usually be fetchable because anyone trying to fetch the splitfile will fetch it even if they give up afterwards, and even if they just followed a link in fproxy and got a size warning and changed their mind... Wuala's simulations assume 25% uptime, and they don't allow nodes to have extra storage unless they have at least 17% uptime. Can we implement something similar? We would have to not take low uptime nodes into account when determining whether we are a sink for a key, the problem with this is that we'd have to reliably tell whether nodes are low uptime... On opennet, there is enough connection churn that we're unlikely to have had a node for the many days necessary to measure this. We could reduce the connection churn but this would come at the cost of reduced connectivity - when a node disconnects, we give it a few minutes to reconnect, and then we move on. A full blown reputation system as Wuala uses would be a lot of work and a lot of debugging... > > Also, they discourage low uptime nodes by not giving them any extra storage. > I'm not sure exactly what we can do about this, but it's a problem we need to > deal with. > > We should also think about randomising locations less frequently. It can take > a while to recover, and the current code randomizes roughly every 13 to 22 > hours. It may be useful to increase this significantly? Unfortunately this > parameter is very dependant on the network size and so on, it's not really > something we can get a good value for from simulations... I suggest we > increase it by say a factor of 4, and if we get major location distribution > issues, we can reduce it again. This may be important. > > > > Ian. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://emu.freenetproject.org/pipermail/tech/attachments/20080408/7d7569f8/attachment.pgp From alejandro at mosteo.com Tue Apr 8 12:17:50 2008 From: alejandro at mosteo.com (Jano) Date: Tue, 08 Apr 2008 14:17:50 +0200 Subject: [Tech] Distributed file system using routing inspired by Freenet References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> Message-ID: Ian Clarke wrote: > http://video.google.com/videoplay?docid=-2372664863607209585 > > He mentions Freenet's use of this technique about 10-15 minutes in, > they also use erasure codes, so it seems they are using a few > techniques that we also use (unclear about whether we were the direct > source of this inspiration). What about this one: http://www.omemo.com/ It's a distributed anonymous storage, totally a-la freenet. Programmed (at least) by the guy behind mp2p and piolet (famous on its day). Haven't tested it yet. It seems the race for the next p2p gen is truly open. From alejandro at mosteo.com Tue Apr 8 12:20:51 2008 From: alejandro at mosteo.com (Jano) Date: Tue, 08 Apr 2008 14:20:51 +0200 Subject: [Tech] Distributed file system using routing inspired by Freenet References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <200804081236.44302.toad@amphibian.dyndns.org> Message-ID: Matthew Toseland wrote: (snip) > We should also think about randomising locations less frequently. It can take > a while to recover, and the current code randomizes roughly every 13 to 22 > hours. It may be useful to increase this significantly? Unfortunately this > parameter is very dependant on the network size and so on, it's not really > something we can get a good value for from simulations... I suggest we > increase it by say a factor of 4, and if we get major location distribution > issues, we can reduce it again. In this regard, I've been tracking my location for some time, sampling each hour. I can't really say if what I'm seeing is expected/sane. See attached. -------------- next part -------------- A non-text attachment was scrubbed... Name: location.png Type: image/png Size: 9425 bytes Desc: not available Url : http://emu.freenetproject.org/pipermail/tech/attachments/20080408/54bf50a7/attachment.png From m.rogers at cs.ucl.ac.uk Tue Apr 8 20:09:39 2008 From: m.rogers at cs.ucl.ac.uk (Michael Rogers) Date: Tue, 08 Apr 2008 21:09:39 +0100 Subject: [Tech] Distributed file system using routing inspired by Freenet In-Reply-To: <200804081300.14311.toad@amphibian.dyndns.org> References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <200804081236.44302.toad@amphibian.dyndns.org> <200804081300.14311.toad@amphibian.dyndns.org> Message-ID: <47FBD103.8050503@cs.ucl.ac.uk> Matthew Toseland wrote: > We then multiply that by two from splitfile redundancy, to get a total > redundancy of 6. Wuala works well with a factor of 5 redundancy... but that's > entirely due to FEC. Two FEC blocks each replicated three times aren't really comparable to five FEC blocks, are they? You can't use any combination of them to recover the data. > They simulated ordinary redundancy and needed a factor > of 24 to be reliable, but a factor of 5 for FEC. I'm not convinced their churn model is realistic - they assume that the nodes' uptimes are independent, but studies of Gnutella and Skype show strong daily and weekly cycles - if each node is online 25% of the time it doesn't follow that 25% of the nodes are online at any given time. > So maybe what we need is less network level redundancy and more FEC level > redundancy? Sounds like a good idea, although won't it lead to higher search overhead (each FEC block will be replicated fewer times)? > Wuala's simulations assume 25% uptime, and they don't allow nodes to have > extra storage unless they have at least 17% uptime. Can we implement > something similar? In theory we could reject inserts from peers that haven't been active for, say, 4 of the last 24 hours, but in practice would that just drive away users and decrease the amount of available content? > A > full blown reputation system as Wuala uses would be a lot of work and a lot > of debugging... Wuala has centralised identity management, Freenet doesn't. That means we can't prevent Sybil attacks or whitewashing, which makes designing a reputation system even harder. IMO it's a can of worms. Snakes, even. Dragons! Cheers, Michael From toad at amphibian.dyndns.org Tue Apr 8 22:58:09 2008 From: toad at amphibian.dyndns.org (Matthew Toseland) Date: Tue, 8 Apr 2008 23:58:09 +0100 Subject: [Tech] Distributed file system using routing inspired by Freenet In-Reply-To: <47FBD103.8050503@cs.ucl.ac.uk> References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <200804081300.14311.toad@amphibian.dyndns.org> <47FBD103.8050503@cs.ucl.ac.uk> Message-ID: <200804082358.16489.toad@amphibian.dyndns.org> On Tuesday 08 April 2008 21:09, Michael Rogers wrote: > Matthew Toseland wrote: > > We then multiply that by two from splitfile redundancy, to get a total > > redundancy of 6. Wuala works well with a factor of 5 redundancy... but that's > > entirely due to FEC. > > Two FEC blocks each replicated three times aren't really comparable to > five FEC blocks, are they? You can't use any combination of them to > recover the data. Sure. It's not directly comparable. However, hopefully we have a higher average uptime, and the factor of 3 replication in the stores is necessary because of non-splitfile blocks. > > > They simulated ordinary redundancy and needed a factor > > of 24 to be reliable, but a factor of 5 for FEC. > > I'm not convinced their churn model is realistic - they assume that the > nodes' uptimes are independent, but studies of Gnutella and Skype show > strong daily and weekly cycles - if each node is online 25% of the time > it doesn't follow that 25% of the nodes are online at any given time. True. What would the impact of this be? > > > So maybe what we need is less network level redundancy and more FEC level > > redundancy? > > Sounds like a good idea, although won't it lead to higher search > overhead (each FEC block will be replicated fewer times)? That may not be a bad thing, but just like on Wuala, we have requestor-side healing... > > > Wuala's simulations assume 25% uptime, and they don't allow nodes to have > > extra storage unless they have at least 17% uptime. Can we implement > > something similar? > > In theory we could reject inserts from peers that haven't been active > for, say, 4 of the last 24 hours, but in practice would that just drive > away users and decrease the amount of available content? Perhaps... I was thinking more in terms of storing stuff on nodes with reasonable uptimes... > > > A > > full blown reputation system as Wuala uses would be a lot of work and a lot > > of debugging... > > Wuala has centralised identity management, Freenet doesn't. That means > we can't prevent Sybil attacks or whitewashing, which makes designing a > reputation system even harder. IMO it's a can of worms. Snakes, even. > Dragons! :) > > Cheers, > Michael -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://emu.freenetproject.org/pipermail/tech/attachments/20080408/b9dbb660/attachment.pgp From j16sdiz+freenet at gmail.com Wed Apr 9 04:28:19 2008 From: j16sdiz+freenet at gmail.com (Daniel Cheng) Date: Wed, 9 Apr 2008 12:28:19 +0800 Subject: [Tech] Distributed file system using routing inspired by Freenet In-Reply-To: <200804081300.14311.toad@amphibian.dyndns.org> References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <200804081236.44302.toad@amphibian.dyndns.org> <200804081300.14311.toad@amphibian.dyndns.org> Message-ID: 2008/4/8 Matthew Toseland : > On Tuesday 08 April 2008 12:36, Matthew Toseland wrote: > > On Tuesday 08 April 2008 00:36, Ian Clarke wrote: > > > http://video.google.com/videoplay?docid=-2372664863607209585 > > > > > > He mentions Freenet's use of this technique about 10-15 minutes in, > > > they also use erasure codes, so it seems they are using a few > > > techniques that we also use (unclear about whether we were the direct > > > source of this inspiration). > > > > They use 500% redundancy in their RS codes. Right now we use 150% (including > > the original 100%). Maybe we should increase this? A slight increase to say > > 200% or 250% may give significantly better performance, despite the > increased > > overhead... > > In fact, I think I can justify a figure of 200% (the original plus 100%, so > 128 -> 255 blocks to fit within the 8 bit fast encoding limit). On average, > in the long term, a block will be stored on 3 nodes. Obviously a lot of > popular data will be stored on more nodes than 3, but in terms of the > datastore, this is the approximate figure. On an average node with a 1GB > datastore, the 512MB cache has a lifetime of less than a day; stuff lasts a > lot longer in the store, and on average data is stored on 3 nodes (by > design). I think the downloader would "heal" a broken file by re-inserting the missing FEC blocks, right? If that is the case, I think we can use 300% (or higher) redundancy, but only insert a random portion of them. When a downloader download this file, he insert (some other) random blocks of FEC for this file. Under this scheme, the inserter don't have to pay for a high bandwidth overhead cost, while increasing the redundancy. > [snip] Regards, Daniel Cheng From m.rogers at cs.ucl.ac.uk Wed Apr 9 08:24:31 2008 From: m.rogers at cs.ucl.ac.uk (Michael Rogers) Date: 09 Apr 2008 09:24:31 +0100 Subject: [Tech] Distributed file system using routing inspired by Freenet In-Reply-To: <200804082358.16489.toad@amphibian.dyndns.org> References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <200804081300.14311.toad@amphibian.dyndns.org> <47FBD103.8050503@cs.ucl.ac.uk> <200804082358.16489.toad@amphibian.dyndns.org> Message-ID: On Apr 8 2008, Matthew Toseland wrote: >Sure. It's not directly comparable. However, hopefully we have a higher >average uptime, and the factor of 3 replication in the stores is necessary >because of non-splitfile blocks. In theory would it be possible to use FEC for all blocks, or is it wasteful if you have less than one segment's worth of data? >> I'm not convinced their churn model is realistic - they assume that the >> nodes' uptimes are independent, but studies of Gnutella and Skype show >> strong daily and weekly cycles - if each node is online 25% of the time >> it doesn't follow that 25% of the nodes are online at any given time. > >True. What would the impact of this be? There would be fewer nodes online at certain points in the cycle (eg 4am on Monday morning) so you'd need higher redundancy (or at least a warning saying "No glot, clom Fliday"). >> > So maybe what we need is less network level redundancy and more FEC >> > level redundancy? >> >> Sounds like a good idea, although won't it lead to higher search >> overhead (each FEC block will be replicated fewer times)? > > That may not be a bad thing, but just like on Wuala, we have > requestor-side healing... True, but I was thinking of the number of hops the search will have to travel: one block replicated three times can be found more cheaply than one of three blocks replicated once each. (Higher FEC redundancy is still a good idea IMO, I'm just thinking about the tradeoffs.) >Perhaps... I was thinking more in terms of storing stuff on nodes with >reasonable uptimes... Good point, I guess it's a waste of bandwidth to store data on a transient node. Cheers, Michael From alejandro at mosteo.com Wed Apr 9 12:44:07 2008 From: alejandro at mosteo.com (Jano) Date: Wed, 09 Apr 2008 14:44:07 +0200 Subject: [Tech] Distributed file system using routing inspired by Freenet References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <200804081236.44302.toad@amphibian.dyndns.org> <200804081300.14311.toad@amphibian.dyndns.org> <47FBD103.8050503@cs.ucl.ac.uk> Message-ID: Michael Rogers wrote: > Matthew Toseland wrote: >> We then multiply that by two from splitfile redundancy, to get a total >> redundancy of 6. Wuala works well with a factor of 5 redundancy... but >> that's entirely due to FEC. > > Two FEC blocks each replicated three times aren't really comparable to > five FEC blocks, are they? You can't use any combination of them to > recover the data. > >> They simulated ordinary redundancy and needed a factor >> of 24 to be reliable, but a factor of 5 for FEC. > > I'm not convinced their churn model is realistic - they assume that the > nodes' uptimes are independent, but studies of Gnutella and Skype show > strong daily and weekly cycles - if each node is online 25% of the time > it doesn't follow that 25% of the nodes are online at any given time. I have observed this first hand in a crawler I implemented long ago for the Shareaza network. See, for example, here: http://crawler.trillinux.org/history.html (I'm no longer maintaining these pages, BTW. Somebody took over circa 2005.) From toad at amphibian.dyndns.org Wed Apr 9 14:26:44 2008 From: toad at amphibian.dyndns.org (Matthew Toseland) Date: Wed, 9 Apr 2008 15:26:44 +0100 Subject: [Tech] Distributed file system using routing inspired by Freenet In-Reply-To: References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <200804082358.16489.toad@amphibian.dyndns.org> Message-ID: <200804091526.45446.toad@amphibian.dyndns.org> On Wednesday 09 April 2008 09:24, Michael Rogers wrote: > On Apr 8 2008, Matthew Toseland wrote: > >Sure. It's not directly comparable. However, hopefully we have a higher > >average uptime, and the factor of 3 replication in the stores is necessary > >because of non-splitfile blocks. > > In theory would it be possible to use FEC for all blocks, or is it wasteful > if you have less than one segment's worth of data? Well the problem is what to do with single blocks... A frost post is an SSK plus usually a CHK for example. If a splitfile is inserted as a CHK, there will be a single top level block for the CHK. Granted these are more popular than the splitfile blocks... > > >> I'm not convinced their churn model is realistic - they assume that the > >> nodes' uptimes are independent, but studies of Gnutella and Skype show > >> strong daily and weekly cycles - if each node is online 25% of the time > >> it doesn't follow that 25% of the nodes are online at any given time. > > > >True. What would the impact of this be? > > There would be fewer nodes online at certain points in the cycle (eg 4am on > Monday morning) so you'd need higher redundancy (or at least a warning > saying "No glot, clom Fliday"). > > >> > So maybe what we need is less network level redundancy and more FEC > >> > level redundancy? > >> > >> Sounds like a good idea, although won't it lead to higher search > >> overhead (each FEC block will be replicated fewer times)? > > > > That may not be a bad thing, but just like on Wuala, we have > > requestor-side healing... > > True, but I was thinking of the number of hops the search will have to > travel: one block replicated three times can be found more cheaply than one > of three blocks replicated once each. (Higher FEC redundancy is still a > good idea IMO, I'm just thinking about the tradeoffs.) Sure, but more hops also means more storage capacity, more reliable content. On the other hand it means higher bandwidth overhead - but a lot of our bandwidth overhead is searching for stuff we can't find at the moment. > > >Perhaps... I was thinking more in terms of storing stuff on nodes with > >reasonable uptimes... > > Good point, I guess it's a waste of bandwidth to store data on a transient > node. But how would we implement that? > > Cheers, > Michael -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://emu.freenetproject.org/pipermail/tech/attachments/20080409/2ffd635a/attachment.pgp From toad at amphibian.dyndns.org Wed Apr 9 14:29:57 2008 From: toad at amphibian.dyndns.org (Matthew Toseland) Date: Wed, 9 Apr 2008 15:29:57 +0100 Subject: [Tech] Distributed file system using routing inspired by Freenet In-Reply-To: References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <200804081300.14311.toad@amphibian.dyndns.org> Message-ID: <200804091529.58162.toad@amphibian.dyndns.org> On Wednesday 09 April 2008 05:28, Daniel Cheng wrote: > 2008/4/8 Matthew Toseland : > > On Tuesday 08 April 2008 12:36, Matthew Toseland wrote: > > > On Tuesday 08 April 2008 00:36, Ian Clarke wrote: > > > > http://video.google.com/videoplay?docid=-2372664863607209585 > > > > > > > > He mentions Freenet's use of this technique about 10-15 minutes in, > > > > they also use erasure codes, so it seems they are using a few > > > > techniques that we also use (unclear about whether we were the direct > > > > source of this inspiration). > > > > > > They use 500% redundancy in their RS codes. Right now we use 150% (including > > > the original 100%). Maybe we should increase this? A slight increase to say > > > 200% or 250% may give significantly better performance, despite the > > increased > > > overhead... > > > > In fact, I think I can justify a figure of 200% (the original plus 100%, so > > 128 -> 255 blocks to fit within the 8 bit fast encoding limit). On average, > > in the long term, a block will be stored on 3 nodes. Obviously a lot of > > popular data will be stored on more nodes than 3, but in terms of the > > datastore, this is the approximate figure. On an average node with a 1GB > > datastore, the 512MB cache has a lifetime of less than a day; stuff lasts a > > lot longer in the store, and on average data is stored on 3 nodes (by > > design). > > I think the downloader would "heal" a broken file by re-inserting the > missing FEC blocks, right? > > If that is the case, I think we can use 300% (or higher) redundancy, > but only insert a random portion of them. When a downloader download > this file, he insert (some other) random blocks of FEC for this file. > Under this scheme, the inserter don't have to pay for a high bandwidth > overhead cost, while increasing the redundancy. I'm not worried about inserters paying a high bandwidth cost actually. Right now inserts are a lot faster than requests. What I'm worried about is if we have too much redundancy, our overhead in terms of data storage will be rather high, and that reduces the amount of data that is fetchable. > > > [snip] > > Regards, > Daniel Cheng -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://emu.freenetproject.org/pipermail/tech/attachments/20080409/41700598/attachment.pgp From j16sdiz+freenet at gmail.com Wed Apr 9 15:27:43 2008 From: j16sdiz+freenet at gmail.com (Daniel Cheng) Date: Wed, 9 Apr 2008 23:27:43 +0800 Subject: [Tech] Distributed file system using routing inspired by Freenet In-Reply-To: <200804091529.58162.toad@amphibian.dyndns.org> References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <200804081300.14311.toad@amphibian.dyndns.org> <200804091529.58162.toad@amphibian.dyndns.org> Message-ID: On Wed, Apr 9, 2008 at 10:29 PM, Matthew Toseland wrote: > > On Wednesday 09 April 2008 05:28, Daniel Cheng wrote: > > 2008/4/8 Matthew Toseland : > > > On Tuesday 08 April 2008 12:36, Matthew Toseland wrote: > > > > On Tuesday 08 April 2008 00:36, Ian Clarke wrote: > > > > > http://video.google.com/videoplay?docid=-2372664863607209585 > > > > > > > > > > He mentions Freenet's use of this technique about 10-15 minutes in, > > > > > they also use erasure codes, so it seems they are using a few > > > > > techniques that we also use (unclear about whether we were the direct > > > > > source of this inspiration). > > > > > > > > They use 500% redundancy in their RS codes. Right now we use 150% > (including > > > > the original 100%). Maybe we should increase this? A slight increase to > say > > > > 200% or 250% may give significantly better performance, despite the > > > increased > > > > overhead... > > > > > > In fact, I think I can justify a figure of 200% (the original plus 100%, > so > > > 128 -> 255 blocks to fit within the 8 bit fast encoding limit). On > average, > > > in the long term, a block will be stored on 3 nodes. Obviously a lot of > > > popular data will be stored on more nodes than 3, but in terms of the > > > datastore, this is the approximate figure. On an average node with a 1GB > > > datastore, the 512MB cache has a lifetime of less than a day; stuff lasts > a > > > lot longer in the store, and on average data is stored on 3 nodes (by > > > design). > > > > I think the downloader would "heal" a broken file by re-inserting the > > missing FEC blocks, right? > > > > If that is the case, I think we can use 300% (or higher) redundancy, > > but only insert a random portion of them. When a downloader download > > this file, he insert (some other) random blocks of FEC for this file. > > Under this scheme, the inserter don't have to pay for a high bandwidth > > overhead cost, while increasing the redundancy. > > I'm not worried about inserters paying a high bandwidth cost actually. Right > now inserts are a lot faster than requests. What I'm worried about is if we > have too much redundancy, our overhead in terms of data storage will be > rather high, and that reduces the amount of data that is fetchable. Disk are getting cheaper and cheaper... Also, high data redundancy means we can drop any blocks of them without problem, right? The only potential problem I have in mind is the LRU drop policy on store full. All blocks of an unpopular item may drops around the same time if we use this policy.. I think if the redundancy is high enough, we should use: - Random drop old data on store full. - LRU drop on Cache full. which should give a good balance of data retention and load balancing Regards, Daniel Cheng From alejandro at mosteo.com Wed Apr 9 15:45:36 2008 From: alejandro at mosteo.com (Jano) Date: Wed, 09 Apr 2008 17:45:36 +0200 Subject: [Tech] Distributed file system using routing inspired by Freenet References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <200804081300.14311.toad@amphibian.dyndns.org> <200804091529.58162.toad@amphibian.dyndns.org> Message-ID: Matthew Toseland wrote: > On Wednesday 09 April 2008 05:28, Daniel Cheng wrote: >> 2008/4/8 Matthew Toseland >> : >> > On Tuesday 08 April 2008 12:36, Matthew Toseland wrote: >> > > On Tuesday 08 April 2008 00:36, Ian Clarke wrote: >> > > > http://video.google.com/videoplay?docid=-2372664863607209585 >> > > > >> > > > He mentions Freenet's use of this technique about 10-15 minutes in, >> > > > they also use erasure codes, so it seems they are using a few >> > > > techniques that we also use (unclear about whether we were the direct >> > > > source of this inspiration). >> > > >> > > They use 500% redundancy in their RS codes. Right now we use 150% > (including >> > > the original 100%). Maybe we should increase this? A slight increase to > say >> > > 200% or 250% may give significantly better performance, despite the >> > increased >> > > overhead... >> > >> > In fact, I think I can justify a figure of 200% (the original plus 100%, > so >> > 128 -> 255 blocks to fit within the 8 bit fast encoding limit). On > average, >> > in the long term, a block will be stored on 3 nodes. Obviously a lot of >> > popular data will be stored on more nodes than 3, but in terms of the >> > datastore, this is the approximate figure. On an average node with a 1GB >> > datastore, the 512MB cache has a lifetime of less than a day; stuff lasts > a >> > lot longer in the store, and on average data is stored on 3 nodes (by >> > design). >> >> I think the downloader would "heal" a broken file by re-inserting the >> missing FEC blocks, right? >> >> If that is the case, I think we can use 300% (or higher) redundancy, >> but only insert a random portion of them. When a downloader download >> this file, he insert (some other) random blocks of FEC for this file. >> Under this scheme, the inserter don't have to pay for a high bandwidth >> overhead cost, while increasing the redundancy. > > I'm not worried about inserters paying a high bandwidth cost actually. Right > now inserts are a lot faster than requests. What I'm worried about is if we > have too much redundancy, our overhead in terms of data storage will be > rather high, and that reduces the amount of data that is fetchable. FWIW, my store is 50% full (of 8GB) after several several weeks of mostly 24/7 uptime, whereas the cache gets filled pretty fast. Plus reinsertions are requested quite often in frost shortly after an announcement. Could this mean that stores aren't being currently fully exploited? From m.rogers at cs.ucl.ac.uk Wed Apr 9 16:13:25 2008 From: m.rogers at cs.ucl.ac.uk (Michael Rogers) Date: 09 Apr 2008 17:13:25 +0100 Subject: [Tech] Distributed file system using routing inspired by Freenet In-Reply-To: References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <200804081300.14311.toad@amphibian.dyndns.org> <200804091529.58162.toad@amphibian.dyndns.org> Message-ID: On Apr 9 2008, Daniel Cheng wrote: >Disk are getting cheaper and cheaper... >Also, high data redundancy means we can drop any blocks of them without >problem, right? I'm not sure it's that simple - imagine two files with unequal popularity. If we increase the redundancy of both files, causing some blocks to dropped, what will happen to the reliability of the two files? > The only potential problem I have in mind is the LRU drop policy on store > full. All blocks of an unpopular item may drops around the same time if > we use this policy.. Good point. >I think if the redundancy is high enough, we should use: > - Random drop old data on store full. > - LRU drop on Cache full. >which should give a good balance of data retention and load balancing I've recently done some simulations of LRU vs FIFO vs random replacement, but I haven't had time to write up the results yet. The short version is that random replacement performs better than LRU or FIFO for some workloads, and isn't significantly worse for any workload. I didn't simulate multi-block files or FEC, though. Cheers, Michael From m.rogers at cs.ucl.ac.uk Wed Apr 9 16:20:07 2008 From: m.rogers at cs.ucl.ac.uk (Michael Rogers) Date: 09 Apr 2008 17:20:07 +0100 Subject: [Tech] Distributed file system using routing inspired by Freenet In-Reply-To: <200804091526.45446.toad@amphibian.dyndns.org> References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <200804082358.16489.toad@amphibian.dyndns.org> <200804091526.45446.toad@amphibian.dyndns.org> Message-ID: On Apr 9 2008, Matthew Toseland wrote: > Well the problem is what to do with single blocks... A frost post is an > SSK plus usually a CHK for example. If a splitfile is inserted as a CHK, > there will be a single top level block for the CHK. Granted these are > more popular than the splitfile blocks... Ah, I see what you mean, the top level can't be FECed because we don't want to have to use several keys to identify the file. >> Good point, I guess it's a waste of bandwidth to store data on a >> transient node. > >But how would we implement that? Don't send inserts to, or accept inserts from, peers unless they've been active for x of the last y hours? But I reckon it would annoy people to be told that they couldn't post a Frost message because their node wasn't reliable enough, it would cause the store to fill up even more slowly, and might it also have implications for anonymity? Cheers, Michael From toad at amphibian.dyndns.org Wed Apr 9 21:52:52 2008 From: toad at amphibian.dyndns.org (Matthew Toseland) Date: Wed, 9 Apr 2008 22:52:52 +0100 Subject: [Tech] Distributed file system using routing inspired by Freenet In-Reply-To: References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <200804091529.58162.toad@amphibian.dyndns.org> Message-ID: <200804092252.59529.toad@amphibian.dyndns.org> On Wednesday 09 April 2008 16:27, Daniel Cheng wrote: > On Wed, Apr 9, 2008 at 10:29 PM, Matthew Toseland > wrote: > > > > On Wednesday 09 April 2008 05:28, Daniel Cheng wrote: > > > 2008/4/8 Matthew Toseland : > > > > On Tuesday 08 April 2008 12:36, Matthew Toseland wrote: > > > > > On Tuesday 08 April 2008 00:36, Ian Clarke wrote: > > > > > > http://video.google.com/videoplay?docid=-2372664863607209585 > > > > > > > > > > > > He mentions Freenet's use of this technique about 10-15 minutes in, > > > > > > they also use erasure codes, so it seems they are using a few > > > > > > techniques that we also use (unclear about whether we were the direct > > > > > > source of this inspiration). > > > > > > > > > > They use 500% redundancy in their RS codes. Right now we use 150% > > (including > > > > > the original 100%). Maybe we should increase this? A slight increase to > > say > > > > > 200% or 250% may give significantly better performance, despite the > > > > increased > > > > > overhead... > > > > > > > > In fact, I think I can justify a figure of 200% (the original plus 100%, > > so > > > > 128 -> 255 blocks to fit within the 8 bit fast encoding limit). On > > average, > > > > in the long term, a block will be stored on 3 nodes. Obviously a lot of > > > > popular data will be stored on more nodes than 3, but in terms of the > > > > datastore, this is the approximate figure. On an average node with a 1GB > > > > datastore, the 512MB cache has a lifetime of less than a day; stuff lasts > > a > > > > lot longer in the store, and on average data is stored on 3 nodes (by > > > > design). > > > > > > I think the downloader would "heal" a broken file by re-inserting the > > > missing FEC blocks, right? > > > > > > If that is the case, I think we can use 300% (or higher) redundancy, > > > but only insert a random portion of them. When a downloader download > > > this file, he insert (some other) random blocks of FEC for this file. > > > Under this scheme, the inserter don't have to pay for a high bandwidth > > > overhead cost, while increasing the redundancy. > > > > I'm not worried about inserters paying a high bandwidth cost actually. Right > > now inserts are a lot faster than requests. What I'm worried about is if we > > have too much redundancy, our overhead in terms of data storage will be > > rather high, and that reduces the amount of data that is fetchable. > > Disk are getting cheaper and cheaper... Yes, but the flipside is people want to share bigger and bigger files. > Also, high data redundancy means we can drop any blocks of them without > problem, right? Not necessarily. We have data redundancy precisely because blocks get dropped for various reasons e.g. because a node goes offline. > > The only potential problem I have in mind is the LRU drop policy on store full. > All blocks of an unpopular item may drops around the same time if we use > this policy.. > > I think if the redundancy is high enough, we should use: > - Random drop old data on store full. Hmmm. I dunno... simulations would be interesting. > - LRU drop on Cache full. > which should give a good balance of data retention and load balancing > > Regards, > Daniel Cheng -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://emu.freenetproject.org/pipermail/tech/attachments/20080409/7917e224/attachment.pgp From toad at amphibian.dyndns.org Wed Apr 9 21:54:51 2008 From: toad at amphibian.dyndns.org (Matthew Toseland) Date: Wed, 9 Apr 2008 22:54:51 +0100 Subject: [Tech] Distributed file system using routing inspired by Freenet In-Reply-To: References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <200804091526.45446.toad@amphibian.dyndns.org> Message-ID: <200804092254.51923.toad@amphibian.dyndns.org> On Wednesday 09 April 2008 17:20, Michael Rogers wrote: > On Apr 9 2008, Matthew Toseland wrote: > > Well the problem is what to do with single blocks... A frost post is an > > SSK plus usually a CHK for example. If a splitfile is inserted as a CHK, > > there will be a single top level block for the CHK. Granted these are > > more popular than the splitfile blocks... > > Ah, I see what you mean, the top level can't be FECed because we don't want > to have to use several keys to identify the file. Well, there *are* ways we could do that... > > >> Good point, I guess it's a waste of bandwidth to store data on a > >> transient node. > > > >But how would we implement that? > > Don't send inserts to, or accept inserts from, peers unless they've been > active for x of the last y hours? But I reckon it would annoy people to be > told that they couldn't post a Frost message because their node wasn't > reliable enough, it would cause the store to fill up even more slowly, and > might it also have implications for anonymity? Yeah, that doesn't work. And detecting uptime is difficult on opennet anyway - you have to rely on what they tell you because of churn. > > Cheers, > Michael -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://emu.freenetproject.org/pipermail/tech/attachments/20080409/4bc22cd8/attachment.pgp From toad at amphibian.dyndns.org Wed Apr 9 21:55:57 2008 From: toad at amphibian.dyndns.org (Matthew Toseland) Date: Wed, 9 Apr 2008 22:55:57 +0100 Subject: [Tech] Distributed file system using routing inspired by Freenet In-Reply-To: References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> Message-ID: <200804092255.58339.toad@amphibian.dyndns.org> On Wednesday 09 April 2008 17:13, Michael Rogers wrote: > I've recently done some simulations of LRU vs FIFO vs random replacement, > but I haven't had time to write up the results yet. The short version is > that random replacement performs better than LRU or FIFO for some > workloads, and isn't significantly worse for any workload. I didn't > simulate multi-block files or FEC, though. Any ideas why? Did you simulate a two layer store, a plain cache, or a plain store? > > Cheers, > Michael -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://emu.freenetproject.org/pipermail/tech/attachments/20080409/b899df56/attachment.pgp From j16sdiz+freenet at gmail.com Wed Apr 9 22:48:32 2008 From: j16sdiz+freenet at gmail.com (Daniel Cheng) Date: Thu, 10 Apr 2008 06:48:32 +0800 Subject: [Tech] Distributed file system using routing inspired by Freenet In-Reply-To: References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <200804081300.14311.toad@amphibian.dyndns.org> <200804091529.58162.toad@amphibian.dyndns.org> Message-ID: On Thu, Apr 10, 2008 at 12:13 AM, Michael Rogers wrote: > On Apr 9 2008, Daniel Cheng wrote: > >Disk are getting cheaper and cheaper... > >Also, high data redundancy means we can drop any blocks of them without > >problem, right? > > I'm not sure it's that simple - imagine two files with unequal popularity. > If we increase the redundancy of both files, causing some blocks to > dropped, what will happen to the reliability of the two files? I guess both files are spread over a number of node, and only a small portion would overlap. (just guess, i don't know how freenet really works) Just one popular file can't push another file out. In previous posts, I had proposed to heal *only* a random portion of file of block. This make redundancy grown invert exponentially with popularity.. It's really hard for a single file to had that kind of popularity. > > The only potential problem I have in mind is the LRU drop policy on store > > full. All blocks of an unpopular item may drops around the same time if > > we use this policy.. > > Good point. > > > >I think if the redundancy is high enough, we should use: > > - Random drop old data on store full. > > - LRU drop on Cache full. > >which should give a good balance of data retention and load balancing > > I've recently done some simulations of LRU vs FIFO vs random replacement, > but I haven't had time to write up the results yet. The short version is > that random replacement performs better than LRU or FIFO for some > workloads, and isn't significantly worse for any workload. I didn't > simulate multi-block files or FEC, though. > > Cheers, > Michael > > > _______________________________________________ > Tech mailing list > Tech at freenetproject.org > http://emu.freenetproject.org/cgi-bin/mailman/listinfo/tech > From juiceman69 at gmail.com Thu Apr 10 02:00:05 2008 From: juiceman69 at gmail.com (Juiceman) Date: Wed, 9 Apr 2008 22:00:05 -0400 Subject: [Tech] Distributed file system using routing inspired by Freenet In-Reply-To: References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <200804082358.16489.toad@amphibian.dyndns.org> <200804091526.45446.toad@amphibian.dyndns.org> Message-ID: <8b525dee0804091900l5a37bfa0k6ac32098560b0e03@mail.gmail.com> On Wed, Apr 9, 2008 at 12:20 PM, Michael Rogers wrote: > On Apr 9 2008, Matthew Toseland wrote: > > Well the problem is what to do with single blocks... A frost post is an > > SSK plus usually a CHK for example. If a splitfile is inserted as a CHK, > > there will be a single top level block for the CHK. Granted these are > > more popular than the splitfile blocks... > > Ah, I see what you mean, the top level can't be FECed because we don't want > to have to use several keys to identify the file. > > > >> Good point, I guess it's a waste of bandwidth to store data on a > >> transient node. > > > >But how would we implement that? > > Don't send inserts to, or accept inserts from, peers unless they've been > active for x of the last y hours? But I reckon it would annoy people to be > told that they couldn't post a Frost message because their node wasn't > reliable enough, it would cause the store to fill up even more slowly, and > might it also have implications for anonymity? Also, remember one of the target audience for Freenet is dissidents or someone with important whistle blower information that needs to be released anonymously. Can we expect these folks to run a node for several hours just to publish data? From j16sdiz+freenet at gmail.com Thu Apr 10 02:10:08 2008 From: j16sdiz+freenet at gmail.com (Daniel Cheng) Date: Thu, 10 Apr 2008 10:10:08 +0800 Subject: [Tech] Distributed file system using routing inspired by Freenet In-Reply-To: <8b525dee0804091900l5a37bfa0k6ac32098560b0e03@mail.gmail.com> References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <200804082358.16489.toad@amphibian.dyndns.org> <200804091526.45446.toad@amphibian.dyndns.org> <8b525dee0804091900l5a37bfa0k6ac32098560b0e03@mail.gmail.com> Message-ID: On Thu, Apr 10, 2008 at 10:00 AM, Juiceman wrote: > On Wed, Apr 9, 2008 at 12:20 PM, Michael Rogers wrote: > > On Apr 9 2008, Matthew Toseland wrote: > > > Well the problem is what to do with single blocks... A frost post is an > > > SSK plus usually a CHK for example. If a splitfile is inserted as a CHK, > > > there will be a single top level block for the CHK. Granted these are > > > more popular than the splitfile blocks... > > > > Ah, I see what you mean, the top level can't be FECed because we don't want > > to have to use several keys to identify the file. > > > > > > >> Good point, I guess it's a waste of bandwidth to store data on a > > >> transient node. > > > > > >But how would we implement that? > > > > Don't send inserts to, or accept inserts from, peers unless they've been > > active for x of the last y hours? But I reckon it would annoy people to be > > told that they couldn't post a Frost message because their node wasn't > > reliable enough, it would cause the store to fill up even more slowly, and > > might it also have implications for anonymity? > > Also, remember one of the target audience for Freenet is dissidents or > someone with important whistle blower information that needs to be > released anonymously. Can we expect these folks to run a node for > several hours just to publish data? > But they won't insert large files, would they? How about limiting the insertion rate? We can do this on inserter's node. Regards, Daniel Cheng From colin at sq7.org Thu Apr 10 05:40:49 2008 From: colin at sq7.org (Colin Davis) Date: Thu, 10 Apr 2008 01:40:49 -0400 Subject: [Tech] Distributed file system using routing inspired by Freenet In-Reply-To: References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <200804081300.14311.toad@amphibian.dyndns.org> <47FBD103.8050503@cs.ucl.ac.uk> <200804082358.16489.toad@amphibian.dyndns.org> Message-ID: <47FDA861.1000505@sq7.org> Just another stupid question from the perspective of a user- Is there a reason why the number of redundant FEC blocks isn't be user-configurable? "How redundant do you want your data to be? The more-redundant you specify, the slower it will be to insert, but the more resilient the data will be to network damage" Faster Inserts -------------------[]------More Reliable Data Users who want their data better propagated are likely to submit it to multiple nodes anyway, increasing the total number of blocks.. But this may end up having 3 copies of block A, and none of block B C D, so it can't heal the file. It just seems like any freenet-ettique behavior needs to be enforced by the other clients observing behavior, rather than the original client being trusted to do the right thing, or only insert in a certain way. From toad at amphibian.dyndns.org Thu Apr 10 15:36:01 2008 From: toad at amphibian.dyndns.org (Matthew Toseland) Date: Thu, 10 Apr 2008 16:36:01 +0100 Subject: [Tech] Distributed file system using routing inspired by Freenet In-Reply-To: <47FDA861.1000505@sq7.org> References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <47FDA861.1000505@sq7.org> Message-ID: <200804101636.08924.toad@amphibian.dyndns.org> On Thursday 10 April 2008 06:40, Colin Davis wrote: > Just another stupid question from the perspective of a user- Is there a > reason why the number of redundant FEC blocks isn't be user-configurable? > > "How redundant do you want your data to be? The more-redundant you > specify, the slower it will be to insert, but the more resilient the > data will be to network damage" > Faster Inserts -------------------[]------More Reliable Data > > Users who want their data better propagated are likely to submit it to > multiple nodes anyway, increasing the total number of blocks.. But this > may end up having 3 copies of block A, and none of block B C D, so it > can't heal the file. > > It just seems like any freenet-ettique behavior needs to be enforced by > the other clients observing behavior, rather than the original client > being trusted to do the right thing, or only insert in a certain way. If we make it configurable, everyone will increase it, and because everyone has increased it, everyone will increase it some more. This is what happened on 0.3 with HTL, which is one reason we have configurable HTL any more. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://emu.freenetproject.org/pipermail/tech/attachments/20080410/d7f9e574/attachment.pgp From colin at sq7.org Thu Apr 10 16:47:30 2008 From: colin at sq7.org (Colin Davis) Date: Thu, 10 Apr 2008 12:47:30 -0400 Subject: [Tech] Distributed file system using routing inspired by Freenet In-Reply-To: <200804101636.08924.toad@amphibian.dyndns.org> References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <47FDA861.1000505@sq7.org> <200804101636.08924.toad@amphibian.dyndns.org> Message-ID: <47FE44A2.6050707@sq7.org> > If we make it configurable, everyone will increase it, and because everyone > has increased it, everyone will increase it some more. This is what happened > on 0.3 with HTL, which is one reason we have configurable HTL any more. > > Forgive me for being naive, but that seems more like a flaw in the network design than tweaking the FEC setting. Consider this- After freenet is released, JohnQHacker puts up a website, offering TweakedFreenet.jar, which is recompiled to push more FEC blocks, to more servers, and generally be very greedy. Wouldn't you get into the same Arms race, with people needing to also download the TweakedFreenet to keep up? It seems like some sort of "in network" solution, such as a variation of a tit-for-tat measurement.. Inserting file A with 100X redundancy is approximately the same as inserting file B which is 100X the size.. If we enforce fairness on accepting traffic unless they've "earned" it, a user can then decide how to "spend" that bandwidth.. One mostly lossy 10M file, or 10X redundancy on your 1M file. -Colin > ------------------------------------------------------------------------ > > _______________________________________________ > Tech mailing list > Tech at freenetproject.org > http://emu.freenetproject.org/cgi-bin/mailman/listinfo/tech From m.rogers at cs.ucl.ac.uk Thu Apr 10 18:14:43 2008 From: m.rogers at cs.ucl.ac.uk (Michael Rogers) Date: Thu, 10 Apr 2008 19:14:43 +0100 Subject: [Tech] Distributed file system using routing inspired by Freenet In-Reply-To: <47FE44A2.6050707@sq7.org> References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <47FDA861.1000505@sq7.org> <200804101636.08924.toad@amphibian.dyndns.org> <47FE44A2.6050707@sq7.org> Message-ID: <47FE5913.6030808@cs.ucl.ac.uk> Colin Davis wrote: > It seems like some sort of "in network" solution, such as a variation of > a tit-for-tat measurement.. Inserting file A with 100X redundancy is > approximately the same as inserting file B which is 100X the size.. If > we enforce fairness on accepting traffic unless they've "earned" it, a > user can then decide how to "spend" that bandwidth.. One mostly lossy > 10M file, or 10X redundancy on your 1M file. In principle I think this is a great idea. In practice I've spent a lot of time working on tit-for-tat-ish incentive mechanisms for multi-hop networks, without much success. That probably just means I should find another line of work, but it might also mean the problem is harder than it looks. In a single-hop network such as BitTorrent, the value provided by a neighbouring node is directly related to how much it spends on you: if it spends 1MB of bandwidth uploading to you, you receive 1MB of data (or some fixed fraction of 1MB, allowing for overhead), all of which is directly useful to you. That makes it easy to design strategies that reward cooperative neighbours and punish uncooperative neighbours. (It turns out that TFT isn't actually a very good strategy in this context, but the point is that good strategies can be found.) But in a multi-hop network the relationship between cost and benefit is more complicated: assuming all nodes allocate bandwidth to cooperative neighbours, if you receive a request from neighbour A, should you forward it to neighbour B? First, will you get a response or will you spend the bandwidth and have nothing to show for it? Second, if you get a response and return it to A, will the cooperation you earn from A be worth more than the cooperation previously earned from B and spent on A's request? I've been banging my head against this problem for a while, and I can't come up with a model where it makes sense for selfish nodes to forward requests. It makes sense to answer requests locally if you can, to earn cooperation from your neighbours, but it doesn't make sense to forward them. Unfortunately if everyone behaves like that, the network doesn't function. Doubtless someone else can solve this problem, but at this stage I'm just hoping they won't solve it before my thesis is written up. ;-) Cheers, Michael From toad at amphibian.dyndns.org Thu Apr 10 18:35:27 2008 From: toad at amphibian.dyndns.org (Matthew Toseland) Date: Thu, 10 Apr 2008 19:35:27 +0100 Subject: [Tech] Tit for tat is difficult was Re: Distributed file system using routing inspired by Freenet In-Reply-To: <47FE5913.6030808@cs.ucl.ac.uk> References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <47FE44A2.6050707@sq7.org> <47FE5913.6030808@cs.ucl.ac.uk> Message-ID: <200804101935.34561.toad@amphibian.dyndns.org> On Thursday 10 April 2008 19:14, Michael Rogers wrote: > Colin Davis wrote: > > It seems like some sort of "in network" solution, such as a variation of > > a tit-for-tat measurement.. Inserting file A with 100X redundancy is > > approximately the same as inserting file B which is 100X the size.. If > > we enforce fairness on accepting traffic unless they've "earned" it, a > > user can then decide how to "spend" that bandwidth.. One mostly lossy > > 10M file, or 10X redundancy on your 1M file. > > In principle I think this is a great idea. In practice I've spent a lot > of time working on tit-for-tat-ish incentive mechanisms for multi-hop > networks, without much success. That probably just means I should find > another line of work, but it might also mean the problem is harder than > it looks. This is interesting. We will eventually need some form of tit for tat, won't we? Not necessarily in inserts, IIRC we talked about it as a way to prevent an attacker flooding opennet with spam requests/inserts? It's something we've talked about for a long time anyway... > > In a single-hop network such as BitTorrent, the value provided by a > neighbouring node is directly related to how much it spends on you: if > it spends 1MB of bandwidth uploading to you, you receive 1MB of data (or > some fixed fraction of 1MB, allowing for overhead), all of which is > directly useful to you. That makes it easy to design strategies that > reward cooperative neighbours and punish uncooperative neighbours. (It > turns out that TFT isn't actually a very good strategy in this context, > but the point is that good strategies can be found.) > > But in a multi-hop network the relationship between cost and benefit is > more complicated: assuming all nodes allocate bandwidth to cooperative > neighbours, if you receive a request from neighbour A, should you > forward it to neighbour B? First, will you get a response or will you > spend the bandwidth and have nothing to show for it? Second, if you get > a response and return it to A, will the cooperation you earn from A be > worth more than the cooperation previously earned from B and spent on > A's request? > > I've been banging my head against this problem for a while, and I can't > come up with a model where it makes sense for selfish nodes to forward > requests. It makes sense to answer requests locally if you can, to earn > cooperation from your neighbours, but it doesn't make sense to forward > them. Unfortunately if everyone behaves like that, the network doesn't > function. > > Doubtless someone else can solve this problem, but at this stage I'm > just hoping they won't solve it before my thesis is written up. ;-) > > Cheers, > Michael -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://emu.freenetproject.org/pipermail/tech/attachments/20080410/a3dcc1ca/attachment.pgp From sandos at home.se Thu Apr 10 19:32:06 2008 From: sandos at home.se (=?UTF-8?B?Sm9obiBCw6Rja3N0cmFuZA==?=) Date: Thu, 10 Apr 2008 21:32:06 +0200 Subject: [Tech] Distributed file system using routing inspired by Freenet In-Reply-To: References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <200804081236.44302.toad@amphibian.dyndns.org> Message-ID: <47FE6B36.3020305@home.se> Jano wrote: > Matthew Toseland wrote: > > (snip) > >> We should also think about randomising locations less frequently. It can take >> a while to recover, and the current code randomizes roughly every 13 to 22 >> hours. It may be useful to increase this significantly? Unfortunately this >> parameter is very dependant on the network size and so on, it's not really >> something we can get a good value for from simulations... I suggest we >> increase it by say a factor of 4, and if we get major location distribution >> issues, we can reduce it again. >> > > In this regard, I've been tracking my location for some time, sampling each > hour. I can't really say if what I'm seeing is expected/sane. See attached. > Interesting. I think also avg. store location and maybe hit location would be interesting, too, no? --- John B?ckstrand From m.rogers at cs.ucl.ac.uk Sun Apr 13 15:29:44 2008 From: m.rogers at cs.ucl.ac.uk (Michael Rogers) Date: Sun, 13 Apr 2008 16:29:44 +0100 Subject: [Tech] Tit for tat is difficult was Re: Distributed file system using routing inspired by Freenet In-Reply-To: <200804101935.34561.toad@amphibian.dyndns.org> References: <823242bd0804071636w4ad9fbfo4e2191e05b5e64@mail.gmail.com> <47FE44A2.6050707@sq7.org> <47FE5913.6030808@cs.ucl.ac.uk> <200804101935.34561.toad@amphibian.dyndns.org> Message-ID: <480226E8.9070409@cs.ucl.ac.uk> Matthew Toseland wrote: > This is interesting. We will eventually need some form of tit for tat, > won't we? Not necessarily in inserts, IIRC we talked about it as a way > to prevent an attacker flooding opennet with spam requests/inserts? > It's something we've talked about for a long time anyway... Yup, I used to think it would work (at least for requests, inserts are more complicated) but I'm not so sure any more. Imagine a node modified so that when it receives a request, it returns the data if it's in the cache or the store; otherwise it returns RNF with the remaining hops - it never forwards requests. This node will return fewer hits than a normal node, but it might return more hits *per byte of bandwidth used*. In that case, even if there's a TFT-like incentive mechanism, it might make sense for selfish users to modify their nodes. But if everyone does that, the network collapses. Cheers, Michael From j16sdiz+freenet at gmail.com Tue Apr 15 03:02:30 2008 From: j16sdiz+freenet at gmail.com (Daniel Cheng) Date: Tue, 15 Apr 2008 11:02:30 +0800 Subject: [Tech] node swapping algorithm Message-ID: Hi list, On LocationManager#shouldSwap(), I wonder if we should use a geometric mean instead of just multiplying them together. Let's see the current implement: 748 double A = 1.0; 749 for(int i=0;iB) return true; #4 And now, we compare A and B. This is unfair! Regards, Daniel Cheng From j16sdiz+freenet at gmail.com Tue Apr 15 03:44:32 2008 From: j16sdiz+freenet at gmail.com (Daniel Cheng) Date: Tue, 15 Apr 2008 11:44:32 +0800 Subject: [Tech] node swapping algorithm In-Reply-To: References: Message-ID: On Tue, Apr 15, 2008 at 11:02 AM, Daniel Cheng wrote: > Hi list, > > On LocationManager#shouldSwap(), I wonder if we should use a geometric > mean instead of just multiplying them together. As an experiment, I have prepared a patch here: http://sdiz.net/experimental/geomatic-mean-exponential-shouldswap.patch This is just a proof-of-concept patch. Regards, Daniel Cheng From m.rogers at cs.ucl.ac.uk Tue Apr 15 08:28:23 2008 From: m.rogers at cs.ucl.ac.uk (Michael Rogers) Date: Tue, 15 Apr 2008 09:28:23 +0100 Subject: [Tech] node swapping algorithm In-Reply-To: References: Message-ID: <48046727.1070003@cs.ucl.ac.uk> Daniel Cheng wrote: > 750 if(Math.abs(friendLocs[i] - myLoc) <= > Double.MIN_VALUE*2) continue; What's the purpose of this check and why does it use Math.abs instead of Location.distance? Cheers, Michael From toad at amphibian.dyndns.org Tue Apr 15 10:28:44 2008 From: toad at amphibian.dyndns.org (Matthew Toseland) Date: Tue, 15 Apr 2008 11:28:44 +0100 Subject: [Tech] node swapping algorithm In-Reply-To: References: Message-ID: <200804151128.50568.toad@amphibian.dyndns.org> On Tuesday 15 April 2008 04:02, Daniel Cheng wrote: > Hi list, > > On LocationManager#shouldSwap(), I wonder if we should use a geometric > mean instead of just multiplying them together. > Let's see the current implement: > > 748 double A = 1.0; > 749 for(int i=0;i 750 if(Math.abs(friendLocs[i] - myLoc) <= > Double.MIN_VALUE*2) continue; > > #1. some friends are skipped. > this number should be small -- my friends shouldn't be sooo close to me. > rare, but not impossible (?) It's possible with a race condition. If it does happen it will sabotage things, so we check for it. If it persists we randomise our location elsewhere. > > 751 A *= Location.distance(friendLocs[i], myLoc); > 752 } > 753 for(int i=0;i 754 if(Math.abs(hisFriendLocs[i] - hisLoc) <= > Double.MIN_VALUE*2) continue; > > #2. again some friends are skipped. (same as #1) > > 755 A *= Location.distance(hisFriendLocs[i], hisLoc); > 756 } > 757 > 758 // B = the same, with our two values swapped > 759 double B = 1.0; > 760 for(int i=0;i > ( never mention his number of friends may not be the same as mine > should we use for(int i=0;i ;i > 761 if(Math.abs(friendLocs[i] - hisLoc) <= > Double.MIN_VALUE*2) continue; > > #3. skip yet other friends... > but this time, we compare my friend's location with his location .. > I have him as my friend, so this is skipped _at least_ once. Normally we don't swap with our direct peers. There is a 10 hop HTL. So no, we don't skip him. I don't understand how it is different - as far as I can see it's symmetrical, first we calculate the product for the current situation, then we calculate it for it being swapped. Either way we ignore nodes which are too close together in order to avoid zeros. > > The number of friends skipped here is not the same as those in #1 and #3.. > i.e. A and B are calculated using different number of samples. > > as Location.distance() < 1.0 , the more round you multiply, > the less the value. > > > 762 B *= Location.distance(friendLocs[i], hisLoc); > 763 } > 764 for(int i=0;i 765 if(Math.abs(hisFriendLocs[i] - myLoc) <= > Double.MIN_VALUE*2) continue; > 766 B *= Location.distance(hisFriendLocs[i], myLoc); > 767 } > > [...] > > 771 if(A>B) return true; > > #4 And now, we compare A and B. This is unfair! No it isn't, it's the algorithm specified, which is an application of the Metropolis-Hastings algorithm. > > > Regards, > Daniel Cheng -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://emu.freenetproject.org/pipermail/tech/attachments/20080415/d6ccdb58/attachment.pgp From toad at amphibian.dyndns.org Tue Apr 15 10:30:08 2008 From: toad at amphibian.dyndns.org (Matthew Toseland) Date: Tue, 15 Apr 2008 11:30:08 +0100 Subject: [Tech] node swapping algorithm In-Reply-To: <48046727.1070003@cs.ucl.ac.uk> References: <48046727.1070003@cs.ucl.ac.uk> Message-ID: <200804151130.09461.toad@amphibian.dyndns.org> On Tuesday 15 April 2008 09:28, Michael Rogers wrote: > Daniel Cheng wrote: > > 750 if(Math.abs(friendLocs[i] - myLoc) <= > > Double.MIN_VALUE*2) continue; > > What's the purpose of this check and why does it use Math.abs instead of > Location.distance? To prevent unjustified zeros when there is a race condition or other error causing there to be two nodes with the same location (e.g. when a swap ends up swapping with a direct peer). > > Cheers, > Michael -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://emu.freenetproject.org/pipermail/tech/attachments/20080415/9a1ac4b1/attachment.pgp From m.rogers at cs.ucl.ac.uk Tue Apr 15 13:20:19 2008 From: m.rogers at cs.ucl.ac.uk (Michael Rogers) Date: Tue, 15 Apr 2008 14:20:19 +0100 Subject: [Tech] node swapping algorithm In-Reply-To: <200804151130.09461.toad@amphibian.dyndns.org> References: <48046727.1070003@cs.ucl.ac.uk> <200804151130.09461.toad@amphibian.dyndns.org> Message-ID: <4804AB93.3070402@cs.ucl.ac.uk> Matthew Toseland wrote: > On Tuesday 15 April 2008 09:28, Michael Rogers wrote: >> Daniel Cheng wrote: >>> 750 if(Math.abs(friendLocs[i] - myLoc) <= >>> Double.MIN_VALUE*2) continue; >> What's the purpose of this check and why does it use Math.abs instead of >> Location.distance? > > To prevent unjustified zeros when there is a race condition or other error > causing there to be two nodes with the same location (e.g. when a swap ends > up swapping with a direct peer). If it's checking for strictly identical locations, why not use == ? Surely there's no chance of rounding errors when we're talking about two bit-for-bit copies of the same value? OTOH if it's checking for nearly-identical locations then shouldn't it use Location.distance rather than Math.abs? But either way, this check doesn't stop the product of the distances from rounding to zero if the factors are small enough - if that's the intent then why not just test whether the product is zero after leaving the loop? Cheers, Michael From ian.clarke at gmail.com Wed Apr 16 01:46:15 2008 From: ian.clarke at gmail.com (Ian Clarke) Date: Tue, 15 Apr 2008 20:46:15 -0500 Subject: [Tech] node swapping algorithm In-Reply-To: References: Message-ID: <823242bd0804151846l3b461c3yb4496a1cf08f8aef@mail.gmail.com> On Mon, Apr 14, 2008 at 10:02 PM, Daniel Cheng > wrote: > On LocationManager#shouldSwap(), I wonder if we should use a geometric > mean instead of just multiplying them together. I don't pretend to fully understand the math behind it, but I don't think Oskar made this decision arbitrarily. We definitely should *not* tinker with this stuff without buy-in from him. Ian. -- Email: ian at uprizer.com Cell: +1 512 422 3588 Skype: sanity -------------- next part -------------- An HTML attachment was scrubbed... URL: http://emu.freenetproject.org/pipermail/tech/attachments/20080415/fdcc5a77/attachment.htm From bombe at pterodactylus.net Tue Apr 22 21:18:29 2008 From: bombe at pterodactylus.net (David =?utf-8?q?=E2=80=98Bombe=E2=80=99_Roden?=) Date: Tue, 22 Apr 2008 23:18:29 +0200 Subject: [Tech] Fcp woes ..final In-Reply-To: <200803202219.50012.toad@amphibian.dyndns.org> References: <47E122D9.9010507@arcor.de> <47E2B572.304@web.de> <200803202219.50012.toad@amphibian.dyndns.org> Message-ID: <200804222318.29392.bombe@pterodactylus.net> On Thursday 20 March 2008 23:19:44 Matthew Toseland wrote: > If it's java-specific it's useless, you can always steal the code you need > from Fred (as jSite does). Maybe I'm wrong (I doubt it, though :) but jSite doesn't contain a single line taken from Fred... ;) Bombe -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part. Url : http://emu.freenetproject.org/pipermail/tech/attachments/20080422/708fc337/attachment.pgp From bombe at pterodactylus.net Tue Apr 22 21:25:11 2008 From: bombe at pterodactylus.net (David =?utf-8?q?=E2=80=98Bombe=E2=80=99_Roden?=) Date: Tue, 22 Apr 2008 23:25:11 +0200 Subject: [Tech] Fcp woes ..final In-Reply-To: References: <47E122D9.9010507@arcor.de> Message-ID: <200804222325.11280.bombe@pterodactylus.net> On Monday 24 March 2008 14:22:10 Jano wrote: > > Just dropping by to tell, that I finally resign on writing a highlevel > > Python wrapper > > for the protocol. There is still too many show stoppers in there to > > write it nice and > > clean. > Ummm... I'm sorry to hear that. I didn't follow your efforts; the thing is > that I have an Ada high level binding and I didn't find any stopping > problems. Same here with my high-level client written in Java. FCP might not be the best of protocols (indeed I never liked it very much) but as far as message parsing is concerned it's very simple. Bombe -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part. Url : http://emu.freenetproject.org/pipermail/tech/attachments/20080422/ae1d8b03/attachment.pgp From m.rogers at cs.ucl.ac.uk Tue Apr 29 21:57:26 2008 From: m.rogers at cs.ucl.ac.uk (Michael Rogers) Date: Tue, 29 Apr 2008 22:57:26 +0100 Subject: [Tech] Re datastore simulations In-Reply-To: <200804292238.53981.toad@amphibian.dyndns.org> References: <200804292238.53981.toad@amphibian.dyndns.org> Message-ID: <481799C6.6070906@cs.ucl.ac.uk> Matthew Toseland wrote: > sdiz is planning to implement an unindexed (random replacement, salted) > datastore post 0.7.0. Is there any chance you could do some simulations of > this? It's not quite random replacement, it's an approximation to random > replacement... I'd be happier if we had some experimental data showing it > doesn't cause catastrophe. Sure, if I can get some CPU time I'd be happy to. Do you have any more information about the difference between sdiz's scheme and random replacement? The simulations I've done so far have used a random salt for each node, so two keys that collide on one node probably won't collide on the next node. Another problem is coming up with a realistic traffic model, which is something I keep running up against with simulations (not just Freenet, PhD stuff as well). Cheers, Michael From m.rogers at cs.ucl.ac.uk Wed Apr 30 12:19:23 2008 From: m.rogers at cs.ucl.ac.uk (Michael Rogers) Date: 30 Apr 2008 13:19:23 +0100 Subject: [Tech] Re datastore simulations In-Reply-To: <200804301219.11319.toad@amphibian.dyndns.org> References: <200804292238.53981.toad@amphibian.dyndns.org> <200804292358.14338.toad@amphibian.dyndns.org> <200804301219.11319.toad@amphibian.dyndns.org> Message-ID: On Apr 30 2008, Matthew Toseland wrote: > Keys to block number. Block numbers to keys is handled by the on disk > structure. So we can actually pick a random block number to dump - but at > the cost of having to keep a key index. Cool, I see what you mean now - I'll simulate that too. > I'm surprised that hashing works so well, it has some big disadvantages > e.g. once the datastore is say half full, half of all new incoming keys > will overwrite old data rather than being added to the end. So we end up > storing less data: it takes a much longer time for the datastore to fill > up. Hmm, good point. On the other hand filling the store (or 99% filling it) would typically only take a few days, so maybe it's more important to optimise the steady state behaviour than the startup behaviour? > What is the approximate ratio of store filling rates for the same size > store on LRU versus on a direct hashing implementation? Can you simulate > this? So far I've been allowing the simulations to reach a steady state before making any measurements, but it shouldn't be a problem to simulate it. > IMHO most of it will be filesharing, just as a massive chunk of the total > internet bandwidth is filesharing. OK, I'll simulate filesharing two popularity distributions, uniform and Zipf. Each file will contain a lognormally distributed number of blocks, and the downloader will randomly choose 2/3 of them to request. I won't bother with splitfile healing, inserts, churn, congestion, swapping, phase of the moon, etc. > SSK polling for messages obviously > will also be huge, right now we have 2.5 SSKs for every CHK (but SSKs are > ~ 10x than CHKs). That should reduce a bit in future with some new > measures such as RecentlyFailed ... but it will increase as FMS is more > widely adopted... So no idea really... I do know that if we spend all our > bandwidth on SSK polling, filesharing will not work well. :| Also, SSKs > are kept in a separate store from CHKs, this is not likely to change. I'll stick to simulating CHKs for the moment - RecentlyFailed and ULPRs will affect the way SSKs are cached, but I don't have time to dig into the code to find out how they work (and into Frost and FMS to find out what kind of traffic patterns they produce). Cheers, Michael From m.rogers at cs.ucl.ac.uk Wed Apr 30 08:34:10 2008 From: m.rogers at cs.ucl.ac.uk (Michael Rogers) Date: 30 Apr 2008 09:34:10 +0100 Subject: [Tech] Re datastore simulations In-Reply-To: <200804292358.14338.toad@amphibian.dyndns.org> References: <200804292238.53981.toad@amphibian.dyndns.org> <481799C6.6070906@cs.ucl.ac.uk> <200804292358.14338.toad@amphibian.dyndns.org> Message-ID: On Apr 29 2008, Matthew Toseland wrote: >Oh, so you didn't actually simulate true random replacement with an index? Sorry, I don't understand - what would the index contain if you were doing random replacement? > The big question is whether it is safe to have an implementation that > doesn't support rekeying. If we have to periodically rekey then an > indirect implementation will be necessary, which works on the same > principles but is much more complex. I guess the question is what we're trying to protect against by rekeying. If it's an attacker pushing a single block out of our cache by inserting other blocks and then requesting the target block with HTL=1 to see whether it was pushed out, I don't think we should spend any effort trying to prevent the attack. There are a hundred worse things an attacker could do with a similar amount of effort. >I thought there were standard models? I assume they all suck? There are standard models for things like phone and web traffic, but the question is how will people use Freenet - how much of the traffic will be file sharing, messaging, web browsing, etc? Cheers, Michael From toad at amphibian.dyndns.org Tue Apr 29 21:38:53 2008 From: toad at amphibian.dyndns.org (Matthew Toseland) Date: Tue, 29 Apr 2008 22:38:53 +0100 Subject: [Tech] Re datastore simulations Message-ID: <200804292238.53981.toad@amphibian.dyndns.org> sdiz is planning to implement an unindexed (random replacement, salted) datastore post 0.7.0. Is there any chance you could do some simulations of this? It's not quite random replacement, it's an approximation to random replacement... I'd be happier if we had some experimental data showing it doesn't cause catastrophe. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://emu.freenetproject.org/pipermail/tech/attachments/20080429/a71d88de/attachment.pgp From toad at amphibian.dyndns.org Tue Apr 29 22:58:09 2008 From: toad at amphibian.dyndns.org (Matthew Toseland) Date: Tue, 29 Apr 2008 23:58:09 +0100 Subject: [Tech] Re datastore simulations In-Reply-To: <481799C6.6070906@cs.ucl.ac.uk> References: <200804292238.53981.toad@amphibian.dyndns.org> <481799C6.6070906@cs.ucl.ac.uk> Message-ID: <200804292358.14338.toad@amphibian.dyndns.org> On Tuesday 29 April 2008 22:57, you wrote: > Matthew Toseland wrote: > > sdiz is planning to implement an unindexed (random replacement, salted) > > datastore post 0.7.0. Is there any chance you could do some simulations of > > this? It's not quite random replacement, it's an approximation to random > > replacement... I'd be happier if we had some experimental data showing it > > doesn't cause catastrophe. > > Sure, if I can get some CPU time I'd be happy to. Do you have any more > information about the difference between sdiz's scheme and random > replacement? The simulations I've done so far have used a random salt > for each node, so two keys that collide on one node probably won't > collide on the next node. Oh, so you didn't actually simulate true random replacement with an index? That would be interesting for comparison. Yes what you describe is exactly what sdiz is proposing - a direct implementation using a single table. The big question is whether it is safe to have an implementation that doesn't support rekeying. If we have to periodically rekey then an indirect implementation will be necessary, which works on the same principles but is much more complex. > > Another problem is coming up with a realistic traffic model, which is > something I keep running up against with simulations (not just Freenet, > PhD stuff as well). :| I thought there were standard models? I assume they all suck? > > Cheers, > Michael -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://emu.freenetproject.org/pipermail/tech/attachments/20080429/078bbb58/attachment.pgp From toad at amphibian.dyndns.org Wed Apr 30 11:19:05 2008 From: toad at amphibian.dyndns.org (Matthew Toseland) Date: Wed, 30 Apr 2008 12:19:05 +0100 Subject: [Tech] Re datastore simulations In-Reply-To: References: <200804292238.53981.toad@amphibian.dyndns.org> <200804292358.14338.toad@amphibian.dyndns.org> Message-ID: <200804301219.11319.toad@amphibian.dyndns.org> On Wednesday 30 April 2008 09:34, you wrote: > On Apr 29 2008, Matthew Toseland wrote: > >Oh, so you didn't actually simulate true random replacement with an index? > > Sorry, I don't understand - what would the index contain if you were doing > random replacement? Keys to block number. Block numbers to keys is handled by the on disk structure. So we can actually pick a random block number to dump - but at the cost of having to keep a key index. > > > The big question is whether it is safe to have an implementation that > > doesn't support rekeying. If we have to periodically rekey then an > > indirect implementation will be necessary, which works on the same > > principles but is much more complex. > > I guess the question is what we're trying to protect against by rekeying. > If it's an attacker pushing a single block out of our cache by inserting > other blocks and then requesting the target block with HTL=1 to see whether > it was pushed out, I don't think we should spend any effort trying to > prevent the attack. There are a hundred worse things an attacker could do > with a similar amount of effort. True, I suppose. I'm surprised that hashing works so well, it has some big disadvantages e.g. once the datastore is say half full, half of all new incoming keys will overwrite old data rather than being added to the end. So we end up storing less data: it takes a much longer time for the datastore to fill up. What is the approximate ratio of store filling rates for the same size store on LRU versus on a direct hashing implementation? Can you simulate this? This is another advantage of an indirect design: if we can separate the hashtable from the storage, we can make it bigger than the storage, and achieve store filling rates close to that with LRU. > > >I thought there were standard models? I assume they all suck? > > There are standard models for things like phone and web traffic, but the > question is how will people use Freenet - how much of the traffic will be > file sharing, messaging, web browsing, etc? IMHO most of it will be filesharing, just as a massive chunk of the total internet bandwidth is filesharing. SSK polling for messages obviously will also be huge, right now we have 2.5 SSKs for every CHK (but SSKs are ~ 10x than CHKs). That should reduce a bit in future with some new measures such as RecentlyFailed ... but it will increase as FMS is more widely adopted... So no idea really... I do know that if we spend all our bandwidth on SSK polling, filesharing will not work well. :| Also, SSKs are kept in a separate store from CHKs, this is not likely to change. > > Cheers, > Michael -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://emu.freenetproject.org/pipermail/tech/attachments/20080430/2e37cb4e/attachment.pgp From toad at amphibian.dyndns.org Wed Apr 30 14:45:43 2008 From: toad at amphibian.dyndns.org (Matthew Toseland) Date: Wed, 30 Apr 2008 15:45:43 +0100 Subject: [Tech] Re datastore simulations In-Reply-To: References: <200804292238.53981.toad@amphibian.dyndns.org> <200804301219.11319.toad@amphibian.dyndns.org> Message-ID: <200804301545.44244.toad@amphibian.dyndns.org> On Wednesday 30 April 2008 13:19, Michael Rogers wrote: > On Apr 30 2008, Matthew Toseland wrote: > > Keys to block number. Block numbers to keys is handled by the on disk > > structure. So we can actually pick a random block number to dump - but at > > the cost of having to keep a key index. > > Cool, I see what you mean now - I'll simulate that too. > > > I'm surprised that hashing works so well, it has some big disadvantages > > e.g. once the datastore is say half full, half of all new incoming keys > > will overwrite old data rather than being added to the end. So we end up > > storing less data: it takes a much longer time for the datastore to fill > > up. > > Hmm, good point. On the other hand filling the store (or 99% filling it) > would typically only take a few days, so maybe it's more important to > optimise the steady state behaviour than the startup behaviour? Depends on how big it is. > > > What is the approximate ratio of store filling rates for the same size > > store on LRU versus on a direct hashing implementation? Can you simulate > > this? > > So far I've been allowing the simulations to reach a steady state before > making any measurements, but it shouldn't be a problem to simulate it. Ok. > > > IMHO most of it will be filesharing, just as a massive chunk of the total > > internet bandwidth is filesharing. > > OK, I'll simulate filesharing two popularity distributions, uniform and > Zipf. Each file will contain a lognormally distributed number of blocks, > and the downloader will randomly choose 2/3 of them to request. I won't > bother with splitfile healing, inserts, churn, congestion, swapping, phase > of the moon, etc. > > > SSK polling for messages obviously > > will also be huge, right now we have 2.5 SSKs for every CHK (but SSKs are > > ~ 10x than CHKs). That should reduce a bit in future with some new > > measures such as RecentlyFailed ... but it will increase as FMS is more > > widely adopted... So no idea really... I do know that if we spend all our > > bandwidth on SSK polling, filesharing will not work well. :| Also, SSKs > > are kept in a separate store from CHKs, this is not likely to change. > > I'll stick to simulating CHKs for the moment - RecentlyFailed and ULPRs > will affect the way SSKs are cached, but I don't have time to dig into the > code to find out how they work (and into Frost and FMS to find out what > kind of traffic patterns they produce). Sensible imho. > > Cheers, > Michael -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://emu.freenetproject.org/pipermail/tech/attachments/20080430/62c149a0/attachment.pgp