[Tech] Distributed file system using routing inspired by Freenet
Matthew Toseland
toad at amphibian.dyndns.org
Wed Apr 9 21:52:52 UTC 2008
On Wednesday 09 April 2008 16:27, Daniel Cheng wrote:
> On Wed, Apr 9, 2008 at 10:29 PM, Matthew Toseland
> <toad at amphibian.dyndns.org> wrote:
> >
> > On Wednesday 09 April 2008 05:28, Daniel Cheng wrote:
> > > 2008/4/8 Matthew Toseland <toad at amphibian.dyndns.org>:
> > > > On Tuesday 08 April 2008 12:36, Matthew Toseland wrote:
> > > > > On Tuesday 08 April 2008 00:36, Ian Clarke wrote:
> > > > > > http://video.google.com/videoplay?docid=-2372664863607209585
> > > > > >
> > > > > > He mentions Freenet's use of this technique about 10-15 minutes
in,
> > > > > > they also use erasure codes, so it seems they are using a few
> > > > > > techniques that we also use (unclear about whether we were the
direct
> > > > > > source of this inspiration).
> > > > >
> > > > > They use 500% redundancy in their RS codes. Right now we use 150%
> > (including
> > > > > the original 100%). Maybe we should increase this? A slight
increase to
> > say
> > > > > 200% or 250% may give significantly better performance, despite
the
> > > > increased
> > > > > overhead...
> > > >
> > > > In fact, I think I can justify a figure of 200% (the original plus
100%,
> > so
> > > > 128 -> 255 blocks to fit within the 8 bit fast encoding limit). On
> > average,
> > > > in the long term, a block will be stored on 3 nodes. Obviously a lot
of
> > > > popular data will be stored on more nodes than 3, but in terms of
the
> > > > datastore, this is the approximate figure. On an average node with a
1GB
> > > > datastore, the 512MB cache has a lifetime of less than a day; stuff
lasts
> > a
> > > > lot longer in the store, and on average data is stored on 3 nodes
(by
> > > > design).
> > >
> > > I think the downloader would "heal" a broken file by re-inserting the
> > > missing FEC blocks, right?
> > >
> > > If that is the case, I think we can use 300% (or higher) redundancy,
> > > but only insert a random portion of them. When a downloader download
> > > this file, he insert (some other) random blocks of FEC for this file.
> > > Under this scheme, the inserter don't have to pay for a high bandwidth
> > > overhead cost, while increasing the redundancy.
> >
> > I'm not worried about inserters paying a high bandwidth cost actually.
Right
> > now inserts are a lot faster than requests. What I'm worried about is if
we
> > have too much redundancy, our overhead in terms of data storage will be
> > rather high, and that reduces the amount of data that is fetchable.
>
> Disk are getting cheaper and cheaper...
Yes, but the flipside is people want to share bigger and bigger files.
> Also, high data redundancy means we can drop any blocks of them without
> problem, right?
Not necessarily. We have data redundancy precisely because blocks get dropped
for various reasons e.g. because a node goes offline.
>
> The only potential problem I have in mind is the LRU drop policy on store
full.
> All blocks of an unpopular item may drops around the same time if we use
> this policy..
>
> I think if the redundancy is high enough, we should use:
> - Random drop old data on store full.
Hmmm. I dunno... simulations would be interesting.
> - LRU drop on Cache full.
> which should give a good balance of data retention and load balancing
>
> Regards,
> Daniel Cheng
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://emu.freenetproject.org/pipermail/tech/attachments/20080409/7917e224/attachment.pgp
More information about the Tech
mailing list