[Tech] LIFO simulations
Michael Rogers
m.rogers at cs.ucl.ac.uk
Tue Dec 12 23:14:12 UTC 2006
Jano wrote:
> Ummm. Each branch is counted as a success? All of them must succeed? Just
> the longer one? Is this nonsense?
Not nonsense, but the choice between the various options seems to be
fairly arbitrary. On the other hand there's only one way to define route
length for requests, so I guess we could add a hop counter to
ChkDataFound and SskDataFound.
> * I'm dropping messages at the tail when queues reach 50.000 messages queued
> (for search and transfer queues). I implemented this in the hope of getting
> rid of OOMs. I'm getting them anyway, so I've screwed something in the
> process.
Not necessarily - 500 peers * 2 queues * 50,000 messages could easily
eat a gig or two of memory.
> I could only simulate up to 30 with lifo queues and this change;
> see the graph. I don't think it's correct. Have we some idea on what is the
> theoretical maximum throughput for the simulated network, as currently
> defined?
It depends on how far the data's travelling. Ignoring inserts and slow
nodes for the moment, half the requests are for CHKs and half are for
SSKs, so the average size of a reply is about 17 KB. The total capacity
of the network is 1500 KB per second, and the maximum sustainable
throughput (for FIFO at least) seems to be about 8000 in replies in 2
hours = 11 replies per second. That would imply that replies are
travelling an average of 1500/(11*17) = 8 hops, but that's a *very*
rough estimate.
> * I'm counting just remote successes. If we are measuring the load balancing
> performance, I don't think the local hits are of any interest and could
> mask the remote ones.
Good point - maybe that's why our figures for backoff at high loads are
different. (I'm also at revision 11135 by the way.) It makes sense that
you'd see a low success rate but high throughput if requests were either
succeeding locally or not at all.
Unfortunately this suggests that simply counting the number of successes
(or even remote successes) isn't an adequate measure of throughput -
being able to retrieve the nearest tenth of the keyspace in one minute
isn't equivalent to being able to retrieve the entire keyspace in ten
minutes...
Any suggestions for a better metric?
> * I'm not computing failures anymore since messages dropped by far exceed
> successes. Half an hour of simulation would produce near 1GB of logs, given
> the amount of msgs dropped!
We should probably replace the logging statements with a static counter.
Bear in mind that a dropped message doesn't necessarily lead to a
failure - under some circumstances the upstream node can move on if it
gets a timeout, so a search can suffer several dropped messages and
still succeed.
Cheers,
Michael
More information about the Tech
mailing list