Welcome Board Archive - The recent latency issues
Jump to navigation
Jump to search
( 5) From: Rufus Title: The recent latency issues
Posted On: Friday, June 08 2018, 07:45PM
---------------------------------------------------------------------------
The recent 'lag' problems
I'd rather not call them 'lag' ... the proper term is 'chunking.'
It's actually the game stopping processing input/output while it
works on something else.
Legend runs on a pulse-based system where it does every main loop
of the game every 250ms (4 times/second). When I added in the 'save
people often' code, I determined this would be viable because
we spent, on average, 216ms in 'sleep'. That means the mud was simply
waiting for a set amount of time to pace (for those in the know, we
don't use 'sleep()' because of its somewhat unpredictable nature, we
have a more reliable algorithm).
When we added that code, our character list update -- which, just so
you know, runs through 12,000+ players/mobs, 2500+ act updates, etc
every quarter of a second -- was taking up no more than about 30ms
on average. We had plenty of time and disk is fast now. Even your
friendly local druid with a ton of herbs saves in a matter of a few
miliseconds even though their pfiles are huge.
When someone would be removed from the game for being link-dead, it
takes a bigger hit, but even before this recent spat of chunking,
it was rarely noticeable, taking < 45ms, and usually far, far less.
People started complaining about the lag on typing 'save'. I optimized
some of that code but it was something else that had previously not
really been a problem that suddenly became a problem, but there was
an obvious potential slowdown of little value that I removed.
The chunking during the update cycle though is perplexing.
We did change the compiler, but we'd not changed any code in the
update or save paths in a while. One of the potential fixes (that
didn't work) was reverting the compiler.
I added a number of options to our internal profiler to help track
this down. Thank you for your patience with the reboots!
There are many places where we could do some optimization but the
amount of work and amount of risk is substantial. Compare major
alterations to this part of the code akin to open heart surgery.
Personally, I run a virtual machine whose specs mirror, pretty closely,
the VPS we have the mud on. Even ramping up the pulse updates to 20/
second (50ms vs 250ms), the average cycle through the character list
is 13ms. On the main mud it's 108ms. The peak on my testmud is 85,
on the main mud right now, it's in excess of 6s.
Everything I've looked into points to environmental issues. We're
going to explore down that road to see if we can't resolve this first
as resolving this in code is a 4-6 month (full-time) project.
Thanks for sticking with us. Sorry about the issues!