Jump to content
drgullen

Server Crash vs. Server Restart

Recommended Posts

There are a lot of posts on here complaining about persistence being broken.  In fact, it is working as designed.  I know you're probably getting ready to flame me right now for saying that, but hear me out.  I'm not saying it is a good design, in fact, it is an awful design, but it technically does work correctly.  I think part of the problem here is that people don't realize how often the servers are crashing.  It's the server crashes that are the real problem -- they are the culprits for the loss of persistence, not the persistence system itself.

Let me show you what I mean.  I used my own server to place a tent on Skalisty Island.  I then issued the command taskkill /F /IM DayZServer_x64.exe which simulates a server crash, since the /F switch forces the exe to close immediately.  The result was this:

dayz-servercrash-1.png

followed by...

dayz-servercrash-2.png

THIS IS A KEY POINT HERE.  I can't tell you how many streamers I've watched playing DayZ that think this was a scheduled restart.  Every time you get that "No message received..." red text error message in the middle of the screen, that is a server crash and not a restart and at that moment, it means the server is now in a "persistence loss" state because if the server comes back up without the binary files (which are now corrupted) in the storage folder(s) being replaced with some backup files, the result is this:

dayz-servercrash-3.png

I logged back in and sure enough, the tent is gone.  Why?  Because the binary files were corrupted, so the system loads a fresh Chernarus from the XML files and creates a new set of storage persistence files.

Now, let's try it again, this time waiting for a scheduled server restart.  On my server, it restarts every 2 hours, so I place another tent in approximately the same location...

dayz-restart-1.png

...and then when the server restarts, I get this message:

dayz-restart-2.png

THIS IS A KEY POINT AS WELL.  This message leads you to believe that THIS was a server crash, when in fact, it was not.  It was a scheduled restart of my server.  I wait for a minute for the server to come back up, click Play to log back in and...

dayz-restart-3.png

...the tent is still there because the restart shut down the exe gracefully, meaning no binary file corruption occurred (NOTE: my server is in real time, so the sun has set a bit which is why the image looks a bit different).

The point I am trying to make here is that this is not going to be a simple fix by Bohemia to resolve this problem.  It would be a major redesign of how persistence is handled.  The binary file system would have to be scrapped altogether for something else.  The question is will Bohemia actually do that to fix this problem now that the game is officially released?

Regardless of whether they do or not, all you server owners out there be aware that you absolutely need to be creating backup files on a regular basis and restoring the latest backup prior to restarting if you get a server crash otherwise you will definitely have persistence loss on your server.  If your GSP doesn't allow you to do that, consider moving your server to someone who allows game servers on a virtual server, which will give you full access to the server to do as you please.  On my server, I have a batch file taking backups every hour, so worst-case for me is, we would have to roll back 59 minutes of persistence if the server happened to crash in the 59th minute of the hour just prior to the next scheduled backup being taken.

So, is persistence broken?  Technically, no, it isn't.  It does what it's supposed to do -- save the state of the world on the server and in the result of corruption, load a new world instead.  So "when are you going to fix persistence?" is not the question we should be asking.  The real question we should be asking is...why is the server crashing so often?  Figuring out the server crashes will go a long way towards helping the stability of persistence.

  • Thanks 1
  • Beans 2

Share this post


Link to post
Share on other sites
1 hour ago, Lysus said:

Is it the storage_1 file that I'm backing up?  Thanks!

It's a folder that contains many files, but yes, that's the one.

  • Thanks 1

Share this post


Link to post
Share on other sites
13 hours ago, drgullen said:

So, is persistence broken?  Technically, no, it isn't.  It does what it's supposed to do -- save the state of the world on the server and in the result of corruption, load a new world instead.  So "when are you going to fix persistence?" is not the question we should be asking.  The real question we should be asking is...why is the server crashing so often?  Figuring out the server crashes will go a long way towards helping the stability of persistence.

I like to disagree with this, I do think persistence is broken. We're now talking about a server crash vs scheduled restart. What about a manual restart or taking the server down for a (mod) update or whatever? I don't know how long you have been playing DayZ but I have been playing it for quite a while. Server crashes are nothing new in DayZ's development. In the past, all persistent items would still be there after a server crash. Conclusion: persistence is in fact broken.

There has to come a fix for this, you can't expect the servers to run stable and never crash, that's impossible. It can crash because of multiple reasons, it might be the server files, it might be the OS of the server or even the hardware. Persistence backups used to be a way to restore corrupted persistence which occurred in some rare occasions. Back then, creating a back-up once or twice a day would suffice. If you needed to restore persistence, some hours would be lost but it occurred so rarely that it wasn't really an issue.

This is something the developers have to fix, not the GSP's or server owners. Creating regular backups is a workaround, what we need is a complete fix. It worked as intended in the past so why shouldn't it work right now or in the future? I really hope they fix this as soon as possible. This is currently, in my opinion, the most pressing issue. I have no incentive to play on official servers because once you're geared (which doesn't take long either), there is nothing left to do. On top of that, I haven't even tried base building on an extended scale, only messed around with it a bit.

Edited by IMT
  • Like 1

Share this post


Link to post
Share on other sites

as items disappear even when servers are running, i would say persistence is indeed broken.

but i agree also completely to the notion, that actually, we should have the ability to choose how we serialize this data out, instead of some undocumented binary blob files

obviously this could go into any type of database storage

and writing those files every few seconds is not really ideal either for SSDs and creates unneccessary io, which is just another source of problems. of course, it works nice on a ramdisk. but why write to disk, if you use ramdisks.

 

30 minutes ago, IMT said:

you can't expect the servers to run stable and never crash, that's impossible.

tend to agree. there is simply nothing which can't crash. even perfectly programmed system with overly perfect hardware could crash with the next solar flare or a slight power fluctuation. not saying focus should not be reducing crashes. but if your server crashes, and your data has been safely streamed into a database, you won't care as much, as if it basicly makes things disappear at will.

Share this post


Link to post
Share on other sites
9 hours ago, IMT said:

I like to disagree with this, I do think persistence is broken. We're now talking about a server crash vs scheduled restart. What about a manual restart or taking the server down for a (mod) update or whatever? I don't know how long you have been playing DayZ but I have been playing it for quite a while. Server crashes are nothing new in DayZ's development. In the past, all persistent items would still be there after a server crash. Conclusion: persistence is in fact broken.

I think these are two separate issues personally.  I have seen the posts where people have claimed that things like tents have vanished before their eyes.  I have not witnessed anything like that personally, but if that is indeed happening, that is an issue for sure, but it is slightly different than the point of this post -- my point is in regards to the binary files and the reasons that they might get corrupted.  Items disappearing in-game is a different issue.

You've mentioned a manual restart here -- it would be the same thing in that case -- if you're allowing the exe to shut down gracefully, there should be no issue.  If you're right-clicking on the process and selecting End Task or using the /F switch in taskkill to stop the server immediately as I've demonstrated here, you're going to have persistence loss.

9 hours ago, IMT said:

There has to come a fix for this, you can't expect the servers to run stable and never crash, that's impossible. It can crash because of multiple reasons, it might be the server files, it might be the OS of the server or even the hardware. Persistence backups used to be a way to restore corrupted persistence which occurred in some rare occasions. Back then, creating a back-up once or twice a day would suffice. If you needed to restore persistence, some hours would be lost but it occurred so rarely that it wasn't really an issue.

I've worked for several software companies in my career and to me, this game server data is no different than financial data for a bank or astronomical data for NASA.  As you point out here, servers can crash for many reasons, both software and hardware related.  This is why I made the post in the first place -- at the end of the day, IT IS up to the server owners to protect the data.  Every company I've worked for, our software had disclaimers relieving us of the responsibility of data loss -- Bohemia is no different.

If there are items in the world vanishing when they shouldn't, yes, Bohemia needs to fix that.  If the DayZ servers are crashing for no apparent reason, yes, Bohemia needs to fix that as well.  Even with all that fixed though, you are one Windows Server blue screen of death away from pissing off a bunch of your players from losing hours of work building a base.  So regardless of what happened in the past, hourly backups at a minimum are the way to go.  Backing up once or twice a day isn't enough, IMO.

  • Like 1
  • Beans 1

Share this post


Link to post
Share on other sites

Many server owners and (myself included) have reported persistence wipes after a regular and scheduled restarts. And yes, they were normal restarts.

Share this post


Link to post
Share on other sites
On 1/7/2019 at 1:30 PM, drgullen said:

THIS IS A KEY POINT HERE.  I can't tell you how many streamers I've watched playing DayZ that think this was a scheduled restart.  Every time you get that "No message received..." red text error message in the middle of the screen, that is a server crash and not a restart and at that moment

Large swaths of gameservers.com community-public servers reset at the same time, every day.  I can expect gameserver machines to feed me this red text every single time at 11:10 and 23:10 (my time) on a regular basis.  Usually, if I'm on one of their servers near this time I'll get out of any car I may be driving because if I don't, I'll have to run a good 300 meters in the direction I was traveling previously, to find my car.  It's pretty consistent.

Share this post


Link to post
Share on other sites
1 hour ago, Parazight said:

Large swaths of gameservers.com community-public servers reset at the same time, every day.  I can expect gameserver machines to feed me this red text every single time at 11:10 and 23:10 (my time) on a regular basis.  Usually, if I'm on one of their servers near this time I'll get out of any car I may be driving because if I don't, I'll have to run a good 300 meters in the direction I was traveling previously, to find my car.  It's pretty consistent.

The question is though, how are the GSPs restarting the server because as I said in my first post, my experience at least is that the red message = server crash whereas the "you have been kicked" message is a graceful restart, which makes sense if you think about it -- part of a graceful restart would be sending a disconnection command to all connected clients whereas a crash is the equivalent of pulling the rug out from underneath you.

Also, there are multiple binary files and not all get corrupted with every crash I don't believe -- only the ones that were being written to at the time the server went down -- so the fact you can find your car doesn't mean it wasn't a crash.

In the case of gameservers.com, do you know for 100% certainty that the 11:10 and 23:10 restarts are graceful restarts?

  • Beans 1

Share this post


Link to post
Share on other sites

I have no idea if they are graceful or not.  Typically, users rejoin and then continue doing what they were doing without too much bother.  I should assume that there's something I'm probably missing and purposefully avoided speculating as to why servers were crashing in my last post here.

Share this post


Link to post
Share on other sites
28 minutes ago, Parazight said:

I have no idea if they are graceful or not.  Typically, users rejoin and then continue doing what they were doing without too much bother.  I should assume that there's something I'm probably missing and purposefully avoided speculating as to why servers were crashing in my last post here.

Yeah, see this is an important point that we are discussing right now.  On my server, I issue a regular "taskkill /IM DayZServer_x64.exe" command to begin the restart process.  My batch file then goes into a 3-second "are you finished yet?" loop of checking for the continued existence of that exe every 3 seconds because, in my experience at least, it takes the server between 10 and 15 seconds to gracefully exit.  During that time, I believe it is sending disconnection commands to the client and writing the final updates to the binary files and players database before closing down.  So, I loop and wait until the exe is gone before I copy the storage folder as my current persistence backup and then start her up again.  When I do it this way, I do not get any red "No connection" messages if I'm a connected player -- I get booted back to the logo screen with the "You were kicked off the game" message.

I will admit that I'm still learning all this and I may be wrong, but I contend that if you are seeing the red "No connection" message on a regular/scheduled basis (i.e. 11:10 and 23:10), then that tells me Gameservers.com are not shutting down the server gracefully and therefore, creating the possibility of persistence issues.

  • Like 1

Share this post


Link to post
Share on other sites

Well, an update.

So, I logged onto a community public server near a base I'd been working on at 11am (my time) and, as expected, saw the red text at 11:10a.  I lost 'connection' to the server.  I was the only one on the server. Came back after the server was back up.  Only the cars survived.  So,  I should assume that these scheduled restarts are not always shutting down gracefully and/or something else is at play.  

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now

×