World Community Grid Wiki
Advertisement

Tasks canceled by system and when[]

At times the servers or technician will determine that a batch or quorum of tasks (work units) is redundant or bad. In those cases the system is instructed to prepare an action message for the client. When the client contacts the servers, at least once per 3 days, the message is transmitted and the relevant task(s) will be aborted:

  • when not yet started, and
  • even when started if it is known to crash at any rate.

knreed posted several articles to identify these conditions that will terminate a job when using BOINC version 5.10 and up:

[Sep 19, 2007 2:17:19 PM]

In regards to some of the questions above about how BOINC will abort off late workunits. This works with two options turned on:

1) The server sends a message to the clients asking the client to check in no later then three days after its previous communication with the server. There are many reasons why this will happen more frequently then this, but in the worst case three days is the upper limit.

2) If a workunit is cancelled, deleted on the server (after being assimilated and moved into the queue to be sent back to the scientists), or the information about the workunit is no longer in the database, then the server will send the abort message to the client.

These two things work together to handle the case where someone goes on vacation and shuts off their computer for a week. As soon as they turn their computer back on, condition #1 engages and the client talks to the server. The server will then tell the client to abort a number of the results and the client will then start working on new work.

Slightly older, the below elaborates further on what was in place for client version 5.8 and is implemented in release 5.10:

[Jun 21, 2007 7:46:08 PM]

The rules that exist now are:

1) If the workunit has been assimilated and the related files deleted on the server, had an error or been cancelled, then the server will tell the client to cancel the in-progress result the next time the client contacts the server.

The 5.10 client should be handle this next case (and I need to test this feature on the server before unleashing it on :you since the 5.8 client had a bug which would crash the core client when the abort-if-unstarted was sent - as a result the 5.10 client listens for a abort-if-not-started message):

2) Once the workunit has been assimilated on the server (i.e. result(s) have been accepted and any future results returned will not contribute to the science) tell the client to abort any results for that workunit that haven't started next time the client contacts the server.

At some future point BOINC will add a preference to the client to treat 'abort if not started' messages as 'abort' messages from the servers.

Rule #1 makes sure that you are not crunching work that you can neither earn credit for and that does not contribute to the science.

Rule #2 helps reduce the number of times that you would be crunching work that does not contribute to the science.

Once the preference is added to a future BOINC client, then rule #2 will work with that preference to ensure that the work you are doing will contribute to the science.

In many cases the cause of these are due to someone shutting off their computer and taking a vacation.

I can tell you that there is significant effort going into scheduling and doing what is possible to maximize the use of the volunteers computers. A large part of the BOINC 5.10 client is improvements in the scheduling algorithm for the client. A simulator was created and numerous test cases were run through the simulator.

On the server side version 5.09 facilitated these efficiency improvements.

Related topics[]


To return to the Frequently Asked Questions index choose link below or top left margin!
Advertisement