mirror of
https://github.com/saltstack/salt.git
synced 2025-04-14 08:40:20 +00:00
Port #50078 to master
This commit is contained in:
parent
716698ec8e
commit
d453ea94e9
1 changed files with 57 additions and 0 deletions
57
rfcs/0003-job-retry.md
Normal file
57
rfcs/0003-job-retry.md
Normal file
|
@ -0,0 +1,57 @@
|
|||
- Feature Name: Job retry
|
||||
- Start Date: 2018-10-16
|
||||
- RFC PR: https://github.com/saltstack/salt/compare/develop...cachedout:job_retry
|
||||
- Salt Issue: (leave this empty)
|
||||
|
||||
# Summary
|
||||
[summary]: #summary
|
||||
|
||||
This feature enables Salt to detect when a minion did not run a job and when the minion comes back online,
|
||||
the job is published to that minion.
|
||||
|
||||
# Motivation
|
||||
[motivation]: #motivation
|
||||
|
||||
For a long time, people have complained that Salt has no way to send a job to a minion that was not available
|
||||
to receive it at the time that it was published. Doing so would allow Salt administrators to publish a job and
|
||||
not worry about whether or not a minion that was down at the time did not receive the job.
|
||||
|
||||
# Design
|
||||
[design]: #detailed-design
|
||||
|
||||
At a high level, the way that this feature operates is relatively simple and only minor extensions to the core
|
||||
of salt itself are necessary.
|
||||
|
||||
All operations will be gated behind a configuration option called `job_retry` which will default to being
|
||||
disabled.
|
||||
|
||||
We propose a `job_retry` engine be created which can run on the master. The purpose of this engine is to detect
|
||||
minion "start" events and then to check a master cache for any jobs which might be enqueued and, if found,
|
||||
pull those pending jobs from the queue and publish them to the minion.
|
||||
|
||||
Jobs are enqueued on the master using the masterapi cache subsystem. There shall be an individual cache for each
|
||||
minion which is found not to respond. In this cache will be a list of JIDs which the master believes were not
|
||||
sent.
|
||||
|
||||
Jobs to be retried are inserted into the cache by the LocalClient when it deterermines that a minion which it expected
|
||||
to return did not.
|
||||
|
||||
Please see https://github.com/saltstack/salt/compare/develop...cachedout:job_retry for a POC.
|
||||
|
||||
## Alternatives
|
||||
[alternatives]: #alternatives
|
||||
|
||||
No alternatives considered but ideas welcome.
|
||||
|
||||
## Unresolved questions
|
||||
[unresolved]: #unresolved-questions
|
||||
|
||||
* May not work with anything that interacts with the master but bypasses the LocalClient.
|
||||
|
||||
* Should minions also check to see if they have run the job? Probably.
|
||||
|
||||
# Drawbacks
|
||||
[drawbacks]: #drawbacks
|
||||
|
||||
The main drawback is the risk of running a job twice. Additionally, failure to clear a cache
|
||||
may cause a job to be run in a loop which would be pretty bad. ;-/
|
Loading…
Add table
Reference in a new issue