[RFC] 2.5.X semaphore starvation fix^H^H^Hwork-around

Mike Galbraith (efault@gmx.de)
Wed, 07 May 2003 13:00:04 +0200


This is a MIME-formatted message. If you see this text it means that your
E-mail software does not support MIME-formatted messages.

--=_courier-13033-1052305016-0001-2
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Greetings Folks,

As some of you know, I've been muttering about the behavior of the new
scheduler, and discovered that the priority boost tasks receive via
sleeping on a contended lock is my main problem. This can lead to a whole
string of tasks which have been promoted above _other_ lock holders,
effectively starving them for an indeterminate amount of time with some
workloads. (make -j30 bzImage in ext3 fs is wonderful example, but the
problem appears to be generic, and is visible here in ext2 as well). With
ext3 on my up/preempt box, this leads to nearly zero concurrency for a make
-j30 bzImage.

The method I decided to try to defeat this problem is to take advantage of
those times when the scheduler would normally just return to the same task
(which the programmer asked to be rescheduled), and at this time, slip in a
lower priority task if possible to give a lower priority lock holder a
chance to release it's lock. For instance, once you have _one_ priority 17
task left, and it finished it's timeslice, instead of returning to that
task (or higher), select the stalest task from either the active array or
the expired array instead. It won't run for long, but _might_ have time to
release the lock. If you try the attached in ext3, you'll see that it's
pretty effective. It's still subject to a long queue at one priority, but
I figured I'd post this for now, and see what kind of comments/flames I get
back before doing any more experiments 8)

I also made sure that no task will keep the cpu without at least _trying_
to switch for a maximum of 50ms. This is a variant of Ingo's timeslice
split method, but doesn't introduce (isn't supposed to anyway;) any
unneeded context switches for friendly tasks.

Comments/flames welcome. (test reports as well)

-Mike
--=_courier-13033-1052305016-0001-2
Content-Type: application/octet-stream; name="twiddle.diff"; x-mac-type=42494E41; x-mac-creator=5843454C
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="twiddle.diff"

LS0tIGxpbnV4LTIuNS42OS52aXJnaW4va2VybmVsL3NjaGVkLmMJVHVlIE1heSAgNiAxMTozNjo1
MyAyMDAzCisrKyBsaW51eC0yLjUuNjlYL2tlcm5lbC9zY2hlZC5jCVdlZCBNYXkgIDcgMTE6Mjk6
MjIgMjAwMwpAQCAtNzQsNiArNzQsOCBAQAogI2RlZmluZSBNQVhfU0xFRVBfQVZHCQkoMTAqSFop
CiAjZGVmaW5lIFNUQVJWQVRJT05fTElNSVQJKDEwKkhaKQogI2RlZmluZSBOT0RFX1RIUkVTSE9M
RAkJMTI1CisjZGVmaW5lIE1JTl9SRVFVRVVFX1RJTUUJTUFYX1RJTUVTTElDRQorI2RlZmluZSBU
SU1FU0xJQ0VfR1JBTlVMQVJJVFkgICAoSFovMjAgPzogMSkKIAogLyoKICAqIElmIGEgdGFzayBp
cyAnaW50ZXJhY3RpdmUnIHRoZW4gd2UgcmVpbnNlcnQgaXQgaW4gdGhlIGFjdGl2ZQpAQCAtMTE2
Niw5ICsxMTY4LDkgQEAKICAqIGluY3JlYXNpbmcgbnVtYmVyIG9mIHJ1bm5pbmcgdGFza3M6CiAg
Ki8KICNkZWZpbmUgRVhQSVJFRF9TVEFSVklORyhycSkgXAotCQkoU1RBUlZBVElPTl9MSU1JVCAm
JiAoKHJxKS0+ZXhwaXJlZF90aW1lc3RhbXAgJiYgXAotCQkoamlmZmllcyAtIChycSktPmV4cGly
ZWRfdGltZXN0YW1wID49IFwKLQkJCVNUQVJWQVRJT05fTElNSVQgKiAoKHJxKS0+bnJfcnVubmlu
ZykgKyAxKSkpCisJCShTVEFSVkFUSU9OX0xJTUlUICYmIChycSktPmV4cGlyZWRfdGltZXN0YW1w
ICYmIFwKKwkJdGltZV9hZnRlcl9lcShqaWZmaWVzLCAocnEpLT5leHBpcmVkX3RpbWVzdGFtcCAr
IFwKKwkJKFNUQVJWQVRJT05fTElNSVQgKiAoKHJxKS0+bnJfcnVubmluZykgKyAxKSkpCiAKIC8q
CiAgKiBUaGlzIGZ1bmN0aW9uIGdldHMgY2FsbGVkIGJ5IHRoZSB0aW1lciBjb2RlLCB3aXRoIEha
IGZyZXF1ZW5jeS4KQEAgLTEyNDgsMTIgKzEyNTAsOTEgQEAKIAkJCWVucXVldWVfdGFzayhwLCBy
cS0+ZXhwaXJlZCk7CiAJCX0gZWxzZQogCQkJZW5xdWV1ZV90YXNrKHAsIHJxLT5hY3RpdmUpOwor
CQlnb3RvIG91dDsKKwl9CisJaWYgKHRpbWVfYWZ0ZXJfZXEoamlmZmllcywgcC0+bGFzdF9ydW4g
KyBUSU1FU0xJQ0VfR1JBTlVMQVJJVFkpKSB7CisJCWRlcXVldWVfdGFzayhwLCBycS0+YWN0aXZl
KTsKKwkJc2V0X3Rza19uZWVkX3Jlc2NoZWQocCk7CisJCXAtPnByaW8gPSBlZmZlY3RpdmVfcHJp
byhwKTsKKwkJZW5xdWV1ZV90YXNrKHAsIHJxLT5hY3RpdmUpOwogCX0KIG91dDoKIAlzcGluX3Vu
bG9jaygmcnEtPmxvY2spOwogCXJlYmFsYW5jZV90aWNrKHJxLCAwKTsKIH0KIAorLyoKKyAqIFNl
YXJjaCBhbiBhcnJheSBmb3IgdGhlIG9sZGVzdCBydW5uYWJsZSB0YXNrLiAgaWYgc2VhcmNoaW5n
IHRoZQorICogZXhwaXJlZCBhcnJheSwgcmVxdWV1ZSBpdCB0byB0aGUgaGVhZCBvZiBpdCdzIGNv
cnJlc3BvbmRpbmcgYWN0aXZlCisgKiBxdWV1ZSBhbmQgdXBkYXRlIHJxLT5leHBpcmVkX3RpbWVz
dGFtcC4KKyAqCisgKiBSZXR1cm4gdGhlIGxvY2F0aW9uIG9mIHRoZSB0YXNrLCBvciBNQVhfUFJJ
TyBpZiBub3RoaW5nIGZvdW5kLgorICovCitzdGF0aWMgaW5saW5lIGludCBzZWxlY3Rfb2xkZXN0
KHJ1bnF1ZXVlX3QgKnJxLCBwcmlvX2FycmF5X3QgKmFycmF5KQoreworCWludCBpZHggPSAwLCBv
bGRfaWR4ID0gMDsKKwlzdHJ1Y3QgbGlzdF9oZWFkICpoZWFkLCAqY3VycjsKKwl0YXNrX3QgKm5l
eHQsICpvbGQgPSBOVUxMLCAqc2tpcCA9IE5VTEw7CisJaW50IGV4cCA9IGFycmF5ID09IHJxLT5l
eHBpcmVkOworCituZXh0X3F1ZXVlOgorCWlmICghaWR4KQorCQlpZHggPSBzY2hlZF9maW5kX2Zp
cnN0X2JpdChhcnJheS0+Yml0bWFwKTsKKwllbHNlCisJCWlkeCA9IGZpbmRfbmV4dF9iaXQoYXJy
YXktPmJpdG1hcCwgTUFYX1BSSU8sIGlkeCk7CisJaWYgKGlkeCA+PSBNQVhfUFJJTykKKwkJZ290
byBvdXQ7CisKKwloZWFkID0gYXJyYXktPnF1ZXVlICsgaWR4OworCWN1cnIgPSBoZWFkLT5uZXh0
OworbmV4dF90YXNrOgorCW5leHQgPSBsaXN0X2VudHJ5KGN1cnIsIHRhc2tfdCwgcnVuX2xpc3Qp
OworCWN1cnIgPSBjdXJyLT5uZXh0OworCisJLyogRmluZCBhIHJ1bm5hYmxlIGNhbmRpZGF0ZS4g
Ki8KKwlpZiAobmV4dCA9PSBycS0+Y3VyciB8fCBuZXh0LT5zdGF0ZSA+IFRBU0tfSU5URVJSVVBU
SUJMRSB8fAorCQkJdGVzdF90aV90aHJlYWRfZmxhZyhuZXh0LT50aHJlYWRfaW5mbywgVElGX05F
RURfUkVTQ0hFRCkgfHwKKwkJCShuZXh0LT5zdGF0ZSA9PSBUQVNLX0lOVEVSUlVQVElCTEUgJiYg
IXNpZ25hbF9wZW5kaW5nKG5leHQpKSB8fAorCQkJdGltZV9iZWZvcmUoamlmZmllcywgbmV4dC0+
bGFzdF9ydW4gKyBNSU5fUkVRVUVVRV9USU1FKSkgeworCQlpZiAoZXhwICYmICghc2tpcCB8fCB0
aW1lX2FmdGVyKG5leHQtPmxhc3RfcnVuLCBza2lwLT5sYXN0X3J1bikpKQorCQkJc2tpcCA9IG5l
eHQ7CisJCWlmIChjdXJyICE9IGhlYWQpCisJCQlnb3RvIG5leHRfdGFzazsKKwkJaWR4Kys7CisJ
CWdvdG8gbmV4dF9xdWV1ZTsKKwl9CisJLyogR29vZCwgd2UgZm91bmQgYSBjYW5kaWRhdGUuIEV2
YWx1YXRlIGl0Li4gKi8KKwlpZiAoIW9sZCB8fCB0aW1lX2FmdGVyKG5leHQtPmxhc3RfcnVuLCBv
bGQtPmxhc3RfcnVuKSkgeworCQlvbGQgPSBuZXh0OworCQlvbGRfaWR4ID0gaWR4OworCX0KKwkv
KiBhbmQgcmVjb3JkIGEgc2tpcCBpZiBwcmVzZW50LiAqLworCWlmIChleHAgJiYgY3VyciAhPSBo
ZWFkKSB7CisJCW5leHQgPSBsaXN0X2VudHJ5KGN1cnIsIHRhc2tfdCwgcnVuX2xpc3QpOworCQlp
ZiAoIXNraXAgfHwgdGltZV9hZnRlcl9lcShuZXh0LT5sYXN0X3J1biwgc2tpcC0+bGFzdF9ydW4p
KQorCQkJc2tpcCA9IG5leHQ7CisJfQorCWlkeCsrOworCWdvdG8gbmV4dF9xdWV1ZTsKK291dDoK
KwlpZiAob2xkKSB7CisJCW9sZC0+c3RhdGUgPSBUQVNLX1JVTk5JTkc7CisJCWlmIChleHApIHsK
KwkJCWRlcXVldWVfdGFzayhvbGQsIGFycmF5KTsKKwkJCWlmICghYXJyYXktPm5yX2FjdGl2ZSkK
KwkJCQlycS0+ZXhwaXJlZF90aW1lc3RhbXAgPSAwOworCQkJZWxzZQorCQkJCXJxLT5leHBpcmVk
X3RpbWVzdGFtcCA9IHNraXAtPmxhc3RfcnVuOworCQkJbGlzdF9hZGQoJm9sZC0+cnVuX2xpc3Qs
IHJxLT5hY3RpdmUtPnF1ZXVlICsgb2xkLT5wcmlvKTsKKwkJCV9fc2V0X2JpdChvbGQtPnByaW8s
IHJxLT5hY3RpdmUtPmJpdG1hcCk7CisJCQlycS0+YWN0aXZlLT5ucl9hY3RpdmUrKzsKKwkJCW9s
ZC0+YXJyYXkgPSBycS0+YWN0aXZlOworCQl9CisJCXJldHVybiBvbGRfaWR4OworCX0KKwlyZXR1
cm4gaWR4OworfQorCiB2b2lkIHNjaGVkdWxpbmdfZnVuY3Rpb25zX3N0YXJ0X2hlcmUodm9pZCkg
eyB9CiAKIC8qCkBAIC0xMjY1LDcgKzEzNDYsNyBAQAogCXJ1bnF1ZXVlX3QgKnJxOwogCXByaW9f
YXJyYXlfdCAqYXJyYXk7CiAJc3RydWN0IGxpc3RfaGVhZCAqcXVldWU7Ci0JaW50IGlkeDsKKwlp
bnQgaWR4ID0gMCwgcmV0cnkgPSAwOwogCiAJLyoKIAkgKiBUZXN0IGlmIHdlIGFyZSBhdG9taWMu
ICBTaW5jZSBkb19leGl0KCkgbmVlZHMgdG8gY2FsbCBpbnRvCkBAIC0xMjkwLDggKzEzNzEsOCBA
QAogCXNwaW5fbG9ja19pcnEoJnJxLT5sb2NrKTsKIAogCS8qCi0JICogaWYgZW50ZXJpbmcgb2Zm
IG9mIGEga2VybmVsIHByZWVtcHRpb24gZ28gc3RyYWlnaHQKLQkgKiB0byBwaWNraW5nIHRoZSBu
ZXh0IHRhc2suCisJICogaWYgZW50ZXJpbmcgb2ZmIG9mIGEga2VybmVsIHByZWVtcHRpb24gb3Ig
cmVzY2hlZHVsZSwKKwkgKiBnbyBzdHJhaWdodCB0byBwaWNraW5nIHRoZSBuZXh0IHRhc2suCiAJ
ICovCiAJaWYgKHVubGlrZWx5KHByZWVtcHRfY291bnQoKSAmIFBSRUVNUFRfQUNUSVZFKSkKIAkJ
Z290byBwaWNrX25leHRfdGFzazsKQEAgLTEzMzAsNyArMTQxMSwyMyBAQAogCQlycS0+ZXhwaXJl
ZF90aW1lc3RhbXAgPSAwOwogCX0KIAotCWlkeCA9IHNjaGVkX2ZpbmRfZmlyc3RfYml0KGFycmF5
LT5iaXRtYXApOworCWlmICghaWR4KQorCQlpZHggPSBzY2hlZF9maW5kX2ZpcnN0X2JpdChhcnJh
eS0+Yml0bWFwKTsKKwllbHNlIGlmICh1bmxpa2VseShyZXRyeSkpIHsKKwkJcHJpb19hcnJheV90
ICpleHBpcmVkID0gcnEtPmV4cGlyZWQ7CisJCS8qIElmIHdlIGRpZG4ndCBzd2l0Y2gsIHRyeSB0
aGUgb2xkZXN0IGFjdGl2ZSB0YXNrLiAqLworCQlpZHggPSBzZWxlY3Rfb2xkZXN0KHJxLCBhcnJh
eSk7CisJCS8qIElmIHdlIHN0aWxsIHdvbid0IHN3aXRjaCwgdHJ5IHRoZSBvbGRlc3QgZXhwaXJl
ZCB0YXNrLiAqLworCQlpZiAoaWR4ID49IE1BWF9QUklPIHx8IGV4cGlyZWQtPm5yX2FjdGl2ZSA+
IGFycmF5LT5ucl9hY3RpdmUpIHsKKwkJCWludCB0bXAgPSBzZWxlY3Rfb2xkZXN0KHJxLCBleHBp
cmVkKTsKKwkJCWlmICh0bXAgPCBNQVhfUFJJTykKKwkJCQlpZHggPSB0bXA7CisJCX0KKwkJLyog
SWYgd2Ugc2ltcGx5IF9jYW4ndF8gc3dpdGNoLCBwdW50IGJhY2sgdG8gaGlnaGVzdC4gKi8KKwkJ
aWYgKGlkeCA+PSBNQVhfUFJJTykKKwkJCWlkeCA9IHNjaGVkX2ZpbmRfZmlyc3RfYml0KGFycmF5
LT5iaXRtYXApOworCX0KKwogCXF1ZXVlID0gYXJyYXktPnF1ZXVlICsgaWR4OwogCW5leHQgPSBs
aXN0X2VudHJ5KHF1ZXVlLT5uZXh0LCB0YXNrX3QsIHJ1bl9saXN0KTsKIApAQCAtMTM0MiwxOSAr
MTQzOSwyNyBAQAogCWlmIChsaWtlbHkocHJldiAhPSBuZXh0KSkgewogCQlycS0+bnJfc3dpdGNo
ZXMrKzsKIAkJcnEtPmN1cnIgPSBuZXh0OworCQluZXh0LT5sYXN0X3J1biA9IGppZmZpZXM7CiAK
IAkJcHJlcGFyZV9hcmNoX3N3aXRjaChycSwgbmV4dCk7CiAJCXByZXYgPSBjb250ZXh0X3N3aXRj
aChycSwgcHJldiwgbmV4dCk7CiAJCWJhcnJpZXIoKTsKIAogCQlmaW5pc2hfdGFza19zd2l0Y2go
cHJldik7Ci0JfSBlbHNlCisJfSBlbHNlIHsKKwkJaWYgKCFyZXRyeSsrKSB7CisJCQlpZHggPSBN
QVhfUFJJTzsKKwkJCWdvdG8gcGlja19uZXh0X3Rhc2s7CisJCX0KIAkJc3Bpbl91bmxvY2tfaXJx
KCZycS0+bG9jayk7CisJfQogCiAJcmVhY3F1aXJlX2tlcm5lbF9sb2NrKGN1cnJlbnQpOwogCXBy
ZWVtcHRfZW5hYmxlX25vX3Jlc2NoZWQoKTsKLQlpZiAodGVzdF90aHJlYWRfZmxhZyhUSUZfTkVF
RF9SRVNDSEVEKSkKKwlpZiAodGVzdF90aHJlYWRfZmxhZyhUSUZfTkVFRF9SRVNDSEVEKSkgewor
CQlpZHggPSAwOwogCQlnb3RvIG5lZWRfcmVzY2hlZDsKKwl9CiB9CiAKICNpZmRlZiBDT05GSUdf
UFJFRU1QVAo=
--=_courier-13033-1052305016-0001-2--