[11:13:13]  <halfline> ajax: we really do need a way for a server to tell
   the kernel when it's doing work on behalf of a particular client
   [11:13:31]  <halfline> that would solve the whole dbus auditing problem
   steve grubb has too
   [11:13:54]  <halfline> maybe something to talk about at plumbers...
   [11:14:49]  <ajax> solaris had an X extension for this, apparently?
   [11:14:50]  <ajax> https://bugs.freedesktop.org/show_bug.cgi?id=2192
   [11:15:30]  <ajax> i forget most of the details, i think that one was
   mostly about bumping client priority to reduce frontend latency, not about
   throttling stupid shit
   [11:16:22]  <halfline> hmm too hard to grok that in passing
   [11:17:19]  ***  pjones has left chat #fedora-kernel (Changing host).
   [11:17:19]  ***  pjones (~pjones@fedora/pjones) has joined chat
   #fedora-kernel.
   [11:18:08]  <halfline> basically i think we need something like prctl
   (PR_SET_CLIENT, client_fd)
   [11:18:57]  <halfline> and from that point on the kernel knows the process
   is doing work on behalf of the process on the otherside of the fd
   [11:19:08]  <ajax> i think that's too much overhead for X, but for audit
   sure.
   [11:19:24]  <halfline> oh i see
   [11:19:27]  <ajax> i really can't afford two more syscalls per request
   [11:19:36]  <halfline> well we could do it in a smarter way
   [11:19:40]  <ajax> well, okay, per soft-ctxsw, but sure.
   [11:19:54]  <halfline> fcntl
   [11:20:48]  <ajax> maybe.  if you make it an fcntl then obvious you enter
   a client context once you do read()
   [11:20:56]  <ajax> but when do you exit that client's context?
   [11:21:06]  <ajax> write?  next read/select?
   [11:21:12]  <kylem> Optimizing Unix Resource Scheduling for User
   Interaction
   [11:21:12]  <kylem> Steve Evans, Kevin Clarke, Dave Singleton, Bart
   Smaalders
   [11:21:13]  <kylem> SunSoft Inc.
   [11:21:16]  <kylem> that's a blast from the past. :)
   [11:21:47]  <halfline> ajax: hmm
   [11:22:04]  <ajax> yeah, there's been plenty of work on this in the past.
   remember when reading it that HZ used to be 100.
   [11:22:15]  <ajax> which is _way_ too slow for graphics
   [11:22:52]  <kylem> yeah, i can't think of many ways to do this that
   aren't a completely horrid hack.
   [11:23:04]  <kylem> i'll read the paper afterl unch.
   [11:23:18]  <ajax> i think rcvbuf does actually get me a lot of the way to
   where i want to be
   [11:24:15]  <ajax> i don't know what my typical readq is for a greedy
   client, but i can find that out, that's just numerology
   [11:28:27]  <halfline> so we could have a magix futex associated with each
   client
   [11:28:40]  <halfline> when you hold it, the kernel knows you're serving
   that client
   [11:29:40]  <halfline> hmm but the kernel doesn't get involved when it's
   uncontended i guess reading hte man page
   [11:30:47]  <halfline> i guess the point is, you can't add any new calls
   at all
   [11:30:55]  <halfline> it has to be implicitly figured out from what you
   already do
   [11:30:57]  <kylem> i wonder if we could do something awesome with fuse.
   [11:31:02]  <halfline> since what you already do is performance critical
   [11:31:53]  <halfline> unless it was something really fast, like writing
   to a special address in memory?
   [11:33:07]  <kylem> hrm.
   [11:33:17]  <kylem> we could do the prctl thing with a vdso. that would be
   relatively fast.
   [11:33:51]  <halfline> oh like the gettimeofday hack?
   [11:33:58]  <kylem> yeh.
   [11:34:03]  <kylem> and getpid.
   [11:34:12]  <kylem> (Wait, did we ever put getpid in there/)
   [11:34:28]  <ajax> pretty sure we did
   [11:34:36]  <halfline> i thought that only worked for readng from the
   kernel not writing to the kernel?
   [11:34:49]  <kylem> back in a bit.
   [11:35:00]  <kylem> halfline, we can't write to kernel space, but we can
   put somethign somewhere the kernel can easily get.
   [11:35:01]  <ajax> vdso is readonly right now, yeah
   [11:35:28]  <ajax> a writeable vdso segment isn't _that_ much logically
   different from a futex
   [11:35:46]  <ajax> it's just a bunch of predefined futexes..
   [11:37:10]  <halfline> okay i don't know much about them
   [11:40:03]  <halfline> i think we just need some writeable mapped memory,
   a single integer where we write to the kernel "i'm handling this client
   now" and afterwards "i'm done"
   [11:40:43]  <ajax> that doesn't help scheduling
   [11:41:03]  <ajax> accounting, sure, because that will read the ctx value
   out when it needs it
   [11:41:29]  <halfline> but once the kernel has the accounting information,
   it can perform scheduling tweaks for you
   [11:41:36]  <ajax> but the scheduler assumes ctx transitions happen at
   scheduling, you'd have to tell it more explicitly
   [11:41:41]  <ajax> the write itself won't trigger anything
   [11:41:44]  <ajax> unless it's a pagefault
   [11:41:58]  <halfline> and page faults are very expensive
   [11:43:24]  <halfline> anyway, something to gnaw on.
   [11:44:13]  <halfline> maybe the answer is what you said originally, force
   all apps to only deal with one fd per iteration of poll
   [11:44:37]  <halfline> and mark that one fd ahead as one to use for
   accounting
   [11:44:49]  <halfline> accounting ends on next poll
   [11:45:05]  <ajax> maybe.
   [11:45:18]  <ajax> but i mean, that's sort of secondary?
   [11:45:30]  <halfline> or maybe accounting ends on next poll or on next
   read of some other fd in the fdset
   [11:45:35]  <ajax> the problem that attempts to solve is the scheduler
   making bad decisions
   [11:46:03]  <ajax> and i don't think it's making bad decisions.  there
   appears to be more work to do so it's doing it.
   [11:46:34]  <halfline> well the issue is, the kernel can only make
   decisions based on the available information
   [11:46:38]  <ajax> if i want to influence that i should make it look like
   there's nothing to do.
   [11:46:50]  <halfline> and for a server, one important peice of
   information is which clients its serving and when
   [11:46:54]  <halfline> but the kernel doesn't have that information
   [11:46:58]  ***  adamw has left chat #fedora-kernel (Quit: Coyote finally
   caught me).
   [11:47:16]  <ajax> that's why i'm saying try shrinking the recieve buffer
   [11:47:28]  <halfline> ajax: your recvbuf thing will probably be "good
   enough" for your specific issue
   [11:47:33]  <ajax> client write()s, it blocks because the buffer is full.
   [11:47:35]  <halfline> and that's fine
   [11:48:00]  <ajax> i suspect fixing that will actually make it so
   accounting tricks aren't even needed though
   [11:48:06]  <halfline> was just hoping to kill some other birds with a new
   stone
   [11:48:27]  <halfline> well i'd like to fix the sgrubb auditing problem
   too
   [11:49:03]  <halfline> but i guess you probably don't want an audit entry
   for every draw operation in the x server so...
   [11:49:10]  <halfline> maybe i'm shoehorning where i shouldn't be
   [11:49:11]  <ajax> yeah.  i think those end up being different enough
   problems that you don't want to conflate them.
   [11:49:19]  <ajax> good thought and all, but.
   [11:51:50]  ***  adamw (~adamw@redhat/adamw) has joined chat
   #fedora-kernel.
   [11:52:53]  <halfline> part of the issue is, the accounting has to be very
   fast to be something X could make use of for improved scheduling, which
   probably means it has to be deduced implicitly
   [11:53:12]  <halfline> but the accounting has to be very trustworthy and
   accurate for it to be something audit could make use of
   [11:53:22]  <halfline> which probably means it can't be deduced implicitly
   [11:53:34]  <ajax> yeah.  make it work, make it good, then make it fast.
   [11:54:44]  <ajax> prctl seems entirely reasonable for audit's needs and
   if i ever think i need it for X i can probably make it work
   [11:54:59]  <ajax> like, right now we don't do any estimation of request
   cost
   [11:55:21]  <ajax> which is lame.  i've got a ton of information about
   that.
   [11:55:36]  <ajax> and i try to drain multiple reqs per read(), so.
   [11:56:07]  <ajax> i could amortize the prctl across multiple reqs and
   only fire it if i think the next timeslice is going to be expensive
   [11:57:14]  <halfline> yea maybe something like that would work