00:37nyef: Hrm. Well, the AIGLX thing has *some* effect.
00:38nyef: If I move the nouveau_dri.so file that it uses out of the way before starting X, the difference is very obvious in the logs.
00:38nyef: And this also implies that I have a tesla-specific lockup scenario.
01:36imirkin: nyef: moving it out of the way means you don't get GL at all
01:36imirkin: well, hw-accelerated GL
01:36imirkin: it's unfortunately needed for both direct and indirect operation
01:36imirkin: due to largely various historical silliness
01:51nyef: Problem is, it's the *wrong version*.
01:51nyef: It's the system copy, not the development copy.
01:53nyef: Hrm. Do I still get issues if gxine can't find that .so either?
01:57nyef: ... Yeah, I still get issues under such circumstances.
01:59nyef: I can drag a Mate Terminal window around with no problems, but if I try to drag gxine around, I get a lockup within seconds.
02:01nyef: Does mean that it's not gallium at fault, though, and that it's an nv50 issue, not an nvc0 issue.
02:10nyef: Hrm. Going to have to try and find a Fermi card and a system to wrap around it. Joy.
03:45rhyskidd: karolherbst: sure that nv140 is from nouveau? it still has the errors with BIT table 'P' i'd seen before
03:46karolherbst: rhyskidd: you are right, it isn't
03:46rhyskidd: oob pointers
03:47rhyskidd: have seen something similar with the vbios of a Tesla V100 off techpowerup, which is almost certainly a "windows"-format dumped vbios
03:47rhyskidd: (i know it's not windows format per se, but has some other headers around it)
03:47rhyskidd: not nouveau derived
05:48orbea: oh cool, one of the recent xorg-server commits seems to fixed the DRI3 + modestting issue for me
05:48orbea: well, at least it doesn't immediately blow up anymore
06:07karolherbst: orbea: what issue?
06:07orbea: karolherbst: would hang with startx, I think it had to do with compton, but I never got around to bisecting it
06:08orbea: actually, it was worse at first, used to hang glxgears (black window, no gears) too even without compton
06:09orbea: but that ws fixed maybe a few weeks ago?
06:10orbea: started with xserver 1.20.0
15:06pendingchaos: imirkin: have you tested the phi patch with the traces you mentioned?
15:53imirkin: pendingchaos: no, but i will right now
15:56imirkin: pendingchaos: is the cycle estimate purely informational (and potentially to see how various opts do)?
15:58pendingchaos: you have another use in mind?
16:03nyef: Use them to drive optimization selection?
16:04imirkin: pendingchaos: well, looks like hearthstone works ok with your patch
16:04imirkin: and yeah, latency estimates are often used to drive instruction ordering within a BB
16:05nyef: Is xf86-video-nouveau multi-threaded or single-threaded?
16:05imirkin: (or rather, works as ok as without - there's still some fail, never tracked it down)
16:05imirkin: nyef: single
16:06nyef: Okay, good. I don't have to try to track down thread-interaction issues, at least. Thank you.
16:06imirkin: everything's serialized by the X server afaik
16:06imirkin: that code is as close to bug-free as it gets...
16:06imirkin: there are some known issues with acceleration of certain X primitives that no one ever uses, but that's a separate issue
16:07imirkin: (trapezoids and whatnot)
16:07nyef:points out that he's trying to figure out an X lockup triggered by moving around a gxine window that's not actually playing anything.
16:07imirkin: what makes you think that X locks up?
16:08imirkin: see if LIBGL_ALWAYS_SOFTWARE=1 gxine still locks things up
16:08nyef: It still happens even if I rename away all of the nouveau_dri.so files.
16:08imirkin: also, the vdpau stuff has seriously gone downhill lately
16:08imirkin: i think there's a kernel bug which makes it just not work
16:08imirkin: which gpu is this on?
16:09nyef: Happens on tesla, doesn't happen on kepler.
16:09imirkin: and on tesla, it's the dma_pusher thing?
16:09nyef: Not always. Managed to have it happen without any kernel messages once.
16:10imirkin: well the dma pusher thing is a kernel-level issue
16:10imirkin: we're not switching channels correctly ... or something
16:10imirkin: (if we knew what, it'd be fixed already)
16:10nyef: It's always pointed at the X server, never any other process.
16:11imirkin: you don't really know that
16:11imirkin: nouveau reports the process that opened the fd, not the process to which that fd was passed to over a domain socket
16:12imirkin: not sure if there's a way of retrieving that
16:12nyef: I'm trying to update a test machine so that I can try with fermi, but that's likely to take all day.
16:52karolherbst: imirkin: do you think this has to be ">=" instead of ==? https://github.com/mesa3d/mesa/blob/master/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c#L948
17:09nyef: That'd depend on if it's a "something new was introduced with this revision" thing or a "there's something quirky about this revision in particular" thing, surely?
17:12Armada: nyef, that subchannel is used here: https://github.com/mesa3d/mesa/blob/master/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c#L612
17:13Armada: It's likely it's a new feature in that revision and not something quirky, without changing the first check current behaviour results in a subchannel being used without an engine
17:14Armada: bound to the subchannel
17:14rhyskidd: karolherbst: perhaps skeggsb has a nouveau-derived vbios from the volta (GV100) he's been using for initial bringup?
17:15karolherbst: rhyskidd: maybe
17:16nyef: Armada: Thank you. That has just somewhat reduced my level of compound ignorance.
17:24karolherbst: I have a super silly fix for the last CTS fail and I kind of fear I don't get any fails now...
17:43karolherbst: ahhhh :(
17:45karolherbst: imirkin: this patch fixes the packed_depth_stencil.blit.depth32f_stencil8 test: https://github.com/karolherbst/mesa/commit/44222f90597089cf16b9fa3001b4a3445d3bada2
17:45karolherbst: does it make sense?
17:45karolherbst: it seems to not help any of those piglit fails
17:45karolherbst: doing a piglit run now to check for regressions
17:46karolherbst: but uhm, we kind of pass all CTS tests
17:46karolherbst: except those which sometimes fail
17:47karolherbst: Armada: with that changed I get "fifo: PBDMA0: 00000010 [HCE_ILLEGAL_CLASS] ch 3 00000000 0000a0b5"
17:48karolherbst: on pascal
17:55Armada: hmm, a0b5 doesn't exist on the switch either, but b0b5 does
17:57Armada: in that case, the nvc0_transfer check should be changed until more recent copy engines are supported
18:00Armada: karolherbst, could you try adding another check for greater or equal to NVF0_P2MF_CLASS that sets 0xb0b5 to the subchannel?
18:11karolherbst: Armada: after the piglit run, yes
18:41nyef: MCP89, no nouveau_dri.so files to be found. LIBGL_ALWAYS_SOFTWARE=1 gxine... lockup.
18:42nyef: fifo: DMA_PUSHER - ch 3 [X] get 0000034b54 put 0000034b78 ib_get 00000273 ib_put 0000029a state 80000040 (err: INVALID_CMD) push 00400040
18:46nyef: What next, I start gxine again, then use lsof to see if it has any suspicious-looking fds open?
18:47nyef: Or re-enable nouveau_dri.so and try to replicate the issue using modesetting instead of xf86-video-nouveau?
18:49karolherbst: imirkin: hum, no regressions/fixes inside piglit with that path
18:52pendingchaos: karolherbst, imirkin: is this a reasonable assumption to make (even with CROSS edges): https://hastebin.com/dogecudabo.diff (in CFGIterator::search)?
18:52karolherbst: Armada: also illegal class on pascal
18:53Armada: I guess for now just change nvc0_transfer so it doesn't use that subchannel
18:53imirkin: karolherbst: didn't test piglit, just a handful of traces
18:53Armada: only use nve4 m2mf transfers on nve4
18:54karolherbst: imirkin: what are you refering to? I meant the patch I have for the CTS fail
18:55imirkin: karolherbst: oh. nevermind.
18:55karolherbst: I am very suspicious regarding that patch
18:55karolherbst: it looks too trivial
18:55imirkin: nyef: if you get errors without 3d accel, only xorg is performing accel
18:55imirkin: nyef: and fbcon, theoretically, but only if you switch to a console
18:56imirkin: karolherbst: that MIGHT make sense
18:56karolherbst: yeah, I know
18:57imirkin: depending on how rast is done
18:57imirkin: problem is, it'll only affect very large rects
18:57nyef: imirkin: So, does that mean "not a context switch issue"?
18:57imirkin: so you have to test it with maximally-sized surfaces
18:57nyef: I suppose it could mean "context setup isn't quite right"...
18:57karolherbst: imirkin: mhh
18:57imirkin: nyef: wellll ... unlikely to be a context switch issue. but the issue is the same as the other "random tesla fail" issues
18:58imirkin: i.e. DMA_PUSHER gets upset, things go downhill from there
18:58karolherbst: imirkin: well it affects the CTS tests which does 256x256 surfaces
18:58imirkin: karolherbst: right
18:58imirkin: but the reason those numbers are so large
18:58imirkin: is to deal with very large surfaces
18:58karolherbst: what is more suspicious is, that it doesn't affect any fail/pass inside piglit
18:59karolherbst: so I am wondering
19:00karolherbst: imirkin: ohhh I was wrong about piglit,
19:00karolherbst: the tests which fail do _look_ better
19:00karolherbst: the test still fails
19:00karolherbst: at least the window content looks better
19:01karolherbst: instead of those weirdly scaled textures, they are sized correctly
19:01imirkin: have a look at the commit logs for that code
19:01imirkin: it's iterated a few times
19:01imirkin: perhaps i left some comments a few of those times
19:02karolherbst: anyway, it seems like with this it should be easier to fix the piglit tests as well, because seems like one bug less inside the path. Let me check the commits
19:03karolherbst: imirkin: "nvc0: fix blit triangle size to fully cover FB's > 8192x8192" https://github.com/karolherbst/mesa/commit/a651bc027d5ed4150bb5240fc9f46a6ca569f665
19:04karolherbst: ohh wait
19:04karolherbst: that doesn't add the shift
19:05imirkin: yeah. like i said - there have been a few fixes in that area
19:05imirkin: and i definitely remember leaving it off in a state where some of those ext_framebuffer_multisample tests looked wrong
19:05imirkin: and differently wrong on nv50 and nvc0
19:05imirkin: which i wasn't too happy about
19:05karolherbst: uhh, that code is old
19:07karolherbst: ohhh I see
19:08imirkin: the idea of that code is that you draw a single triangle
19:08imirkin: which goes way outside the bounds of the fb
19:08imirkin: but the whole fb gets covered, in a single draw. rather than 2 for a quad.
19:09karolherbst: the thing is, do we have to upscale the triangle if the destination is a ms surface?
19:10karolherbst: or well
19:10karolherbst: we kind of do
19:11karolherbst: but we do that for the coordinates
19:11karolherbst: so x0/1 and y0/1 are increases
19:12karolherbst: but do we also have to increase the vertex
19:17imirkin: problem is whether the rast is multisampled or not
19:17imirkin: and how it deals with a MS fb
20:01pendingchaos: ping on the CFGIterator::search() question?
20:05imirkin: pendingchaos: sorry, not sure offhand, i'd have to read through a bunch more code
20:06imirkin: unfortunately the CFG is not quite the way it's supposed to be in practice
20:06imirkin: i've been too chicken about fixing it
20:06imirkin: the idea is that edges are categorized based on the MST + extra edges
20:06imirkin: but ... that's not how they're done in practice
20:13pendingchaos: not sure how that applies to a CFG though
20:15HdkR: imirkin: You know how many times I've told people to blit with a single triangle and it has blown their mind that they have never thought of doing that? :P
20:18imirkin: pendingchaos: all those edge types aren't just randomly named
20:18imirkin: i think i have a comment about it
20:18imirkin: but basically tree = part of the MST of the CFG
20:18imirkin: forward = jump to a descendent in the MST
20:18imirkin: back = jump to a parent
20:18imirkin: cross = other
20:19imirkin: this info is mostly used for layout of the code
20:20imirkin: i.e. you have a bunch of bb's, which all jump to one another -- how to lay them out in actual code space to avoid unnecessary jumps all over the place
20:20imirkin: but it's also used for some other matters, like critical edge detection
20:27pqatsi: Hello folks! I still fighting with F28 environment with my inspiron 7000. The issue now is I cant pass (in best case) the login screen, with a permanent freeze. With a live cd, I got the result of journalctl --boot=-1 with a chroot: https://pastebin.com/212PhD7F
20:27pqatsi: What can I do to have a at least usable system?
20:30nyef: Downgrade your video card?
20:32imirkin: pqatsi: boot with nouveau.modeset=0
20:55karolherbst: imirkin: inside codegen the edges are rather categeorized by the type of "jump" creating that edge though.
20:57karolherbst: if-then/if-else: Edge::TREE, endif: Edge::FORWARD, loops: Edge::TREE, loob-end: Edge::BACK, break: Edge::CROSS, continue: Edge::BACK. There is also that special case for RET, but... that doesn't really matter
20:58karolherbst: normally you categorize the edges based on the iteration path you choose through the CFG, but we don't do that
20:58karolherbst: so sometimes edges could be end up as different one, depending on which path you take
21:00karolherbst: I am sure that by accident all the tree edges could be equal to the MST, but I highly doubt that this is the always the case
21:00karolherbst: also, we create circles
21:00karolherbst: which by definition can't be a MST
21:04karolherbst: uhm, maybe we don't create circles, but at least we could end up with a non connected tree edges
21:12ReinUsesLisp: hello, what does post-fermi SEL instruction do?
21:18karolherbst: ReinUsesLisp: select value based on the result of the compare
21:18karolherbst: ReinUsesLisp: src0 compareOp 0 ? src1 : src2
21:19karolherbst: src2 compareOp 0 ? src0 : src1 actually
21:19imirkin: karolherbst: it's *supposed* to be based on the MST though
21:19karolherbst: so a SEL.LT a b c d writes either b or c into a depending on whether d is less then or not
21:20imirkin: karolherbst: that's not a thing
21:20imirkin: there's a SELP
21:20imirkin: and a FCMP/ICMP/etc
21:20karolherbst: imirkin: sure, but the if-then and the else-then block connect through a Tree::Forward edge to the successor
21:20imirkin: which is wrong.
21:20karolherbst: imirkin: CMP is SET, no?
21:20imirkin: which is what i was trying to explain. it should be based on the MST, but isn't.
21:21ReinUsesLisp: so `SEL R18, RZ, c[0x1][0x0], !P0` would translate to `R18 = !P0 ? RZ : c`
21:21imirkin: OP_SET -> *CMP. i assumed the question was about the SEL name in nvdisasm
21:21karolherbst: ReinUsesLisp: mhh, that is actually a SELP
21:21imirkin: ReinUsesLisp: c1, but yeah
21:21karolherbst: imirkin: ohh right, I might have got it wrong with SLCT vs SELP and their names in nvdisasm
21:21ReinUsesLisp: oh, yeah, mistyped
21:22karolherbst: but yeah, slct and selp are basically the same (in our nouveau terms)
21:22karolherbst: just selp already gets an boolean input
21:22imirkin: karolherbst: errrr, OP_SET -> *SET. OP_SLCT -> *CMP
21:22karolherbst: imirkin: k
21:22imirkin: SELP uses a predicate
21:22karolherbst: naming is always confusing
21:22imirkin: such is life.
21:23ReinUsesLisp: about SSY and SYNC calls, those should be handled by GLSL compiler, right?
21:23imirkin: they're added in by the compiler, yes
21:23imirkin: not by the glsl compiler, but by layers further down.
21:23imirkin: (glsl compiler's job is to parse the glsl and produce an IR to be consumed further on)
21:24ReinUsesLisp: yea, I was thinking from IR to GLSL terms
21:24ReinUsesLisp: ok, thanks!
21:25imirkin: may i ask why you're asking?
21:25ReinUsesLisp: Nintendo Switch emulation
21:25HdkR: imirkin: I hear it's the only reason why people care about Maxwell these days
21:25imirkin: yeah, you have to keep track of the most recent SSY, and convert the SYNC into a jump
21:26imirkin: this will happen with if/else for the most part
21:26imirkin: internally it allows the hw to know when all lanes are "SIMD" again
21:26ReinUsesLisp: oh, so it can be ignored
21:27imirkin: SYNC is a jump, so you can't exactly skip it
21:27HdkR: Don't forget loops that have unstructured control flow inside of it :P
21:27imirkin: if/else -> SSY; @P0 BRA else; if-code; SYNC; else-code; SYNC
21:28imirkin: the SSY has a pointer to after the else code
21:28imirkin: we call it "JOINAT" and "JOIN"
21:29imirkin: and right. there are various tructures with a for loop as well
21:29imirkin: i just wanted to make a simple example :)
21:29imirkin: but usually one would use a PBRK for that
21:29HdkR: Or a combination of the two
21:29imirkin: i.e. PBRK + BRK when exiting the loop
21:30imirkin: and if there's if/else inside the loop, then SSY + SYNC
21:30imirkin: but someone who knew what they were doing wouldn't necessarily be bound by such simplicity :)
21:30skeggsb: and fortunately gone in volta :P
21:30imirkin: moral of the story - don't let HdkR write games
21:30HdkR: er uh
21:30HdkR:hides stockpile of games
21:31HdkR: skeggsb: You get new fun toys in Volta though ;)
21:31imirkin: skeggsb: yeah, that's definitely nice.
21:32HdkR: <3 Volta's threading model
21:35karolherbst: oh no :( more CTS fails
21:36karolherbst: some robustness stuff
21:43karolherbst: imirkin: appernatly we fail some robust_buffer_access_behavior tests as well
21:57karolherbst: but maybe that's just prime related..
22:24karolherbst: mhh, can somebody check if GLX_ARB_create_context_robustness is exposed as a GLX extension inside glxinfo? (not server/client)
22:25karolherbst: but I am sure it doesn't get reported due to prime
22:26imirkin: for me it's reported on client but not server
22:26imirkin: dunno if we need to do something or not
22:26imirkin: might need newer xorg
22:26karolherbst: I guess so
22:26karolherbst: well, my X runs on intel
22:27karolherbst: let me check with a dedicated nouveau X
22:30karolherbst: X doesn't start on nouveau here, because the GPU doesn't report any displays
22:30karolherbst: well at least the modesetting ddx isn't happy
22:30karolherbst: nouveau ddx just crashes
22:34karolherbst: also my X is like 1.20 or so
22:34karolherbst: uhh 1.19.6 actually
22:34karolherbst: imirkin: well, it works perfectly with intel, just prime offloaded nouveau not
22:36imirkin: i have not investigated.