00:01karolherbst: and 0x400 is still not enough with even more iterations, but that's not surprising
00:01imirkin_: just always set to like 1MB
00:02imirkin_: i don't even know why they made that option configurable :p
00:02karolherbst: and what if things really really slow down?
00:02karolherbst: yeah.. I noticed it too late
00:03karolherbst: apperantly it helps to not do predicate bras inside loops... so maybe we should just never emit those if the loop is already too deep?
00:06RSpliet: karolherbst: as a rule that doesn't sound right. You can do predicated bra's inside a loop, as long as the threads sync up again before the next iteration
00:06karolherbst: RSpliet: right.. which just never happens if the other path is a break, no?
00:06RSpliet: it should.
00:07karolherbst: the other thread break out of the loop
00:07karolherbst: so they can never join on the next iteration
00:07karolherbst: only of the outer loop at best
00:07karolherbst: but in this case, it's even a return
00:07karolherbst: so they won't sync up on the outer loop either
00:07RSpliet: the unconditional break will set the "break" masks of all active threads. Consequently, no threads are active at the end of executing the brk. This will trigger a pop
00:10HdkR: Only 1 pop? :)
00:10RSpliet: That pop (of type "branch") takes the conditional branch target PC off the stack, along with the new mask to be loaded into the "branch" mask
00:11RSpliet: HdkR: only one pop required, there's no reconvergence point required because all threads that execute the unconditional brk will not have to be reactivated
00:11RSpliet: Hence no reconvergence point for the conditional branch is pushed either
00:13HdkR: Ah right. I was thinking of a different situation
00:16RSpliet: Anyway, a stack of 512B per warp should be sufficient for this program, so clearly somewhere there's a flaw in my mental model of the GPU control flow mechanism... or I'm overlooking sth in the program.
00:17karolherbst: RSpliet: it's not
00:17karolherbst: even 1k is not
00:17RSpliet: program should require 5 entries, it can store 8
00:17karolherbst: RSpliet: well.... you now what imirkin_ would say in such cases ;)
00:17RSpliet: Yeah, if stack entries are still 64b (which I presume they are), that's... odd
00:17karolherbst: I set steps to 50
00:17karolherbst: so 100 outer loops * 100 inner loop iterations
00:17karolherbst: : TRAPS
00:18karolherbst: steps set to 30 seems to be the sweet spot with the stack set to 0x400
00:19karolherbst: ohh wait
00:19karolherbst: so 31*31 iterations
00:19RSpliet: I don't know what imirkin_ would say in such cases. Probably something like shouldn't require running such complex programs in the first palce?
00:20karolherbst: no, between believing you or the hardware, he chooses the hardware ;)
00:20imirkin_: this is great. i can just keep quiet, and my snarky comments are implied
00:20karolherbst: imirkin_: :D
00:20imirkin_: i should build an AI bot
00:21imirkin_: that intersperses various references to despair.com demotivators
00:21imirkin_: at opportune moments
00:21RSpliet: But... yes, that's a privilege you earned for contributing lots of top-notch stuff to nouveau
00:21karolherbst: but that helps us
00:21karolherbst: we don't have the variable "how big is the on-chip stack" anymore
00:21karolherbst: now, why are 31*31 iterations "fine" with a 1kb off-chip stack
00:22karolherbst: and why is 32*32 not
00:22karolherbst: or mhhh
00:22karolherbst: I depend too much of the calculation...
00:24karolherbst: allthough, I can't say it does 32*32 or 31*31 iterations for real..
00:24karolherbst: there is this nasty if
00:26RSpliet: wait, yeah defo don't bet on me. 1KiB holds 128 entries. 128 != 33 or something. That must mean that on every iteration of the outer loop you leak one or two stack entries...?
00:26karolherbst: maybe that's it?
00:26karolherbst: could be something stupid like that
00:26RSpliet: Yeah, but I still don't see why
00:26karolherbst: some threads have to hit the iteration, because otherwise increasing the amount wouldn't change anything
00:27karolherbst: maybe you in 5 days would :p
02:08imirkin: ok, so KHR-GL45.tessellation_shader.single.xfb_captures_data_from_correct_stage is a mesa bug
02:09imirkin: i'm like 99% sure someone had a patch for fixing KHR-GL45.conditional_render_inverted.functional ... sigh
02:09imirkin: i probably didn't like something about it
02:10imirkin: tobijk was the one who did stuff with it...
02:11imirkin: aha found it. and yeah, it was broken. ok
04:17rhyskidd: anyone here know what issues nouveau has with witcher2?
04:21imirkin: iirc it worked
04:21imirkin: at least at one point
04:22imirkin: it was super-slow
04:23imirkin: looks like i have a trace here... let's try
04:24rhyskidd: yeh, like i get it will be slow
04:24rhyskidd: interested in seeing what novel engines are out there with decent user interest, so push nouveau
04:24rhyskidd: e.g. there's lots of unity and source2 ...
04:25rhyskidd: and those engines that push interesting pieces of opengl
04:55imirkin: karolherbst: https://github.com/imirkin/mesa/commits/cts2 -- i want to push all of these (except the NEEDS TEST one, unless it gets tested && works)
04:55imirkin: karolherbst: tagged for stable, so we get a nice 19.0 release
06:01imirkin: rhyskidd: heh, witcher2 is $3 on steam now
06:01rhyskidd: that's the reason that it re-caught my interest
06:02imirkin: ah :)
06:15imirkin: pendingchaos: did you have a series to implement GL_COMPUTE_SHADER_INVOCATIONS_ARB ?
06:16imirkin: i seem to remember you at least talking about it
06:49imirkin: karolherbst: add GL_RGBA2 to your "rgba4" change -- i need that for the updated KHR copy_image test to pass
07:11imirkin: hrmph. looks like i used the wrong master file for the cts run... doing that again =/
07:17imirkin: karolherbst: i want to push out the fp64 stuff while we're at it... got it in a branch somewhere?
07:18imirkin: should do it piece-meal, test it on every gen separately
07:37imirkin: i wonder if the simplest thing for counting indirect invocations is to get the macro processor to keep track of it in a scratch register
07:46imirkin: with the updated CTS mustpass file:
07:46imirkin: Passed: 7700/7781 (99.0%)
07:46imirkin: Failed: 22/7781 (0.3%)
07:47imirkin: i liked it a lot better when it was 5.
07:48imirkin: ok, down to 6 without fp64. 1 of those is bogus, 1 is real (but a mesa glsl-level fail), 3 are pipeline stats, and 1 is the dreaded KHR-GL45.packed_depth_stencil.blit.depth32f_stencil8
10:02karolherbst: imirkin: cts_v2 on my repo
10:04diogenes_: Hello awesome nouveau devs, i wanna thank you very much because recently i've reached the same awesome performance in games with nouveau as with native drivers on windows, excellent work! Thank you!
10:44RSpliet: karolherbst: By the way, wrt the presumed stack overflow issue. Is there a way to extend the trap "handler" to dump the warp's PC, SP and the stack contents (type, pc, mask) to dmesg? With a bit of luck, that narrows down the problem (esp. if the latter shows repeated stack frames, the frame's target PC will let us know who's pushing them)
10:44karolherbst: I think so
10:45karolherbst: the bigger problem is, that we have no infra for putting all the data in yet
10:45karolherbst: kind of need a driver ssbo to dump stuff in
10:45karolherbst: and make sure threads don't overwrite each other
10:46karolherbst: RSpliet: I guess I will just try to come up with a cl kernel showing the exactly same issue
10:46karolherbst: way easier to toy around with it that way
10:46RSpliet: For now I'd say just use debug prints to the kernel log? It reports the trap from an interrupt handler, presumably that means the GPU is paused until the interrupt is handled.
10:47karolherbst: yeah... but I know how to execute a shader trap handler
10:47karolherbst: so I could just dmp whatever data from the shader
10:48karolherbst: RSpliet: anyway, I really want to get a proper trap handler for nouveau, just waiting on a few information on how to properly pause in the trap handler
10:48karolherbst: but setting one up is kind of trivial
10:49karolherbst: should help a lot with a lot of random shader trap issues
10:49RSpliet: Ok, well... if waiting proves unsuccesful, I'd say go for quick and dirty and solve issues while you wait :-P
10:50RSpliet: (But... I'm obvs not entitled to determine your planning :-D)
10:50karolherbst: yeah... we could just dump whatever we collected
10:50karolherbst: but... when do dump it?
10:51karolherbst: RSpliet: I kind of want to be able to run mesa in "debug mode" and whenever we get a trap, we should just print it out to stderr or something, but that kind of needs more kernel interfaces and so on :/
10:53RSpliet: karolherbst: yeah.... that sounds like an excellent phase two :-P Idk man, I've always been a kernel dev for nouveau, so my instinct is "we've got a printf that shouts It's a TRAP!' to dmesg, we can sprinkle a few more prints around that one to dump additional data". But... if the data must be extracted through command submission, this whole scheme might not work.
10:54karolherbst: RSpliet: well, I want to essentially see what each register contains, where the TRAP happened, which shader caused it... stuff like that
10:55RSpliet: So do I! But... baby steps maybe? Anyway, given my €0,02 - and those are pretty rare (and soon pretty valuable) in the land of the £ ;-)
10:55pendingchaos: imirkin: I made two: one using a macro for indirect dispatches which would sometimes fail for some reason and one which reads the indirect command from the cpu and works fine
10:57pendingchaos: and adding a PUSH_KICK worked around the failures for some reason
10:59pendingchaos: macro: https://github.com/pendingchaos/mesa/commit/555b8ddfee9 read_from_cpu: https://github.com/pendingchaos/mesa/commit/6aaff0c502b
11:51karolherbst: RSpliet: you know what's most weird about the issue? That depending on the steps, other invocations are traping and invocations which trapped before might not anymore
11:51karolherbst: not on a per thread base
11:51karolherbst: more per tile
11:57RSpliet: karolherbst: that... might mean the stack is a red herring.
11:58karolherbst: well, using the off chip stack helps though
11:59RSpliet: Ok, let me take that back. There could be two bugs :-P
11:59RSpliet: Having more information extracted about the trap would be useful.
12:03karolherbst: do you know if we are able to extract anything from the kernel side?
12:03karolherbst: or maybe the entire warp just aborts
12:03karolherbst: allthough I saw that some tiles were partially rendered, but only a few times
12:04karolherbst: all that is pretty reproducible in general, so always the same tiles of the texture not rendered, etc...
12:17RSpliet: karolherbst: I'm not the expert on this sadly
14:38imirkin: pendingchaos: thanks, i'll have a closer look
14:38imirkin: karolherbst: have a look at the branch i published. i'd like to push everything in it tonight (except the NEED TEST one, unless you test it)
14:38imirkin: [and the non-nouveau changes will have to wait for review, obviously]
14:39imirkin: karolherbst: https://github.com/imirkin/mesa/commits/cts2
14:40karolherbst: imirkin: did you check for piglit regressions or something?
14:40imirkin: i ran CTS and dEQP
14:41imirkin: the fixes are pretty directed, so i doubt it'll hit piglit. i did run the appropriate piglits, but not a full run
14:41karolherbst: I was sure that I wrote a similiar patch for the 3d image stuff at some point, but something wasn't quite right, but could have been the madsp stuff as well...
14:41imirkin: you mean for kepler? or for maxwell?
14:41imirkin: well, this definitely passes all the image tests on kepler
14:41imirkin: perhaps an element was missing somewhere
14:42karolherbst: yeah, maybe
14:42imirkin: it *is* odd that we end up doing that madsp of z * 1 + y
14:42imirkin: but ... what do i know
14:42imirkin: perhaps the clamp ops do something unexpected
14:43karolherbst: might be, yes
14:44karolherbst: the patches look fine I think
14:46imirkin: i'm also going to request they all end up in 19.0
14:46imirkin: want to get as close to conformance as possible
14:46imirkin: karolherbst: where are the fp64 patches again?
14:47imirkin: oh, i looked in _v3 and they weren't there
14:47imirkin: thanks =]
14:47karolherbst: yeah... I used the v3 for the gles stuff
14:47imirkin: nvc0/blit: some weird MS issue fixed
14:47imirkin: that's not a correct fix
14:47imirkin: i'll have to think
14:47imirkin: that's very annoying =/
14:48karolherbst: I think that issue got fixed by one of our gles patches though...
14:48karolherbst: something super weird was happening
14:48karolherbst: or maybe not
14:48karolherbst: I didn't check fir a long time
14:51imirkin: no, still happens
14:51imirkin: there are some GLES3 deqp's that this happens for too
14:51imirkin: which are very similar
14:51karolherbst: I see
14:52imirkin: although iirc it was fixed by a flush somewhere
14:52imirkin: i just couldn't figure out where, other than a giant mega-flush
14:53imirkin: heh - https://github.com/karolherbst/mesa/commit/9e33a0118898c12302e78c3f62443be153efbc65
14:53imirkin: i thought you had already done something like that
14:53imirkin: o well
14:53imirkin: i actually like yours better... hm
15:08imirkin: karolherbst: do try to test that maxwell 3d images patch to see if that fixes the "single layer" tests for you
15:12karolherbst: meh... the CTS broke out of tree builds
15:13karolherbst: had to refetch the externals
15:16imirkin: fwiw i used the GLES31 image tests for most of it
15:17imirkin: much easier to debug with
15:21imirkin: pendingchaos: did you ever consider not having the indirect_cp_inv buffer at all and just permanently stealing some SCRATCH registers for the purpose?
15:22pendingchaos: sounds like a good idea though
15:23imirkin: pendingchaos: also do you remember exactly how you triggered the errors with the macro? or they just kinda happen when running the invocations cts test?
15:23pendingchaos: it was with a modified piglit test IIRC
15:25pendingchaos: I don't think it was random at all btw
15:30imirkin: pendingchaos: ok. if you happen to still have the modified piglit, i'd like to play with it
15:31karolherbst: imirkin: I think I also had it failing for me randomly
15:31karolherbst: like in an entire CTS pass, those compute counter tests sometimes failed
15:34pendingchaos: imirkin: this should be it: https://hastebin.com/rojovomoje.c
15:38karolherbst: imirkin: yeah... on your branch it's essentially completly broken
15:38karolherbst: images that is
15:38karolherbst: instead of 1 failed, I get dozens and assertions
15:40karolherbst: imirkin: KHR-GL45.shader_image_load_store.uniform-limits asserts for me even with the maxwell patch reverted
15:52imirkin_: karolherbst: any more info on the failure?
15:52karolherbst: imirkin_: those are just failures around all image tests
15:52karolherbst: the assert is inside RA
15:52karolherbst: glcts: ../src/gallium/drivers/nouveau/codegen/nv50_ir_util.cpp:197: void nv50_ir::Interval::unify(nv50_ir::Interval&): Assertion `this != &that' failed.
15:52imirkin_: ok. i do have some RA patches locally
15:52imirkin_: er, some compiler
15:53karolherbst: yeah, there is a coordinate too many
15:53imirkin_: hrmph. i thought i fixed that.
15:53imirkin_: you said you reverted my patch though?
15:53karolherbst: or uhm...
15:54karolherbst: yeah, I did
15:54karolherbst: currently trying to understand what's wrong
15:56karolherbst: heh... seems like the TGSI is already using undefined values
15:57imirkin_: so which of my patches is causing it?
15:57imirkin_: interesting. wtf is that TEMP
16:02imirkin_: can you play with the dEQP-GLES31 image tests? those are a lot more directed and won't catch unrelated bits of failure.
16:05karolherbst: mhh, it's not the kepler 3d image fix which is causing that assert
16:05karolherbst: "nvc0/ir: fix second tex argument after levelZero optimization"
16:06imirkin_: i did look at the generated code for maxwell, seemed fine at first glance
16:17imirkin_: karolherbst: do you have a specific TGSI shader that fails with that patch? or is the gist above enough to trigger the fail?
16:17karolherbst: imirkin_: the link I posted
16:18imirkin_: the one with the ATOMUADD?
16:18imirkin_: inconceivable :)
16:18imirkin_: if it touches that insturction then yeah, the dream is over ;)
16:19imirkin_: can you test the maxwell 3d image thing with that patch removed?
16:21karolherbst: Failed: 33/51 (64.7%)
16:21imirkin_: wait, those are the CTS ones right?
16:21imirkin_: can you please check with dEQP-GLES31? those are actually a lot clearer as to what's wrong.
16:23karolherbst: imirkin_: name of the tests?
16:27karolherbst: doesn't look good
16:28karolherbst: Failed: 251/747 (33.6%)
16:28imirkin_: which ones failed?
16:45imirkin_: on the bright side, the 3d single_layer ones pass ;)
16:45imirkin_: but i broke everything else
16:47imirkin_: does e.g. dEQP-GLES31.functional.image_load_store.cube.store.rgba32f_single_layer pass without my patches?
16:49imirkin_: oh. so i didn't break it. that's good.
16:50karolherbst: uhm.. wait
16:50karolherbst: I reverted the wrong commit
16:50karolherbst: imirkin_: it does pass with the maxwell patch reverted
16:50karolherbst: I reverted the kepler one :/
16:51imirkin_: ok. well that's surprising.
16:51imirkin_: ohhhhh hrm
16:51imirkin_: no it's not
16:51imirkin_: coz ... hm
16:52imirkin_: ok. this will require playing around with.
16:52imirkin_: thanks for testing :)
16:52imirkin_: i'm going to leave that one out then
16:52imirkin_: might have to make it conditional
16:52imirkin_: i.e. double-up the surface ops
16:52imirkin_: based on whether it's secretly 3d or not
16:54karolherbst: did you check what nvidia is doing?
16:54imirkin_: iirc they have a complex thing too, conditioned on SOMETHING
16:55imirkin_: i mostly do stuff in a way that makes sense to me rather than look at what nvidia does
16:55imirkin_: i only revert to that when i really can't figure it out
16:55karolherbst: I see
16:55karolherbst: I mean we could do the detiling for maxwell I guess, seems straightforward enough
16:55imirkin_: esp when there are ops involed that i have no idea what they do
16:55karolherbst: yeah... nvidia used such a shader on maxwell too
16:55karolherbst: no idea what it does
16:55imirkin_: that's just ... so unnecessary
16:56imirkin_: the solution is simple
16:56imirkin_: but it'll require me to have hardware plugged in to actually work it out
16:56imirkin_: right now i'm focusing on kepler since kepler is the generation that's most useful on nouveau
16:56karolherbst: if you bind a layer it kind of makes sense
16:56karolherbst: because you wouldn't get a benefit of tiling on the z axis anyway
16:56imirkin_: for that operation, no
16:56imirkin_: but when you later texture from it, yes
17:13karolherbst: imirkin_: mind pinging me if you have the branch ready, then I'd run piglit on my pascal
17:52imirkin_: karolherbst: well, i don't think i can do this blind. so i'm just going to remove that patch and leave it to another day.
17:52imirkin_: i do have a maxwell and a pascal i can test with, just have to plug them in
17:53imirkin_: actually maybe i can do this blind
17:53imirkin_: i'll try tonight
17:53karolherbst: imirkin_: I meant for the other patches as well, just to be sure there are no other random fails like with the levelZero patch
17:53imirkin_: oh sure. will do.
18:00karolherbst: mupuf: regarding buglog, is there some documentation giving a higher overview on how to essentially build the whole thing? Like one server collecting the test results, other servers listening on git commits and running them, etcc?
18:10karolherbst: I mean, for us it would be enough to just track regressions inside mesa
18:10karolherbst: everything else we can think about later
18:13karolherbst: so basically I would like to have a document describing what has to be done to have a setup checking for regression on certain components
18:14karolherbst: which is I am sure everybody kind of wants to start with anyhow if they don't do CI yet
20:15pmoreau: karolherbst: Is mt_fixes_take2 the branch I should try for MT issues, or mt_fixes_to_upstream?
20:15karolherbst: pmoreau: take2
21:00karolherbst: pmoreau: I've got a compilation error with one of your patches
21:00karolherbst: "clover: Implement clCreateProgramWithILKHR"
21:01karolherbst: "../src/gallium/state_trackers/clover/core/program.cpp:37:23: error: ‘compile_from_spirv’ is not a member of ‘clover::llvm’"
21:03karolherbst: where are the newest version of all patches?
21:04pmoreau: Locally on my laptop, waiting for something apparently
21:12karolherbst: pmoreau: sooo yeah.. I think you might have fixed the compilation error in a later patch
21:12pmoreau: It could very well be
21:12karolherbst: ohhhh, I know what I do now :) I push that branch to mesa-ci
21:12karolherbst: shouldn't be too bad
21:32pmoreau: karolherbst: By the way, I pulled in your two glsl patches from nir_to_upstream, to avoid having the FIXME in my branch for the “WIP: clover: add support for nir shaders via spirv” patch.