05:24 chatter29: hey guys
10:18 karolherbst: I think I will finally finish that LiveOnly pass, in furmark:
10:18 karolherbst: 1404->1581 points
10:38 pmoreau: karolherbst: Almost gained 200 points, not too bad at all!
10:40 karolherbst: thing is, this is the only application affected :/
10:40 karolherbst: didn't find something else
10:40 karolherbst: but hey, doesn't matter
10:40 karolherbst: more perf than nvidia
10:43 pmoreau: Ah, too bad it is the only one affected, but as long as the others aren’t affected negatively, it is good enough. :-)
10:43 karolherbst: yeah
10:43 karolherbst: maybe something gets affected though
10:43 karolherbst: I even have a demo which shows visual bugs if done wrongly
10:43 karolherbst: but no perf improvements there
10:44 karolherbst: allthough it could be due to PCIe
10:44 karolherbst: cause 1600fps
10:44 karolherbst: yep
10:44 pmoreau: 1600fps? What are you running, glxgears? :-p
10:44 karolherbst: nvidia in furmark: 1449 points, nouveau 1581
10:44 karolherbst: some irrlicht demo
10:44 pmoreau: Ok
10:45 karolherbst: being nearly 10% faster is pretty nice
10:45 Omar007: Optimizing for benchmark programs are we now? :P
10:45 karolherbst: the only way which is reliable
10:46 karolherbst: ...
10:46 karolherbst: and if I use bumblebee, nvidia gets to 1608 points
10:46 karolherbst: yeah, because on a native X they can be slower
10:46 karolherbst: that logic
10:49 karolherbst: pmoreau: the code is aquite crappy though for now: https://github.com/karolherbst/mesa/commit/6b9d625d38455ad9232b2001ec57234f3ed3c74f
10:49 karolherbst: maybe I could simply check for "insn->asTex()" instead of doing the switch on the op?
11:25 karolherbst: imirkin: are "shl, vote, quadon, quadpop" also considered "quadops"?
11:38 karolherbst: pmoreau: wuhu, alien isolation is also affected and generates artefacts :), nice nice
11:56 pmoreau: karolherbst: Yeah! More artefacts is better! :-D
11:57 pmoreau: I’ll have a look at the code later
11:57 karolherbst: I am mainly searching for affected applications
11:57 karolherbst: my pass is correct, this just happens if I set that flag always to true
12:27 karolherbst: pmoreau: in payday2 this pass even causes the GPU to hang :/
12:27 karolherbst: like randomly
12:28 karolherbst: "fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]"
12:28 karolherbst: and I think artefacts can appear in every game
12:28 karolherbst: it's always where triangles/vertices overlap I think
12:28 karolherbst: but not everywhere
12:36 karolherbst: there is some condition where I can't set that flag as well
12:46 dboyan_: karolherbst: I think you mean "shfl" by saying "shl"?
12:48 karolherbst: yes
12:51 dboyan_: I don't know about quadops, but I know that vote and shfl are quite generic operations. shfl is used to calculate deriviates on maxwell though, because it provide means to exchange data between threads in a quad
12:51 karolherbst: dboyan_: I try to implement this: https://trello.com/c/bW7SYHtW/56-add-pass-to-set-liveonly-on-tex-instructions-when-possible
12:52 karolherbst: I guess this also includes shfl
12:53 karolherbst: and this is my patch for now: https://github.com/karolherbst/mesa/commit/df896b1e93cd9c9da8eb8e44936cb6de0c55c06f
12:53 karolherbst: everything seems to be still rendered correctly
12:53 karolherbst: but in payday2 the GPU just hangs at some point
12:53 karolherbst: no idea why
12:54 dboyan_: Ah, here's the context
12:55 karolherbst: I really want to impement it, because performance in furmark increases by 12.5%
12:55 karolherbst: I doubt it matters that much in other applications, but still
12:55 karolherbst: in the end everything counts :)
13:02 dboyan_: karolherbst: I guess "vote" and "shfl" are so-called "cross-lane instructions"
13:02 karolherbst: yeah
13:03 karolherbst: I've added them to the list
13:03 karolherbst: also quadpop and quadon
13:03 dboyan_: yeah, seem so
13:04 karolherbst: still hang
13:05 karolherbst: maybe it can't be done on all tex* instructions? or if there are more requiernments? dunno
13:07 karolherbst: imirkin: do you have any ideas? https://github.com/karolherbst/mesa/commit/48d89dbc408da4ea235a04cf6418ad1f86c7fb66
13:08 karolherbst: there is a condition we/I miss and I don't see which one
13:09 karolherbst: maybe my pass is also just broken...
13:10 dboyan_: karolherbst: you didn't check for tex instructions as mentioned there I guess
13:10 karolherbst: I do
13:11 karolherbst: "use->asTex()"
13:11 dboyan_: ah
13:26 karolherbst: in the cuda tools it is called "NODEP"
14:12 dboyan_: karolherbst: I don't think I have the knowledge enough for this problem, but is it possible to be related to control flow?
14:12 dboyan_: The snippet in https://lists.freedesktop.org/archives/mesa-dev/2017-March/150404.html suddenly came to my mind
14:13 dboyan_: Although totally no idea if it a valid case
14:15 karolherbst: I could try to find if any piglit tests hangs as well
14:15 karolherbst: or I try to figure out when Nvidia sets that bit and when not
14:17 dboyan_: btw, we also have the name "nodep" in envydis
14:17 karolherbst: yeah
14:17 karolherbst: but it gets printed as "all" and "live"
14:20 dboyan_: It's called "nodep" in gm107 envydis. Different styles
14:20 karolherbst: k
14:21 karolherbst: I have to take a look at the generated shaders, maybe my pass just has a trivial bug
14:22 karolherbst: but I think that pass has to be moved post RA anyway
17:44 yann-kaelig: Hi
17:48 yann-kaelig: I'm using linux-rt kernel and the screen freeze when I try to use kde-plasma. I can access the computer through ssh, and get an output of dmesg. http://dpaste.com/2955M3R
17:48 yann-kaelig: Now I'm on mate and I don't have this sort of issue
17:51 karolherbst: yann-kaelig: nouveau is kind of bad with multi-context multi-threaded issues
17:52 karolherbst: it's being worked on afaik
17:53 yann-kaelig: karolherbst, hi. Does that mean it's a nouveau issue only with kde-plsame ?
17:53 yann-kaelig: plasma*
17:53 karolherbst: basically yes
17:53 karolherbst: it can hit every application though doing multithreaded OpenGL
17:53 karolherbst: or if there are many OpenGL based applications active
17:53 karolherbst: Plasma5 tends to use a lot of OpenGL these days
17:53 yann-kaelig: ok, I understand
20:43 imvr: Hello, can the extremely quick change of words at the same point on the screen cause flickering/tearing ?
20:52 karolherbst: dboyan_: mhhh odd, my pass seems to look fine though :/ *sigh*
20:52 karolherbst: I am sure imirkin could say what is wrong there
21:09 imirkin: ..?
21:10 karolherbst: imirkin: I tried to work on that LiveOnly card again
21:10 karolherbst: I changed a little in my patches and could verify, that I don't add rendering errors in multiple application
21:10 karolherbst: but still improve performance especially in furmark
21:10 karolherbst: but
21:11 karolherbst: the issue is, that I have a game where the GPU just hangs
21:11 karolherbst: and I don't know why
21:11 karolherbst: here is the updated version of the patch: https://github.com/karolherbst/mesa/commit/e83b6d85b8ecfb8f43f5729c012e4b53e45bf122
21:11 karolherbst: currently I try to hack some ptx assembly code to somehow figure out when Nvidia sets that flag
21:12 karolherbst: mhhh
21:13 karolherbst: somehow I get the feeling, that NVidia doesn't use the same constraints on nvc0+ like they do on older chipsets
21:13 karolherbst: example: https://devtalk.nvidia.com/default/topic/689830/cuda-low-level-programming-strange-ptxas-behavior/
21:13 karolherbst: if I compile that for sm20 and sm30, I get no "NODEP" at all
21:13 karolherbst: but for sm_11 it seems to be there
21:14 imirkin: ok, well, sm11 is totally different, so don't even bother with that
21:14 hakzsam: karolherbst: can you try heaven in ultra (tess extreme) with recent mesa against nouveau?
21:14 imirkin: NODEP is different than liveOnly iirc
21:14 karolherbst: no
21:14 karolherbst: it's the same
21:14 karolherbst: I just checked that today
21:14 imirkin: ok. i don't really have time to look at this ... sorry
21:14 karolherbst: that bit we set in the emiter gets translated to NODEP in the cuda tools
21:14 karolherbst: k
21:15 karolherbst: hakzsam: k
21:15 hakzsam: karolherbst: there is a potential regression
21:15 hakzsam: roofs might be pink
21:15 karolherbst: :O
21:17 karolherbst: hakzsam: do you want me to bisect it as well?
21:17 hakzsam: no
21:17 hakzsam: I know the first bad
21:17 karolherbst: well, they ain't pink
21:17 hakzsam: at least with radeonsi
21:17 hakzsam: no regressions?
21:17 karolherbst: I try to play with the options
21:17 karolherbst: but it looks good
21:18 hakzsam: karolherbst: do you have 0bceefc29591d64d5d529a726e68b837f1f504b2 ?
21:19 karolherbst: yes
21:19 karolherbst: ohhh
21:19 karolherbst: now it is pink
21:19 karolherbst: it's not only related to tess extreme
21:20 hakzsam: can you upload a screen? and try without this commit as well?
21:20 hakzsam: (just to confirm)
21:20 karolherbst: the dragon is blue as well
21:22 karolherbst: hakzsam: https://i.imgur.com/EJxngWy.jpg
21:22 hakzsam: okay thanks
21:22 hakzsam: same issue
21:23 imirkin: merge registers was accidentally fixing something?
21:23 imirkin: ftr, that color is 'purple', not pink :)
21:23 karolherbst: hakzsam: without that commit the issue is still there
21:23 imirkin: diff commit for nvc0
21:23 hakzsam: imirkin: yeah, wtvr the color is :)
21:23 karolherbst: true...
21:23 karolherbst: what was the commit for nvc0 :D
21:24 imirkin: 00b504474014663ff1b00d273d219cd9a02091de
21:24 karolherbst: thanks
21:25 karolherbst: yep
21:25 karolherbst: issue gone
21:25 hakzsam: ok cool
21:26 hakzsam: karolherbst: I guess disabling the renumber_registers() pass fixes the issue?
21:27 karolherbst: on plain master?
21:27 imirkin: oh, renumber_registers is still run? kill that with fire.
21:27 hakzsam: karolherbst: yes
21:27 karolherbst: where was the place?
21:27 hakzsam: karolherbst: just look for renumber_registers()
21:27 karolherbst: smart...
21:28 hakzsam: imirkin: I think the renumber_registers() pass is still useful
21:29 karolherbst: hakzsam: fixed with disabling it
21:29 hakzsam: thanks karolherbst
21:33 karolherbst: imirkin: maybe a quick answer for a short question if you have the time for that: do you know if there is somethign else important for the evaluation except quadops and tex instructions?. I am sure you would have put that into the card, but just want to make sure
21:34 karolherbst: I don't even know what that flag does anyway....
21:35 imirkin: dfdx/dfdy, but those map to quadops
21:35 karolherbst: yeah
21:36 karolherbst: I also added shuffle and such things, just in case. But this won't matter for the application I testing on
21:36 karolherbst: *am
21:37 imirkin: hakzsam: they're all subtly buggy passes that add no value =/
21:38 hakzsam: true
21:38 hakzsam: I'm comparing shaders
21:38 airlied: imirkin: subtly interlinked buggy passes
21:39 airlied: for the tgsi stuff :-)
21:39 imirkin: airlied: yeah. e.g. the DCE pass does some necessary-for-correctness stuff. urgh.
21:40 hakzsam: that would be sad :)
21:40 imirkin: hakzsam: but true.
21:40 hakzsam: probably
21:40 karolherbst: running DCE on the tgsi is no bad idea though
21:40 imirkin: i trakced it down a while back
21:40 hakzsam: imirkin: you remember?
21:40 imirkin: https://github.com/imirkin/mesa/commits/st-pass-cleanup
21:41 imirkin: look at the WIP eliminate DCE commit
21:41 imirkin: sometimes an instruction is marked dead early on
21:41 imirkin: like in the emitter
21:41 hakzsam: ah you disabled all TGSI opts? :)
21:42 imirkin: yeah.
21:42 hakzsam: that a solution, at least for backends with real compilers...
21:42 hakzsam: *that's
21:42 imirkin: yep
21:42 imirkin: my target was nv50_ir
21:42 imirkin: but i was mostly prototyping at that point
21:42 hakzsam: okay
21:43 imirkin: i ended up coming to the conclusion that some of those passes perform work which accidentally causes the output code to be better than it otherwise might be
21:43 hakzsam: ah
21:43 hakzsam: buggy passes, then?
21:43 imirkin: no
21:43 imirkin: not buggy
21:43 imirkin: but for reasons other than what their primary purpose is
21:44 imirkin: the final code ends up looking better
21:44 hakzsam: okay, I see
21:44 imirkin: (also buggy, but i fixed a bunch of bugs)
21:44 airlied: some passes are also lazy, they leave crap around for later passes to clean
21:45 imirkin: this was a goodie - 047b91771845453826dcdd0019adc7333348b158
21:45 imirkin: or this 1614c39a8fc205d7b1cb5b16737c233fbcc5b678
21:45 hakzsam: okay, it's time to figure out the issue :)
21:50 airlied: and write a piglit :-)
21:51 hakzsam: yeah, it's the season of piglit tests :)
21:54 karolherbst: mhhh
21:54 karolherbst: I can't get ptxas to generate a tex instruction with that live flag set
21:56 karolherbst: maybe it only works for special tex instructions
22:03 imirkin: karolherbst: adjust the number of invocations perhaps?
22:03 imirkin: the live flag makes most sense in frag shaders though
22:04 karolherbst: mhhh
22:04 karolherbst: true
22:04 karolherbst: maybe I can't enable it inside vertex shaders?
22:08 karolherbst: luckily that hang happens in under a minute
22:08 imirkin: mmmmm iirc it has no meaning outside frag. could be wrong.
22:09 karolherbst: well we don't need to run that pass outside FPs anyway, so that check makes sense
22:09 karolherbst: but yeah, still freeze
22:13 karolherbst: it generates a texfetch instructio... let's see
22:32 karolherbst: ha, I just implement a shader from us into PTX, that will show nvidia :/
22:33 karolherbst: *sigh*
22:33 karolherbst: I really don'T want to install old versions of those tools now
22:35 karolherbst: imirkin: maybe if there are dependencies to other tex, we can't enable it at all? not only the source, but also the use?
22:35 imirkin: so the deal is
22:35 imirkin: the liveonly flag, as i understand it,
22:35 imirkin: affects whether the texture is fetched for non-live invocations
22:36 imirkin: aka helper invocations
22:36 imirkin: so ... why do helper invocations even exist?
22:36 imirkin: they exist so that derivatives can be computed properly
22:36 imirkin: which means any texture that does not have an explicit LOD supplied (or explicit derivatives)
22:36 imirkin: or anything else that makes use of the data in helper invocatoin lanes
22:37 karolherbst: mhh, okay I doubt I have that part "explicit LOD and explicit derivatives"
22:37 karolherbst: I just check if the tex is used for another tex or quadop instructions
22:37 imirkin: well, if you disable it any time any texture op hits, you're fine as wel.
22:37 karolherbst: hum....
22:37 karolherbst: I did that
22:37 imirkin: you could just be more agressive if you wanted to be
22:38 karolherbst: I think
22:38 karolherbst: ahhh, got it
22:38 imirkin: and i assume you follow through phi's, etc?
22:38 imirkin: as well as other ops?
22:38 karolherbst: yeah
22:38 imirkin: i.e. texture(add(texture())) ?
22:38 karolherbst: I do a deep search basically
22:38 karolherbst: I even have shaders showing that
22:39 imirkin: karolherbst: check with this shader:
22:39 karolherbst: https://gist.github.com/karolherbst/a89987660872e431fdd5448274d6a819
22:39 karolherbst: I've added that liveOnly flag to the print
22:39 karolherbst: all/live
22:39 imirkin: vec4 foo; if () { foo = texture(); } else { foo = vec4(1); }; texture(foo.xy)
22:40 imirkin: you have a single BB there, so it's simpler. try it with something more complex :)
22:41 karolherbst: yeah, but I think it looks correct there, doesn't it?
22:41 imirkin: maybe, maybe not
22:42 karolherbst: also that code you gave me doesn't compile, and I never wrote any OpenGL/GLSL code in my life
22:42 karolherbst: I think I will make it work though
22:43 karolherbst: ahh, here we go
22:44 karolherbst: mhh
22:44 karolherbst: exit got left over
22:58 karolherbst: it would be so nice to know which shader actually hangs....
22:58 karolherbst: maybe I just go ahead and implement stuff so that we know that
22:58 karolherbst: I think it is really just a shader looping forever or so