01:36Lyude: oooo, I /may/ have figured out what the random disp init fail is, but I haven't actualloy tested my theory yet
01:36Lyude: that being said; the display channel already being active is definitely what's happening here
02:25endrift: I think I heard someone say that there were some PowerPC fixes recently, is it worth trying out my NV4* AGP cards again on my G5?
02:26endrift: the Radeon I have in there is soooo slow
06:53Tom^: would you guys want a vbios dump from a 1060 laptop card, or something else tested on it? just recently got a new laptop with it.
10:39RSpliet: Tom^: I'm sure all data is welcome
10:39Tom^: RSpliet: oki, will dump it then
10:40RSpliet: Oh that's Pascal as well. You can send it to mmio.dumps...
10:41Tom^: yeah, will do
10:50Tom^: RSpliet: sent.
10:50RSpliet: Cheers! Does it work with nouveau by the way?
10:50Tom^: RSpliet: i guess the strap_peek could have been just added as text in the mail, but attached it as a file
10:50Tom^: havent tried, only blob so far. got it just a few days ago
10:51RSpliet: All fine, we'll work it out when we need the data
10:51Tom^: been messing with some undervolting, this 8750 thermal throttles otherwise xD
10:52RSpliet: Ah right, yeah that's not the kind of fun to be had with nouveau at the moment
10:53Tom^: not sure how MSI thought it to be a good idea to not make sure the cooler is sufficient for the i7 8750h at full turbo *facepalm*
10:53RSpliet: It might be if you live in Greenland...
12:31karolherbst: who was the last person working on a scheduler pass?
12:31karolherbst: I kind of need that now
12:34pendingchaos: I have a wip instruction scheduler
12:34pendingchaos: though if you're asking for a specific person, I'm probably not them since I've never mentioned it
12:34karolherbst: pendingchaos: is it a SSA based one?
12:35karolherbst: but somebody had something like a year ago or so
12:35karolherbst: or half a yea, dunno
12:35karolherbst: I am currently trying to fix some fundamental issues inside codegen
12:36karolherbst: and for one of the possible solutions, I need such a scheduling pass
12:36pendingchaos: I think older versions of it (which I still have around) worked with both pre-RA and post-RA code, though I later removed the pre-RA stuff to focus on post-RA scheduling
12:37pendingchaos: maybe you're thinking of https://github.com/RSpliet/mesa/commits/insn_sched?
12:37karolherbst: mhh, I see
12:37karolherbst: pendingchaos: well, post RA scheduling is nice and all, but it doesn't really help with the general problem sadly
12:38karolherbst: optimizing against better sched opcodes or improviing dual issueing is one possible target for it, but you are constrained by whatever layout RA chooses, so you can't move instructions around freely :(
12:39karolherbst: I had a post RA scheduler to improve dual issue on kepler, which gave me around 10% more perf
12:39karolherbst: but that was only in pixmark_piano and everything else wasn't really affected
12:40karolherbst: pendingchaos: I have a good example though why we can't trust those shader-db stats at all
12:40karolherbst: pendingchaos: gpr: gputest_pixmark_piano/7.shader_test - 1 49 -> 79 inst: gputest_pixmark_piano/7.shader_test - 1 3523 -> 2950
12:40karolherbst: you would expect like really bad perf, because gpr usage is that high and killed parallelism, right?
12:41karolherbst: 60% higher gpr usage, and only 17% lower instruction count. sounds like a bad deal, doesn't it?
12:41karolherbst: guess what
12:42karolherbst: 2507 -> 2588 points
12:43pendingchaos: I guess because parallelism isn't very useful with ALU heavy shaders?
12:43karolherbst: it is
12:43karolherbst: we don't do CFG based opts
12:43karolherbst: nir does
12:43karolherbst: I was comparing tgsi vs nir based input
12:43karolherbst: I think I will add some more stats to the output for real
12:44karolherbst: 1. dual issue rate 2. max loop depth
12:44pendingchaos: isn't parallelism mostly useful to hide the long latencies of memory operations?
12:44karolherbst: it is, but
12:44karolherbst: if you have 30 threads vs 60 threads you can do more work with 60 threads
12:45karolherbst: but yeah..
12:45karolherbst: anyway, the gprs usage is usually higher as nir tends to put all loads into the head of a BB
12:45karolherbst: so you can insanely huge live values
12:47karolherbst: pendingchaos: I had the same problems with immediates
12:47karolherbst: and there I got like 140 gprs
12:47karolherbst: perf was terrible
12:48karolherbst: cut by 40%
12:59pendingchaos: I mean what caused the problem
12:59karolherbst: RSpliet: with your scheduling stuff : 2588 -> 2691 points
13:00karolherbst: pendingchaos: just nir being silly and pushing them on top of the shader, I think some passes do that, dunno
13:01karolherbst: gprs usage went down to 52 with it (from 79)
13:02pendingchaos: might be interesting: https://www.complang.tuwien.ac.at/andi/papers/plsa_94.pdf and https://www.researchgate.net/publication/234828261_Code_scheduling_and_register_allocation_in_large_basic_blocks seem to be about doing RA that works better with post-RA scheduling
13:05karolherbst: well the main issue I want to solve is how we do lowering of instructions in general. Currently we always have to make the deal between lowering before opts (less context for opts) or lowering after opts (and miss optimizations)
13:05karolherbst: with volta this will become a more pressing issue as some instructions don't exist anymore
15:01heeen: I have hickups in xorg of multiple seconds with nvidia binary drivers, is that better with nouveau?
15:10RSpliet: karolherbst: I know. pixmark piano was the only place where I saw a benefit on Maxwell though
15:11RSpliet: On Kepler I observed a benefit for Unigine too with that branch
15:12karolherbst: dual issueing I guess
15:12RSpliet: Part that, part early issue of DRAM requests
15:12karolherbst: could be
15:17pabs3: heeen: no such issue with nouveau for me, I think it depends on the GPU model though
15:21RSpliet: I'm rather curious about how much NVIDIA allows you to not stall on DRAM requests if the following instructions in your thread don't access the register you requested the data into...
15:22RSpliet: And how much of that requires schedcode fiddling
16:27pendingchaos: karolherbst: got some xmad numbers with hitman: https://hastebin.com/mamirajeza.txt
16:27karolherbst: pendingchaos: nice
16:27karolherbst: pendingchaos: 4th patch is just the 3 -> 2 xmad and the 3 -> 1 shift stuff?
16:28karolherbst: I guess this is a good base to play around what optimizations are indeed useful beneficial
16:28pendingchaos: yes, it is (it happens before the MUL/MAD -> XMAD optimization though)
16:29pendingchaos: "good base"?
16:29karolherbst: starting point
16:30karolherbst: pendingchaos: regarding that typo: 1.0894 * 1.0079 = 1.09800626 ;)
16:34HdkR: 10/10 :)
16:34pendingchaos: 1.089355089 * 1.007845934 = 1.097902098
16:36karolherbst: more precise values
17:00pendingchaos: karolherbst: https://patchwork.freedesktop.org/patch/242598/ also gives some nice numbers: https://hastebin.com/pinipuqoqi.txt
17:01karolherbst: ahh, we didn't merge that yet?
17:01karolherbst: yeah, that should impact everything using compute shaders
17:01karolherbst: it seems like those games are quite bottlenecked through our compute shaders
17:02karolherbst: or g access is just that slow
17:02karolherbst: pendingchaos: do you have access to TombRaider as well?
17:02karolherbst: worth benchmarking there as the performance should be much better than Hitman
17:02karolherbst: *as well
17:02pendingchaos: I might do that too sometime later
17:03karolherbst: so you got your feral key I assume? :)
17:04karolherbst: now you need a new HDD :p
17:08pendingchaos: yeah, the games are pretty huge
17:09pendingchaos: Middle-earth: Shadow of Mordor seems to be among the largest I've seen: 62 GB
17:10karolherbst: some game needs 100GB, but I don't know which one
17:11karolherbst: maybe it was rise of the tomb raider?
17:13karolherbst: not that one, was just 25GB
17:14pendingchaos: XCOM 2 is 70 GB
17:14pendingchaos: I don't see any 100 GB one though
19:49pendingchaos: karolherbst: with Tomb Raider on High Quality, the constant buffer patch gives no change
19:49pendingchaos: the 3rd and 4th xmad patches seem to give a 0.57% to 1.14% improvement
19:52endrift: imirkin: you around?
19:58karolherbst: pendingchaos: so, not that big of a difference. Try to increase the hair quality in the settings
19:58karolherbst: to TressFX
19:58karolherbst: this should impace perf quite a bit
19:58karolherbst: and should make those patches have a bigger impact (I think)
20:02pendingchaos: didn't TressFX hang some gpus?
20:06karolherbst: mhhh, right, it doesn't render correctly
20:17pendingchaos: seems to hang with my 1060
20:41karolherbst: pendingchaos_: something inside dmesg?
20:41karolherbst: I think there are some out of bound reads
20:44pendingchaos: karolherbst: I think so? I'
20:44pendingchaos: ll get a hastebin in a sec
20:46pendingchaos:assumes journalctl shows dmesg messages
20:47pendingchaos: yeah, I think it is: https://en.wikipedia.org/wiki/Dmesg#Output
20:57karolherbst: pendingchaos: you can also do journalctl --demsg
20:57karolherbst: mhh, yeah
20:57karolherbst: might be worth to try figuring out what we do wrong there
21:00pendingchaos: any idea what the "GPC1/TPC4/TEX: 80000041" and similar messages mean?
21:04pendingchaos: are they the source of the fault?
21:33karolherbst: not quite sure
21:33karolherbst: anyway, we can't do much with that information
21:34karolherbst: we kind of need a trap handler if we want to debug something like that
21:37karolherbst: maybe I should work on that actually
21:37karolherbst: as this would make debugging issues like this quite easy
22:04karolherbst: pendingchaos: yeah... I guess I will work on that fault handler thing
22:04karolherbst: or do you want to?
22:05pendingchaos: I assume it would involve kernel development?
22:05karolherbst: maybe? I am not entirely sure how we want to do that
22:05pendingchaos: I haven't done any kernel development at all
22:05karolherbst: normaly you install a trap handler for your shaders
22:06karolherbst: and the kernel isn't involved at all
22:06karolherbst: you kind of need a way to retrieve the data
22:07karolherbst: well, I can do that anyway as we need it
22:17karolherbst: pendingchaos: soo the basic idea is: reserve some global memory, install the trap handler (and write it)
22:18karolherbst: now we have two options: 1. let userspace know and print it out 2. kernel prints it out
22:19karolherbst: I would kind of prefer to do all that inside the kernel
22:19karolherbst: skeggsb, imirkin: do you both know how easy it would be to dump the shader when the MP traps?
22:19karolherbst: just the binary, nothing else
22:23pendingchaos: I think there might have been some work for something like those a while back: https://github.com/mesa3d/mesa/commit/e2dded78ea9209a11897e178d5f585f66262ce1e ?
22:23pendingchaos: it looks unfinished though
22:24karolherbst: yeah, I know
22:25karolherbst: it is also just for compute shaders
22:25karolherbst: the thing is, you also want to extract the shader
22:25karolherbst: for a single OpenCL kernel or something... okay, manageable
22:25karolherbst: but if you want to debug traps while having a game running, then you need it
22:26karolherbst: also, you need to signal userspace
22:26karolherbst: or you have a new buffer for every shader invocation and check the status
22:27karolherbst: actually, maybe we could do something like that
23:29pendingchaos: imirkin, karolherbst: could https://patchwork.freedesktop.org/patch/242598/ be reviewed sometime?