00:04imirkin: reinuseslisp: that's surprising
00:04imirkin: i don't remember if i had a theory about what mrg_* did
00:05reinuseslisp: from what I can tell it's not envyas' fault since it emits the same output that the blob does (using the arguments shown by nvdisasm)
00:06imirkin: yeah, i spent a bunch of time getting those ops accurate in envydis (relative to nvdisasm, that is)
00:09HdkR: reinuseslisp: What's your pascal?
00:09HdkR: then it won't
00:10imirkin: HdkR: i thought all GP10x supported half
00:10reinuseslisp: it's weird since it works without mrg_h0/mrg_h1
00:10imirkin: HdkR: what's mrg?
00:10imirkin: how does it merge h0?
00:10imirkin: is it just a broadcast of h0 to both halves?
00:10HdkR: Oh maybe it has it but crap throughput
00:11imirkin: that's ok - nouveau has crap clocks
00:11imirkin: even if the throughput were awesome... wouldn't matter
00:11HdkR:is in the middle of a video game, distracted
00:15reinuseslisp: f32 as a result modifier works, it reads the lower component
00:15imirkin: reinuseslisp: some combinations of parameters can be illegal
00:16imirkin: if you're using nouveau, did you get an error in dmesg?
00:16imirkin: just coz nvdisasm decodes it doesn't mean it works
00:18karolherbst: imirkin: ohh, btw, ping on the TEXS patches
00:18imirkin: ideally to tree
00:19karolherbst: imirkin: https://patchwork.freedesktop.org/series/47777/ https://github.com/karolherbst/mesa/commits/scalar_tex
00:19karolherbst: ohh uhm... I have some more patches on that tree
00:20karolherbst: force pushed
00:20reinuseslisp: heh, yes: ILLEGAL_INSTR_DECODING
00:21imirkin: yeah, so probably need more modifiers
00:23HdkR: imirkin: mrg isn't a broadcast
00:23karolherbst: HdkR: is it like the XMAD mrg?
00:25imirkin: karolherbst: gatherMask isn't mutually exclusive with the texture output mask
00:25imirkin: er, gatherComp
00:25imirkin: the gather comp is which color you want
00:25imirkin: the mask is which outputs get written
00:28karolherbst: yeah, and I only print which color we want. I could actually add the mask for TXG in addition, but that sounded less important to me back at the time
00:43reinuseslisp: "so probably need more modifiers" the emitted binary is the same as one emitted by the blob ("0000 f7af 8080 165d" vs. "0000 f7af 8080 165d"). I guess it's just my GPU
00:44reinuseslisp: I'll try to make it similar to XMAD's
00:44reinuseslisp: good to know that this is reported in dmesg
00:55HdkR: sigill or something in dmesg?
00:55reinuseslisp: [ 6945.149719] nouveau 0000:01:00.0: gr: TRAP ch 2 [007fb8d000 inject]
00:55reinuseslisp: [ 6945.149728] nouveau 0000:01:00.0: gr: GPC0/TPC0/MP trap: global 00000000  warp 3c0009 [ILLEGAL_INSTR_ENCODING]
00:57reinuseslisp: mrg here would preserve dest's value (unlike xmad)?
01:01HdkR: reinuseslisp: Overwrites the half you chose, preserves the half you didn't
01:02reinuseslisp: I guess it takes the respective component of the pack (mrg_h0 takes and writes lower, mrg_h1 to higher)
01:16reinuseslisp: That'll do. Sorry for bothering, thanks!
01:35karolherbst: imirkin: mhh, that bo with the missing kref is the one created inside nouveau_pushbuf_new
01:35karolherbst: inside that loop
01:39karolherbst: and they get refed inside nouveau_pushbuf_space
01:39karolherbst: I bet the problem is, that nouveau_pushbuf_space is never called
01:43imirkin: karolherbst: actually the mask is a lot more important
01:43imirkin: [for txg]
01:44imirkin: [for everything]
01:45karolherbst: well, for txg we could print something like rxxr if the mask is 5 and the gathercomp is 0, right?
01:45karolherbst: mask is 9
01:45karolherbst: 5 would be rxr
01:45karolherbst: but... I don't like that
01:45karolherbst: looks ugly
01:45imirkin: karolherbst: no
01:45imirkin: there's nothing special about txg
01:46imirkin: for gather, you just grab a color from a bunch of pixels
01:46imirkin: for 5 it should be rb
01:46imirkin: and that's it
01:46imirkin: or even better, r_b_
01:47karolherbst: and what does gatherComp do then?
01:48imirkin: it selects which color to grab
01:48imirkin: from the various texels
01:48imirkin: whereas the mask affects which outputs get written to
01:52karolherbst: mhh, but how can you actually select different colors?
01:52karolherbst: I mean, you can pretty much select one
01:52karolherbst: but how would r_b_ look like?
01:52karolherbst: gathercomp is just 2 bits
01:53karolherbst: my understanding is that gathercomp == 3 and mask == 0xf would write aaaa as the output
01:53karolherbst: and gathercomp == 1 and mask == 0x3 would write gg__
01:54imirkin: yes, but ... that's confusing to think about
01:54imirkin: the non-confusing way to think about it is to treat txg just like every other tex function
01:54imirkin: i.e. r = output 1. g = output 2. etc
01:55karolherbst: okay, but that's not how that works with txg in practise :/
01:55imirkin: except that's how every single person thinks of txg
01:56imirkin: the "rgba" is the vec4 output
01:56imirkin: it has nothing to do with the colors
01:56imirkin: there can be texture swizzles, etc
01:56karolherbst: mhhh I see
01:56karolherbst: so it is more like a xyzw which just end up being colors in the end
02:04imirkin: karolherbst: can you check if https://www.shadertoy.com/view/4sf3Dj works ok for you?
02:05imirkin: iirc it worked on nv50, but it appears to be totally fubar now
02:05karolherbst: imirkin: will check tomorrow
02:05imirkin: not on nouveau at all right now?
02:13karolherbst: it is 4 am :p
02:13karolherbst: I have my other machine where I can switch to nouveau in a few minutes, but I really want to sleep soonish
02:27imirkin: karolherbst: your condenseDefs thing doesn't work
02:27imirkin: oh wait
02:27imirkin: i misread... hm
02:28karolherbst: yeah... the patch layout is stupid :/
02:29karolherbst: I am quite positive it works, but I wasn't trying to actually break it though
02:29imirkin: no, i literally misread.
02:33imirkin: karolherbst: uhhh wtf is going on with that getTEXSMask ?
02:34karolherbst: imirkin: yeah well... dunno
02:34karolherbst: that's what nvdisasm gave me
02:34karolherbst: imirkin: the mask is just 3 bits in the scalar variants
02:34karolherbst: so they even threw out some combinations
02:34imirkin: right ...
02:35karolherbst: and some values are reused
02:35imirkin: i'm going to have to dig into this more
02:35karolherbst: for 1-2 vs 3-4 defs
02:35imirkin: have you updated envydis for this?
02:35karolherbst: it is already there afaik
02:35karolherbst: maybe I missed something
02:35imirkin: yeah, i remember 1-2 being diff from 3-4
02:35karolherbst: let me check
02:35karolherbst: I think I wrote some patches there
02:36karolherbst: imirkin: https://github.com/envytools/envytools/commit/e83f36dd3c2733e44b5170f898ba9a9f1f6611eb#diff-ae584cf0004fc957b7b49ced039a117e
02:36imirkin: right ok
04:58HdkR: karolherbst: Nevermind, it was a lie. Just died coming back up again :P
12:59karolherbst: imirkin: mhhh... the pushbuf gets fully initialized after we call into nvc0_program_library_upload
12:59karolherbst: so something inside nvc0->base.push_data is doing that
13:01karolherbst: uhhhh, I see it now
13:07karolherbst: it works! yay
13:08karolherbst: dolphin renders correctly while using one hardware channel + per context pushbuf/fence list
13:10karolherbst: soo, now a bit of deeper testing
13:26karolherbst: imirkin: we have some more issues though: we could end up with two uploaded builtin libraries if two contexts are created nearly at the same time
13:27karolherbst: inside nvc0_program_library_upload between if (screen->lib_code) and the nouveau_heap_alloc
13:27karolherbst: imirkin: do you prefer the "lock the screen" or "do and delete if something else already did it" approach?
13:28karolherbst: mhh, allthough nouveau_heap_alloc returns 1 if the heap is already set, still a bit racy as both threads could enter nouveau_heap_alloc at the same time
15:01karolherbst: imirkin: so okay, what should be the problem witht he shadertoy?
15:01karolherbst: seems to work on my gm204
15:03karolherbst: imirkin: do you know applications which use multiple gl contexts for data streaming?
15:04karolherbst: glamor seems to be stable as well and I didn't see any rendering issues so far
15:11karolherbst: Lyude: do you have some clock gating patches for maxwell?
15:24imirkin: karolherbst: if there were a problem, you'd know it :)
15:25imirkin: did you see a gamecube logo, or a bunch of random dots all over the place?
15:33karolherbst: gamecube logo
15:37karolherbst: imirkin: but overall it seems my patches seem to fix multi context crashes :)
15:37karolherbst: and surprisingly the changes are rather small: " 33 files changed, 358 insertions(+), 231 deletions(-)"
15:39karolherbst: HdkR: those ubershaders aren't that heavy actually with the proper GPU :p
16:07imirkin: karolherbst: try it with some real games
16:07imirkin: iirc i had a lot of moments where i was like "yeah, it works"
16:07karolherbst: imirkin: yeah... well, I don't know which one uses multiple contexts
16:07imirkin: and then ... no.
16:07imirkin: there was some open-source one
16:08imirkin: with a 2 at the end =/
16:08imirkin: warsow 2?
16:08imirkin: iirc they have a hack for nouveau to use single contexts
16:10karolherbst: well, at least I am able to verify those crashes in dolphin are gone, but on my ssystem I wasn't really able to crash glamor reliably anyway :/
16:10karolherbst: this seems like to be the warsow bug: https://lists.freedesktop.org/archives/nouveau/2015-December/023425.html
16:16imirkin: yeah. they stuck a thing into their engine that detects nouveau and avoids multiple threads
16:17imirkin: just change the driver name to 'noveau' and that should get it going
16:29karolherbst: imirkin: in nouveau_screen_get_vendor?
16:33karolherbst: nice, it crashes
16:33karolherbst: okay, now with my patches
16:43karolherbst: mhh hit that kref assert again... let's see
16:47karolherbst: imirkin: okay.. my patches are an improvement as the game doesn't crash at start time anymore... but I think I will have some fun time tracking all those referencing issues down
17:13karolherbst: this is again the bo from the internal pushbuffer state :/
17:13karolherbst: something is weird
19:00Lyude: karolherbst: there is a branch on my gitlab but I discovered it needs a bit more work
19:01Lyude: it looks like I'm accidentally loading the clkgate packs after enabling cg_ctrl which explains why the power saving didn't save anywhere as much as I thought it would
19:03karolherbst: Lyude: I see
19:03Lyude: ended up being a bit more work left then I thought; yeah
19:09karolherbst: Lyude: do you think you will find some time to finish that over the next few days?
19:10Lyude: karolherbst: I can try during work, now that work thing is out of the way
19:10Lyude: you're welcome to play around with the branch as well
19:10karolherbst: well, I would try it on my gm204
19:10karolherbst: but I still want to focus on the multi context crash fixes :p
19:11karolherbst: I think I am on the right track, just have to track down some silly issues
19:12HdkR: karolherbst: Still not super hard on the GPU, but people love running these games at 4k and it cripples most at that resolution :P
19:12karolherbst: HdkR: right... I was just testing nouveua at FHD :p
19:35karolherbst: imirkin: how much do you know about those libdrm_nouveau internals? Currently I have that issue that the nvpb->bo thing is refed by that bo, but nvpb->bo is nvpb->bos and I don't see how nvpb->bo can be changed without also refing it
19:38imirkin: i try to forget every time i learn ;)
19:39imirkin: i doubt i can provide any real insight you wouldn't already have from reviewing the code yourself
19:39imirkin: all the kref stuff is ... subtle
19:41karolherbst: the handover to the next bo is happening inside nouveau_pushbuf_space, that much I know
21:01karolherbst: imirkin: do you think you find some time for the scalar tex patches today? Then I could have an updated series next weekend or so
21:02imirkin: yeah. i think it looked basically good
21:02imirkin: update the printing thing
21:03karolherbst: sure, I will think about a proper way
21:04imirkin: ￼ if (tex->tex.mask != 0x3 && tex->tex.mask != 0xf)
21:04imirkin: ￼ return false;
21:04imirkin: i do find this surprising for OP_TXG
21:04karolherbst: I know
21:04karolherbst: but there is no way to encode masks
21:04karolherbst: because you don't emit it at all
21:04karolherbst: or did I miss something?
21:05imirkin: but if the mask == 1
21:05karolherbst: sure, but how to emit it?
21:05imirkin: i'll get back to you.
21:05karolherbst: I had the same issue, because it would make sense
21:05imirkin: i mean, you could always just say rx, r255
21:05karolherbst: but... single vs double reg
21:05karolherbst: nope, dest are always filled to double first
21:06karolherbst: TEXS for two colors also use rx, r255
21:06imirkin: or TLD4S rather
21:07karolherbst: anyway, we can't emit a mask for tld4s
21:07karolherbst: only gatherComp
21:07imirkin: i find that unlikely.
21:07imirkin: i will play with it
21:07imirkin: gimme a few
21:07karolherbst: I mean... you could "reserve" the rxd.hi
21:07karolherbst: and just don't use it
21:07imirkin: (just coz it's unlikely doesn't mean it's not true)
21:08karolherbst: I don't know how hard I actually tried to get the blob to emit that, but I am sure I checked up on that
21:11imirkin: wtf is 'TXA'
21:12karolherbst: imirkin: where did you see a TXA?
21:13imirkin: both in envydis and in my random fuzzing attempts
21:13karolherbst: mhh, it doesn't exist in the nvdisasm doc
21:13karolherbst: do you know what nvdisasm calls it?
21:15karolherbst: mhh, interesting
21:16karolherbst: ohhhh I might have an idea
21:16imirkin: which is...
21:16karolherbst: nvidias fancy Temporal Anti-Aliasing
21:16karolherbst: called TXAA
21:17imirkin: ok, well it seems like you're right with TLD4S
21:17karolherbst: yep... but you always have TLD4 ;)
21:17karolherbst: I think they just empirically checked how to cover the most common cases and just went with that
21:18imirkin: there's nowhere in the opcode to put it
21:18karolherbst: they can't put in everything, so they had to decide what to put in
21:18karolherbst: they could also just have one bit for the component and one for selecting single/double outputs
21:19karolherbst: and only allow r and b or something
21:19imirkin: probably r and a would have been enough
21:19imirkin: i don't think i've see anything else
21:19karolherbst: _but_ the source/dest layout is kind of the same for all scalar tex operations, so why make a special case for tld4s
21:20karolherbst: there is that TXF.A2D thing though :/
21:20karolherbst: ohh that reminds me
21:20imirkin: btw, can i assume you ran piglit on this?
21:20imirkin: at least the texturing tests
21:20karolherbst: TEX.A2D might have the same exception. _but_ I am sure I checked that
21:20karolherbst: yeah, full piglit
21:20karolherbst: I would rerun it again though
21:20karolherbst: just to be sure
21:22karolherbst: the switch emulator guys already REed all that already as well. Might be worth to take a look and see if something doesn't match
21:23karolherbst: but I think I even ran the CTS with those patches
21:23imirkin: anyways, other than that printing thing where you should just make it the regular mask always, your thing looks fine.
21:27karolherbst: imirkin: yeah... I am just a bit worried that printing the mask as colors for txg might be confusing for some. I kind of would prefer a r.xyw or something, maybe even r.rga even if that looks a little odd
21:27imirkin: the universe is confusing for some :p
21:27imirkin: maybe those some shouldn't be reading the nv50 ir output
21:28karolherbst: but if you see b.yz you directly know whats up
21:28karolherbst: imirkin: well, but we can make things easier to understand if there is a easy solution for it ;)
21:28imirkin: ok, well your current thing is REALLY confusing :p
21:29karolherbst: yeah.. it doesn't tell the whole truth
21:31karolherbst: I think in the end I would rather go for a "color.xyzw" style for txg, or maybe we could just do "color rgba"... any preferences? Skipping the gatherComp doesn't sound like a proper solution anyway and just having "rg" for txg is also a bit confusing as you don't know which color you actually get
21:31imirkin: color rgba
21:32imirkin: i.e. "r rgba" makes sense
21:32imirkin: although for the mask
21:32imirkin: i'd prefer to see
21:32imirkin: "r___" to "r"
21:32karolherbst: those ___ don't really add more information, do they?
21:33imirkin: they make it obvious that it's a mask
21:33karolherbst: true, but then you have those ___ in the print :/
21:55karolherbst: imirkin: mhh, does this make sense? after a pushbuf was flushed with pushbuf_flush, it can't be PUSH_KICKed?
21:56karolherbst: first call into nouveau is nouveau_buffer_transfer_unmap
21:57karolherbst: mhh, pushbuf_validate is called after, but that isn't enough sadly
22:00karolherbst: mhhh, the issue seems to be somewhere else though
22:23karolherbst: imirkin: did you ever get to building warsow?...
22:23karolherbst: this thing is totally annoying to debug
22:23karolherbst: at some point when I debug too long, the connection timeouts to the local server and I don't hit the path where it should crash anymore :/
22:23karolherbst: trying to load the tutorial just works then :/
22:26imirkin: iirc it worked fine with my locking patches
22:29karolherbst: imirkin: okay sure, but what did you lock around? each pushbuf access?
22:31imirkin: yeah, and lots of things can also trigger kicks behind your back
22:31imirkin: like nouveau_bo_map and so on
22:31imirkin: ultimately i realized the approach was unworkable
22:31imirkin: and it didn't help that some geniuses decided to ship it as part of distros
22:31karolherbst: right, I remember
22:31imirkin: so i removed all traces of it online, or at least tried to
22:32karolherbst: also the perf benefit of using multiple contexts kind of vanishes, no?
22:32imirkin: not if there's OTHER things your application does than make GL calls
22:32imirkin: also hardly everything requires pushbuf access
22:33imirkin: i did try to keep it to a minimum of locked "area"
22:33karolherbst: true, it isn't that bad, but I prefer a solution without having to lock
22:33karolherbst: but ultimatly I might end up having to lock as well :/ there is still that screen pushbuf
22:34imirkin: the proper solution is locking + no libdrm_nouveau kicking randomly
22:34imirkin: my approach was missing the latter bit
22:34imirkin: and another key to all this is to be able to have a fence wait from userspace that doesn't spin the cpu
22:34imirkin: it's not CRITICAL, but definitely useful
22:35imirkin: but after 2 years waiting on such a thing, i'm not holding my breath
22:37karolherbst: mhh, anyway, fixed my kref assertion
22:37karolherbst: "nouveau_pushbuf_bufctx(nvc0->base.pushbuf, nvc0->bufctx); nouveau_pushbuf_validate(nvc0->base.pushbuf);" before the PUSH_KICK(nvc0->base.pushbuf); inside nvc0_flush does the trick...
22:38karolherbst: the other assert I fixed similiar by having that inside nvc0_create
22:38karolherbst: but this just looks like something is odd
22:38karolherbst: maybe someting super trivial like nothing gets refed after all references are gone and flushing requires something to be refed....
22:38karolherbst: or dunno
22:39karolherbst: maybe the bufctx just gets removed and kicking without it doesn't work
22:39imirkin: sounds like you have an underlying issue
22:39imirkin: and you're patching around it while ignoring the issue.
22:39imirkin: but before you waste too much time
22:39imirkin: the locking approach won't work
22:39karolherbst: or maybe just a simple assumption doesn't hold up anymore
22:39imirkin: i proved it to myself at the time
22:40karolherbst: yeah... I don't really lock
22:40imirkin: because of how libdrm_nouveau works
22:40karolherbst: and I am sure I don't even need it
22:40karolherbst: I just flush out the screen pushbuf when the context one gets flushed
22:40karolherbst: but... I doubt I actually need it
22:40karolherbst: doesn't help with anything
22:50imirkin: you need it.
22:50imirkin: the commands were composed assuming a certain hw context
22:52karolherbst: well, normally you use the context pushbuf for that
22:53karolherbst: and there you have the state tracking of the context
22:54imirkin: oh, you have the screen its own pushbuf?
22:54karolherbst: radeonsi has a similiar approach as well, they have a pseudo context in the screen
22:55karolherbst: for ... things. some of those gallium methods only have the screen and no context attached
22:55imirkin: seems like this could create awkwardness wrt the various resource tracking. interesting.
22:58karolherbst: maybe we want also a screen owned context and just do our context switching stuff with that as well... dunno if we actually need it though
22:58karolherbst: anyway, dolphin and warsow seems to work now
23:00karolherbst: imirkin: do you know of other applications which use multiple contexts?
23:03karolherbst: was able to trigger a shader trap inside dolphin...
23:04imirkin: not offhand... i forget
23:04imirkin: there were def others i tested with
23:09karolherbst: mhh, but I guess it didn't work not because some applications were triggering special cases, but it wasn't relibly working?
23:11imirkin: i encountered cases that were unfixable
23:11karolherbst: I see
23:17imirkin: basically i was fighting the fact that nouveau_bo_map & co could call the kick handler
23:29karolherbst: yeah.. I can see how that could mess up stuff
23:30karolherbst: imirkin: did you at some point ran into race conditions regarding parallel shader compilations?
23:30karolherbst: I would like to helgrind dolphin... just I don't want to wait until next week for the output log
23:34imirkin: that should work
23:34imirkin: compiler's entirely isolated
23:34karolherbst: well, I blame access to screen->text or something
23:35karolherbst: maybe some threads uploading to the same address or weird things like that... dunno
23:35karolherbst: allthough that shouldn't happen...
23:35imirkin: that's part of upload
23:35imirkin: not compilation
23:37karolherbst: mhh, well the error I get is something like "gr: SHADER a244020e, sph: 0x44020e, stage: 0x22" anyway
23:38imirkin: usually means the header is off
23:39karolherbst: line before that is "gr: TRAP ch 8 [017e0e0000 Xorg]"
23:39karolherbst: will try to trigger it with mesa master
23:39karolherbst: I currently blame those ubershaders as I just figured out how to actually use them if the parallel shader compilation is forced off in dolphin
23:44karolherbst: HdkR: with ubsershader async, ubershaders are used until the specialized shaders are compiled, right?
23:48HdkR: karolherbst: Correct
23:50karolherbst: HdkR: I get some weird error in zelda windwaker ...
23:50karolherbst: HdkR: with ubershader async
23:50karolherbst: but I can't test it with mesa master + nouveau, because crashes
23:50karolherbst: basically this intro video after the intro scene
23:54HdkR: Sure, the bit where you wait past the fly over and then a little scrolling intro sequence occurs
23:54HdkR: Just a few quads
23:55karolherbst: for me it is black.. sometimes
23:55karolherbst: and I get shader TRAPs
23:55karolherbst: dunno what is happening there
23:55karolherbst: I am not that desprate to run helgrind yet
23:55HdkR: Does nouveau actively emit traps or something?
23:56HdkR: Because those should be dead simple compared to the rest
23:56HdkR: Probably one TEV stage or something
23:56karolherbst: something is up there
23:56karolherbst: and I don't trigger it with mesa master :/
23:56karolherbst: maybe it isn't the ubershaders
23:57HdkR: In the Dolphin side you can force uber or force specialized to see if it is one or the other at least
23:57karolherbst: HdkR: maybe it is the switching :p
23:58HdkR: Could be. Test all three :P