02:07imirkin: karolherbst: which is what the from_tgsi logic does ;)
02:10karolherbst: yeah right, but we have a slightly different situation with nir here. With tgsi we insert the exports after parsing the TGSI instructions. In nir those are part of the instructions and might be there twice or more (the exports)
02:10karolherbst: it shouldn't matter that much though
02:11imirkin: karolherbst: but you can stage the outputs in nir just like you can in tgsi
02:11imirkin: it's really the same thing
02:11karolherbst: no, how would I be able to? for that I would have to detect those and remove the duplicates in nir
02:12imirkin: just write them to a temp set of values
02:12imirkin: and then in the final block, export them.
02:12karolherbst: they are part of the instructions
02:12karolherbst: they are literally instructions
02:12karolherbst: intrinsic store_output
02:12karolherbst: it is there twice
02:13karolherbst: we don't have it with TGSI like this
02:13imirkin: and each time, that becomes mov tmparea, the-value
02:13imirkin: and then in the final exit block, you do export tmparea
02:13imirkin: i really don't see what the issue is
02:15karolherbst: mhh, then phi nodes would be added and all the other stuff, right?
02:15karolherbst: well right, I can do it like this, and I also need a workaround for the successor block
02:15karolherbst: because that ret block, has no next ;)
02:15imirkin: right, so like i said, you need a final exit block
02:16imirkin: or you can add the code every time there is an exit from the main function
02:16imirkin: but i think that kinda sucks
02:16karolherbst: I still need to create a fake edge between the last block of the loop and the tail block
02:16karolherbst: otherwise the graph code complains
02:16karolherbst: but having the last block be detached also makes it crash
02:16imirkin: yeah, the return has to go to the tail
02:16imirkin: i had to fix that for the tgsi thing too
02:17karolherbst: I already have a solution for this, I just don't movve the exports currently
02:17imirkin: that sure was fun to track down in a giant enormous program
02:17imirkin: iirc i added asserts which made the problem easier to notice
02:17karolherbst: well, in the end I saw that the other blocks weren't there
02:17karolherbst: but with my old code, ra was happy
02:17imirkin: commit 52b68375aeaa1ff6bca48eb833176d3498aa48f7
02:17imirkin: and commit adcc547bfbef362067bb3b4e3aee75b287bc6189
02:18karolherbst: we could make dce smarter to eliminate those dead blocks
02:18karolherbst: the instructions
02:18karolherbst: I already wondered what that is about
02:18karolherbst: well, the message didn't help, but I kind of knew why it happened
02:19imirkin: but before the assert was there, it was much harder to figure out why it was randomly crashign in the middle of random stuff
02:20karolherbst: here is a comparison between what tgsi creates and what I currently do: https://gist.github.com/karolherbst/3363847c8137d3e9180423fc142cc500
02:20karolherbst: wondering if I should bring that preret/ret back
02:20karolherbst: I had it once
02:20karolherbst: but it felt kind of pointless
02:21karolherbst: but not with the clip stuff it kind of makes sense again
02:21karolherbst: but do you see how BB:3 is pretty much dead code in both versions?
02:22karolherbst: mhh interesting
02:22karolherbst: in the TGSI there is no NOP
02:22karolherbst: in the nir I have this: "vec4 32 ssa_15 = undefined"
02:23karolherbst: and this value is used in BB:3
02:25imirkin: that's fairly common
02:25imirkin: when you have loops and whatnot
02:25karolherbst: yeah I know
02:25karolherbst: but it isn't there in the tgsi
02:25imirkin: something may be falsely assigned to it, dunno
02:26karolherbst: they use the same value simply in the tGSI
02:26karolherbst: but yeah
02:26karolherbst: we might be able to tweak DCE to remove the dead code here as well
02:27karolherbst: but this is more like finding dead blocks through the CFG, right?
02:27karolherbst: which is annoying due to the fake edge we add...
02:58karolherbst: right, back to that clip thing
02:58karolherbst: or rather sleep
14:20Exterminador: hello guys.. today, I've run "dmesg" command on my Ubuntu 17.10, and I've found something that i can't understand (newbie here). it's in this paste http://paste.ubuntu.com/26194676/ on line 871 (I've pasted the entire dmesg output just in case)
15:26imirkin: Exterminador: yeah, it's an issue that became easy to hit with whatever some newer distros ship
15:26imirkin: Exterminador: i believe it should be fixed with a newer kernel
15:28imirkin: Exterminador: either way, that's not an error, it's a warning that sad performance is ahead
15:29imirkin: Exterminador: actually the fix i had in mind only affects newer GPUs
15:30imirkin: either way, it's not a critical error
15:40Exterminador: oh. probably because I run an old laptop already and the HDD must be some way dying too
15:40imirkin: well, this is unrelated
15:41imirkin: it's related to how we manage PRAMIN
15:41imirkin: i really do think this changed a bit recently, but i can't find the relevant commit
15:42imirkin: but basically with wayland, more things end up with a GL context, which in turn holds on to a PRAMIN area iirc, and we run out of space
15:42imirkin: and have to fall back to a different type of access
15:43imirkin: ideally there'd be an LRU or something but ... i don't think there is
15:53Exterminador: well, since it's not a critical error, I guess I don't need to be worried. but the funny is that Xubuntu don't have that error (at least didn't saw it)
15:55imirkin: yeah, it's an issue brought on by wayland i believe
15:58Exterminador: as long as it works with major issues, I'm good.
15:58Exterminador: I like always to have the latest release, but sometimes things get buggy, I know
16:00imirkin: yeah. unfortunately recent software likes to make use of features that aren't well-supported by nouveau
16:00imirkin: for some reason everyone has started to feel the need to use opengl for accelerating trivial programs
16:00imirkin: which just leads to more opportunities for failure
16:01imirkin: i'd strongly recommend AMD for your next purchase.
16:03Exterminador: I'll take that in mind. thanks for the heads up
16:14karolherbst: imirkin: is the pramin region fixed in size?
16:14karolherbst: well, that reserved area
16:15karolherbst: because maybe it would make sense to make it increase if it is too small for a context
16:55AndrewR: hi ... should I file bug about virgl regression on nouveau-powered host system? (unfortunately, last working version was ..long time ago, just retested with Mesa 17.4.0-devel (git-96fc5fbf23) on both host and guest (qemu 2.11) - and bam - no more glxgears, or any gl programs.....
16:56karolherbst: AndrewR: I would ask the virgl devs
16:56karolherbst: except you have no opengl on your host
16:56AndrewR: karolherbst, I have!
16:57karolherbst: well if it works on the host, but not in the guest, then I don't know what is wrong. I would ask the virgl devs
16:58AndrewR: karolherbst, I hope someone on #dri-devel test it from time to time ..but I think airlied also has #nouveau open..so i asked here first. Not really worst bug, just ..may be something few noticed
17:01karolherbst: I doubt it is nouveaus fault if it doesn't work in the guest
17:01karolherbst: or not reall
17:02karolherbst: because you don't use Nouveau components inside the guest, right?
17:02imirkin: AndrewR: yeah, file a bug if you have a trace
17:03karolherbst: AndrewR: well to be more precise here, wahat is the issue you see, just nothing rendered or no context at all?
17:03imirkin: esp if that trace renders with llvmpipe
17:16AndrewR: imirkin, good idea ...if trace will be no more than few GBs (a bit short on tmpfs..and disk space)
17:16AndrewR: https://bugs.freedesktop.org/show_bug.cgi?id=104291 - started ....
17:19imirkin: karolherbst: i may have an afternoon to do ... stuff. should i use it to write a nir -> nvir adapter, or are you good?
17:19karolherbst: I am good
17:19karolherbst: I just use this to learn more about the nvir stuff as well
17:19karolherbst: and a lot of things are already working quite fine
17:20karolherbst: just features are missing
17:20karolherbst: thanks for asking though
17:20karolherbst: imirkin: but if you want, you can take a look at what I did and give comments on that
17:20karolherbst: that would help
17:21karolherbst: but it isn't in a state I would send out to the ML
17:28imirkin: not really ... i'd rather not look now
17:29imirkin: since i suspect you're at like ... the 10% line
17:30karolherbst: sounds about right
17:30imirkin: perhaps i'll re-re-re-re-re-try to do bindless. 10th time is going to be a charm.
17:30karolherbst: you could take a look at some pending patches maybe
17:31karolherbst: there is still this textureGrad thing
17:31karolherbst: or did you pushed it?
17:31imirkin: yeah that's no fun.
17:31karolherbst: well, maybe still push it so that's at least fixed?
17:31karolherbst: or would you rather want to understand it before doing it?
17:32imirkin: i think i'll just push it.
17:32imirkin: with a giant comment of "it should work this other way too, but for some reason that's unclear, it doesn't. oh, and blob does it this way too"
17:32karolherbst: sounds good
17:32karolherbst: I have no idea who uses textureGrad in a way, that it breaks becuase of this issue
17:32karolherbst: and to be honest
17:33karolherbst: I don't want to debug any application having this issue
17:33karolherbst: becuase I feel it will be painful until we find it uses textureGrad or so
17:33karolherbst: ohh and maybe the fp64 stuff as well?
17:34karolherbst: other then that there is nothing in my cts branch which might be ready
17:35imirkin: yeah, fp64 stuff is a good option too
17:35karolherbst: I don't know if I sent this thing out: https://github.com/karolherbst/mesa/commit/2abf0999dbe1e876c4e58c1541144cda3f5be1ac
17:35imirkin: the textureGrad thing is a *very* minor correctness issue though. careful tests can find it, but a normal app won't care.
17:35karolherbst: ahh, okay
17:35karolherbst: so mainly we fix it for the CTS
17:35imirkin: you did not
17:36imirkin: that one sounds bad....
17:36imirkin: so we mess up when a phi refers to a phi? hm
17:36imirkin: that's unfortunate!
17:36karolherbst: no clue, I don't remember writing that patch
17:36imirkin: perhaps a robber came in the middle of the night
17:37imirkin: commited it in your tree
17:37imirkin: and vanished into the darkness
17:37karolherbst: Jun 10th...
17:37karolherbst: that might explain
17:37imirkin: if you have the source shader that causes it
17:37imirkin: i'd like to at least be able to investigate what really goes wrong
17:38karolherbst: yeah, trying to get it
17:38karolherbst: currently too lazy to go to my other laptop, so I just download the game again and see if I can hit it here
17:38imirkin: it does seem like the CSE thing could mess up in that case
17:38imirkin: however it's unclear to me that your solution is correct
17:38karolherbst: maybe i even pushed the sahders...
17:41karolherbst: "nvc0_program_translate:610 - shader translation failed: -4" at least something
17:43karolherbst: "Error in Graph:createEdge: edge already exists"
17:52imirkin: the shader would be nice... file a bug if posisble
17:53karolherbst: imirkin: https://gist.githubusercontent.com/karolherbst/f244241adda4141bbebdb11cadaca3db/raw/06f2fcf2b15e2ac827e5cab4b78dbd2717a08c31/gistfile1.txt
17:54imirkin: bug would be great so i don't forget
17:54imirkin: or rather, so that when i inevitably forget, we haven't lost the info :)
17:55imirkin: i wonder what those #prgram's do
17:57karolherbst: ohh yeah
17:57imirkin: #pragma's i mean
17:57karolherbst: I could imagine they tweak the optimisation things or so
17:57imirkin: although search results suggest that the answer is 'nothing'
17:57karolherbst: the same goes for gccs inline, right?
17:57karolherbst: but still a lot of people thing it does
17:57imirkin: nah, inline does stuff
17:58imirkin: it's not guaranteed to do stuff
17:58karolherbst: more or less
17:58imirkin: but it's an indication to gcc that weighs in its calculation of whether to inline or not
17:58karolherbst: oh, it is actually? okay
17:58karolherbst: I thought it is pointless except you specificly tell gcc to not inline itself
17:58imirkin: AndrewR: oh, i know that crash
17:59karolherbst: nice, I got that clip thing working :)
17:59imirkin: AndrewR: er hm... maybe not....
17:59karolherbst: yeah, will create a bug report soonish
18:01AndrewR: imirkin, may be you fixed something similar in not too distant past (on nv43)
18:02imirkin: well, the issue i remember is where the clear happens *very* early
18:02imirkin: and upsets some stuff inside the driver
18:02imirkin: however this is inside the state tracker
18:03AndrewR: imirkin, I think modesetting driver also uses GL with virgl nowadays ...? and I have both qt3 and cairo (for gtk) built with gl support
18:04imirkin: yes, that would be the case
18:04imirkin: but things that happen inside the guest use virgl, not nouveau
18:05imirkin: virgl then produces a data stream that is passed to the host
18:05imirkin: and virglrenderer converts that into GL calls to the host's GL driver
18:05imirkin: so any crashes in the guest are in the virgl path, not nouveau path
18:05imirkin: if qemu crashes, that's a different matter
18:05imirkin: (or rather, virglrenderer)
18:09imirkin: CRAP. of course when reviewing the textureGrad thing i realized it's buggy. of course.
18:10imirkin: why do i bother reviewing my shitty code if i know i'm just going to be disappointed :( should just push it as-is and be blissfully unaware of its bugs.
18:12imirkin: (the issue, of course, is now that everything's based in lane0, i don't broadcast the depth-compare and offsets to lane0)
18:17imirkin: oh nice. offsets have to be const.
18:17imirkin: so it's just the shadow ref
19:00karolherbst: imirkin: mhh interesting
19:01imirkin: i think i know why the old code was broken
19:01imirkin:goes to hack
19:02imirkin: gah. the problem was staring me in the face this whole time
19:03imirkin: we broadcast new coordinates to the other lanes
19:03imirkin: which was good
19:03imirkin: but we neglected to broadcast the array/depth values to the other lanes
19:03imirkin: so they still used their local values
19:03imirkin: which was bad.
19:03imirkin: so now the question is ...
19:03imirkin: am i better off with this l0 approach, or better off fixing the old approach
19:05imirkin: and the answer is yes - we wouldn't have to broadcast back into the proper lane.
19:05karolherbst: mhh, wondering why nvidia does it their way then
19:06imirkin: wellll ... i wonder if their way is somehow better for non-frag stages. dunno.
19:11imirkin: my fix didn't fix it
19:12imirkin: oh, that's coz i'm a typoing fool
19:16karolherbst: guess I can't do things like this: " ld u32 %r297d c0[%r296+0x40]" :)
19:17imirkin: types really should match
19:18karolherbst: ahh "vec3 64 ssa_168 = intrinsic load_uniform"
19:18karolherbst: it has a 64 and not a 32 :)
19:19karolherbst: does it matter for loads if it is a f32 or u32?
19:19karolherbst: I know I should prefer u32, but just asking
19:21imirkin: can't imagine that it would
19:22imirkin: but i suppose some things could be hard-coded to look for U32
19:22imirkin: doubt it though
19:31karolherbst: I should have think more about the type stuff... have some more issues regarding it "mul u64 %r296 %r294 %r295"
19:32imirkin: where are you getting 64-bit muls from?
19:32karolherbst: no, that was my mistake
19:32karolherbst: by fixing the load, I put the same dType for the mul as well
19:33karolherbst: but I can get 64bit things now, because I kind of moved on to all glsl tests (-gs -cs)
19:33karolherbst: I couldn't find any fundamental issue with the prior tests anymore
19:35karolherbst: now I get an assert inside setup_non_interleaved_attribs.... I think I will ignore this one for now
21:52imirkin: mwk: do you have a clear understanding of what quadon/quadpop do?
22:16imirkin: volta isa: http://docs.nvidia.com/cuda/cuda-binary-utilities/index.html#volta
22:17imirkin: fun additions like "integer dot product and accumulate"
22:17imirkin: wonder wtf that does...
22:18karolherbst: "P40 also accelerates INT8 vector dot products (IDP2A/IDP4A instructions), with a peak throughput of 47.0 INT8 TOP/s."
22:18karolherbst: okay, tensor was f16 float...
22:20karolherbst: it is for deep learning
22:22karolherbst: but I am wondering why they didn't add it with the pascal isa...
22:24imirkin: oh, so it's a vector thing
22:25karolherbst: so then it appears that either Volta or a P40 has this instruction
22:26karolherbst: wondering if other GP102 also have this, or GP100 for that matter
22:35karolherbst: imirkin: do I have to create merges in the input IR for 64bit values? Or is it okay if I load 64bit values directly?
22:36imirkin: if it's a 64-bit value you can use it
22:36imirkin: if it's 2 32-bit values you have to use a merge to make a 64-bit value out of it
22:37karolherbst: that explains it
22:38karolherbst: mhh, something is wrong in my uniform loading code
22:38imirkin: and if it's a 64-bit value that you want to use as a 32-bit, you need to use a split
22:38imirkin: bld.mkSplit helps with that
22:39karolherbst: yeah, I am aware of that. I was just wondering if it is okay to use 64bit values in the first place without merges, because in the example I have here the TGSI one creates a lot of merges
22:39karolherbst: and in nir I just know that a value is 64bit
22:40karolherbst: ohhh, wait
22:40karolherbst: I think I found the issue
22:40karolherbst: "ld u64 %r10d c0[0x0]" + "ld u64 %r11d c0[0x4]" ;)
22:44imirkin: that can't work
22:44imirkin: the const offset has to be aligned
22:44imirkin: i.e. if you do a 64-bit load, it has to be aligned-to-8
22:44imirkin: or else you'll get yelled at
22:44imirkin: it's best to just do 32-bit loads and let MemoryOpt do its thing
22:44imirkin: you also get the load propagation benefits
22:45karolherbst: yeah I know
22:45karolherbst: I just need to adjust the nir offset correctly
22:45karolherbst: it was only correct for 32bit values until now
22:46karolherbst: for each component I did something like this: "(offset * 4 + i) * 4)"