00:47 anholt_: jekstrand: trying to look at your explicit io series, but kind of lost in how I'm supposed to convert a driver
00:47 anholt_: now I've got this new load_vulkan_descriptor, but it's not clear what it's supposed to do compared to the vulkan_resource_index
00:59 clever: anholt_: how does the full kms driver for vc4, stop the firmware from doing its own updates to the display list registers?
03:03 jekstrand: anholt_: It's 100% valid for it to do nothing.
03:04 jekstrand: anholt_: It gives you a transition point in your code for when the access chain goes from walking blocks to walking through memory.
03:04 jekstrand: anholt_: In ANV, for some types, it turns into "load the pointer from a UBO". For some other stuff, it's a no-op.
03:07 jekstrand: Before, you got a series of resource_index and resource_reindex and then the access chain built an offset.
03:07 jekstrand: Now you get resource_index, resource_reindex, load_descriptor and the result of the load_descriptor is cast to a pointer.
03:07 jekstrand: It's assumed that the load_descriptor takes the result of indexing and turns it into a pointer of some form.
03:07 jekstrand: Whatever "pointer" means.
05:46 airlied: tarceri: when you fixed up tests for the uniform removals did you fix up the CTS GTF tests?
05:46 airlied: or file issues for them?
06:34 tarceri: airlied: yes the fix was merged
06:43 airlied: tarceri: okay I should update my closed repo then hopefully
07:36 MrCooper: kusma: what protocol is ttps:// ? ;)
08:03 pepp: karolherbst: your modified ext_shader_image_load_store piglit test passes on amdgpu-pro; so dropping the -1 hack and fixing the signed/unsigned bug sounds good
09:42 karolherbst: pepp: ahh cool.. sadly the test doesn't work on mesa :/
09:42 pepp: karolherbst: I can work on fixing this unless you want to do it
09:43 karolherbst: there were two issues I had to deal with: 1. the initial test didn't execute anything at all 2. after the modifications the pixels count of the fb was the iteration count
09:43 karolherbst: pepp: if you don't mind, you should do it... my knowledge about all this GL stuff is quite small :p
09:44 karolherbst: pepp: btw.. did you also play around with the size and wrap parameter on amdgpu-pro?
09:44 pepp: karolherbst: not sure I know more than you but I wrote the buggy code so that's kind of expected that I fix it :)
09:44 pepp: not yet
09:46 pepp: karolherbst: so I'll use your changes as a starting point and open a MR for piglit to fix / extend the test (eg: verifying that signed + atomic fails instead of not testing it)
09:47 karolherbst: sounds good
09:55 MrCooper: hakzsam: if Marge hits 'Branch cannot be merged', reassign the MR to her again; if the merge works for you directly, it'll work for her as well (and if she's already processing another MR, no CI resources will be wasted)
09:56 hakzsam: ok
10:13 MrCooper: airlied: "drm fixes for 5.7-rc2", time machine seems to be malfunctioning :)
12:09 karolherbst: anybody ever worked on implementing GL_KHR_shader_subgroup?
14:50 ajax: eric_engestrom: linker script or an objcopy pass look like the only options, yeah
15:46 bbrezillon: jekstrand: I've simplified the i2f lowering pass and moved it to nir_lower_int64 => https://gitlab.freedesktop.org/kusma/mesa/-/merge_requests/148/diffs?commit_id=6c2a23c1dfb94d9c175f09642d0b09626f753525
15:46 bbrezillon: also added a lowering pass for f2i
15:53 karolherbst: bbrezillon: did you look into conversion modes already?
15:54 karolherbst: it's still a bigger item on my todo list.. but CL is just super annoying about this :/
15:54 jenatali: karolherbst: Yeah, daniels added some logic to vtn which builds nir sequences to handle them
15:54 daniels: that's the MR I said you were going to hate :P
15:55 daniels: https://gitlab.freedesktop.org/kusma/mesa/-/merge_requests/118
15:55 daniels: I wrote that about five times in different ways, and didn't like any of them
15:55 karolherbst: daniels: we have native instructions for all of those
15:56 karolherbst: so I prefer adding nir instructions for all of the different rounding modes as well
15:56 daniels: karolherbst: _all_ of them? :o
15:56 karolherbst: yes
15:56 karolherbst: all of them
15:56 daniels: wow
15:56 karolherbst: it's a flag on the cvt instruction
15:56 karolherbst: well.. even alu instructions have it I think
15:56 karolherbst: which CL supported in 1.0 btw
15:56 daniels: so that's fun, guess I get to type up something that generates both NIR and C implementations for every combination
15:57 karolherbst: :p
15:57 karolherbst: be lucky CL 1.1 throw it out
15:57 karolherbst: :D
15:57 jekstrand: bbrezillon: Cool. Thanks!
15:57 karolherbst: *threw
15:57 karolherbst: daniels: yeah.. the problem is.. x86 FPU is just broken here and C code is super annoying to write
15:57 karolherbst: but I mostly did the work already
15:57 karolherbst: let me find it
15:58 daniels:blinks
15:59 jenatali: karolherbst: Sounds like we should've talked to you first :P
15:59 jenatali: Oh well, at least the nir implementations will still be useful/necessary for our driver
15:59 jekstrand: bbrezillon: 10km view looks pretty good
15:59 karolherbst: yeah.. lowering code will be required I guess for some hw or drivers
16:00 bbrezillon: karolherbst: I suggested the same thing (having one op per variant) when reviewing daniels series, but didn't want to push to hard 'cause I feared he would ask me to do it :D
16:00 karolherbst: daniels: that stuff explains it a little: http://wok.oblomov.eu/tecnologia/gpgpu/opencl-rounding-modes/
16:00 karolherbst: "#pragma OPENCL SELECT_ROUNDING_MODE" is what got removed silently
16:00 karolherbst: more or less
16:01 karolherbst: ahh. cl_khr_select_fprounding_mode was the extension..
16:01 karolherbst: https://github.com/KhronosGroup/OpenCL-Docs/blob/master/ext/cl_khr_select_fprounding_mode.asciidoc
16:02 karolherbst: but yeah.. we don't have to deal with it I hope :p
16:04 daniels: good lord
16:04 karolherbst: nv hw supports it :p
16:04 karolherbst: mostly I think..
16:04 karolherbst: not 100% on the alu stuff
16:04 karolherbst: daniels: https://gitlab.freedesktop.org/karolherbst/mesa/-/commits/nouveau_nir_spirv_opencl_v5/
16:05 karolherbst: look for commits with convert in the title
16:05 karolherbst: daniels: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/cf2fd99c9e0f65aa50011333de9d533e18ff08b2 is the main one?
16:05 karolherbst: stuff might be missing
16:05 karolherbst: and outdated
16:05 karolherbst: but it has nice constant folding code :p
16:06 karolherbst: int -> float _rtz ...
16:06 karolherbst: :D
16:06 karolherbst: not even quite sure if I got the CTS to accept everything...
16:07 karolherbst: or my own code to stress test it
16:07 karolherbst: https://github.com/karolherbst/HMM-examples/blob/master/convert_test.c
16:08 karolherbst: https://github.com/karolherbst/HMM-examples/blob/master/convert_test.cl
16:10 daniels: oh, that's interesting - I'd be happy to run with that and see how far I could take it
16:10 daniels: it's not too dissimilar in parts to what I did, and definitely quite similar to one of my initial implementations
16:10 karolherbst: I am sure the commit won't apply cleanly anymore
16:10 daniels: yours seems to be missing a sat impl for const values?
16:10 karolherbst: it's like... old
16:10 karolherbst: yeah..
16:10 karolherbst: I doubt I did sat
16:10 karolherbst: this stuff was complicated enough
16:10 karolherbst: maybe I have some updated patches somewhere thoguh :/
16:11 karolherbst: I mean.. I totally understand why somebody doesn't want to get through the trouble of adding all of that.. just that there is hw out there supporting it natively
16:11 karolherbst: AMD does so as well I think
16:11 karolherbst: intel maybe too
16:12 karolherbst: daniels: anyway.. I don't feel strongly about the _sat variants
16:12 karolherbst: _sat lowering is.. trivial compared to lowering of rounding modes
16:13 karolherbst: rounding modes can just easily let everything explode inside the kernels
16:31 danvet: mlankhorst, I think drm-misc backmerge once airlied has processed would be good
16:31 danvet: airlied, it's on fire with build time conflicts, see sfr in case you missed
16:32 danvet: mlankhorst, also there's a huge patch series for dma mapping fixes from marek, and that needs -rc1
16:34 bbrezillon: karolherbst: BTW, I assumed u2f rounding mode is round-nearest-even
16:35 karolherbst: bbrezillon: "Use the default rounding mode for this destination type, _rtz for conversion to integers or the default rounding mode for conversion to floating-point types."
16:35 bbrezillon: hm
16:35 karolherbst: yeah...
16:35 karolherbst: CL has a default rounding mode :/
16:35 karolherbst: it sucks
16:36 karolherbst: bbrezillon: see cl_device_fp_config
16:36 karolherbst: but I don't know if one can change it actually...
16:37 bbrezillon: yes, but I'd expect it to be vtn's responsibility to emit the right op
16:37 jenatali: No, it's just reported by the device
16:37 bbrezillon: the right NIR op I mean
16:37 karolherbst: jenatali: okay... CL has a couple of those "default" thingies in the API and I always fear that something is configurable...
16:38 karolherbst: but this is also means that somebody thought at some point of adding a toggle ...
16:38 jenatali: For the full profile, the mandated minimum floating-point capability for devices that are not of type CL_​DEVICE_​TYPE_​CUSTOM is: CL_​FP_​ROUND_​TO_​NEAREST | CL_​FP_​INF_​NAN.
16:39 bbrezillon: My question was more, what do we expect nir_u2f to do with regards to rounding
16:39 bbrezillon: ?
16:39 karolherbst: bbrezillon: maybe we should just name them all explicitly
16:39 karolherbst: and emit the correct thing
16:40 karolherbst: in vtn
16:40 bbrezillon: is it "do as the default says" or is it implicitly "rtne"
16:40 jenatali: karolherbst: I think the "default" rounding mode is that #pragma that you mentioned
16:40 bbrezillon: karolherbst: yep, I think that'd be preferable
16:40 karolherbst: jenatali: that pragma doesn't exist
16:41 bbrezillon: karolherbst: at the same time, I don't want to modify all the code using the non-specialized converters :-(
16:42 bbrezillon: unless that's a mechanical s/nir_x2f/nir_x2f_rnte/ change
16:42 karolherbst: probably..
16:42 karolherbst: or we keep the name and just handle it
16:43 karolherbst: we have it explicit for fp16 though
16:43 jenatali: Default rounding mode is rte
16:43 bbrezillon: jenatali: not sure I understand what the vectorizer test expect (build failure on my MR)
16:43 bbrezillon: maybe I should drop those opt passes
16:44 jenatali: bbrezillon: Looks like something got copy-prop optimized when I took a quick look, but I didn't compare to the before of that test to see what it used to look like
16:44 bbrezillon: the only one I need is '(('unpack_64_2x32', ('pack_64_2x32_split', a, b)), ('vec2', a, b))'
16:45 karolherbst: mhh/
16:45 karolherbst: I remember clinfo reporting it differently in the past.. weird
16:45 karolherbst: oh well
16:45 jenatali: karolherbst: From the up-to-date spec: Round to nearest even is currently the only rounding mode required by the OpenCL specification for single precision and double precision operations and is therefore the default rounding mode.
16:46 karolherbst: jenatali: yeah.. I was more thinking that devices _could_ report a different rounding mode as the default one
16:46 karolherbst: but maybe I am mistaken
16:46 jenatali: karolherbst: Not for full devices at least
16:51 karolherbst: yeah.. seems like it's not configureable.. okay, cool
16:51 karolherbst: and I'd really ignore this silly pragme unless something really uses it
16:52 karolherbst: only think is I'd prefer to not have to emit more than two ops for the conversions :p
16:52 karolherbst: and I think we can implement very in either one or two conversion ops
16:52 karolherbst: nvidia hw doesn't support directl fp64 to int8 conversions eg.. so it has to be split up in two
16:53 karolherbst: like fp64 -> int32 -> int8
16:53 bbrezillon: karolherbst: panfrost can only do one step at a time IIRC
16:53 bbrezillon: s/panfrost/midgard/
16:53 karolherbst: bbrezillon: okay.. but at least it supports all the modes and sat, right?
16:53 bbrezillon: so it's probably a pretty common limitation
16:53 karolherbst: so you'd have 4 at most
16:54 bbrezillon: that I didn't check, but I remember alyssa mentioning it does support round and sat modifiers
16:54 karolherbst: ahh yeah
16:54 karolherbst: I remember now
16:55 karolherbst: so yeah, there is support for it :p
16:55 karolherbst: it does make sense to have it native in hw if the hw is for compute workloads
16:57 karolherbst: bbrezillon: I am actually wondering if we want to add a rounding mode field on nir_alu_instr....
16:58 karolherbst: it seems like we have a round mode on all float alu instructions
16:58 karolherbst: all 4 modes even
16:58 karolherbst: well.. 8
16:58 karolherbst: :p
16:59 karolherbst: but you can ignore the other 4
16:59 bbrezillon: I count only 4
16:59 bbrezillon: what are the other 4?
16:59 karolherbst: int
16:59 karolherbst: round to X int
16:59 bbrezillon: ah
17:00 bbrezillon: I thought those were separate instructions
17:00 karolherbst: it's for alu
17:00 karolherbst: like you do a fadd and want it to have it rounded to int
17:00 karolherbst: so you get 4.0 instead of 4.2 or something
17:01 bbrezillon: hm, so HW support that as a rounding mode, interesting
17:01 karolherbst: yeah
17:01 karolherbst: I assume it's super cheap to implement
17:01 karolherbst: and it just takes a bit in the encoding space
17:02 karolherbst: I am not sure how well it's supported.. just that there are some bits in the ISA for that
17:02 karolherbst: like it seems like we don't emit it for fadd, but fmul/ffma
17:03 karolherbst: ehh.. even quadops have a rounding mode
17:03 karolherbst: interesting
17:14 karolherbst: after some thinking I think we need to be a bit more explicit about it ...
17:14 karolherbst: so a rounding mode on all alu instructions is not a good idea
19:01 airlied: MrCooper: doh, I knew sending that I'd sone something wrong, but my brain wasn't cooperating
19:49 vivijim: agd5f_: airlied: drm-tip build breaks with some broader config...
19:50 vivijim: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:1357:2: error: implicit declaration of function ‘drm_gem_object_put_unlocked
19:50 vivijim: offending patch: fd9a9f8801de ("drm/amdgpu: Use GEM obj reference for KFD BOs")
19:55 danvet: vivijim, it's a merge conflict
19:55 danvet: airlied should be fixing it when pulling in the drm-misc-next pull from mlankhorst
19:56 airlied: yeah that'll be Monday because I don't hate myself that much
19:59 vivijim: cool then... no rush
20:22 mdnavare: hwentlan: Coould you or Nicholas take a look at this patch: https://patchwork.freedesktop.org/patch/371201/?series=78279&rev=3 , we have pulled out the vrr_range from amdgpu and made it common in drm_debugfs
20:28 buckley310: Ever since my distro upgraded from Mesa 20.0.2 to 20.0.7, my GPU locks up whenever I open Steam. Downgrading to 20.0.2 fixes it, but upgrading to 20.1.1 doesn't make a difference. issues/3132 looks like a similar hang, I also use a 5700XT, but the trigger conditions are very different. What should I do D: (http://ix.io/2pCg)
20:33 Lyude: buckley310: #radeon can probably help a bit more here, and you should probably also file a bug here (not sure where the proper issue hub for radeon is these days, but they can probably tell you)
20:33 buckley310: ok ill ask over there. thanks
20:35 buckley310: so if i should file a mesa bug, should i include any logs other than the linked text?
20:41 Lyude: buckley310: yeah but I don't know exactly what yet (haven't worked on mesa in a while), that log is probably a good start at least
20:50 karolherbst: jekstrand: is there a way of converting loops in nir so that each loop has no explicit continue clauses?
20:50 karolherbst: I think I need it for volta+
20:51 anholt_: I don't think there's an existing pass for that.
20:52 karolherbst: mhhh...
20:52 karolherbst: the issue is, we have only plain jumps
20:52 karolherbst: and we need to handle all divergency ourself
20:52 karolherbst: no special jumps for breaks/conts or anything
20:53 karolherbst: and we need to setup cfg barriers to sync on
20:53 karolherbst: so if you have a loop with 3 explicit continues, you'd have to set up 4 barriers, one for each contniue and one to sync up after the loop
20:54 karolherbst: and converting a loop with multiple continues into multiple loops with none would make that kind of easier
20:54 karolherbst: so we only need to set the barrier before the loop
20:54 karolherbst: and sync on it after the loop
21:04 karolherbst: mhh.. but that would make breaks quite annoying to handle...
21:04 karolherbst: mhhh
21:12 jekstrand: karolherbst: It shouldn't be too hard to handle. It'd be roughly the same algorithm we use for handling early returns in functions
21:22 jekstrand: But likely predicating everything left in the loop isn't actually what you want. That'd get you some very messy control-flow.
22:19 karolherbst: jekstrand: yeah.. I think I will just insert some fake blocks when converting to our backend IR and insert the barriers there..
22:19 karolherbst: otherwise I'd have to deal with the loop breaks and make it break multiple loops at once
22:19 karolherbst: and stuff...
22:21 karolherbst: and I already know where all breaks/continues are.. so that shouldn't be a big issue