00:47anholt_: jekstrand: trying to look at your explicit io series, but kind of lost in how I'm supposed to convert a driver
00:47anholt_: now I've got this new load_vulkan_descriptor, but it's not clear what it's supposed to do compared to the vulkan_resource_index
00:59clever: anholt_: how does the full kms driver for vc4, stop the firmware from doing its own updates to the display list registers?
03:03jekstrand: anholt_: It's 100% valid for it to do nothing.
03:04jekstrand: anholt_: It gives you a transition point in your code for when the access chain goes from walking blocks to walking through memory.
03:04jekstrand: anholt_: In ANV, for some types, it turns into "load the pointer from a UBO". For some other stuff, it's a no-op.
03:07jekstrand: Before, you got a series of resource_index and resource_reindex and then the access chain built an offset.
03:07jekstrand: Now you get resource_index, resource_reindex, load_descriptor and the result of the load_descriptor is cast to a pointer.
03:07jekstrand: It's assumed that the load_descriptor takes the result of indexing and turns it into a pointer of some form.
03:07jekstrand: Whatever "pointer" means.
05:46airlied: tarceri: when you fixed up tests for the uniform removals did you fix up the CTS GTF tests?
05:46airlied: or file issues for them?
06:34tarceri: airlied: yes the fix was merged
06:43airlied: tarceri: okay I should update my closed repo then hopefully
07:36MrCooper: kusma: what protocol is ttps:// ? ;)
08:03pepp: karolherbst: your modified ext_shader_image_load_store piglit test passes on amdgpu-pro; so dropping the -1 hack and fixing the signed/unsigned bug sounds good
09:42karolherbst: pepp: ahh cool.. sadly the test doesn't work on mesa :/
09:42pepp: karolherbst: I can work on fixing this unless you want to do it
09:43karolherbst: there were two issues I had to deal with: 1. the initial test didn't execute anything at all 2. after the modifications the pixels count of the fb was the iteration count
09:43karolherbst: pepp: if you don't mind, you should do it... my knowledge about all this GL stuff is quite small :p
09:44karolherbst: pepp: btw.. did you also play around with the size and wrap parameter on amdgpu-pro?
09:44pepp: karolherbst: not sure I know more than you but I wrote the buggy code so that's kind of expected that I fix it :)
09:44pepp: not yet
09:46pepp: karolherbst: so I'll use your changes as a starting point and open a MR for piglit to fix / extend the test (eg: verifying that signed + atomic fails instead of not testing it)
09:47karolherbst: sounds good
09:55MrCooper: hakzsam: if Marge hits 'Branch cannot be merged', reassign the MR to her again; if the merge works for you directly, it'll work for her as well (and if she's already processing another MR, no CI resources will be wasted)
10:13MrCooper: airlied: "drm fixes for 5.7-rc2", time machine seems to be malfunctioning :)
12:09karolherbst: anybody ever worked on implementing GL_KHR_shader_subgroup?
14:50ajax: eric_engestrom: linker script or an objcopy pass look like the only options, yeah
15:46bbrezillon: jekstrand: I've simplified the i2f lowering pass and moved it to nir_lower_int64 => https://gitlab.freedesktop.org/kusma/mesa/-/merge_requests/148/diffs?commit_id=6c2a23c1dfb94d9c175f09642d0b09626f753525
15:46bbrezillon: also added a lowering pass for f2i
15:53karolherbst: bbrezillon: did you look into conversion modes already?
15:54karolherbst: it's still a bigger item on my todo list.. but CL is just super annoying about this :/
15:54jenatali: karolherbst: Yeah, daniels added some logic to vtn which builds nir sequences to handle them
15:54daniels: that's the MR I said you were going to hate :P
15:55daniels: I wrote that about five times in different ways, and didn't like any of them
15:55karolherbst: daniels: we have native instructions for all of those
15:56karolherbst: so I prefer adding nir instructions for all of the different rounding modes as well
15:56daniels: karolherbst: _all_ of them? :o
15:56karolherbst: all of them
15:56karolherbst: it's a flag on the cvt instruction
15:56karolherbst: well.. even alu instructions have it I think
15:56karolherbst: which CL supported in 1.0 btw
15:56daniels: so that's fun, guess I get to type up something that generates both NIR and C implementations for every combination
15:57karolherbst: be lucky CL 1.1 throw it out
15:57jekstrand: bbrezillon: Cool. Thanks!
15:57karolherbst: daniels: yeah.. the problem is.. x86 FPU is just broken here and C code is super annoying to write
15:57karolherbst: but I mostly did the work already
15:57karolherbst: let me find it
15:59jenatali: karolherbst: Sounds like we should've talked to you first :P
15:59jenatali: Oh well, at least the nir implementations will still be useful/necessary for our driver
15:59jekstrand: bbrezillon: 10km view looks pretty good
15:59karolherbst: yeah.. lowering code will be required I guess for some hw or drivers
16:00bbrezillon: karolherbst: I suggested the same thing (having one op per variant) when reviewing daniels series, but didn't want to push to hard 'cause I feared he would ask me to do it :D
16:00karolherbst: daniels: that stuff explains it a little: http://wok.oblomov.eu/tecnologia/gpgpu/opencl-rounding-modes/
16:00karolherbst: "#pragma OPENCL SELECT_ROUNDING_MODE" is what got removed silently
16:00karolherbst: more or less
16:01karolherbst: ahh. cl_khr_select_fprounding_mode was the extension..
16:02karolherbst: but yeah.. we don't have to deal with it I hope :p
16:04daniels: good lord
16:04karolherbst: nv hw supports it :p
16:04karolherbst: mostly I think..
16:04karolherbst: not 100% on the alu stuff
16:04karolherbst: daniels: https://gitlab.freedesktop.org/karolherbst/mesa/-/commits/nouveau_nir_spirv_opencl_v5/
16:05karolherbst: look for commits with convert in the title
16:05karolherbst: daniels: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/cf2fd99c9e0f65aa50011333de9d533e18ff08b2 is the main one?
16:05karolherbst: stuff might be missing
16:05karolherbst: and outdated
16:05karolherbst: but it has nice constant folding code :p
16:06karolherbst: int -> float _rtz ...
16:06karolherbst: not even quite sure if I got the CTS to accept everything...
16:07karolherbst: or my own code to stress test it
16:10daniels: oh, that's interesting - I'd be happy to run with that and see how far I could take it
16:10daniels: it's not too dissimilar in parts to what I did, and definitely quite similar to one of my initial implementations
16:10karolherbst: I am sure the commit won't apply cleanly anymore
16:10daniels: yours seems to be missing a sat impl for const values?
16:10karolherbst: it's like... old
16:10karolherbst: I doubt I did sat
16:10karolherbst: this stuff was complicated enough
16:10karolherbst: maybe I have some updated patches somewhere thoguh :/
16:11karolherbst: I mean.. I totally understand why somebody doesn't want to get through the trouble of adding all of that.. just that there is hw out there supporting it natively
16:11karolherbst: AMD does so as well I think
16:11karolherbst: intel maybe too
16:12karolherbst: daniels: anyway.. I don't feel strongly about the _sat variants
16:12karolherbst: _sat lowering is.. trivial compared to lowering of rounding modes
16:13karolherbst: rounding modes can just easily let everything explode inside the kernels
16:31danvet: mlankhorst, I think drm-misc backmerge once airlied has processed would be good
16:31danvet: airlied, it's on fire with build time conflicts, see sfr in case you missed
16:32danvet: mlankhorst, also there's a huge patch series for dma mapping fixes from marek, and that needs -rc1
16:34bbrezillon: karolherbst: BTW, I assumed u2f rounding mode is round-nearest-even
16:35karolherbst: bbrezillon: "Use the default rounding mode for this destination type, _rtz for conversion to integers or the default rounding mode for conversion to floating-point types."
16:35karolherbst: CL has a default rounding mode :/
16:35karolherbst: it sucks
16:36karolherbst: bbrezillon: see cl_device_fp_config
16:36karolherbst: but I don't know if one can change it actually...
16:37bbrezillon: yes, but I'd expect it to be vtn's responsibility to emit the right op
16:37jenatali: No, it's just reported by the device
16:37bbrezillon: the right NIR op I mean
16:37karolherbst: jenatali: okay... CL has a couple of those "default" thingies in the API and I always fear that something is configurable...
16:38karolherbst: but this is also means that somebody thought at some point of adding a toggle ...
16:38jenatali: For the full profile, the mandated minimum floating-point capability for devices that are not of type CL_DEVICE_TYPE_CUSTOM is: CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN.
16:39bbrezillon: My question was more, what do we expect nir_u2f to do with regards to rounding
16:39karolherbst: bbrezillon: maybe we should just name them all explicitly
16:39karolherbst: and emit the correct thing
16:40karolherbst: in vtn
16:40bbrezillon: is it "do as the default says" or is it implicitly "rtne"
16:40jenatali: karolherbst: I think the "default" rounding mode is that #pragma that you mentioned
16:40bbrezillon: karolherbst: yep, I think that'd be preferable
16:40karolherbst: jenatali: that pragma doesn't exist
16:41bbrezillon: karolherbst: at the same time, I don't want to modify all the code using the non-specialized converters :-(
16:42bbrezillon: unless that's a mechanical s/nir_x2f/nir_x2f_rnte/ change
16:42karolherbst: or we keep the name and just handle it
16:43karolherbst: we have it explicit for fp16 though
16:43jenatali: Default rounding mode is rte
16:43bbrezillon: jenatali: not sure I understand what the vectorizer test expect (build failure on my MR)
16:43bbrezillon: maybe I should drop those opt passes
16:44jenatali: bbrezillon: Looks like something got copy-prop optimized when I took a quick look, but I didn't compare to the before of that test to see what it used to look like
16:44bbrezillon: the only one I need is '(('unpack_64_2x32', ('pack_64_2x32_split', a, b)), ('vec2', a, b))'
16:45karolherbst: I remember clinfo reporting it differently in the past.. weird
16:45karolherbst: oh well
16:45jenatali: karolherbst: From the up-to-date spec: Round to nearest even is currently the only rounding mode required by the OpenCL specification for single precision and double precision operations and is therefore the default rounding mode.
16:46karolherbst: jenatali: yeah.. I was more thinking that devices _could_ report a different rounding mode as the default one
16:46karolherbst: but maybe I am mistaken
16:46jenatali: karolherbst: Not for full devices at least
16:51karolherbst: yeah.. seems like it's not configureable.. okay, cool
16:51karolherbst: and I'd really ignore this silly pragme unless something really uses it
16:52karolherbst: only think is I'd prefer to not have to emit more than two ops for the conversions :p
16:52karolherbst: and I think we can implement very in either one or two conversion ops
16:52karolherbst: nvidia hw doesn't support directl fp64 to int8 conversions eg.. so it has to be split up in two
16:53karolherbst: like fp64 -> int32 -> int8
16:53bbrezillon: karolherbst: panfrost can only do one step at a time IIRC
16:53karolherbst: bbrezillon: okay.. but at least it supports all the modes and sat, right?
16:53bbrezillon: so it's probably a pretty common limitation
16:53karolherbst: so you'd have 4 at most
16:54bbrezillon: that I didn't check, but I remember alyssa mentioning it does support round and sat modifiers
16:54karolherbst: ahh yeah
16:54karolherbst: I remember now
16:55karolherbst: so yeah, there is support for it :p
16:55karolherbst: it does make sense to have it native in hw if the hw is for compute workloads
16:57karolherbst: bbrezillon: I am actually wondering if we want to add a rounding mode field on nir_alu_instr....
16:58karolherbst: it seems like we have a round mode on all float alu instructions
16:58karolherbst: all 4 modes even
16:58karolherbst: well.. 8
16:59karolherbst: but you can ignore the other 4
16:59bbrezillon: I count only 4
16:59bbrezillon: what are the other 4?
16:59karolherbst: round to X int
17:00bbrezillon: I thought those were separate instructions
17:00karolherbst: it's for alu
17:00karolherbst: like you do a fadd and want it to have it rounded to int
17:00karolherbst: so you get 4.0 instead of 4.2 or something
17:01bbrezillon: hm, so HW support that as a rounding mode, interesting
17:01karolherbst: I assume it's super cheap to implement
17:01karolherbst: and it just takes a bit in the encoding space
17:02karolherbst: I am not sure how well it's supported.. just that there are some bits in the ISA for that
17:02karolherbst: like it seems like we don't emit it for fadd, but fmul/ffma
17:03karolherbst: ehh.. even quadops have a rounding mode
17:14karolherbst: after some thinking I think we need to be a bit more explicit about it ...
17:14karolherbst: so a rounding mode on all alu instructions is not a good idea
19:01airlied: MrCooper: doh, I knew sending that I'd sone something wrong, but my brain wasn't cooperating
19:49vivijim: agd5f_: airlied: drm-tip build breaks with some broader config...
19:50vivijim: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:1357:2: error: implicit declaration of function ‘drm_gem_object_put_unlocked
19:50vivijim: offending patch: fd9a9f8801de ("drm/amdgpu: Use GEM obj reference for KFD BOs")
19:55danvet: vivijim, it's a merge conflict
19:55danvet: airlied should be fixing it when pulling in the drm-misc-next pull from mlankhorst
19:56airlied: yeah that'll be Monday because I don't hate myself that much
19:59vivijim: cool then... no rush
20:22mdnavare: hwentlan: Coould you or Nicholas take a look at this patch: https://patchwork.freedesktop.org/patch/371201/?series=78279&rev=3 , we have pulled out the vrr_range from amdgpu and made it common in drm_debugfs
20:28buckley310: Ever since my distro upgraded from Mesa 20.0.2 to 20.0.7, my GPU locks up whenever I open Steam. Downgrading to 20.0.2 fixes it, but upgrading to 20.1.1 doesn't make a difference. issues/3132 looks like a similar hang, I also use a 5700XT, but the trigger conditions are very different. What should I do D: (http://ix.io/2pCg)
20:33Lyude: buckley310: #radeon can probably help a bit more here, and you should probably also file a bug here (not sure where the proper issue hub for radeon is these days, but they can probably tell you)
20:33buckley310: ok ill ask over there. thanks
20:35buckley310: so if i should file a mesa bug, should i include any logs other than the linked text?
20:41Lyude: buckley310: yeah but I don't know exactly what yet (haven't worked on mesa in a while), that log is probably a good start at least
20:50karolherbst: jekstrand: is there a way of converting loops in nir so that each loop has no explicit continue clauses?
20:50karolherbst: I think I need it for volta+
20:51anholt_: I don't think there's an existing pass for that.
20:52karolherbst: the issue is, we have only plain jumps
20:52karolherbst: and we need to handle all divergency ourself
20:52karolherbst: no special jumps for breaks/conts or anything
20:53karolherbst: and we need to setup cfg barriers to sync on
20:53karolherbst: so if you have a loop with 3 explicit continues, you'd have to set up 4 barriers, one for each contniue and one to sync up after the loop
20:54karolherbst: and converting a loop with multiple continues into multiple loops with none would make that kind of easier
20:54karolherbst: so we only need to set the barrier before the loop
20:54karolherbst: and sync on it after the loop
21:04karolherbst: mhh.. but that would make breaks quite annoying to handle...
21:12jekstrand: karolherbst: It shouldn't be too hard to handle. It'd be roughly the same algorithm we use for handling early returns in functions
21:22jekstrand: But likely predicating everything left in the loop isn't actually what you want. That'd get you some very messy control-flow.
22:19karolherbst: jekstrand: yeah.. I think I will just insert some fake blocks when converting to our backend IR and insert the barriers there..
22:19karolherbst: otherwise I'd have to deal with the loop breaks and make it break multiple loops at once
22:19karolherbst: and stuff...
22:21karolherbst: and I already know where all breaks/continues are.. so that shouldn't be a big issue