07:03 tomeu: MrCooper: anholt: once we merge https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3295, then we can enable 1/5th of gles3 tests in CI for panfrost so alyssa can move full speed on gles3 support :)
09:14 j4ni: mdnavare: fyi https://gitlab.freedesktop.org/drm/intel/issues/932
09:15 j4ni: agd5f_: ^
09:16 j4ni: it's now filed against i915, but it's drm dsc helper issue really
10:37 ickle: RIP: 0010:drm_atomic_bridge_chain_check+0x2f/0x2e0
10:40 ickle: bbrezillon: ^^
10:43 bbrezillon: ickle: where can I find the bug report?
10:44 ickle: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7691/filelist.html
10:44 ickle: pick any pstore
10:44 ickle: https://intel-gfx-ci.01.org/tree/drm-tip/combined-alt.html? for the full picture
10:56 bbrezillon: ickle: I guess there's no way to feed a patch to the CI without pushing to drm-misc
10:56 ickle: you can send to intel-gfx@lists.freedesktop.org (or intel-gfx-trybot@)
11:02 bbrezillon: ickle: oh, nice!
11:50 bbrezillon: ickle: hm, I was not registered to the -trybot ML
11:51 ickle: it's in patchwork, https://patchwork.freedesktop.org/series/71693/
11:51 ickle: I've asked Tomi to re-enable CI so it can be run
11:51 bbrezillon: great!
12:27 Adrinael: bbrezillon, "Participating hosts (2 -> 43)
12:27 Adrinael: CI is happy
12:48 tomeu: Adrinael: what's the recommended way of making sure IGT tests use the right DRM device? I have KMS tests failing here because are trying to use the panfrost device
12:49 Adrinael: tomeu, there's --device now, and IGT_DEVICE_CANTREMEMBERNAME
12:49 Adrinael: tomeu, see tools/lsgpu for syntax for the selection
12:50 tomeu: Adrinael: so no auto-detection?
12:50 Adrinael: tomeu, without any hints it uses the first /dev/dri/card[0-9] that matches the DRIVER_ flag used for drm_open_device
12:51 Adrinael: IGT_FORCE_DRIVER=drivernamehere does name-based selection for DRIVER_ANY tests
13:00 tomeu: Adrinael: ok, this is for IGT, so I guess something either in IGT or above it should auto-detect the devices
13:01 tomeu: sorry, for kernelci
13:01 Adrinael: What do you mean by autodetect
13:02 tomeu: if the test targets the KMS API, then igt core would open the first device that can make modesetting
13:03 Adrinael: The best the current architecture can do is open the device with the requested driver, and then check if it can do the test
13:04 Adrinael: How do you even know if a dri dev node can do modesetting or not? Without opening and trying
13:14 ickle: bbrezillon: can you please push the fix?
13:16 bbrezillon: ickle: sure
13:21 bbrezillon: ickle: hm, there's a conflict with drm-next
13:22 bbrezillon: "dim: FAILURE: Could not merge drm/drm-next"
13:23 ickle: not me for a change; waitasec to see if that resolves it
13:24 ickle: ok
13:28 bbrezillon: ickle: don't know what you did but it passes now
13:37 dolphin: bbrezillon: I was in the middle of merge
13:38 ickle: oh well
14:10 alyssa: Out of curiousity, does any non-mali hardware have native ops for format conversions?
14:11 HdkR: f2i, and i2f?
14:11 HdkR: or texture fetch format conversion?
14:11 alyssa: Independent from the texture/fbo stuff, just like .... R11F to rgba16f
14:11 imirkin_: nvidia has I2F variants which convert a specific byte of a 32-bit int value to unorm
14:11 alyssa: (and vice versa)
14:11 alyssa: RGBA4 to RGBA16F and back, etc.
14:11 alyssa: un/packing stuff
14:12 imirkin_: every GPU has 11f -> 16f conversion -- it's called "shl" :)
14:12 imirkin_: or rather, "extbf"
14:12 imirkin_: (extract bitfield)
14:13 alyssa: imirkin_: Still ends up being a bunch of ALU instructions vs one intrinsic
14:13 imirkin_: extract bitfield is one ALU op
14:13 imirkin_: or you mean one per channel? sure.
14:13 alyssa: Oh, sRGB->RGB/vice-versa
14:13 imirkin_: but in scalar architectures (the vast majority of GPUs out there), it's not a big deal
14:13 alyssa: I guess there's no reason to worry about this when you're not literally implementing an ES3 blend pipeline in software :V
14:14 imirkin_: right, blending is always handled in hardware. mali is special.
14:14 alyssa: Special, yes.
14:15 imirkin_: although like advanced blend is implemented in "software", and yeah, it's a ton of operations
14:15 imirkin_: usually more than the rest of the frag shader :)
14:16 HdkR: Hold on while I implement photoshop burn in my blend unit
14:16 imirkin_:is holding
14:16 HdkR: :P
14:16 imirkin_:is still on hold
14:17 HdkR: Too bad, you'll never get it
14:17 imirkin_: i'll make a gpu which ONLY supports advanced blend :p
14:17 HdkR: =O
14:17 imirkin_: [and no shaders]
14:17 HdkR: Madness
14:17 imirkin_: so you have to just cleverly supply inputs into the blend unit in order to get the computations you need
14:18 imirkin_: like clipping on pre-nv30
14:19 HdkR: ALUs are just an extension of the blend unit
14:19 HdkR: It's the programmable bit
14:19 HdkR: :P
14:34 alyssa: Tenth law of graphics: all graphics systems will expand until they implement half of Krita poorly
14:35 alyssa:guesses she'll be at this a while
14:36 alyssa: Conceptually I'm trying to add un/pack_color_pan intrinsics that take a format and a colour and make magic happen
14:36 alyssa: Given NIR and the hw, this might be a bit of friction
14:56 bbrezillon: danvet, imre: do we have a strong reason to keep drm_kms_helper as a separate module?
15:01 danvet: bbrezillon, it provides solid motivation to not muddle the midlayer
15:02 tomeu: Adrinael: yeah, was thinking that when igt is being asked for a modesetting device, that it would try the next one if DRM_IOCTL_MODE_GETRESOURCES returns -1
15:03 Adrinael: tomeu, that breaks reproducability though
15:04 tomeu: Adrinael: in what way?
15:04 Adrinael: tomeu, in multi-device systems you really should lock on which device you're using to properly detect the case of the driver absolutely b0rking instead of silently using a different one and saying there's no changes
15:04 tomeu: well, picking the first device is already problematic, as that depends on the probe order
15:05 Adrinael: That too, the first node might change
15:06 bbrezillon: danvet: well, now we have a circular dep between drm.ko and drm_kms_helper.ko (as reported by imre in his reply to "drm/bridge: Add a drm_bridge_state object")
15:06 tomeu: Adrinael: yeah, meant to say that it would not be adding that much reproducibility
15:07 danvet: bbrezillon, you screwed up?
15:07 Adrinael: tomeu, but when you use --device, you're not just picking the first node
15:07 bbrezillon: danvet: I don't think I can do any better if I want to use the private obj helpers to implement the bridge state
15:07 danvet: bbrezillon, and yes those dep loops are exectly the bad midlayer smell
15:08 bbrezillon: which is what you recommended to do
15:08 danvet: bridges should be in the helpers
15:08 danvet: but that's a serious can of worms too
15:09 danvet: bbrezillon, so build is broken now, at least the modular one?
15:09 bbrezillon: danvet: yes
15:09 danvet: step 1 revert
15:09 danvet: and we need to get this compile testing sorted somehow
15:09 danvet:pining for gitlab, but oh well
15:10 tomeu: Adrinael: ah yeah, when a device is explicitly passed, no retries should happen
15:12 bbrezillon: danvet: that's weird 0day bots didn't catch that
15:12 danvet: bbrezillon, 0day is dead
15:13 danvet: and you're supposed to compile test before pushing, not praying for 0day to test afterwards
15:14 bbrezillon: I do push to my own tree that's supposed to be scanned by 0day bots (which I didn't know were dead)
15:14 danvet: well it's not completely dead
15:14 danvet: but by all practical purposes, it's daed
15:14 bbrezillon: and I don't compile test all possible combinations, that's for sure
15:14 danvet: there's 3 defconfigs for -misc
15:14 danvet: (which everyone ignores, but well)
15:15 bbrezillon: danvet: ok, so revert
15:15 bbrezillon: then, what's the plan?
15:16 bbrezillon: narmstrong: FYI ^
15:16 danvet: bbrezillon, hm where/how does it fail?
15:17 narmstrong: bbrezillon: is it easily fixable ? otherwise revert
15:18 bbrezillon: danvet: __drm_atomic_helper_bridge_duplicate_state() (drm_bridge.c which is linked in drm.ko) calls __drm_atomic_helper_private_obj_duplicate_state() (which is part of drm_kms_helper.ko)
15:18 danvet: uh why is a _helper_ function in drm.k
15:18 bbrezillon: because other bridge helpers where there IIRC
15:19 danvet: all the other state helpers are in drm_atomic_state_helper.c
15:19 danvet: yeah bridge is a mess wrt the core/helper split
15:19 bbrezillon: s/where/were/
15:19 danvet: bbrezillon, if it's just moving a few functions then just do the fixup patch
15:19 bbrezillon: so, it's all about moving atomic helpers to drm_atomic_state_helper.c?
15:20 bbrezillon: danvet: ok
15:21 danvet: bbrezillon, yeah from a quick look moving these 2 should be enough
15:21 danvet: since private state is core stuff actually
15:23 bbrezillon: that's weird
15:23 bbrezillon: I just checked my .config and both DRM and DRM_KMS_HELPER are modules
15:23 danvet: yeah it doesn't fail while compiling, only while installing
15:24 danvet: so compile test target wouldn't even have caught this
15:24 bbrezillon: hm, that's why
15:24 danvet:cries harder
15:25 danvet: mlankhorst_, your pull got caught in this :-(
15:26 bbrezillon: danvet: shouldn't be hard to add a modules_install target once you have the rest of the CI ready
15:28 tomeu: Adrinael: hmm, or maybe we should always set that a device supports the modesetting API in the regular case, and change GPU drivers to use drm_open_driver_render
15:30 agd5f_: hwentlan, https://gitlab.freedesktop.org/drm/intel/issues/932
15:32 jekstrand: bnieuwenhuizen, tarceri: Does Shadow of the Tomb Raider work on RADV?
15:33 jekstrand: The port, not the DXVK version
15:38 hakzsam: jekstrand: should work
15:39 jekstrand: hakzsam: Ok....
15:39 jekstrand: hakzsam: I think I found an app bug. I've got an e-mail with Feral about it.
15:39 jekstrand: Somehow it works on our older hardware but not on newer stuff :/
15:39 hakzsam: what sort of bug?
15:39 jekstrand: It uses shared memory with control barriers but no memory barriers
15:40 hakzsam: interesting find
15:41 jekstrand: In GLSL, barrier() is supposed to implicitly do a shared memory or TCS output barrier
15:41 jekstrand: Not so much in SPIR-V
15:41 hakzsam: I don't own the game myself, so can't test. But I remember a phoronix article about it and I'm quite sure other radv devs tried that game
15:43 jekstrand: Yeah, phoronix says it works
15:43 jekstrand: And it works on our older HW
15:43 jekstrand: Kayden: I think I may have found many of our Gen11 compute shader bugs ^^
15:44 bbrezillon: mlankhorst_: there are 2 other fixes I'd like to push before you prepare a new PR
15:45 bbrezillon: "drm/vc4: dsi: Fix bridge chain handling" and "drm/exynos: dsi: Fix bridge chain handling"
15:50 bnieuwenhuizen: jekstrand: haven't tried that recently but IIRC I tried it in 2019Q4 and it worked on RADV
15:50 jekstrand: bnieuwenhuizen: Yeah, it may be that your hardware does that automatically
15:55 bnieuwenhuizen: jekstrand: yeah, seems like all our HW does (and when it doesn't the LLVM backend will add it >_<)
15:55 bbrezillon: danvet: hm, now I don't know where to put the 'select DRM_KMS_HELPER' since there's no DRM_BRIDGE Kconfig entry
15:56 bbrezillon: my bad
15:56 danvet: in your driver?
15:56 danvet: that uses the helpers
15:56 bbrezillon: helpers are used by the core too
15:57 danvet: uh that's still backwards then
15:57 bbrezillon: so, there's a DRM_BRIDGE option
15:57 bbrezillon: but drm_bridge.c is always compiled
15:58 danvet: another patch to shred that option?
15:58 danvet: bbrezillon, drm_atomic_default_bridge_duplicate_state <- this also needs to be moved to the helpers
15:58 danvet: and made an EXPORT
15:58 danvet: plus drivers updated
15:59 danvet: exact same dance we do for all the others
15:59 danvet: others = plane/crtc/connector
15:59 bbrezillon: does that mean I shouldn't provide a default when it's not implemented?
15:59 danvet: yeah that's one outcome
16:00 bbrezillon: ouch! that's likely to involve patching all bridge drivers
16:01 danvet: ok I guess we're in revert territory then
16:01 narmstrong: erf
16:02 danvet: consistency with how this stuff works for plane/crtc/connector definitely wins
16:02 danvet: (we might want to change that, but that's a bigger series)
16:03 danvet: plus I think this is holding up intel-gfx-ci, so we really don't have the time to muse over this, even for just a day
16:03 danvet: bbrezillon, narmstrong you're taking care of this, including cross ack to get the revert landed asap?
16:04 danvet:kinda busy with trying to catch up on mails
16:05 narmstrong: Yup I’ll ack once the reverts are on the ml
16:06 AndrewR: Hi all. For some reason recent mesa + piglit fail to build one test for me: /dev/shm/piglit/tests/spec/ext_disjoint_timer_query/simple-query.c:100: undefined reference to `glGetInteger64vEXT' (mesa git-dd09f1d806, piglit 4f9e5f783b9c359d22e9ae27b54b2e940429ead2
16:06 danvet: bbrezillon, [PATCH 0/4] Complete ddc symlink addition <- you're taking care for merging, looks all ready?
16:06 danvet: same company and all
16:12 bbrezillon: narmstrong: done
16:13 bbrezillon: can you apply them (I'm AFK for the next 2 hours)?
16:19 danvet: tomba, drm/omapdrm: use BUG_ON macro for error debugging. <- should I just stuff that one into drm-misc-next?
16:34 narmstrong: bbrezillon: got to go home, can’t reply to ml until a few hours, but you have my ack from irc
16:39 robher: sravn: Since tagr is on board, are you planning to add the rest of the compatible strings for simple panel schema? It's probably easiest to extract them all from my patch.
16:56 andrzej_p: danvet: I can merge the [PATCH 0/4] Complete ddc symlink addition; but as far as I can tell the maintainers didn't say neither yes nor no
17:03 tomba: danvet: yes, that's fine.I can also pick and push it later this week. got back to work today.
17:25 cmarcelo: jekstrand: I agree a "just control barrier" intrinsic must exist (always assumed current NIR intrinsic intention was to be that -- which is arguable), but does the "this is a glsl barrier" should also exist? or we could just lower GLSL's ir_barrier into the low-level + the memory(ies) barrier?
17:25 jekstrand: Ideally, I'd say we should just lower in glsl_to_nir. The real question is how many drivers we have to fix if we do that.
17:26 jekstrand: Probably radeonsi, nouveay, ir3, v3d, and Intel.
17:27 jekstrand: Of course, we could say that's what it means, add extra memory barriers in glsl_to_nir, and let them delete any unneeded code.
17:27 jekstrand: Worst case, they end up with extra synchronization
17:27 imirkin_: what's the issue?
17:27 cmarcelo: jekstrand: if you lower to nir_barrier + nir_memory_barrier_shared, not sure what need to be changed in the drivers.
17:28 jekstrand: imirkin_: Trying to separate the memory and control bits of barrier()
17:28 jekstrand: cmarcelo: Yeah, could do that
17:28 imirkin_: control like control flow?
17:28 cmarcelo: jekstrand: semi related => https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3224
17:29 jekstrand: imirkin_: barrier() in glsl is a control-flow barrier AND a memory barrier: shared for compute and patch output for TCS.
17:29 jekstrand: In SPIR-V, OpControlBarrier can be just a control barrier
17:29 anholt_: cmarcelo: looking at your MR, I was hoping that it could be reworked to be "these barriers lower to these groups of HW barriers, and remove redundancies accordingly"
17:30 cmarcelo: anholt_: I could generalize to that if would be useful...
17:30 imirkin_: afaik if you want to do that in C, you put in a "compiler" barrier
17:30 imirkin_: which is essentially a memory barrier
17:30 jekstrand: imirkin_: This isn't C
17:30 imirkin_: that doesn't reference "real" memory
17:30 imirkin_: i know, but it can still be useful to know how others do it
17:31 jekstrand: cmarcelo, anholt_: Yeah, I think it'd be useful to lower all of the other barriers to scoped_barrier.
17:31 jekstrand: imirkin_: I'm really not sure what you're suggesting.
17:31 cmarcelo: jekstrand: *that* would need some drivers changes (-:
17:32 jekstrand: cmarcelo: Only if it's non-optional
17:32 cmarcelo: jekstrand: how so?
17:32 imirkin_: jekstrand: creating a new barrier type
17:32 imirkin_: and saying that all memory barriers are control flow barriers
17:32 imirkin_: (which i think is right, no?)
17:32 jekstrand: cmarcelo: What I was suggeting to handle what I thought anholt_ was suggeting is to first lower everything to scoped and then remove barriers. That way we can get the optimization even in GLSL land.
17:33 jekstrand: imirkin_: I guess in some sense that's true. Maybe?
17:33 jekstrand: glsl barrier() is more like a semaphore wait
17:33 imirkin_: so you have memory_barrier_global, shared, etc, and then make one like "control" or something, which is the lowest level of barrier
17:34 imirkin_: heh
17:34 imirkin_: oh right
17:34 imirkin_: that barrier.
17:34 jekstrand: yeah...
17:34 imirkin_: sorry, i got confused.
17:34 imirkin_: i was thinking control flow barrier for like ... compiler opts
17:34 imirkin_: yeah, the waiting barrier is totally different. ignore everything i said =/
17:34 jekstrand: :P
17:34 anholt_: jekstrand: that looks really reasonable at first glance
17:35 anholt_: (though I don't understand the acquire/release semantics related to HW I care about yet)
17:35 jekstrand: hehe
17:38 cmarcelo: jekstrand: back to control, still want to have a nir_intrinsic for the GLSL control barrier()?
17:38 jekstrand: Unsure
17:41 jekstrand: cmarcelo: I think what I'd like to see is us add memory semantics to barrier() and then add a pass which splits it into a no-memory control barrier and a properly split pair of scoped barriers.
17:41 jekstrand: Or maybe not
17:41 jekstrand: I don't know
17:41 jekstrand: We can also just add a pair of shared_barrier around the control barrier and let others write a pass to detect that and delete the redundant barriers if they want.
17:42 jekstrand: I guess the real question is if there are people who can bake barriers into other instructions easily
17:44 cmarcelo: I'd like to eventually move everything to the use scoped_memory_barrier (GLSL and the non VkMM SPIR-V), dropping the other memory barriers, and rename scoped_memory_barrier to memory_barrier. perhaps I could try that.
17:46 jekstrand: Ok, I think what I'm going to do right now is try to finish up this MR that adds the workaround for old GLSLang
18:08 sravn: robher: plan to push my current set of patches, and then start by adding compatible from pending panles. When this is done somehow pull out all the simple panel compatible from your pathc, including descriptions.
18:09 sravn: robher: Before pushing my patches I need a-b from you on the new set (of two patches). One detail I like you to look at is if the comment for each compatible is the right way to do it.
18:09 robclark: sravn: when you have something, do you mind pushing a branch somewhere that Bjorn and I can rebase on?
18:10 anholt_: bbrezillon: I don't think I'm going to have time to make sense of the bridge stuff. ack, I guess, if you're sure it fixes things.
18:11 bbrezillon: anholt_: it fixed things for the exynos driver
18:12 bbrezillon: and I'm sure it's currently buggy
18:29 anholt_: Kayden: made piglit MR 206, but still not finding how I broke i965
18:37 jekstrand: Kayden: I wonder if the missing barriers explain the TCS 8-patch failures we were seeing on gen9
18:56 danvet: andrzej_p, imo all ok to merge
18:56 danvet: drm-misc is also serving as fallback for maintainers that don't respond for subsystem-wide refactorings and stuff
18:56 danvet: but maybe get bbrezillon to push stuff for more blame distribution :-)
18:59 bbrezillon: danvet: blame distribution, looks like all the blame is on me recently :)
18:59 danvet: :-)
19:03 bbrezillon: danvet, andrzej_p: I'll apply tomorrow
19:03 bbrezillon: hope that's not too late
19:29 andrzej_p: danvet: pushed
19:40 sravn: robclark: at the moment I have nowhere I can push this, and I will get back to t later this week. So we can move forward with the pending panel patches (4 at the moment)
19:44 cmarcelo: jekstrand: (or perhaps bnieuwenhuizen): a SPIR-V question does it make sense a OpSelectionMerge refering as a merge block to one of the branches in a OpConditionalBranch? In a /structured/ control flow is not clear to me how one of the branches could be also a merge block. an example here:
19:44 cmarcelo: https://github.com/KhronosGroup/VK-GL-CTS/blob/master/external/vulkancts/data/vulkan/amber/graphicsfuzz/control-flow-switch.amber#L143
19:45 jekstrand: That's a "fun" question
19:45 jekstrand: Which no one can answer
19:46 cmarcelo: (we are crashing in this particular one, haven't figured yet if due to that fun or some other fun down the shader)
19:46 jekstrand: cmarcelo: Oh, that case actually looks pretty straightforward
19:46 jekstrand: It's just an if
19:46 jekstrand: cmarcelo: I had a branch to try and make this stuff better...
19:48 jekstrand: cmarcelo: That one looks like it's just "if (foo) continue;" which should work :/
19:48 cmarcelo: jekstrand: there's no path from 130 to 129, how is 129 a "merge block"
19:49 jekstrand: cmarcelo: Good question. :-)
19:49 jekstrand: cmarcelo: Unfortunately, SPIR-V doesn't have rules for when you can and must and must not insert merge blocks.
19:50 bnieuwenhuizen: cmarcelo: where does 130 go to?
19:50 jekstrand: bnieuwenhuizen: It jumps to the loop continue
19:51 cmarcelo: bnieuwenhuizen: 120 -> 95... 129 eventually goes to 120 too.
19:51 cmarcelo: I suspect the "right" thing would be the merge block to be set as 120.
19:52 cmarcelo: jekstrand: I thought whether or not we have a OpSelectionMerge is underspecified, but once we had one, the rules should cover it...
19:53 jekstrand: krh: Do you have tessellation working in ir3 yet?
19:53 jekstrand: cmarcelo: maybe? I'm not sure what those rules are
19:53 robclark: sravn: ok, then ping me when you have something.. I can poke Bjorn to rebase his two panel drivers too
19:53 jekstrand: robclark: Do you have tessellation working yet?
19:53 robclark: jekstrand: yes
19:56 bnieuwenhuizen: cmarcelo: still figuring out how to defend my intuition by spec, but by my intuition a merge block is something that post-dominates the conditionalbranch when everything postdominated by a break/continue is excluded, and this seems to qualify?
19:58 jekstrand: cmarcelo: Does the Vulkan MM apply to TCS patch outputs?
19:58 bnieuwenhuizen: and since one side does an unconditional break (130 is a break block), what other block would you think is a merge block?
19:58 jekstrand: cmarcelo: Like are you supposed to be able to do the same data passing stunts?
19:59 cmarcelo: correction: it can't be 120 because that's already a merge block for something else.
20:00 jekstrand: robclark: Do you need memory barriers for TCS patch outputs?
20:00 jekstrand: or krh
20:00 bnieuwenhuizen: cmarcelo: https://pastebin.com/WnhqnU24
20:03 cmarcelo: bnieuwenhuizen: I see, so we exclude the break/continue, and in practice there's only one "branch" to be considered for merging.
20:05 bnieuwenhuizen: yeah
20:05 cmarcelo: bnieuwenhuizen: but I also can't find the text that justifies this. :-/
20:05 krh: jekstrand: yes, it's all done
20:05 bnieuwenhuizen: cmarcelo: search for "the only blocks in a construct that can branch outside the construct are" in the SPIR-V spec section 2.11
20:05 krh: jekstrand: no
20:06 krh: jekstrand: afaict, adreno supports up to 32-wide dispatch, which means that all tcs threads always run in the same wavefront
20:06 jekstrand: right
20:07 bnieuwenhuizen: cmarcelo: the construct consists of just block 122, 130 is a break block, so the 122->130 jump satisfies "a break block for the innermost loop it is nested inside of", and 129 satisfies "a block branching to the construct’s merge block "
20:08 jekstrand: cmarcelo: I think we're missing a bunch of barrier support places...
20:08 cmarcelo: jekstrand: I think MM applies with the new OutputMemory storage class... there's text describing the non-MM behavior https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#memory-model-tessellation-output-ordering
20:08 jekstrand:will send patchse.
20:08 anholt_: krh: the closed vulkan driver reports a subgroup size of 64, fwiw
20:08 bnieuwenhuizen: ugh the scope of construct may be wrong with the spec: "a selection construct: the set of blocks dominated by a selection header, minus the set of blocks dominated by the header’s merge block " <- should include 130
20:10 bnieuwenhuizen: I don't think that definition makes sense at all in the presence of breaks and continues
20:12 cmarcelo: bnieuwenhuizen: the text "the only blocks in a construct that can branch outside..." does allow us to have the break blocks, but doesn't seem to me restrict the merge block definition.
20:13 bnieuwenhuizen: cmarcelo: a merge block in itself has no constraints on it right?
20:15 bnieuwenhuizen: cmarcelo: the merge block itself is not defined as post-dominating of any kind, it just has to be chosen by the compiler such that all the other rules about what is contained in the construct and what jumps out of the construct etc. hold
20:16 bnieuwenhuizen: (unless it is defined and I'm missing it?)
20:21 cmarcelo: bnieuwenhuizen: I guess I read to much into "a merge block where the control flow subsequently converges".
20:24 cmarcelo: bnieuwenhuizen: why do you think 130 should be out of the selection construct?
20:24 bnieuwenhuizen: cmarcelo: I don't anymore, see [9:12:05 pm] <bnieuwenhuizen> ugh the scope of construct may be wrong with the spec: "a selection construct: the set of blocks dominated by a selection header, minus the set of blocks dominated by the header’s merge block " <- should include 130
20:27 bnieuwenhuizen: cmarcelo: on the other hand, by this definition 120 should also be in the construct as it is not dominated by 129, which seems ... weird
20:29 cmarcelo: looking at the spirv issues, I think there was some discussion last year about this...
20:40 krh: anholt_: it probably multi-issues under the hood and can vary the width depending on the dispatch mode?
20:42 cmarcelo: bnieuwenhuizen: 120 being in the construct is indeed weird. it is a side-effect of "pulling the merge block as up as possible", imagine this was a switch, one of the cases branch, but the other two not. we wouldn't be able to "pull up" the merge block, so it would end up being 120.
20:44 cmarcelo: jekstrand: "barrier support places", you mean considering the barrier in certain passes?
20:44 jekstrand: cmarcelo: I'll have a MR shortly
20:44 jekstrand: cmarcelo: And you can tell me why it's wrong. :-)
22:31 jekstrand: cmarcelo: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307
22:31 jekstrand: cmarcelo: I think I got things cleaned up a bit better at any rate
22:33 cmarcelo: jekstrand: "Commits (0)"?
22:35 cmarcelo: jekstrand: maybe a rebase gone wrong...
22:39 jekstrand: cmarcelo: Yeah, I was mid-rebase when I pushed. :/
22:40 mdnavare: vsyrjala: airlied: In detailed_dtata_monitor_range struct we declare u8 pixel_clock_mhz; /* need to multiply by 10 */ does that mean we need to multiply this value by 10 get pixel_clock in Mhz ?
22:49 Kayden: anholt_: I'm really not seeing it either. The atomics are clearly moved in the shader code by the right amount. And the binding table setup code seems to be swapping them...
22:49 Kayden: it sure looks like it ought to work
22:49 anholt_: Kayden: yeah. I haven't gone as far as dumping out the contents of the binding table, seems like the next step
22:49 anholt_: (is there an easy INTEL_DEBUG= for that?)
22:50 Kayden: It ought to happen in INTEL_DEBUG=color,bat ... |& less -FRS, but it isn't. I can try and fix that
22:50 Kayden: (pretty sure I -did- at one point, but I guess I accidentally tossed that instead of landing it...)
22:52 Kayden: ah, still have the patch in a branch
22:52 imirkin_: now just have to figure out which one :)
22:52 Kayden: git cherry-pick kwg/decoder-compute-counts will make it work
22:59 Kayden: http://whitecape.org/paste/bt-diff.txt - looks right to me...
23:00 Kayden: ssbo_start is 29
23:00 Kayden: and there are 8 of each
23:00 Kayden: and they're swapped
23:06 cmarcelo: jekstrand: odd that OpControlBarrier requires output memory synchronization. I guess because there wasn't an output memory storage class, OpControlBarrier was "patched" to include output memory barrier semantics (before MM happened)...?
23:08 jekstrand: cmarcelo: Yeah, it is odd but it's current with the most recent GLSL spec
23:08 jekstrand: cmarcelo: Also, GLSLang doesn't give us anything useful there yet. :(
23:11 imirkin_: cmarcelo: barrier() in TCS is basically designed to allow things after the barrier to read varyings written before the barrier
23:12 imirkin_: (both patch as well as regular varyings ... you can access other invocations' varyings in a TCS)
23:12 imirkin_: however i'm not sure it has the same memory flush semantics in a compute shader, tbh
23:14 jekstrand: It depends on whether or not you have memory there
23:14 chrisf: i thought it was completely relaxed in compute shader, and if you wanted a memory dependency you needed to insert appropriate memory barriers
23:14 jekstrand: For most people it was a no-op
23:14 chrisf: (at the glsl language level)
23:14 jekstrand: chrisf: Not for shared memory
23:14 Kayden: anholt_: the shader code looks right to me, too... the atomic messages in the loop used to be surface [29..36] and now they're [37..44]. the SSBO message at the top used to have an ADD 8D and now it doesn't
23:15 jekstrand: chrisf: See 1.1.2 https://www.khronos.org/registry/OpenGL/specs/gl/GLSLangSpec.4.60.html#changes
23:15 anholt_: Kayden: the umin()s scared me for a minute, but those appear to all be on array index before the shifting by the base.
23:15 chrisf: jekstrand: alright then
23:16 Kayden: anholt_: Yeah, I think that's just clamping array indexes before we offset
23:17 Kayden: there's just the one SSBO call too..
23:18 anholt_: do we have a backend no-optimization flag to try?
23:19 anholt_: (given that the NIR looks sane, and the backend code is totally shuffled, seems like the next likely place to hide a failure is in some busted opt)
23:21 cmarcelo: jekstrand: reviewed some, I'll get back to the other (TCS related) patches tomorrow.
23:21 jekstrand: cmarcelo: Sounds good
23:22 cmarcelo: jekstrand: after this series land I'm still inclined to propose an MR with the scoped memory barrier change I mentioned above.
23:22 Kayden: anholt_: no, I've just been hacking fs_visitor::optimize when I need to
23:22 Kayden: the backend code doesn't look all that shuffled to me though
23:23 jekstrand: cmarcelo: Which change was that?
23:25 anholt_: Kayden: side-by-side is working OK. The reg alloc changing meant my diff -u was *shrug*
23:25 cmarcelo: jekstrand: emit scoped barriers instead of the inividual barrier types, eventually get rid of the olders and rename scoped_mem_barrier to mem_barrier...
23:26 Kayden: anholt_: Yeah, definitely...side-by-side is way easier
23:27 Kayden: anholt_: sometimes I use dwdiff -c too
23:27 jekstrand: cmarcelo: Yeah, I'm a fan of that
23:28 jekstrand: cmarcelo: I'd really like to see the back-end implement two things: scoped_memory_barrier (which we can rename to memory_barrier) and control_barrier.
23:28 Kayden: (word diff with red/green replacements, so your register renumbering just shows both numbers on the same line, with the instruction still the same)
23:29 Kayden: huh, I think our printing might be busted for the SSBO write
23:30 Kayden: sends(16) nullUD a14UD g32UD g[a0]UD 0x00000080
23:30 Kayden: address register...14??? and g[a0]?
23:30 jekstrand: Kayden: Yeah, I have a patch for that
23:31 jekstrand: Kayden: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3309
23:33 anholt_: Kayden: ooh. I like that tool
23:34 Kayden: jekstrand: thanks! that's much less crazy. the g[a0] is still weird though
23:34 Kayden: sends(16) nullUD g30UD g32UD g[a0]UD 0x00000080
23:34 jekstrand: Kayden: Yeah, that's a bit weird...
23:34 Kayden: there shouldn't be a src2
23:34 jekstrand: Kayden: We should probably just say a0 there instead
23:34 jekstrand: Kayden: It's an indirect send
23:35 Kayden: yeah
23:35 jekstrand: g[a0] is a bit wrong
23:35 Kayden: yeah, implies it's an indirect register read, which it isn't
23:35 jekstrand: I mean, that is what's in the source, but it's not really what we want to print
23:35 jekstrand: a0.0:UD would be better
23:41 Kayden: the last SSBO in the shader looks off, but I'm struggling at reading
23:42 Kayden: oh, there are two store_ssbo at the end
23:43 Kayden: I only see 1 though..
23:43 Kayden: I'm seeing 0x0000001dUD (BTI 29) and & 0xff...but no + 8 in either case
23:45 Kayden: oh, no, it's there
23:46 Kayden: in the old code, g5 = 8D, and add(16) g24<1>UD g5<8,8,1>D 0x0000001dUD { align1 1H };