00:09 jekstrand: karolherbst: I'm thinking instead of using vars_to_explicit_types, we probably want to re-use the thing I made for constants so we can also scrape out a blob of global initialization data.
00:09 karolherbst: mhhh. maybe
00:10 karolherbst: but it's a bit annoying as we kind of have to do it before we even have the nir
00:10 jekstrand: No we don't
00:10 jekstrand: We can create the global buffer the first time a kernel is created and then, every subsequen time, assert that the size and init data match.
00:10 karolherbst: right, it would be enough to just upload it with the first kernel we launch from that program
00:10 karolherbst: mhhh
00:10 karolherbst: that might work
00:10 jekstrand: It's not great, I'll grant, but I think it'll work.
00:11 karolherbst: do we want to add output arguments to the pass or where do we want to store the data then?
00:11 jekstrand: Maybe?
00:11 jekstrand: Or maybe we just want nir_shader::global_mem_init_data
00:12 karolherbst: yeah.. might work as well
00:17 jenatali: Sounds reasonable to me
00:32 karolherbst: oh wow, so do we have a non broken opengl implementation for macos now? :D
00:51 dcbaker[m]: You mean zink over moulton-vk :D
01:01 dcbaker[m]: Oh dang, that's exactly what it is. Lol
01:16 cmarcelo: jekstrand: jenatali: I'm likely missing something here. isn't is_in_set essentially just must?
01:16 cmarcelo: just _must_be
01:17 jenatali: cmarcelo: It's may_be, but asserting that if it is, that it also must_be
01:23 jenatali: cmarcelo: (A|B) is_in_set (A|B|C) without asserting, while (A|B|D) is_in_set (A|B|C) assert-fails
01:30 cmarcelo: (A|B) must_be (A|B|C) ~> true, (A|B|D) must_be (A|B|C) ~> false... is the difference only the assertion?
01:41 cmarcelo: jenatali: jekstrand: would be easier to see the relationship with _must_be if the return clause used the same value that is asserted (so same implm as _must_be for that case) OR if the helper was written in terms of maybe/must...
02:41 cmarcelo: jenatali: ok. I've find easier to think in terms of must with asserts, but I can see it is also in the same way a may_be with asserts. nevermind the suggestion above.
02:56 jenatali: cmarcelo: Yeah, originally the asserts were just inline, so the new helper is really just if (may_be) { assert(must_bet); return true; }
03:00 jenatali: jekstrand: Maybe writing it that way (^^) would make it clearer?
05:00 cmarcelo: jenatali: I like that suggestion.
07:53 igorkovalenko: hello for some reason I cannot add a label r600 to my merge request https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7417 - is it a restricted action? or gitlab issie?
07:53 gitbot: Mesa issue (Merge request) 7417 in mesa "r600: amend space check for chips older than EVERGREEN" [Opened]
08:17 MrCooper: igorkovalenko: only project members with access level "Developer" or higher can set labels on MRs
08:21 igorkovalenko: MrCooper: thanks, new to gitlab
08:23 igorkovalenko: MrCooper: otoh https://docs.mesa3d.org/submittingpatches.html says this: Add labels to your MR to help reviewers find it
08:23 MrCooper: it's a good idea for those who can :)
08:26 igorkovalenko: I also see pipeline build fails, but from same document Patches should never introduce build breaks :)
08:28 igorkovalenko: this one https://gitlab.freedesktop.org/igor.v.kovalenko/mesa/-/jobs/5350111
08:48 MrCooper: yeah, an MR can only be merged if the pipeline is green
08:50 igorkovalenko: I'm sure breakage is not mine, as my change is only adding =0 to initialize a variable; and my local clang build is clean so likely the issue is with meson-clang pipeline step itself
08:51 igorkovalenko: I think many pipelines are just not started since they require manual step e.g. https://gitlab.freedesktop.org/mesa/mesa/-/pipelines/222721
08:52 igorkovalenko: I wanted to see how it works so I clicked each manual one, and found that meson-clang fails later in the pipeline
08:56 igorkovalenko: MrCooper: may I ask you to add r600 label to https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7417
08:56 gitbot: Mesa issue (Merge request) 7417 in mesa "r600: amend space check for chips older than EVERGREEN" [Opened]
08:57 igorkovalenko: restarted pipeline and it passed now, looks like something transient
08:59 danvet: mripard, can you roll -fixes forward pls?
08:59 danvet:has maybe a patch
09:10 mripard: danvet: roll it forward to what?
09:10 mripard: drm-fixes?
09:13 danvet: mripard, drm-misc-fixes to -rc3
09:13 danvet: or so
09:13 danvet: so we're not stuck forever on -rc2 like last round
09:14 mripard: you mean in the future?
09:16 MrCooper: igorkovalenko: most pipelines intentionally don't run automatically to save resources
09:29 bbrezillon: mripard: I just pushed patches to misc-fixes, so please don't drop them in the process :-) (I know it happened in the past because of force-pushes, maybe that can't happen anymore)
09:34 mripard: bbrezillon: I'm not sure why it happened for you in the past, but that's definitely not the plan
09:49 danvet: mripard, oh I missed that you missed the -rc3 window already :-/
09:49 danvet: mripard, and yeah I think -fixes should be at most 2 weeks behind
09:49 danvet: otherwise things get needlessly complicated
09:49 danvet: so if you can't ff to -rc4, I'd just backmerge
09:49 mripard: danvet: hmmm
09:50 mripard: -rc2 was released yesterday?
09:50 bbrezillon: :D
09:50 danvet: the thing is, the early -rc tend to be garbage
09:50 danvet: mripard, -rc3 I mean
09:50 danvet: sry
09:50 danvet:not awake
09:50 bbrezillon: still, -rc3 is not out yet
09:50 danvet: yeah I'm waking up
09:50 danvet: :-/
09:50 danvet: mripard, it's all good, apologies for the confusion
09:50 danvet:goes and applies the patch
09:51 mripard: danvet: np :) I was seriously wondering what I did wrong or miss this time :)
10:11 pq: hmm, looks like COLOR_ENCODING propert values are undocumented
10:14 emersion: indeed
10:14 emersion: is it a standard drm prop?
10:14 pq: I think it is
10:14 pq: I started wondering which one of the BT.601 color spaces the 601 setting there means
10:15 lynxeye: MrCooper: what exactly does "Someone canceled the CI" mean? Just try again later?
10:15 MrCooper: yeah, sorry, I guess I was too slow retrying one of the jobs; I already reassigned the MR to Marge
10:16 pq: emersion, the issues is that BT.601 defines two different sets of RGB primaries: one for 525-line image and another for 625-line.
10:17 pq: so PAL vs. NTSC in practice or something?
10:17 pq: emersion, IOW, there are two different YUV<->RGB transformations I think
10:20 pq: hmm, or are there...
10:21 pq: vsyrjala, does the 525 va 625 line difference in primary chromaticities affect the RGB<->YUV transformation, or is the matrix always the same?
10:21 lynxeye: MrCooper: np, just wanted to know what to do, since I generally don't get in touch with the CI too much, so was a bit confused about the message
10:24 MrCooper: it was pretty accurate :) the pipeline was about to time out due to some kind of runner / network issue, I tried saving it by cancelling and retrying the affected jobs, but failed
10:26 danvet: tzimmermann, do you need any more review for your series?
10:26 danvet: or the igt side? (didn't see that on a quick look)
10:27 tzimmermann: danvet, i think the kernel code should be fine. the read/write changed slightly, but nothing big
10:28 danvet: poke me if you need an ack for the igt
10:28 tzimmermann: danvet, i'd appreciate if you coul dtake a look at the test cases
10:29 tzimmermann: i guess, they only really succeed with the kernel-side patchset applied
10:29 tzimmermann: everything else is luck
10:29 tzimmermann: but the tests succeeded locally when i tested them
10:29 tzimmermann: test url is in the cover letter
10:31 tzimmermann: danvet, BTW there's one build (containers:igt) in the igt MR that does not work. but i don't quite understand the error message
10:32 danvet: might be a gitlab error, no idea
10:32 danvet: Adrinael, ivyl https://gitlab.freedesktop.org/tzimmermann/igt-gpu-tools/-/jobs/5352496
10:32 danvet: tzimmermann, btw igt still using mailing lists and patches afaik
10:32 danvet: https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/CONTRIBUTING.md#sending-patches
10:33 Adrinael: Yes, patches on mailing list please
10:33 Adrinael: And that error message on the gitlab-ci is a server issue
10:33 danvet: reported on #freedesktop already?
10:35 tzimmermann: danvet, it's igt-dev@lists.freedesktop.org ?
10:36 danvet: the CONTRIBUTING link says so :-)
10:38 Adrinael: bleh can't find that particular one in issues
10:38 Adrinael: But we've been having container jobs fail occasionally with different error messages
10:38 Adrinael: Retry never seems to reproduce
10:50 MrCooper: Adrinael: any reason for not using the ci-templates machinery for building/copying the image?
10:52 Adrinael: ivyl, ^ do you have an answer for that, I forgot the details
11:00 ivyl: MrCooper, Adrinael: mostly because the templates were wayland templates back in the days + were lacking all the thing necessary to make our mailing list <-> gitlab ci glue work
11:00 ivyl: I am planning to look at that again soon, but kinda busy with Wine right now
11:01 MrCooper: sounds good
11:09 danvet: bbrezillon, for the drm/scheduler annotations one option might be merging it, but with an opt-in flag that drivers set
11:09 danvet: for now at least
11:48 linkmauve: kusma, since you seem to be the current maintainer of mesa-demos, would you be fine with switching the es2_* demos (and es1_* and possibly even some other ones) to either GLFW or Waffle, so that they can be used on more than just X11?
11:49 linkmauve: Maybe also moving off FreeGLUT, because its support of GLES isn’t very good, I noticed it doesn’t build if I don’t have GLES/gl.h (the GLESv1_CM header) present on my system. :/
11:52 emersion: ^ +1
12:10 danvet: tzimmermann, if you want to keep the common map/buf pointers in the fbdev igt
12:11 danvet: just pull the setup code into the first igt_fixture
12:11 kusma: linkmauve: I don't consider myself like a maintainer of mesa-demos at all, I just had a few itches there lately ;)
12:11 danvet: in case that's not clear from what I typed up
12:11 linkmauve: Ok. ^^
12:12 linkmauve: I guess I’ll scratch my own itches first and submit a MR.
12:12 kusma: :)
12:12 linkmauve: We can decide what to do after that.
12:13 kusma: linkmauve: In principle that sounds fine to me, though.
12:43 tzimmermann: airlied, danvet: is anything wrong with last week's PR for drm-misc-next ?
12:43 danvet: hm I thought airlied applied it, but I didn't check
12:43 danvet: there's also an imx one pending I think
12:44 tzimmermann: ok, i'll just wait then
12:44 tzimmermann: it's not applied AFAICT
12:44 danvet: yeah I thought it was for the backmerge
12:44 danvet: ah this very minute now also pending i915 pull
12:44 danvet: and the imx one is -fixes
12:45 tzimmermann: i was just wondering, because i wanted to prepare this week's update and got a few hundred patches
12:48 danvet: yeah maybe wait for tomorrow, should be done by done
12:48 danvet: *done by then
12:48 danvet: otherwise I can apply it
13:13 vsyrjala: pq: yuv<->rgb matrix is always the same
13:18 vsyrjala: pq: oh and bt.601 originally didn't even specify the chromacities. that stuff was only added in some late revision
13:18 pq: aha
13:19 pq: vsyrjala, now that I read the COLOR_ENCODING property description again, I don't actually understand it at all. What does it do?
13:20 vsyrjala: it selects the yuv<->rgb matrix
13:21 pq: does e.g. value BT.2020 in COLOR_ENCODING mean that both the input and output of the matrix are in BT.2020?
13:22 pq: what property determines whether the wire format is RGB or YUV?
13:23 pq: ...or is it assumed that RGB is always sRGB, and if you pick the BT.2020 RGB->YUV matrix, then it will also map from sRGB to BT.2020, i.e. a very small sub-set of the gamut?
13:24 pq: of the BT.2020 (hypothetical) gamut
13:34 vsyrjala: this stuff doesn't care about the gamut
13:35 vsyrjala: only thing that sort of matters is the trannsfer func, since the matrix operates on non-linear data
13:42 pq: what's the point of BT.601 vs. BT.709 vs. BT.2020 if not gamut?
13:46 pq: or is YCbCr merely a numerical encoding scheme rather than a color space, just like using e.g. 12-bit words to represent values 0.0 - 1.0?
13:57 mukapapa_: Hi, I recently wrote a drm driver for the mipi/spi-based ILI9163V display, based on tinydrm, and found it was mostly copy and paste. So I wondered what you think about building a generic mipi/spi driver that is configured via device tree/debug fs?
14:00 mukapapa: The idea would be that one can start with the generic driver, configure it for bringup via debugfs and if it works, just copy the settings to device tree.
14:00 vsyrjala: i've never seen a full explanation on how the coefficients were chosen. i suppose the gamut may have been one factor in that decision. maybe they also understood human vision a bit better in later years, so the luma vs. chroma split of the new coefficients is maybe more accurate
14:03 pcercuei: mukapapa: https://lkml.org/lkml/2020/8/22/236
14:04 pq: vsyrjala, right. Well, if I ever wanted to make use of that property, I'm going to have to learn that stuff.
14:05 vsyrjala: ycbcr<->rgb matrix is easy. you use whatever your input data says you must use
14:07 pq: that sounds like it really is just numerical encoding
14:07 vsyrjala: yeah
14:07 pq: and that you can use various different color spaces with the same matrix
14:08 pq: ...which is the odd thing to me, if the point to achieve something like lossy compression wrt. keeping human visual quality
14:10 vsyrjala: lots of legacy baggage in all of this. dunno if it can ever make any sense
14:10 pq: heh
14:10 pq: I bet it's not meant to work except for the exact single thing it has been used in practise.
14:10 ajax: pq: the "lossy compression" part is that, if you map the human visual gamut as a blob in 3-space then r/g/b isn't an especially good choice of basis vectors
14:11 pq: ajax, I was more thinking of chroma sub-sampling
14:11 ajax: so was i?
14:12 pq: ok
14:12 ajax: like, look at the rgb->yuv formula and note that when you're computing Y the coeff on R is like twice the coeff of B
14:12 ajax: which means you're wasting a lot of bits on specifying a pointless B value
14:13 ajax: using cr/cb means you get better visual quality for the same number of bits
14:14 ajax: you can subsample the r/b planes too if you want and it has sort of the same effect, visually, just lower quality
14:14 pq: it seems you are talking about quantizatio, but I was referring to spatial sub-sampling? But both have a point.
14:16 ajax: yeah, sorry, they're orthogonal
14:17 pq: :-)
14:17 ajax: was just trying to say that the ycrcb basis vectors more closely describe the shape of the gamut blob than rgb
14:18 ajax: after which it still turns out that your red and blue sensitivity is way worse both in both space and intensity so you can use few bits per sample and/or fewer samples
14:19 pq: indeed
14:20 vsyrjala: very fortunate that tv started out as black and white :P
14:20 pq: that can be experienced first-hand by looking at e.g. modern blue decoration lights - they look so blurry when no other light is present
14:20 pq: haha
14:20 ajax: specific colorspaces choose particular coeffs for their channels mostly because what they really do is pick an output gamut first and work backwards to the basis
14:22 ajax: vsyrjala: fun fact, when they were first testing adding color to ntsc the test image was of a bowl of fruit, which was being viewed in another room
14:22 ajax: some joker picked up the banana out of the bowl and painted it blue
14:23 ajax: which caused no small amount of confusion and panic on the viewing end
14:26 vsyrjala:puts that on the "moments to visit while time travelling" list
14:32 emersion: please don't change the color of the banana again when time travelling…
14:34 pq: maybe that explains something about all the RGB/BGR confusion we never seem to get rid of...
14:42 karolherbst: curro: any good idea on how to deal with this? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7241/diffs?commit_id=67a2e46563d5016305fbf8aea52557f5d0d74f8f
14:42 gitbot: Mesa issue (Merge request) 7241 in mesa "Draft: OpenCL 1.2 image support" [Opencl, Clover, Opened]
14:42 karolherbst: I was thinking about just storing it directly how gallium expects it, but then we might have to change quite some code
14:43 karolherbst: or maybe just fix validation and deal with it internally..
14:43 karolherbst: dunno
14:50 tango_: uh nice
14:50 tango_: openclimages
14:57 karolherbst: tango_: we already have basic support for CL 1.1 image features :)
14:57 karolherbst: just some minor things missing
14:57 karolherbst: that MR is really just about fixing some additional bugs and adding CL 1.2 features
15:05 karolherbst: curro: I was also thinking that maybe we don't need those image subclasses and could simplify the image creation code by just setting all values properly and get rid of the big switch in there...
15:14 ajax: peglgears on nvidia: 57090 fps. peglgears on zink on nvidia: 5984 fps.
15:14 ajax: _some_ room for improvement
15:15 jekstrand: cmarcelo: It's for cases like lower_io where we want to enfoce that you lower all the generic modes simultaneously.
15:16 kisak: ajax: my brain tried really hard to read that as "pepegears", as in cryptic Twitch emote slang :/
15:16 jekstrand: jenatali: I could write it that way, I suppose.
15:16 jekstrand: jenatali: It might actually make things quite a bit cleaner if I defined may_be/must_be first.
15:17 cmarcelo: jekstrand: right, I initially spend time trying to figure out "why is this not just must?"... I think jenatali definition based on may_be/must_be make things more explicit about their relationship.
15:27 jekstrand: cmarcelo: Updated
15:28 jekstrand: cmarcelo: I'm now defining may/must first and the others in terms of those
15:28 jekstrand: cmarcelo: I've also made the asserts in _is and _is_one_of clearer.
15:28 dschuermann: jekstrand (or any other intel guy): anyone opposed to https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4791/diffs?commit_id=76b96e3146de04d9c318a04c456a01a067863398 ? the idea is to make it possible for bitsize lowerings / vectorizations and such to be conditional
15:28 gitbot: Mesa issue (Merge request) 4791 in mesa "aco: implement 16-bit integer arithmetic instead of lowering" [Aco, Nir, Opened]
15:28 dschuermann: pendingchaos: ^
15:42 jekstrand: dschuermann: Mostly looks good.
15:42 jekstrand: dschuermann: Left a comment but should be pretty trivial.
15:55 dschuermann: thx
16:27 danvet: bbrezillon, I guess I need to dig out my patches again
16:27 danvet: for the annotations
16:27 danvet: it should go into drm/scheduler code
16:27 danvet: but if you launch a work queue or so
16:27 danvet: you need to also annotate that one
16:36 jenatali: jekstrand: Let me know if any of your updates are significant enough that you'd like a re-review, it's a big enough MR that trying to track changes and continually re-review isn't really feasible :) Otherwise I'm assuming they're just minor tweaks in response to feedback
16:36 karolherbst: nice, runtime with array images fixed :)
16:37 jekstrand: jenatali: Fair. I don't think there's been much, honestly.
16:37 jekstrand: jenatali: Mostly just moving stuff between patches.
16:37 jenatali: jekstrand: Yeah, based on the comments back and forth that's what it seemed like, just wanted to make sure
16:38 jekstrand: jenatali: Might want to review OpGenericPtrMemSemantics
16:38 jekstrand: jenatali: But I've not got any CTS tests to pass yet so maybe not quite yet?
16:39 jenatali: jekstrand: I skimmed the patches you pushed yesterday afternoon, they all seem a step in the right direction, and that one specifically looks good to land; I'll comment my r-b on the MR
16:39 jekstrand: jenatali: Ok, cool.
16:39 jekstrand: jenatali: I'm going to try to see if I can get a bunch of CTS tests passing today
16:39 jenatali: jekstrand: Are you going to try to get CL2.0 global variables working?
16:39 jekstrand: jenatali: That way I can have confidence that the patches are doing what we need and that all the SPIR-V bits are in place.
16:40 jekstrand: Obviously, we need CL 3.0 for advertising it.
16:40 jekstrand: jenatali: Yeah, I'm going to try to at least get a hack going there.
16:40 jekstrand: jenatali: We'll see if I succed. :D
16:40 jenatali: Cool, good luck :)
16:41 jekstrand: I think my plan will be to land the NIR/SPIR-V bits once we've proven they work and leave the clover bits for a WIP MR that'll have to wait until we get basic 3.0.
16:41 jenatali: That sounds reasonable to me
16:43 karolherbst: jekstrand: quick review: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7241/diffs?commit_id=26695be1a5d566520ee2c97f5866c5e10111bb7f
16:43 gitbot: Mesa issue (Merge request) 7241 in mesa "Draft: OpenCL 1.2 image support" [Opencl, Clover, Opened]
16:44 karolherbst: or should I rather swizzle everything out besides the last one? mhh
16:44 jekstrand: karolherbst: Has my patch to add 0.5 already landed?
16:44 karolherbst: yeah
16:44 karolherbst: with the first series
16:45 karolherbst: but no array images yet, so it's fine
16:46 jekstrand: karolherbst: rb
16:47 karolherbst: thx
16:48 karolherbst: mhh
16:49 karolherbst: something off with writing to 1Darray
16:53 jekstrand: karolherbst: I'm thinking of splitting vars_to_explicit_types in two. In particular, separating out variable location assignment into its own thing.
16:54 jenatali: jekstrand: Not that I think it's a bad idea but I'm curious, what for?
16:55 jekstrand: jenatali: We need different handling for constants and global vs. others.
16:55 jekstrand: Though maybe the only real difference there is that we need to scrape initializers?
16:55 jekstrand: Maybe that's what I need to split out?
16:56 jenatali: Why can't you scrape initializers after vars_to_explicit_types?
16:56 jekstrand: Yeah, that's just what I'm thinking.
16:56 jekstrand: I think I totally can.
16:59 bbrezillon: danvet: you mean, we should annotate what's in the work function, not the schedule_work() call, right?
16:59 danvet: bbrezillon, well the work function annotated in the driver
16:59 karolherbst: jenatali: btw, do you support any optional formats in your runtime?
16:59 curro: karolherbst: image subclasses are intended to avoid switch-case statements, they shouldn't require more than one switch statement at image creation time
16:59 danvet: and the various drm/scheduler callbacks already annotated in drm/scheduler code
16:59 danvet: that should also cover the schedule_work
17:00 karolherbst: curro: well.. but they don't help with avoiding them
17:00 danvet: but from a lockdep pov schedule_work does nothing
17:00 bbrezillon: right
17:00 bbrezillon: we're on the same page
17:00 danvet: ok
17:00 karolherbst: or well.. we wouldn't need switch statements anyway
17:00 bbrezillon: then I wonder why I need to annotate the reset function if the drm_sched_start/stop() are already annotated
17:01 jenatali: karolherbst: Yeah, I think so? I mapped all CL formats to corresponding DXGI formats and then just query if the D3D implementation supports them
17:01 danvet: bbrezillon, atm they're not
17:01 danvet: https://cgit.freedesktop.org/~danvet/drm/commit/?id=642ff01df4c49f25db3b421cb00da81f15388fbc
17:01 karolherbst: jenatali: ahh, okay
17:01 danvet: https://cgit.freedesktop.org/~danvet/drm/commit/?id=dcc2c4696baffce20b2d42b4a3e8fdb174b1c9e0
17:01 karolherbst: jenatali: I just had troubles with some of them and no idea if mesa is broken or the CTS
17:01 karolherbst: like those "x" formats
17:01 jenatali: karolherbst: Which ones?
17:01 curro: karolherbst: they don't help avoiding /every/ switch statement, there is image type special-casing result of API inconsistencies they don't intend to address. removing the image subclasses wouldn't fix that either
17:01 danvet: but the the work func for your own thing would need driver annotations no matter what
17:01 karolherbst: and luminance/intensity
17:01 jenatali: karolherbst: I don't think we support any of those
17:02 jenatali: I think I tried turning them on but the CTS complained
17:02 karolherbst: curro: but it would get rid of the one at image creation time
17:02 karolherbst: and gets rid of all the subclassing
17:02 danvet: bbrezillon, so would be good to test with these two + work func annotated
17:02 bbrezillon: danvet: uh, really! that's super tricky to get right if that's the case
17:02 danvet: and see what goes boom
17:02 karolherbst: jenatali: ahh, right
17:02 karolherbst: jenatali: I think nobody supports it in hw anyway so the CTS might just be buggy?
17:02 karolherbst: dunno
17:02 danvet: bbrezillon, hm what do you mean?
17:03 karolherbst: but yeah.. anyway, that was just an idea I was thinking about
17:03 bbrezillon: I mean that figuring out what might or might not call drm_fence_signal() when you use high level helpers is far from trivial
17:03 jenatali: karolherbst: Dunno. I think the only RGBx we support is the 101010 format
17:03 karolherbst: I am more curious if you have any idea on how to deal with 1Darray specifically
17:04 karolherbst: jenatali: it's the only combination allowed by the spec anyway
17:04 karolherbst: with 1010101
17:04 karolherbst: ehh
17:04 karolherbst: but yeah...
17:04 karolherbst: I've turned them of for now as well
17:04 karolherbst: they are ... strange
17:04 curro: karolherbst: but you'd likely need a different switch statement in order to init the fields correctly anyway, so i don't see much of a benefit, and there may be a cost of... additional switch case statements
17:05 karolherbst: yeah.. dunno. I didn't try it out yet
17:05 karolherbst: I was also thinking of having a image_base template class to get rid of some of the subclassing pain, like the dimension stuff
17:05 karolherbst: but.. mhh
17:05 karolherbst: it's all a bit annoying
17:05 jenatali: karolherbst: https://github.com/microsoft/OpenCLOn12/blob/master/include/formats.hpp#L139 - "This is failing a bunch of tests" :)
17:05 curro: karolherbst: regarding the array coordinate fix, i don't really mind using the gallium ordering of components or the CL one, we should just choose one of them and stick to it consistently in the api/ and core/ code
17:05 karolherbst: yeah..
17:05 bbrezillon: danvet: anyway, I'll add those annotations to panfrost_reset() and see if lockdep complains
17:06 karolherbst: I am inclined to convert on the api level and stick internally to how gallium does things
17:06 karolherbst: jenatali: heh, you could have some macro magic making it easier to declare all of those :p
17:06 jenatali: Eh, there's not that many
17:07 karolherbst: jenatali: your list also looks very small
17:07 karolherbst: definetly smaller than what we support
17:07 jenatali: karolherbst: Maybe it ends up just being the required set after that, I dunno
17:07 karolherbst: yeah.. looks a bit like that
17:08 jenatali: We can probably light up the depth extension at some point, but I didn't look much beyond that
17:08 karolherbst: I also expose CL_RA fully
17:08 karolherbst: and CL_A
17:08 karolherbst: jenatali: yeah... but that requires CL 2.0 compiler support
17:08 karolherbst: I checked it out already, but was very painful
17:08 jenatali: ? Isn't depth an extension to 1.2?
17:08 karolherbst: jenatali: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7241/diffs?commit_id=4b5777616f0b3eefe844764b5e58c647ec663195
17:08 gitbot: Mesa issue (Merge request) 7241 in mesa "Draft: OpenCL 1.2 image support" [Opencl, Clover, Opened]
17:08 karolherbst: jenatali: might be an extension as well?
17:09 curro: karolherbst: that sounds fine to me, but instead of your patch i think it would be better to do it from the vector() constructor, it would make sense to pass it the image as extra argument so it's able to reorder components according to the image type
17:09 karolherbst: but clang complained about missing overloads
17:09 jenatali: Ah, yeah extension in 1.x and required in 2.0
17:09 jenatali: karolherbst: https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_Ext.html#cl_khr_depth_images
17:09 karolherbst: ahh
17:09 karolherbst: maybe I just need to wire up the extension then
17:09 karolherbst: fun
17:10 karolherbst: curro: yeah... I was thinking about that as well
17:10 karolherbst: I just found every solution ugly enough to start thinking about better ones
17:10 jenatali: Ugh though D16 is required... D3D10 dropped D16 and only supports D24/D32
17:10 karolherbst: yep
17:11 karolherbst: uhh wait
17:11 jenatali: Unless this extension spec is just wrong
17:12 karolherbst: no, 2.0 requires CL_DEPTH with CL_UNORM_INT16 at least
17:13 karolherbst: jenatali: but you don't have to support it for 3.0 soo...
17:13 jenatali: Yeah... that's a shame, I'd like to light up support for float or unorm24, but if I have to claim unorm16 to do it that's kinda bad
17:13 jenatali:sighs
17:13 karolherbst: jenatali: but yeah. why was D16 dropped? nobody using it?
17:14 jenatali: Dunno, D3D10 predates my time in the industry by quite a while
17:14 karolherbst: ahh
17:15 jenatali: I know we tried to consolidate things to avoid having multiple ways to do the same thing, like killing RGBA/BGRA channel ordering (only to have to bring it back for 8bit unorm because we picked the wrong one...)
17:16 curro: karolherbst: i just don't like vector_t having CL component ordering at one point until you fix it up to have gallium component ordering, i'd rather have the type of the object imply what the meaning of its contents is
17:16 karolherbst: right
17:16 karolherbst: but that might mean we have to change validation as well
17:16 karolherbst: but that might be fine
17:16 karolherbst: or we validate first and then construct objects...
17:16 karolherbst: dunno
17:17 curro: karolherbst: or you validate as part of constructing the vector_t object
17:17 karolherbst: mhhh
17:17 karolherbst: that might work
17:19 curro: karolherbst: it might involve splitting the vector() constructor into a variant for origin vectors and another one for region vectors, which i think would make sense to do anyway
17:19 karolherbst: yeah, probably
17:40 jekstrand: karolherbst: How do I get at the program from clover::nir::spirv_to_nir?
17:41 karolherbst: go up the stack a little
17:41 karolherbst: I think...
17:42 karolherbst: yeah
17:42 karolherbst: program::link -> link_program -> spirv_to_nir
17:42 karolherbst: can probably just pass in whatever you need
17:42 karolherbst: or store the information in the module object
17:48 jekstrand: Oh, yeah, it gets called by program::link
17:48 jekstrand: So just need to pass the program down a couple steps
18:00 jekstrand: Ugh...
18:00 jekstrand: /usr/include/c++/10/bits/unique_ptr.h:83:16: error: invalid application of ‘sizeof’ to incomplete type ‘clover::root_buffer’
18:00 jekstrand: Never mind. Found my missing include
18:27 bnieuwenhuizen: airlied: does drm-next only pull stuff around the merge window?
18:29 airlied: bnieuwenhuizen: nope constantly
18:29 jekstrand: karolherbst: Got a test passing with a global variable \o/
18:30 airlied: usually from rc2 to rc6
18:30 bnieuwenhuizen: ah so we're right in the period
18:31 airlied: yup, though usually only misc, amd and i915 are active
18:32 bnieuwenhuizen: right, just wondering when I can start landing mesa stuff for modifiers
18:32 airlied: other subtrees usually just do one pull ar rc6
18:32 karolherbst: jekstrand: nice!
18:32 airlied: so whenever agd5f unloads his tree if the patch is in there
18:38 jenatali: jekstrand: \o/ that was fast
18:45 karolherbst: mhh, right, this even chain stuff is still broken :/
18:45 airlied: tzimmermann: just on the backlog, waited for rc2 will get them today or tmrw
18:45 karolherbst: and nothing I tried seems to help
18:49 jekstrand: karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7433
18:49 gitbot: Mesa issue (Merge request) 7433 in mesa "WIP: clover/nir: Generic pointer support" [Nir, Opencl, Clover, Opened]
18:49 jekstrand: karolherbst: I'm going to leave my WIP clover patches there when this is all done
18:51 karolherbst: jekstrand: clover: Add a per-program global buffer is wrong
18:51 karolherbst: you can't assert the data to be identical
18:51 jekstrand: karolherbst: Why? It has to be.
18:51 karolherbst: ohh, the preuploaded one..
18:51 karolherbst: right
18:52 karolherbst: yeah.. it's fine
18:52 karolherbst: I was just reading it and though you were to check the buffer content
18:52 jekstrand: No, I specifically don't do that. :D
18:52 karolherbst: yeah..
18:52 jekstrand: It kind-of sucks that we have to carry around a copy just for an assert.
18:52 karolherbst: Maybe the phrasing in the commit could be improved a little though
18:52 jekstrand: I guess I could make the copy only exist in debug builds
18:52 karolherbst: yeah...
18:52 karolherbst: maybe
19:02 jekstrand: karolherbst, jenatali: The CTS for generic pointers does test memcpy. And it's broken, of course. :)
19:02 jenatali: \o/
19:02 jenatali: What's broken still?
19:02 jekstrand: Still working on that
19:02 jekstrand: by the time I get to lower_memcpy, my pointer casts are gone.
19:02 jekstrand: I don't know why yet
19:11 airlied:realises I didn't run in wimpy mode
19:11 airlied: relational going on 32 hours no
19:13 jekstrand: Found it. We were calling nir_lower_io on constant memory too early
19:14 jekstrand: airlied: That'll do it!
19:16 jenatali: airlied: Are you supporting doubles?
19:16 jenatali: 32 hours for relationals seems... wrong... WARP passes in 14 minutes
19:17 jenatali: Eh granted that was a beefy machine
19:17 airlied: jenatali: I have doubles but I shoudln't be lowering them
19:17 jenatali: airlied: I was just asking since our run skipped those tests
19:17 airlied: though my llvm build might be debug heavy
19:18 jenatali: airlied: A non-wimpy conversions test would probably take months at that rate :)
19:18 airlied: jenatali: currently I think conversions crashes for me and fails to build kernels :-P
19:21 jekstrand: jenatali, karolherbst: AFAICT, I'm passing everything now except for library_function which requires some linking stuff, I think, and a few which crash in our back-end in RA.
19:22 jenatali: Awesome
19:22 jenatali: Sounds like we'll need to recompile libclc to add generic pointer overloads to the library functions
19:23 airlied: yeah not sure clc kept up with 2.0 stuff
19:23 jenatali: Ah, might need to actually implement it then too...
19:23 jekstrand: Oh, and I'm doing it with opt_copy_prop_vars disabled because of the copy-to-the-wrong-address-space issue.
19:23 jenatali: Or maybe we can just fake it in nir :P
19:23 jekstrand: I'm still not sure how I want to handle that
19:23 jekstrand: But I think I'm ok kicking that can down the road
19:23 karolherbst: jekstrand: cool
19:24 airlied: jenatali: where's the Linux WARP port :-P
19:24 jenatali: airlied: You joke but we talked about it. I started on it at one point but the platform abstraction that was there in the past got reallllly leaky over time
19:38 karolherbst: curro: yeah.. so the event code seems to be a bit busted. even calling status() triggers the chain and I kind of don't see where I could clean up things earlier. Maybe there is something I overlooked, but it doesn't look good overall.
19:41 karolherbst: mhh.. maybe in "clover::event::wait_signalled" we could actually do something..
19:42 karolherbst: or rather event::trigger_self..
19:48 jekstrand: cmarcelo: For opt_find_array_copies, I think you're right. We should be able to use foreach_aliasing_node, I was just being lazy and not thinking through it.
19:48 curro: karolherbst: i'm not aware of status() triggering the chain
19:48 curro: karolherbst: what problem are you trying to fix?
19:50 curro: karolherbst: is this about the stack overflow you noticed the other day? if so did you try my suggestion of clearing the dependency list on ::fence()?
19:51 karolherbst: I think I did.. let me try again
19:55 curro: karolherbst: if my suggestion doesn't work it would indicate that the problem occurs because the application is not flushing the command queue for a huge amount of time (you should be able to confirm this by setting a breakpoint at command_queue::flush()), in which case we should probably flush more often
19:56 karolherbst: curro: yeah, your suggestion doens't help.. now breaking on flush
19:56 karolherbst: and see what happens
19:56 jekstrand: cmarcelo: Ok, I've fixed find_array_copies. Can you look at my reasoning and see if you agree with it?
19:56 jekstrand: cmarcelo: If you're good with that one, I'd like to land it after one final Jenkins run.
19:56 karolherbst: curro: yeah... it never flushes
19:57 curro: karolherbst: aha -- then i suggest adding an implicit flush whenever the command queue reaches some arbitrary large threshold of commands in order to limit its size
19:58 cmarcelo: jekstrand: looking
19:58 karolherbst: curro: in command_queue::sequence?
19:58 karolherbst: guess checking how big queued_events is and stuff
19:59 karolherbst: yeah.. let's see
19:59 curro: karolherbst: yeah that would make sense
19:59 karolherbst: ahhhh
19:59 karolherbst: recursive locking :D
20:01 curro: karolherbst: you're free to switch to a recursive_mutex to avoid deadlock
20:02 karolherbst: I just add flush_unlocked being a private method and call it inside flush and sequence
20:02 curro: karolherbst: you can do that too, but recursive mutex would probably be less typing
20:03 karolherbst: yeah, just recursive locks have this "I was too lazy to fix the bug, so I just use a recurisve lock" taste
20:03 karolherbst: anyway, already done
20:03 curro: heh
20:04 karolherbst: oh wow
20:04 karolherbst: I flush with 500 entries
20:04 karolherbst: every other subtest triggers this threshold
20:04 karolherbst: and the test runs like 200 sub tests
20:04 karolherbst: maybe even more
20:05 karolherbst: not sure what is a good threshold though
20:05 curro: karolherbst: may make sense to increase the threshold an order of magnitude if you hit it so often
20:05 karolherbst: yeah
20:05 airlied: jenatali: just port llvmpipe to d3d and fix it up :-P
20:06 karolherbst: how big should stack grow?
20:06 karolherbst: :D
20:08 karolherbst: let's go with 5000.. I think that's good enough
20:09 curro: karolherbst: should probably be enough for traversing an event graph with hundreds of thousands of nodes typically, so yeah a limit of a few thousands seems safe
20:11 airlied: dcbaker[m]: is the 20.3 release schedule going to go ahead to plan
20:11 karolherbst: heh
20:11 karolherbst: "clFinish failed: -14PASSED 475 of 475 sub-tests."
20:11 karolherbst: *sgh*
20:12 curro: seems like the test doesn't care about stuff actually finishing? ;)
20:12 karolherbst: doubtful
20:12 karolherbst: I expect it to be something dumb
20:12 karolherbst: it does find bugs
20:12 karolherbst: so I think at least the subtests are fine
20:14 dcbaker[m]: airlied: assuming I can get my system set back up in time, yes
20:14 dcbaker[m]: my hard drive decided yesterday would be a great day to die
20:15 jenatali: karolherbst: Which test is that?
20:15 karolherbst: image_read_write
20:15 karolherbst: but only 1D
20:15 jenatali: That seems wrong...
20:15 karolherbst: what seems wrong?
20:15 jenatali: That it doesn't care about finishing
20:16 karolherbst: yeah, I am sure it does care
20:16 airlied: dcbaker[m]: ouch
20:16 karolherbst: but my flushing change might have broken something
20:16 karolherbst: will debug
20:16 karolherbst: the problem is, it just takes a while :)
20:16 jenatali: For the record, my runtime doesn't have any implicit flushing behaviors, it only submits work from the queue when asked
20:16 jenatali: And I haven't seen any problems with that test
20:16 karolherbst: yeah...
20:17 karolherbst: I don't think CL requires any ordering unless explicitly stated, no?
20:17 jenatali: Ordering is implied unless you create an out-of-order queue
20:17 dcbaker[m]: airlied: thank God for backups. If only I'd backed up all of that stuff you never think to back up like vpn certificates... :/
20:17 karolherbst: ahh
20:17 karolherbst: right
20:17 karolherbst: jenatali: we just do some dependency chaining internally
20:17 karolherbst: and this breaks if there are no flushes
20:18 karolherbst: you have to flush to the hardware anyway, and this all works, it's just the event tracking in clover which is a bit busted
20:18 airlied: dcbaker[m]: I overly rely on git push as my backup mechanism, and possibly gmail, I should rectify that
20:19 jenatali: karolherbst: Yeah, same
20:19 dcbaker[m]: I have an ssd in an enclosure with rsync. rsyncing the right stuff is hard though
20:20 karolherbst: heh
20:20 karolherbst: the test uses CL_MEM_KERNEL_READ_AND_WRITE...
20:20 karolherbst: *sigh*
20:20 airlied:has a synology I use for my wife's mac, I should probaly use it for my laptop as well :-P
20:20 karolherbst: explains the "ERROR: Unable to get count of supported image formats! (CL_INVALID_VALUE from ../test_conformance/images/common.cpp:137)" error
20:21 karolherbst: curro: okay.. so 5000 is too much
20:21 karolherbst: my stack has ~8800 entries
20:21 karolherbst: 2 for each event
20:21 karolherbst: so maybe we should go for 1000...
20:21 karolherbst: just to be safe
20:21 jenatali: How do you have 5000 events without flushing any of them...?
20:21 karolherbst: the test does a lot of calls
20:22 jenatali: Without waiting for a map or anything to read back data?
20:22 karolherbst: no
20:22 karolherbst: we do submit to the hardware
20:22 karolherbst: this is really just the api level tracking of events
20:22 curro: karolherbst, jenatali: i don't see any evidence of event tracking being broken if it were to run on a turing machine with unlimited storage :P, we just want to make sure the size of the event graph remains manageable for it to work on real machines with limited storage if the application forgets to flush periodically
20:22 karolherbst: so we just insert everything into the queue
20:23 jenatali:shrugs
20:23 karolherbst: yeah.. I don't think we should insert everything..
20:23 karolherbst: and I don't even think we need the graph at all
20:23 jenatali: I picked a very different paradigm, using a dedicated thread that runs in lockstep with the device, responsible for submitting work to the hardware and updating event state after completion
20:23 karolherbst: but mhh
20:23 agd5f: bnieuwenhuizen, airlied I'll send a PR this week
20:24 karolherbst: jenatali: sounds sane
20:24 karolherbst: and I guess it drops stuff once the hw reaches a fence or something
20:24 jenatali: karolherbst: The event tracking in Clover is one of the places where I wasn't convinced it'd work out well, and trying to reshape Clover didn't seem feasible either, so we did our own thing
20:25 karolherbst: yeah... it feels quite overengineered tbh
20:25 jenatali: Yeah, after each chunk of GPU work is done the corresponding events are marked completed
20:25 karolherbst: mhh
20:26 karolherbst: not sure if gallium would allow us to wait on a fence or something...
20:26 karolherbst: mhh
20:26 karolherbst: probably
20:26 curro: jenatali: so what's the limitation of Clover's event handling that your paradigm solves specifically?
20:26 karolherbst: but then it depends on the driver to use some smart blocking or busy looping
20:26 jenatali: curro: I wouldn't necessarily say limitation, just trying to make something that's spec'd to be asynchronous, by being lazy about updating state seemed... hard to get right
20:28 curro: karolherbst: we need some sort of acyclic graph since an event may have an arbitrary number of dependencies and something needs to make sure the corresponding work isn't executed until the dependencies complete
20:28 karolherbst: ohh, explicit dependencies we have to track, sure
20:29 karolherbst: I just get the feeling things could have been a bit simplier, that's all
20:29 curro: karolherbst: sure, we could omit implicit dependencies and combine linear chains of nodes into single nodes in the graph, but AFAIA that requires more algorithmic effort than just using the same graph mechanism for all events, not less
20:30 jenatali: I came to the same conclusion
20:30 jenatali: I tried to build that first but gave up :P
20:30 curro: karolherbst: actually that would probably be an alternative way to fix your problem, but i have a feeling it would be more complicated than what we have right now
20:30 karolherbst: probably
20:31 karolherbst: uf...
20:42 curro: karolherbst: another alternative fix would be to change the implementation of ::wait() to be non-recursive, but once again the result would be more complicated than what we have right now, and unlike adding the implicit flush it wouldn't solve the fundamental problem in applications that forget to flush, since the non-recursive implementation would still require O(n) storage, it would just delay the point at which the program gets
20:42 curro: killed
20:43 curro: that's likely to be the reason why jenatali's implementation survives this test-case even though it does no implicit flushing so it's likely to suffer from the same fundamental problem
20:43 igorkovalenko: airlied: hi would you be able to ack this trivial one-liner https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7417 for old r600?
20:44 gitbot: Mesa issue (Merge request) 7417 in mesa "r600: amend space check for chips older than EVERGREEN" [R600, Opened]
20:44 jenatali: curro: I don't understand how you can get 5000 events pending if the app is doing readback... doesn't that require waiting for readback events to be complete?
20:44 jenatali: Even if there's no explicit flushes
20:45 karolherbst: curro: I think there might actually be a more annoying problem...
20:45 karolherbst: like fence being NULL
20:45 curro: jenatali: no idea... apparently it's not reading back often enough
20:46 karolherbst: well, it does read back
20:47 curro: karolherbst: that shouldn't happen after the event has been flushed unless the pipe driver isn't giving us a fence
20:47 karolherbst: yeah.. currently trying to debug this
20:48 airlied: igorkovalenko: done
20:48 ajax: what exactly is pipe_loader for
20:49 airlied: ajax: for some backends with load the gallium pipe drivers
20:50 airlied: other than that I'm not sure what it does, I think ties src/loader to gallium pipe drivers
20:51 curro: yes, it's an auxiliary library to load pipe drivers for the available DRM or software devices in the system
20:54 ajax: curro: so if the driver i'm trying to load is backed by a VkPhysicalDevice (rather than a /dev node or just software) i would probably want to add another pipe_loader_device_type and .../pipe-loader/pipe_loader_vk.c ?
20:55 ajax: which would do mostly the same things as far as driconf parsing, just against a different "device" class
20:56 curro: ajax: yeah that would make sense
20:56 ajax: aight. one more thing to subclass. thanks
21:27 airlied: ajax: I updated the clear buffer llvmpipe patch in !7416 to do 1 + 4 byte clears without memcpy in a loop
21:31 ajax: airlied: sweet
21:48 jekstrand: cmarcelo: Did you have more comments on generic pointers or are we good to go there?
21:55 cmarcelo: jekstrand: no extra comments from me
21:57 jekstrand: cmarcelo: Cool, thanks for reviewing! That series is much much better for your and jenatali's review!
21:59 cmarcelo: \o/
22:01 airlied: ajax: btw there is a #zink
22:02 jekstrand:joins #zink just for grins
22:24 karolherbst: curro: mhhh.. seems like there are some events in the chain without a fence object :/
22:24 karolherbst: and we wait on that
22:24 karolherbst: so screen->fence_finish fails
22:24 karolherbst: and never got a _fence
22:29 karolherbst: ohhh
22:29 karolherbst: wait...
22:30 karolherbst: jenatali: do you call clFlush inside clFinish?
22:31 jenatali: karolherbst: Yeah, indirectly
22:31 karolherbst: yeah....
22:31 karolherbst: I don't think we do
22:31 jenatali: karolherbst: https://github.com/microsoft/OpenCLOn12/blob/master/src/queue.cpp#L199
22:31 jenatali: Where clWaitForEvents sees that the event is still enqueued, and therefore flushes that queue
22:32 karolherbst: yeah...
22:32 karolherbst: I think this is actually the bug
22:32 curro: karolherbst: we should, indirectly, from hard_event::wait()
22:33 karolherbst: curro: well.. ...
22:33 karolherbst: status() == CL_QUEUED
22:33 karolherbst: status() doesn't work if there is no _fence
22:33 karolherbst: huh
22:33 karolherbst: no
22:33 karolherbst: wait breaks
22:33 karolherbst: which is before the flush
22:34 karolherbst: event::wait(); -> ev.wait(); -> if (!_fence) throw error(CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST)
22:35 curro: karolherbst: hum? that throw exception is thrown right after the flush() call if there still is no fence
22:36 karolherbst: only if status is CL_QUEUED
22:36 karolherbst: but still, there is no _fence object
22:36 curro: which should be the case unless it already has an error status
22:36 karolherbst: mhhh
22:37 curro: (or unless it already has a _fence object)
22:37 karolherbst: yeah... let me run into the issue again, but I already inserted a flush and see if it changes anything
22:42 karolherbst: curro: yeah....
22:43 karolherbst: calling "obj(d_q).flush();" in clFinish fixes it
22:44 curro: karolherbst: hm, so why isn't the flush() call already in there having the same effect?
22:45 karolherbst: I guess that status returns something else
22:46 karolherbst: but yeah..
22:46 karolherbst: it's odd
22:46 karolherbst: let's see
22:46 curro: karolherbst: which should only happen if the event has an error status in which case throwing an exception seems like the right behavior, or..?
22:47 karolherbst: maybe.. let's see
22:47 karolherbst: nope
22:48 karolherbst: clFinish doesn't really have any errors
22:48 karolherbst: except thouse out of resrouces, or "invalid command queue" one
22:48 karolherbst: so we can't return CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST
22:49 curro: hm, i guess you can return CL_SUCCESS instead, but we cannot wait for the event to finish if it's not going to run because it has an error status
22:50 karolherbst: right
22:50 karolherbst: but not sure if it's even in an error state
22:50 karolherbst: flushing doesn't produce an error
22:50 karolherbst: so it does return CL_SUCCESS then
22:51 karolherbst: mhh
22:51 karolherbst: I think we just run out of memory or something in the end... or something stupid
22:51 karolherbst: calling status() in gdb also runs into an issue
22:51 karolherbst: it's all a bit odd
22:52 curro: karolherbst: can you maybe print the status at wait time to make sure it's really due to an error status that it doesn't flush the queue?
22:53 karolherbst: yeah
22:53 karolherbst: waiting is annoying :D
22:53 karolherbst: I hate those bug where one iteration takes like 3-4 minutes
22:58 karolherbst: curro: soo. I got "3" which is... CL_QUEUED...
22:58 karolherbst: but I think I can debug into it
22:58 karolherbst: wait isn't called as often anyway... maybe that works
22:59 karolherbst: I am sure it's something stupid
23:02 curro: oh... so flush *is* being called, it just doesn't want to give you a fence
23:05 curro: karolherbst: i have a hunch what's going on. is the event that fails to get a fence the same one that causes the queue to go above the flushing threshold you set?
23:05 karolherbst: might be
23:07 karolherbst: mhhh
23:07 karolherbst: so that NULL comes from nouveau actually
23:08 karolherbst: I think.. let's see
23:09 karolherbst: curro: I mean, things are getting flushed correctly an the stack is significanntly smaller. that's all fine, it's just odd why the flush earlier seems to help if the queue is getting flushed for real
23:13 curro: karolherbst: could it be that the driver doesn't give us a fence whenever the hardware doesn't have any real work batched up because the only thing in the queue is the no-op hard event emitted by clFinish()? but that doesn't explain why flushing earlier helps at all...
23:16 karolherbst: sooo
23:16 karolherbst: you get only a fence as long as _wait_count is 0
23:16 karolherbst: right?
23:17 karolherbst: that's the queued_events.front()().signalled() check inside flush
23:17 karolherbst: but that's 0 as well
23:17 karolherbst: mhhh
23:18 karolherbst: huh
23:18 karolherbst: but all need ot have a wait_count of 0
23:19 curro: karolherbst: yes, that's related to the other hunch i was referring to a minute ago -- if you're doing the implicit flush at command_queue::sequence() time, the event that triggered the sequence() won't be signalled during the flush, so it won't get submitted to the hardware nor get a fence yet
23:20 karolherbst: even if I flush after adding the event?
23:20 karolherbst: or should I flush first, then add?
23:21 curro: karolherbst: it won't be signalled until it is already in the queue and command_queue::sequence() returns
23:21 karolherbst: mhh, but mhh
23:22 karolherbst: I think the event might just not be in the queue
23:22 curro: yes, it shouldn't matter anyway
23:22 karolherbst: so queued_events is now empty after the loop
23:22 karolherbst: but no _fence
23:24 curro: karolherbst: how would that happen? hard events are always added to the command queue on construction, and only removed by ::flush() at the same point they are assigned a fence
23:25 karolherbst: right, thats inside the flush code of the queue
23:25 karolherbst: so after the queue was worked thorugh, it's empty
23:25 karolherbst: meaning all events should have a fence, no?
23:25 curro: yeah
23:25 karolherbst: maybe I should assert on fence() the argument not being NULL or something...
23:26 curro: yeah