03:33 imirkin: karolherbst: depends - are you replaying on BE? or replaying on LE? if replaying on LE, you need to byteswap everything
03:33 imirkin: karolherbst: https://github.com/apitrace/apitrace/issues/601
04:17 jekstrand: karolherbst: So... Aparently we have an instruction to fetch the surface format. Who knew?
04:17 jekstrand: karolherbst: Of course, it's not in a form useful for CL but it's interesting none the less. :-)
04:18 jekstrand: Unless the CL enums happen to match the intel ones. That'd be creepy
04:19 HdkR: Gotta help those fallback sampling paths :P
04:21 jekstrand: HdkR: It makes image_load_store without a format more implementable.
04:22 jekstrand: Still 95% terrible, but it helps, right?
04:23 imirkin: jekstrand: nvidia has queries which allow reading values out of the texture/sampler descriptors
04:24 jekstrand: imirkin: That's basically what our RSI message does, I guess
04:24 airlied: on amd you can just bitshift :-P
04:24 jekstrand: I knew about the resinfo message through the sampler, but the one I found tonight gives more stuff
04:24 airlied: granted ou'd probably need a lookup table to get cl values
04:24 jekstrand: Like format
04:24 imirkin: right, on amd the starting point is the texture/sampler descriptor :p
04:24 imirkin: jekstrand: resinfo takes a LOD?
04:25 jekstrand: And instruction base address. Why is that in the surface info message? I have no idea....
04:25 jekstrand: imirkin: It does
04:25 imirkin: yeah. that mirrors the nvidia situation. there's an op which takes LOD, one that doens't
04:25 jekstrand: Nah, they both take LOD
04:25 imirkin: oh heh
04:25 imirkin: per-lod formats? :)
04:25 jekstrand: They send to different units and give back the info differently
04:26 jekstrand: we have resinfo which is a sampler message and just returns size
04:26 jekstrand: sampleinfo which is a sampler message and returns number of samples and some misc bits from the surface state. Literally, 3 bits whose only purpose is to be returned by that message.
04:27 jekstrand: Then we have RSI which I just found tonight which return size, type (1/2/3D, etc.), mip count, surface format, and instruction state base address.
04:29 jekstrand: Kayden: With a message that returns ISBA, we can do our shader patching trick for constant data even with relocations! 🙃
04:29 jekstrand: The number of uses for this I've come up with so far is astounding. None of them are good. :-)
04:32 imirkin: self-modifying shader code?
04:32 imirkin: definitely no downsides to that
04:33 imirkin: thankfully there's no silly protection stuff, unlike on the CPU where there's ... pfft ... page permissions. no fun.
04:33 HdkR: I've made self modifying shader code, it works
04:34 imirkin: HdkR: icache flushing seems like it'd be the biggest blocker
04:34 jekstrand: We've got messages for that, I think.
04:34 HdkR: Just jump to a PC that lives outside of icache :P
04:35 imirkin: HdkR: that's not so much self-modifying as self-writing
04:35 HdkR: nah, then jump back once you've ensure it is safe and modified
04:36 HdkR: EZ if you have documentation on caching behaviour :D
04:37 imirkin: one might ask ... why
04:38 HdkR: anti emulation is always a good response
04:38 imirkin: lol
04:38 imirkin: SafeDisc or whatever?
04:40 HdkR: There are plenty of things that do anti-emulation. I believe Safedisc was copy protection
04:43 imirkin: ah ok
05:01 Kayden: jekstrand: Neat...yeah, we probably could do a lot with that
05:01 Kayden: jekstrand: whether or not we should is another question :)
05:02 Kayden: jekstrand: hmm, yeah, we could actually do constant data patching that way
05:02 Kayden: I guess if that let us simplify some code by dropping old paths...
05:03 Kayden: might be able to do some misc texture fixups too
05:10 Kayden: jekstrand: looks like Read Surface Info is new on Broadwell, I don't see it in the IVB docs.
05:12 Kayden: it might exist on Haswell?
05:12 Kayden: there's an encoding but I don't see docs for it
07:36 MrCooper: airlied mareko: if you build LLVM/clang with -Wl,--gdb-index in CMAKE_EXE/SHARED_LINKER_FLAGS, gdb starts up quickly even with debugging symbols
07:42 MrCooper: (which might only be supported by lld or gold, not the bfd linker)
09:43 j4ni: repeating my question from yesterday, has anyone seen a display using DisplayID v2.0 in the wild?
10:00 airlied: j4ni: nope or any references to the spec at all
10:01 airlied: not sure how they expect it to work outside an EDID blocks, as afaik displayid used a different i2c addr
10:56 j4ni: airlied: so, if it's *not* in EDID, surely there then has to be a regular EDID possibly with displayid v1.3 in extension blocks, for backwards compatibility with legacy sources
11:00 j4ni: airlied: I'm trying to understand a feature that strictly by the specs is only defined in displayid v2.0. no idea how real world displays are going to do this stuff.
11:17 karolherbst: imirkin: trace taken on BE doesn't replay on BE
11:17 vsyrjala: j4ni: ddc spec specifies how you can have both while dispid is not an extension block
11:17 karolherbst: apparently none of the APIs are decoded
11:17 karolherbst: either the trace is all 0 or...
11:18 vsyrjala: j4ni: also one option is "just dispid with no compatibility with old edid-only sources"
11:37 j4ni: vsyrjala: interesting, have to look that up. does it say you could still have displayid v2.0 in EDID extension block?
11:38 vsyrjala: i don't think it says anything about the edid ext block option. that's only part of the edid spec iirc
11:43 j4ni: it's in displayid v1.3 spec
11:47 vsyrjala: ah right
11:54 j4ni: vsyrjala: side note, is there some historical reason why DDC spec and i2c subsystem have addresses shifted by 1 bit position?
11:54 j4ni: or, have a difference in the addresses shifted by one
11:57 vsyrjala: the lsb is the r/w bit. some things specify the 7bit address so you need 'address<<1|r/w' others specify it as already shifted to 8bit so you need 'address|r/w'
11:57 vsyrjala: no idea why anyone chose one over the other
11:58 j4ni: ah, I've known this and forgotten :)
11:59 j4ni: anyway, looks like in the future we'll need to probe 0x52 (or 0xa4) for displayid first
12:00 j4ni: I wonder if we can bolt that within drm_get_edid() transparently enough for the drivers
12:02 vsyrjala: hmm. wasn't it so that the normal address can either be edid or dispid. and if it's edid then there could be an extra dispid at the second address?
12:03 j4ni: iiuc you should first try the, uh, second address 0x52, which should only have displayid
12:04 j4ni: if that fails, try 0x50, which should only have edid
12:04 j4ni: of course the edid may have displayid in the extension blocks
12:04 j4ni: based on E-DDC v1.3 section 3.1
12:05 vsyrjala: hmm. did they change that?
12:05 j4ni: - EDID structure, if present, shall be located at DDC address pair A0h/A1h
12:05 j4ni: - DisplayID structure, if present, shall be located at DDC address pair A4h/A5h
12:05 j4ni: chapter 3
12:06 vsyrjala: i have 1.2 here which says different
12:06 j4ni: idk if it has changed
12:07 vsyrjala: sigh. they changed it
12:08 vsyrjala: although the new behaviour is clearer beccause it removes the ambiguity as to which one (edid vs. dispid) we should trust
12:08 vsyrjala: but if any displays implement the old behaviour we might be screwed
12:09 vsyrjala: as in they might have an incomplete dispid at the second address, and expect the source to parse both
12:09 j4ni: ugh, was *that* possible per the spec?
12:09 j4ni: I don't have ddc 1.2
12:10 j4ni: can't find it on vesa site either
12:11 j4ni: how many old displays are going to crap out on trying to access 0xA4?
12:11 j4ni: err 0x52
12:14 vsyrjala:mailed some specs to j4ni
12:14 j4ni: thanks
12:15 j4ni: so any quick thoughts on trying to do the displayid reading transparently within drm_get_edid() vs. basically adding a new function to do it and converting drivers individually?
12:16 j4ni: also, EDID blob property, what if we stick a pure displayid into it?
12:17 j4ni: it should be possible for a parser to distinguish between the two trivially
12:26 vsyrjala: i guess it depends on whether we may need to expose both
12:52 j4ni: right
13:05 MrCooper: karolherbst: sounds like endianness bugs in apitrace itself then, or maybe in a library used by it
13:12 karolherbst: probably
14:08 ajax: j4ni: iirc the display possibly exposing both is a real thing, so please don't cram it into the edid blob. it is not edid by definition
14:25 karolherbst: I always wondered why that gl get device info piglit test fails: https://gist.github.com/karolherbst/413b11d728d336ffd41a87f312ceb067 ...
14:25 karolherbst: ...
14:26 karolherbst: ohh "CL_DEVICE_MAX_READ_IMAGE_ARGS: failed, expected at least 128, got 32 " right...
14:26 karolherbst: Uff
14:43 karolherbst: EdB_, curro, jekstrand, airlied: cleaned up my OpenCL image MR to really only contain some fixes and enablement without adding suport for the CL 1.2 features: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7069
14:43 karolherbst: still not sure if the pipe_image_view breaks r600/radeonsi, but I think images there are considered broken anyway
14:43 karolherbst: and with that it might actually start working
14:44 karolherbst: also, no inline sampler support yet as the patch is still a bit whacky.. will do a proper CL 1.2 image support MR later and fix those things up there
16:52 dcbaker[m]: xexaxo1: is there a way with waffle to tell that a window you've created has been closed?
17:35 xexaxo1: dcbaker: closed as in destroyed, or simply hidden?
17:36 dcbaker[m]: I guess hidden? If someone presses the [x] button in the corner, for example?
17:37 xexaxo1: you're correct - closed is perhaps the best. Nope, I don't think so.
17:39 dcbaker[m]: I can get that information from the native window handles though, I assume? using waffle_window_get_native()
17:39 dcbaker[m]: assuming the underlying thing has visible windows? (not gbm, obviously)
17:40 xexaxo1: yes you can get the native handle, which then you'll get to poke around via winsys specific code
17:40 dcbaker[m]: cool, thanks
17:42 xexaxo1: out of curiosity - is the window really closed by the user, or the DE closes it. Is there a particular usecase that you're thinking?
17:43 xexaxo1:could swear that piglit was using a hidden window by default...
17:44 xexaxo1: ... since windows popping on your screen is a bit distracting
17:44 dcbaker[m]: I'm trying to implement gears on top of waffle
17:44 dcbaker[m]: so having a window is kinda the point :)
17:46 dcbaker[m]: I just need to figure out when the window is closed so that the program can exit the draw loop at that point
17:48 xexaxo1: dcbaker[m]: jljusten had a few branches in the past for wflgears. don't he ever sent them out for review
17:49 dcbaker[m]: interesting, I guess he didn't
17:50 jekstrand: dcbaker[m]: I think you need a gear-shaped waffle iron. Sadly, I can't find one on Amazon or else you'd have one in 1-2 buisiness days. :-P
17:50 xexaxo1: you can check if he's got a branch somewhere. alternatively I think I may have them on another machine
17:51 dcbaker[m]: jekstrand: lol
17:52 xexaxo1: jekstrand: only if it ships with chocolate, cream and fruits ;-)
17:52 dcbaker[m]: I'll ask Jordan and see if he still has one
17:53 jljusten: xexaxo1, dcbaker[m]: not waffle so much. but, gears3d support gbm (and X and Wayland) on gles, gl and vulkan. https://gears3d.org/
17:54 dcbaker[m]: I might steal some of that then
17:56 dcbaker[m]: xexaxo1: another waffle question. If I say WAFFLE_DONTCARE for the window system, is there a way to know what waffle picked for me?
17:57 xexaxo1: even in 2020 gears demains the pinacle benchmark :troll:
17:57 xexaxo1: dcbaker[m]: from memory - a window system must be provided by user
17:57 dcbaker[m]: I thought it was the pinnacle of functional testing
17:58 dcbaker[m]: Ah, you're right, I am picking the window system myself
17:58 dcbaker[m]: apparently trying to remember what I did ~20 minutes before stopping yesterday is hard
18:04 xexaxo1: dcbaker[m]: found the branches and pushed to origin/jljusten/wflgears{,2}
18:05 dcbaker[m]: thanks!
18:05 xexaxo1: they're from 2014, so needless to say things have changed a bit
18:08 xexaxo1:cracks open a bottle of single malt scotch o/
18:08 jljusten: xexaxo1: Heh. I forgot about that. I guess decided to skip the waffle dep and make a separate project. Possibly when thinking about Vulkan support.
18:44 xexaxo1: jljusten: with Vulkan in mind, it makes sense.
18:49 airlied: j4ni: find outnwhat windows does
18:50 airlied: monitor probing is already slow enough
18:51 airlied: adding a poijtless in 99% of cases i2c probe is annoying
18:56 airlied: agd5f: can you look at linus clang warning email asap
18:56 agd5f: airlied, yeah, already looking at it
18:58 airlied: thx!
19:01 airlied: agd5f: cc him on any patch, he can direct apply, as im off today in theory
19:11 agd5f: airlied, will do
19:36 AndrewR: hm ..https://pastebin.com/4ew4rBrg - > CL_KERNEL_PRIVATE_MEM_SIZE / CL_KERNEL_LOCAL_MEM_SIZE = 0 ..is it normal (for llvmpipe) ?
19:55 agd5f: airlied, sent
20:28 Venemo: I see that nir_analyze_range only takes a nir_alu_instr - is there some way to range analyze any random nir ssa that I have?
20:41 Kayden: would be pretty easy to add a wrapper that says *shrug* if ssa->parent_instr->type != nir_instr_type_alu
20:43 karolherbst: AndrewR: depends on the kernel
20:45 AndrewR: karolherbst, ok, cool
21:01 Venemo: Kayden: can it handle phis?
21:03 Kayden: I don't think so.
21:03 Kayden: that might be a nice thing to add
21:05 idr: Right... it doesn't look back through phis because it's hard.
21:07 idr: I've been thinking of adding some support for some non-ALU instructions.
21:08 idr: For example, in many cases we know that the values returned by texture instructions are in the range [0, 1].
21:08 idr: At least that's always the case in older GL apps.
21:09 idr: I noticed that when I was doing some work to add is_a_number and is_finite analysis.
21:10 Venemo: hmm
21:11 Venemo: I need to know whether the number of GS output primitives can be possibly 0 or not
21:12 Venemo: that does come from alu instructions, but very possibly some control flow is involved in determining it
21:12 bnieuwenhuizen: idr: how do you know though, since we don't know the texture format at shader compile time?
21:13 Venemo: right now I just check whether it's a non-zero constant, which is the case for a surprising number of apps
21:14 idr: bnieuwenhuizen: Because before OpenGL 3.0 / GL_ARB_texture_float, it was *impossible* to have a value outside [0, 1].
21:14 idr: (Hence the comment about "older GL apps.")
21:14 idr: So it would probably need a driconf or something.
21:15 idr: It still just an idea. :)
21:16 Venemo: idr, what does the range analysis do right now if it finds a phi?
21:17 idr: Venemo: If it hits anything that isn't a load_const or an ALU, it returns "unknown."
21:17 Venemo: ah, okay
21:17 Venemo: so it's not useless, just a little limited
21:19 idr: I think the most valuable addition to make would be to propagate information through conditional flow control.
21:19 Venemo: idr: different question, but do you think it can be extended so that it can prove whether a value fits in 24 bits?
21:19 idr: I think someone already did that.
21:20 Venemo: where?
21:21 idr: So, your question about phis has an issue: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2348
21:21 idr: I think there was an issue for the bit-mask thing too. I'm looking.
21:22 idr: The other issue is https://gitlab.freedesktop.org/mesa/mesa/-/issues/2399
21:22 idr: And Rhys had a related MR: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2720
21:23 Venemo: I see, that doesn't seem to deal with the 24 bits yet
21:24 idr: If memory serves, I think it adds most of the necessary infrastructure.
21:24 Venemo: yeah
21:24 idr: Some optimizations that use it to generate imul32 (or whatever) still need to be added.
21:24 idr: *imul24
21:25 Venemo: good to know that you care about this too :)
21:25 idr: On most of our platforms we'd like to be able to generate imul_32x16 when possible.
21:26 idr: I just keep forgetting to come back to it. :)
21:27 Venemo: the reason we care is because we have a 24-bit multiplication instruction that takes fewer cycles than the 32-bit one
21:30 idr: That was my guess. :)
21:41 robclark: Venemo: btw for 24 int mul, there is some limited support (limited to address/offset calculation).. see `amul`
21:42 robclark: (freedreno has 24b mul/mad, but 32b turns into multiple instructions)
21:43 Venemo: in our case, imul24 corresponds exactly to the hw instr we have
21:44 robclark: you should probably hook in nir_lower_amul.. that will get you at least some benefit
21:46 Venemo: ok, will take a look
21:47 robclark: but better range tracking and converting imul to imul24 would ofc be useful in more cases
21:48 idr: I think we might just need to add a predicate in nir_search_helpers.h, then add some patterns in nir_opt_algebraic.
21:50 Venemo: robclark: what exactly generates amul? it seems we don't support it
21:50 pendingchaos: if we do the opt in NIR, I think we might want a nir_op_imul24_or_32, since we don't have a 24-bit multiplication opcode that can write to a uniform register
21:51 robclark: Venemo: it is used for deref lowering, ie. calculating an offset into an array
21:52 Venemo: pendingchaos: I think s_mul_i32 can still work on 24-bit ints
21:52 robclark: basically, when we see the size of the var is known to be small enough to fit into 24b math nir_lower_amul converts amul to imul24
21:52 Venemo: or we could make the lowering depend on divergence
21:53 Venemo: robclark: I get what the lowering does (read the comment at the beginning), I just don't see how you get an amul in the first place
21:53 pendingchaos: Venemo: it doesn't mask the operands, so it will differ from actual 24-bit multiplication if the result is larger than 2**24-1
21:53 Venemo: pendingchaos: that matches imul24, afaiu
21:53 robclark: Venemo: see nir_build_deref_offset()
21:54 Venemo: hm
21:54 pendingchaos: I don't see anything saying imul24 is undefined with overflow
21:55 Venemo: that's not what I meant
21:56 Venemo: I meant that I believe imul24 can be implemented with s_mul_i32 for uniforms and v_mul_u32_u24 for divergents
21:59 pendingchaos: s_mul_i32(0x1000000, 0x1)->0x1000000, v_mul_u32_u24(0x1000000, 0x1)->0, we would need to mask by 0xffffff to match v_mul_u32_u24
22:00 Venemo: I think 0x1000000 is not a valid operand of imul24, though
22:01 Venemo: neither imul(0x1000000, 0x1) nor amul(0x1000000, 0x1) would be lowered to imul24
22:01 Venemo: unless I misunderstand something here
22:05 pendingchaos: none of those would be lowered to imul24/umul24, but assuming it's in-range when nir_opcodes.py says the 8 msb are ignored feels like a hack
22:08 Venemo: ok, you have a point there
22:15 Venemo: maybe we could choose to only lower divergent values to imul24
22:15 pendingchaos: looks like there are algebraic opts assuming the sources are in-range, and the CL function assumes they are
22:16 pendingchaos: I guess we can make that assumption about nir_op_imul24 too then