02:56 yshui`: Hi, the EGL_EXT_platform_xcb spec has been merged upstream, can someone give this MR a look: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6474 ?
02:56 yshui`: Thanks.
03:04 curro: karolherbst: hey, sorry i wasn't around when you reported that stack overflow in clFinish()
03:05 curro: karolherbst: there is a simple micro-optimization i was meaning to implement that i think might help in this case: you could just clear the dependency list of an event at ::fence() time, since that implies the corresponding work was just submitted to the hardware so there is no need to wait for any dependencies in order to synchronize with it, we can just fence_finish()
03:07 curro: karolherbst: that should make sure that the event dependency graph never extends beyond a single flush (unless it really needs to due to user event dependencies and such), which i suspect might be causing the problem you were seeing
03:07 jekstrand: curro: That's roughly what karolherbst and I talked about though we might have been looking at different places to put it.
03:12 curro: jekstrand: maybe you talked about something with a similar graph-limiting effect but it seemed like karolherbst was trying to add additional event list tracking to the command queue which sounded far more complicated than that
03:13 jekstrand: curro: Yeah, I don't think it needs to be that complicated. I just couldn't find a place to put it.
03:14 jekstrand: curro: I looked at clearing it out in wait() but constness wasn't going to be happy.
03:14 jekstrand: I hadn't looked at ::fence()
03:15 AndrewR: curro, hello. Do you have few minutes for n00b?
03:15 curro: jekstrand: i suspect wait() may have been too late for that case, since it's probably never called until the clFinish() after it has batched up a huge amount of work
03:15 curro: AndrewR: sure
03:17 AndrewR: curro, I tried to add few missing cases into nv50-specific code (because benchmark I used hit those) but may be I completely misread disassembler code? https://pastebin.com/PESugQkM
03:19 AndrewR: curro, https://github.com/envytools/envytools/blob/master/envydis/g80.c - online version of file I was looking into
03:20 AndrewR: curro, so, there is this struct: static struct insn tabcvtiisrc[] and table below ... but how to combine two to do desired effect (format conversion)?
03:32 curro: AndrewR: hmm, the s16/u16 destination encodings you've added make sense to me roughly, though my ability to parse nv50 assembly in hex has gotten kinda rusty in the last few years :P, maybe you'd have better luck in #nouveau
03:33 curro: AndrewR: which specific encoding are you having trouble with, and how is it disassembled (if at all)?
03:35 curro: AndrewR: your encoding for s8->u16 seems rather suspicious, and the ones for s8/u8 destination too
03:51 AndrewR: sorry was walking with my dog ...
03:53 AndrewR: oh, yeah, probabaly just some random numbers I forgot to properly edit
04:07 AndrewR: curro, updated ... https://pastebin.com/PESugQkM
04:08 AndrewR: curro, I was mostly looking for dmesg errors during benchmark run. It has some option to get binary, but I'm not sure if clover/mesa have such functionality wired up
04:12 AndrewR: curro, https://github.com/Randrianasulu/MPBenchmarks/blob/master/MPBenchmarks/GPUTask.cpp - line 63 -- and this goes down to ....
04:12 AndrewR: https://github.com/Randrianasulu/MPBenchmarks/blob/master/BealtoOpenCL/src/CLProgram.cpp - line 49 and down
05:02 rak-zero: hey, i got a polarised 3D tv and thought it would be as easy as in 200x to have stereoscopic support on driver level
05:02 rak-zero: i had a elsa 3d revelator back then and it was just a hotkey to have it work in almost any software/game
05:02 rak-zero: is there really no support for this in linux?!
05:03 rak-zero: all i found was this: https://lists.freedesktop.org/archives/dri-devel/2017-January/130125.html
05:03 rak-zero: but this is more about the flag sent through HDMI same like audio
05:46 AndrewR: rak-zero, I only found this https://marc.info/?l=mesa3d-dev&m=109817901921774 via https://dri.freedesktop.org/wiki/StereoSupport/ (and those patches against old UMS/DRI1 version of radeon r200 driver ...)
06:33 khnazile: mesa 20.3 no longer builds with older gcc: ../src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp:3118:4: sorry, unimplemented: non-trivial designated initializers not supported
10:37 EdB: airlied: hello
10:38 EdB: So, is NIT the prefered future for CL on AMD cards ?
10:38 EdB: *NIR
11:43 karolherbst: curro: clearing deps doesn't help
11:44 karolherbst: ohh, earlier
11:44 karolherbst: nmhhh
11:44 karolherbst: will take a look
14:30 jekstrand: karolherbst: Do you remember the name of the NIR pass to re-materialize constants at their uses?
14:31 karolherbst: not sure if we have it for constants
14:31 jekstrand: :-/
14:32 jekstrand: I thoguht there was a pass
14:32 jekstrand: I thought radv used it. Maybe bnieuwenhuizen or pendingchaos would know
14:32 karolherbst: there is and I think it was for inputs mainly
14:33 karolherbst: but maybe it can do more now
14:34 jekstrand: Meh. I'll write a quick pass.
14:34 karolherbst: mhh, but yeah, I can't find it either
14:35 karolherbst: maybe it's part of nir_schedule now?
14:35 jekstrand: There's opt_move but it doesn't re-materialize anything
14:36 karolherbst: mhh
14:36 karolherbst: I just did it in the backend
14:36 jekstrand: Yeah, I'm thinking of doing that
14:37 jekstrand: I'm looking at a shader that's spilling piles of constants. :-/
14:37 karolherbst: just constructed a map of nir_ssa_def -> constants or something
14:37 jekstrand: So it's either something cheap like re-emitting constants at their uses or I have to make back-end spilling smarter. :-/
14:38 karolherbst: should be easy to write a nir pass though
14:38 karolherbst: then I could maybe use it as well
14:38 karolherbst: the way I am doing it is a bit expensive :/
14:42 jekstrand: What I really want is for our back-end to do a bit of analysis prior to RA that determines things like "this register is actually a constant" and then, instead of spilling, re-materialize.
14:43 jekstrand: But that's more smarts than I feel like writing today
14:48 jekstrand: Actually.... that only makes things worse. :-/
14:49 jekstrand: I guess those weren't just constants. They just have been things initialized with constants. It gets hard to tell. :-/
14:55 karolherbst: btw image_format and image_order are annoying to deal with :/
14:56 karolherbst: I think I have the argument stuff right, but now I have to rewrite the source
14:56 karolherbst: and if we assume we might get weird indirects through ?: or so, we would have to copy the entire deref chain and swap the variables, no?
15:03 jekstrand: Depends on how we decide it works
15:03 jekstrand: When I wrote my image code, I assumed that image indices didn't have to be constant and just rewrote the derefs like we do in explicit_io
15:37 karolherbst: jekstrand: ohh, I have an idea.. maybe I replace those after we lower images, because then we sould know the constants
15:37 karolherbst: mhh
15:38 karolherbst: but I also have to do it before we lower io
15:38 karolherbst: :/
15:38 karolherbst: argh
15:39 karolherbst: ohh
15:39 karolherbst: we lower uniforms after images, good
15:39 karolherbst: then that could work.. let's see
17:11 wb9688: Is using P2PDMA for copying stuff between GPUs supported by Linux?
17:15 jekstrand: karolherbst: What does CLC clang do with unions?
17:15 karolherbst: good question..
17:20 jekstrand: Whatever LLVM is generating, we come up with a different size and different offsets than the CPU. :-(
17:21 karolherbst: ehh...
17:21 karolherbst: wait..
17:22 karolherbst: it seems like that llvm just uses the biggest field
17:22 karolherbst: but that's for unions of scalar
17:22 jekstrand: I think it may be a nested-struct problem
17:22 jekstrand:wishes these kernels weren't so stupid big
17:23 karolherbst: %union.Data = type { %struct.anon.0 }
17:23 karolherbst: %struct.anon.0 = type { [20 x i64] }
17:23 karolherbst: source: https://gist.github.com/karolherbst/3b3fcb5f9bd7a15cee100d7bf7238756
17:24 karolherbst: uff
17:24 karolherbst: when I use 4 longs it is getting even weirder
17:24 karolherbst: %union.Data = type { %struct.anon.0, [8 x i8] }
17:24 karolherbst: %struct.anon.0 = type { [4 x i64] }
17:24 karolherbst: but that looks wrong
17:24 jekstrand: I really wish we had shader-db for kernels
17:24 karolherbst: ahh no
17:25 jekstrand: shader_runner rather
17:25 karolherbst: we have that
17:25 jekstrand: Actually.... I think piglit has a thing, doesn't it?
17:25 karolherbst: yeah
17:25 karolherbst: cl file and you declare input/outputs
17:26 karolherbst: I think llvm just converts the union to a structs and throws in a bunch of bitcasts
17:26 karolherbst: but I guess with more complex unions this can become quite annoying
17:31 jekstrand: Ugh... what's the name of the executable?
17:31 karolherbst: cl-program-tester
17:33 jekstrand: Yeah, didn't have it built. Re-building piglit on my dev machine now.
17:34 jekstrand: Really not sure how I didn't have it built. I've used it before....
17:39 karolherbst: uhm...
17:40 karolherbst: why is dce running forwards through the shader?
17:40 karolherbst: ohh wait
17:40 karolherbst: doesn't matter
17:40 karolherbst: it works differently than I thought it would
17:44 jekstrand: karolherbst: What's the thing to dump LLVM?
17:45 karolherbst: CLOVER_DEBUG
17:45 karolherbst: but that one is painful to use :/
17:45 karolherbst: CLOVER_DEBUG_FILE=out CLOVER_DEBUG=llvm
17:50 jekstrand: I think the problem is with our handling of nested structs
17:51 jekstrand: In particular, whether or not you align the size of the struct to the largest member when assigning locations of members after the substruct
17:51 karolherbst: mhh
17:51 jekstrand: The struct that's going sideways has a substruct with some uint64_ts but it ends with a couple of chars
17:51 jekstrand: And then after the substruct are more chars
17:51 karolherbst: how does the llvm struct look like?
17:52 karolherbst: or rather
17:52 karolherbst: how does the spirv look like
17:55 jekstrand: karolherbst: https://paste.centos.org/view/220bc79b
17:56 karolherbst:sees dword, feels like running away
17:56 jekstrand: karolherbst: Sorry. Intelisms...
17:56 karolherbst: "dword" gives me this 1980 vibe
17:56 jekstrand: karolherbst: Wouldyou feel better if I called them u32 and u64?
17:57 karolherbst: jekstrand: how does the spirv look like?
17:57 jekstrand: karolherbst: How do I dump that easily?
17:57 karolherbst: CLOVER_DEBUG=spirv
17:57 karolherbst: should give you a spvasm file or something
17:58 jekstrand: karolherbst: Here's the LLVM https://paste.centos.org/view/e325f3a4
17:58 karolherbst: huh
17:58 karolherbst: the llvm is wrong
17:59 karolherbst: ohh wait
17:59 karolherbst: qword vs dword
17:59 karolherbst: the heck...
17:59 jekstrand: karolherbst: Here's everything
17:59 jekstrand: https://paste.centos.org/view/bd6765b1
17:59 karolherbst: yeah okay, so.. that's not an union issue as no union exists
17:59 jekstrand: qword = quad-word = u64
17:59 jekstrand: Yeah, I'm sorry
18:00 karolherbst: or not really..
18:00 karolherbst: I guess a little..
18:00 karolherbst: really
18:01 jekstrand: yeah, I think LLVM just uses the biggest substruct of the union or something like that
18:01 karolherbst: yeah
18:01 karolherbst: the third member is just bitcastet
18:01 karolherbst: or so
18:02 karolherbst: or well.. "conv"
18:02 karolherbst: mhh
18:02 karolherbst: maybe it's not used
18:03 karolherbst: but I don't see why that should cause any issues?
18:03 karolherbst: the alignment is a little wild though of the load_ubo
18:05 karolherbst: so.. that struct should be aligned to 64 bits
18:06 karolherbst: and I also don't see why the cl_alignment code would be wrong
18:07 karolherbst: or are the rules slightly different than we think they are?
18:09 karolherbst: jekstrand: btw, I think the result of that test is 0
18:10 karolherbst: 11 is occupied by T, and F will be bits 8-15, which are 0
18:13 jekstrand: karolherbst: Yeah, it should read T not F
18:14 karolherbst: mhh
18:14 jekstrand: karolherbst: So I can fix it with a one-line change to explicit_type_for_size_align.
18:14 karolherbst: but expecting 11 and reading T still gives me 0
18:14 jekstrand: Just align the size of the struct type to its alignment
18:14 karolherbst: ahh
18:14 karolherbst: right, as alignmet is still screwed
18:14 jekstrand: I'm not sure if we want to do that universally though
18:14 jekstrand: Maybe we do?
18:14 karolherbst: no
18:14 jekstrand: Why not?
18:15 karolherbst: because the size is not aligned
18:15 jekstrand: Right, sizeof() isn't aligned
18:15 karolherbst: yep
18:15 jekstrand: But why are the members after it aligned?
18:15 karolherbst: because the field has an alignment
18:15 karolherbst: so the struct field is aligned to 64 bit
18:16 karolherbst: but yeah.. mhh
18:16 karolherbst: let me think
18:16 karolherbst: ahh yeah
18:16 karolherbst: T and F are not 11
18:16 karolherbst: they are bits 16-31 of 10
18:17 karolherbst: at least that's how mesa behaves today
18:17 jekstrand: yes but I think that's wrong
18:17 jekstrand: Or at least it doesn't match the layout from GCC
18:17 karolherbst: I just don't know if this is following OpenCL C rules or not
18:17 karolherbst: as T and F are aligned to 8 bits, they can just follow directly after the union
18:17 jekstrand: I don't know either
18:18 karolherbst: where does GCC put T and F?
18:18 jekstrand: bytes 48 and 49 respectively
18:18 karolherbst: hum...
18:18 karolherbst: that's 12
18:18 karolherbst: wait..
18:19 karolherbst: there was this magic gcc option
18:19 jekstrand: We're purtting them at 43 and 44
18:19 jekstrand: Oh, I don't like the sound of magic gcc options
18:19 karolherbst: nono, it's a good one
18:22 karolherbst: jekstrand: clang has -fdump-record-layouts
18:22 karolherbst: okay, it is clang, not gcc which has it
18:22 karolherbst: maybe gcc has it now as well? dunno
18:23 karolherbst: seems like with gcc you can use pahole on the object file
18:23 jekstrand: The OpenCL C spec says this about sizeof(): "When applied to an operand that has structure or union type, the result is the total number of bytes in such an object, including internal and trailing padding."
18:24 karolherbst: mhhh
18:24 karolherbst: I think we ignore trailing padding?
18:25 jekstrand: Yeah, I think the "align up to the biggest member size" is the trailing padding we're missing. :)
18:26 karolherbst: so we need to adjust the struct handling of glsl_type::cl_size a little
18:26 karolherbst: just store the biggest alignment and append?
18:29 jekstrand: karolherbst: https://paste.centos.org/view/e43fb611
18:29 jekstrand: karolherbst: I'm not sure if I want to go that scorched-earth
18:30 karolherbst: mhhh
18:30 jekstrand: karolherbst: I'm off to find some lunch. I'll be back after a while.
18:51 karolherbst: the heck..
18:56 karolherbst: what the ,,,
18:56 karolherbst: ahhhhhhhhh
18:56 karolherbst: _why_
18:59 karolherbst: why is there an implicit add CL_SNORM_INT8 and add CL_R
18:59 karolherbst: ...
18:59 karolherbst: why...
18:59 karolherbst: it's in the spirv, but not llvm
19:00 karolherbst: and I think jenatali said something about that
19:07 karolherbst: jekstrand: what an ugly patch: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7241/diffs?commit_id=07c451d3c7a105db725ef881baddcf9eb9e3fe49
19:07 karolherbst: :D
19:08 karolherbst: still.. who inserts that offset where
19:09 karolherbst: ahh the translator
19:09 karolherbst: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/blob/master/lib/SPIRV/OCLUtil.h#L299
19:10 karolherbst: oh well
19:10 karolherbst: at least constant folding takes care of that
19:10 airlied: karolherbst: spirv and cl have different enums
19:10 karolherbst: but doing that offset inside the kernel is just dumb
19:11 karolherbst: airlied: since when
19:11 airlied: for ever
19:11 airlied: spirv starts at 0
19:11 karolherbst: it has nothing to do with spirv though
19:11 airlied: CL C has CL enum
19:11 airlied: yeah it has
19:12 karolherbst: uh.. right
19:12 airlied: its even calledcout in a soec
19:12 karolherbst: we use the spirv functions...
19:12 airlied: spec
19:12 karolherbst: ehhh
19:12 karolherbst: annoying
19:12 airlied: just not sure whixh spec
19:12 airlied: maybe cl spirv env
19:13 karolherbst: but the formats don't match anyway...
19:13 karolherbst: ohh wait
19:13 airlied: they are in same order
19:13 karolherbst: yeah.. they do
19:13 karolherbst: still annoying
19:13 airlied: just different base
19:14 karolherbst: oh well
19:14 karolherbst: at least it gets constant folded away
20:22 AndrewR: so, when I get 'Reviewed-by" line on gitlab I should add this line to my commit(s) and push?
20:23 karolherbst: yeah
20:34 jekstrand: karolherbst: I was going to pull the "real" OpenCL driver for Intel and try my test against that just for sanity. However, the copr I found appears to be buested for f33
20:34 karolherbst: :/
20:49 karolherbst: looking in the event stuff, now I am wondering if it does anything at all...
20:52 jekstrand: And... my fisfinite implementation was busted.
20:53 jekstrand: One day, I will fix all the bugs. :)
20:53 karolherbst: :)
20:53 jekstrand:accidentally implemented isinfinite :)
21:03 jekstrand: karolherbst: Do you have isfinite implemented in your back-end?
21:04 AndrewR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7297/diffs?commit_id=32368780795657ebce2b6cbcf38089f6613aa811 - why "Fixes' and 'Reviewed-by' are all together (without space between them)?
21:05 jekstrand: karolherbst: Hrm... Is there any reason why abs(x) < inf isn't a valid isfinite implementation? For x = NaN, it should return false because all comparisons with NaN are false, right?
21:05 karolherbst: not all comparisons with NaN are false
21:06 jekstrand: ?
21:06 karolherbst: a != NaN -> true
21:06 jekstrand: karolherbst: Yes, != is weird because it's unordered
21:06 karolherbst: ohh, you meant the <? mhh
21:06 jekstrand: But assuming an ordered <, I think abs(x) < inf should work
21:07 karolherbst: for which values does it fail?
21:07 jekstrand: karolherbst: I know of none.
21:07 jekstrand: karolherbst: I'm just trying to vet the idea.
21:07 karolherbst: ahh
21:07 karolherbst: yeah, I think this should work
21:08 jekstrand: Because |a| < inf is something we can do in a single instruction.
21:08 jekstrand: And the natural NIR lowering will get us that one instruction
21:08 karolherbst: and we can ignore denormals here anyway
21:08 karolherbst: because worst case they become 0
21:08 karolherbst: but I hope hw leaves them alone with abs
21:08 jekstrand: Yeah
21:08 jekstrand: But abs(denorm) = 0 is fine for this check
21:08 pendingchaos: AndrewR: "git log" shows they aren't
21:09 jekstrand: Ok, good. I'm not crazy.
21:09 karolherbst: jekstrand: just... we don't have a plain abs :p
21:09 pendingchaos: sometimes gitlab messes up the formatting of commit messages
21:09 karolherbst: but anyway
21:09 karolherbst: should be fine
21:09 jekstrand: karolherbst: Neither do we. We have MOV with a modifier. :)
21:09 karolherbst: we have f2f with a modifier which sucks
21:09 karolherbst: or we do high perf abs with fadd
21:09 jekstrand: Why is f2f bad?
21:10 pendingchaos: AndrewR: something like "Fixes: 0eccd158" (without the "commit") is a more typical formatting of a Fixes tag, btw
21:10 karolherbst: jekstrand: variable runtime
21:10 jekstrand: Oh, ugh.
21:10 karolherbst: yeah
21:10 karolherbst: f2f is for real conversions :p
21:10 karolherbst: and as the modifier is on each alu op, why bother with another instruction?
21:10 jekstrand: Sure
21:11 jekstrand:wonders if his better fisfinite implementation will lead to less spilling in this awful kernel.
21:11 karolherbst: I wrote the optimization once to use add instead of i2i/f2f
21:11 karolherbst: got a 0.2% perf increase in pixmark_piano :D
21:11 jekstrand: That's sad
21:12 karolherbst: well, the shader is busy doing sqrt
21:12 karolherbst: and sin/cos
21:12 karolherbst: so I doubt it is busy with other crazy stuff
21:12 karolherbst: uhm
21:12 karolherbst: *think
21:13 AndrewR: pendingchaos, oh, well ..I can repush with yet another change, just was confused by web-look
21:15 AndrewR: I hope I will not break gitlab .... :} :(
23:10 DPA: I've sent a patch for implementing .format_mod_supported in mxsfb now (It's my first kernel patch)
23:10 DPA: About setting linear modifiers in etnaviv in the scanout + no modifiers case, I'm currently looking into that.
23:10 DPA: It seams to actually be using the linear modifier in that case, but does another odd thing before that which
23:10 DPA: causes it to add extra padding, which is what actually caused problems with mxsfb in some cases...
23:40 karolherbst: DPA: I hope you don't use broken userspace?
23:40 karolherbst: Xorg 1.20 is considered broken, also some wayland compositors don't use modifiers correctly by default
23:42 karolherbst: MOD_LINEAR is kind of the fallback when userspace does't request anything in particular