03:22 imirkin: karolherbst: i don't suppose you had a chance to trace that test?
03:26 imirkin: curious. for RGBA32UI and a 1x1 buffer it works, but 1x8192 and 8192x1 don't work
03:37 imirkin: INTERESTING. only blue/alpha are messed up. red/green are ok. but blue/alpha are returning *0*. What. The. Fuck.
03:42 airlied: not enough bits!
03:44 imirkin: something like that.
03:45 imirkin: i think something dumb's happening in how the test works
03:48 imirkin: ok, looks like it has to be something in the blit from MS -> non-ms
03:49 imirkin: time to pay the piper...
03:49 imirkin: that blit logic is ... all kinds of broken =/
03:51 imirkin: but ... question remains ... how is it that only blue/alpha are messed up
03:55 airlied: something blitting 64-bits instead of 128?
03:56 imirkin: ooooh, do we use the 2d blitter here?
04:17 imirkin: ok, well THAT totally makes sense... the issue is with rgba32i msaa x1, 2, 4, but NOT 8
04:26 imirkin: right. we forced the 3d blitter for 8x, but not for smaller resolves.
04:27 imirkin: none of which explains how red/green worked, but blue/alpha didn't
04:41 imirkin: hrmph. NEXT. incomplete textures and maxwell images.
05:55 imirkin: bleh. that's annoying to fix. "don't do that".
09:36 karolherbst: pmoreau: nice!
09:36 karolherbst: imirkin: not yet, planned to do that today, but somehow I fail to mmt trace nvidia again
09:37 karolherbst: imirkin: or do you have a solution now?
09:58 karolherbst: imirkin: today I will urn the CTS on Kepler1/Kepler2/Maxwell and update/add trello cards
11:11 karolherbst: imirkin: on Kepler1: Failed: 8/7445 (0.1%) :)
12:12 karolherbst: imirkin: uhm... post RA: st u8 # g[$r0d+0x0] %r118 :(
12:15 karolherbst: why isn't that %r118 converted to a register :(
12:20 karolherbst: mhh livei(%118): [27 32)
12:21 karolherbst: but then it is just ignored? weird
12:21 karolherbst: uhh, but this just vanishes in RA: 25: ld u8 %r118 g[%r116d+0x1]
12:46 pmoreau: karolherbst: Make sure that %r118 is a full 32-bit register, and not declared as an 8-bit one for example. That’s know to end up badly.
12:46 pmoreau: *known
12:46 karolherbst: pmoreau: uhh :(
12:46 karolherbst: so instruction type u8 but register 32bit?
12:46 pmoreau: Yup
12:47 pmoreau: Or fix the code to properly handle u8
12:47 pmoreau: ;-)
12:47 pmoreau: But I chickened out on that
12:50 karolherbst: mhh
12:52 pmoreau: (Mostly because I didn’t wanted to end up with one more big task to do, further delaying all the other current tasks I’m trying to finish.)
12:53 karolherbst: better now :)
12:53 pmoreau: ;-)
12:54 karolherbst: char3.hi fails now of hiloeo
12:56 pmoreau: Yeah, it failed for me initially as well
12:56 karolherbst: also
12:56 karolherbst: we have a higher CTS pass rate on pascal than on maxwell
12:56 karolherbst: ...
12:56 pmoreau: In most? cases, type3 should be considered as a type4, with the 4th element being undefined.
12:57 karolherbst: pmoreau: vec3 8 ssa_22 = intrinsic load_global (ssa_21) () (0) /* base=0 */ ;)
12:57 pmoreau: (And type3 are aligned as if they were type4)
12:58 karolherbst: doesn't really matter
12:58 karolherbst: but mhh
12:58 pmoreau: Really? Maybe not in your test case, but otherwise, it does
12:58 karolherbst: by nvir thing does the same thing nir tells me to do
12:58 karolherbst: weird
12:58 karolherbst: vec2 8 ssa_23 = vec2 ssa_22.z, ssa_1
12:58 karolherbst: vec1 8 ssa_1 = undefined
12:58 karolherbst: this is a bit weirdo
12:58 karolherbst: because of this: intrinsic store_global (ssa_23, ssa_30) () (0, 3) /* base=0 */ /* wrmask=xy */
12:58 pmoreau: No, this is exactly how it should be
12:58 karolherbst: so the result is the third component + random?
12:59 pmoreau: Yep
12:59 karolherbst: so you write random stuff into bits 8-15?
12:59 pmoreau: Or you just don’t write to it either
12:59 karolherbst: well
12:59 karolherbst: we can't do that though
13:00 karolherbst: let check the spir
13:00 karolherbst: v
13:00 karolherbst: OpStore %arrayidx2 %25 Aligned 2
13:00 karolherbst: OpVectorShuffle %v2uchar %23 %24 2 4294967295
13:00 karolherbst: %24 = OpUndef %v3uchar
13:00 karolherbst: so yeah, spir-v does the same
13:00 karolherbst: %23 = OpLoad %v3uchar %arrayidx Aligned 4
13:01 karolherbst: for the other value
13:02 karolherbst: imirkin: we have a lot of KHR-GL45.shader_image_size.advanced-nonMS fails on Maxwell/Pascal now
13:02 karolherbst: maybe a regression from on of your patches?
13:03 pmoreau: “The suffixes .lo (or . even) and .hi (or . odd) for a 3 - component vector type operate as if the 3 - component vector type is a 4 - component vector type with the value in the w component undefined.”
13:04 karolherbst: imirkin: ohh maybe they crashed before or where internalFails or whatever...
13:06 karolherbst: pmoreau: I see
13:09 karolherbst: pmoreau: Failure for char3.hi { 47, -86, -60, -105 } --> { -86, 46 } exp { -60, -105 } :(
13:09 pmoreau: And yeah, don’t forget that if you have `char3 foo[10];`, `char3 bar = foo[4];` will return the value stored at `foo + 16 bytes`, not `foo + 12 bytes`.
13:10 karolherbst: duh
13:10 karolherbst: robclark: ^^
13:10 karolherbst: pmoreau: but it is weird
13:11 karolherbst: pmoreau: does this looks kind of correct? https://gist.githubusercontent.com/karolherbst/ee2318b1853d5c9f0ca58fff01f4c88a/raw/79ac5e517697817a21f2591f8baaafd7254b7290/gistfile1.txt
13:11 pmoreau: “For 3-component vector data types, the size of the data type is 4 * sizeof(component). This means that a 3-component vector data type will be aligned to a 4 * sizeof(component) boundary.”
13:11 pmoreau: Let’s see
13:12 karolherbst: ohh
13:13 karolherbst: ohhhh
13:13 pmoreau: “15: mul u32 $r0 $r6 0x00000003 (8)” I’m not sure where that 3 comes from, but I would assume it’s the issue I’m talking about, and it should be 4
13:13 karolherbst: I see
13:13 karolherbst: yeah
13:13 karolherbst: that has to be a 4
13:13 karolherbst: the global array of char3s is also char4 alligned then
13:14 karolherbst: %23 = OpLoad %v3uchar %arrayidx Aligned 4
13:15 karolherbst: but mhh
13:15 karolherbst: pmoreau: I think this needs to be fixed on the spir-v level?
13:15 RSpliet: karolherbst: RE higher pass rate, is that because we're not pretending to support stuff that is buggy in Maxwell? Or are you talking absolute number of tests?
13:15 karolherbst: RSpliet: well, I was wrong in the end, because I was sure we had less fails on pascal
13:15 karolherbst: but now a lot of other tests don't cause errors
13:16 karolherbst: pmoreau: https://gist.githubusercontent.com/karolherbst/a9a1f03d2c06cd6990bf2615e6ddc766/raw/ad0490864bdee70f7ee49b7176bf1638c2b6b1f9/gistfile1.txt
13:16 karolherbst: %arrayidx = OpInBoundsPtrAccessChain %_ptr_CrossWorkgroup_v3uchar %srcA %idxprom
13:16 pmoreau: Was going to ask you for that, thanks :-)
13:16 karolherbst: but mhh
13:16 karolherbst: it looks okay
13:16 karolherbst: robclark: we have to take a different appraoch to OpInBoundsPtrAccessChain
13:16 karolherbst: OpInBoundsPtrAccessChain shouldn't care about alignment of the type
13:16 karolherbst: we get the allignment in OpLoad
13:17 pmoreau: So the load is properly aligned. so that looks fine
13:17 karolherbst: I think this might be the error here
13:17 karolherbst: pmoreau: the nir is wrong
13:18 pmoreau: %_ptr_CrossWorkgroup_v3uchar should probably have an alignment decoration to it.
13:18 karolherbst: pmoreau: https://github.com/karolherbst/mesa/commit/ea6595c7ddf0cff786d76dfa01a6feed6915c9e3#diff-20e6d2c405c49e60dcdbfae7b9e2565aR345
13:19 pmoreau: Unless GLSL has the same rules that an array of type3 is similarly aligned to their corresponding type4
13:19 karolherbst: pmoreau: no
13:19 karolherbst: pmoreau: doesn't have to
13:19 karolherbst: we get the alignment from OpLoad
13:19 pmoreau: Well, you get that the load is 4-byte aligned, which doesn’t tell the whole story
13:19 karolherbst: OpLoad loads at %arrayidx * alignment
13:20 karolherbst: which is fine
13:20 karolherbst: OpInBoundsPtrAccessChain creates an index
13:20 karolherbst: where you also don't have to know anything about alignment
13:20 karolherbst: because how does the alignment matter for an index?
13:21 robclark: karolherbst, if you have a spv I can switch branch and have a look at it in a bit.. the whole handling of deref chains would be considerably easier if nir used deref instructions like spirv instead of load_var w/ deref chain
13:21 karolherbst: robclark: well, we don't have to care about alignment until OpLoad
13:21 pmoreau: I see it more as a pointer that you receive, than an index
13:21 karolherbst: robclark: I think currently with your patches the alignment is taking into account at OpInBoundsPtrAccessChain time?
13:22 karolherbst: so we end up with a */+ 3 there already
13:22 robclark: I'd have to look at it again, I've been working on different things
13:22 karolherbst: robclark: spirv https://gist.githubusercontent.com/karolherbst/a9a1f03d2c06cd6990bf2615e6ddc766/raw/ad0490864bdee70f7ee49b7176bf1638c2b6b1f9/gistfile1.txt
13:22 robclark: the spv binary would be easier to feed into parser ;-)
13:22 karolherbst: OpLoad should be translater into imul idx alignment
13:23 karolherbst: + load_var
13:23 karolherbst: *translated
13:24 karolherbst: robclark: sent
13:24 robclark: thx
13:27 pmoreau: karolherbst: Let’s say you have `struct { int bar1; long bar2 } foo[42]; long foo2 = foo[4].bar2;`. That could be translated into a single `OpInBoundsPtrAccessChain foo 4 1`, which doesn’t really give you an index.
13:27 karolherbst: pmoreau: mhh
13:28 pmoreau: I think there is a reason why OpInBoundsPtrAccessChain and others give you back a pointer, and not an integer.
13:29 karolherbst: I see
13:29 karolherbst: this sounds now more annoying then it has to be...
13:29 pmoreau: So, having an alignment decoration on the pointer would be nice to have ;-)
13:30 karolherbst: why does intmath_long MUL fails now :(
13:30 karolherbst: ahh
13:30 karolherbst: optimize=0
13:30 karolherbst: pmoreau: RUN_PASS(1, Split64BitOpPreRA, run); has to be RUN_PASS(0, Split64BitOpPreRA, run);
13:30 pmoreau: Or you hardcode in the translation pass, that for OpenCL, when aligning a type3, it should be done as a type4. I have done that in my pass.
13:30 pmoreau: karolherbst: I know, I even sent a patch for that.
13:31 karolherbst: where is the patch?
13:31 karolherbst: I will push it
13:32 pmoreau: Sitting on the mailing list. Ilia replied that that code shouldn’t be there in the first place, but part of something else (I can’t remember what), and I was waiting on your lowering helper to land to send the v2 of the patch.
13:32 karolherbst: ahh
13:32 karolherbst: yeah ,aybe we want to move that stuff in there maybe
13:32 karolherbst: but I don't use it for TGSI yet
13:35 pmoreau: imirkin: Should we land https://patchwork.freedesktop.org/patch/191743/ for now, and move it to the lowering helper once it has landed and we start using it for TGSI?
13:35 karolherbst: robclark: ohh, do you pushed the patches to use the llvm structurizer?
13:35 karolherbst: I need it :)
13:36 robclark: yeah, I have a github tree..
13:36 karolherbst: mhh
13:36 karolherbst: I was sure I got all the patches from there
13:36 robclark: karolherbst, https://github.com/freedreno/SPIRV-LLVM/commits/master
13:36 karolherbst: ohh
13:36 karolherbst: okay, that makes sense
13:36 robclark: maybe there is something newer since then.. idk if tomeu or pmoreau touched it
13:37 karolherbst: robclark: you have the llvm tree, right?
13:37 robclark: karolherbst, btw, are there newer spirv_to_nir/etc patches somewhere?
13:37 karolherbst: robclark: yes
13:37 karolherbst: robclark: https://github.com/karolherbst/mesa/commits/nouveau_nir_spirv_opencl
13:37 robclark: k
13:38 karolherbst: robclark: even with calling spirv_to_nir inside clover
13:38 karolherbst: and OpenCL C to spir-v integration
13:38 robclark: ahh, nice
13:38 robclark: as far as llvm.. I think I was using https://github.com/llvm-mirror/llvm.git ?
13:38 karolherbst: I see
13:39 robclark: you should have asked me these things a month ago :-P
13:39 karolherbst: pmoreau: which git repo do we use for llvm-spirv?
13:39 karolherbst: https://gitlab.collabora.com/tomeu/llvm-spirv.git ?
13:39 karolherbst: or do we have a new one?
13:39 tomeu: robclark: I'm not doing any coding, I think pmoreau's fork is the canonical atm, and hopetech has plans to improve it as well
13:39 karolherbst: ahh so pmoreau is the new one
13:39 robclark: yeah, my llvm-spirv patches were based on top of tomeu's tree
13:39 karolherbst: robclark: right, we moved away from a full llvm tree and just have a llvm-spirv "plugin"
13:39 tomeu: yeah, it's in github, so it's easier for everybody
13:40 karolherbst: robclark: ohh right, the master branch looks fine
13:40 robclark: karolherbst, I did at one point build all of llvm for some reason.. although I think the conclusion was that my problems were due to weird mix of llvm and clang versions so that probably wasn't needed
13:41 karolherbst: yeah
13:41 karolherbst: robclark: [DEBUG] At word No.260: "Block 16[Flow93] appears in the binary before its dominator 18[Flow109]" :(
13:42 karolherbst: https://gist.github.com/karolherbst/0cf8dd01efbe2f0488865abeb552b8bb
13:46 karolherbst: robclark: oh and we also need support for OpInBoundsPtrAccessChain on SSA values
13:46 karolherbst: %src = OpFunctionParameter %_ptr_CrossWorkgroup_uchar
13:47 karolherbst: %18 = OpBitcast %_ptr_CrossWorkgroup_uint %src
13:47 karolherbst: %arrayidx = OpInBoundsPtrAccessChain %_ptr_CrossWorkgroup_uint %18 %idxprom
13:47 karolherbst: casting a pinter to int
13:47 karolherbst: very smart ;)
13:47 robclark: hmm, I thought ssa values should work..
13:47 karolherbst: maybe they do, I just hit an error
13:48 karolherbst: well
13:48 karolherbst: error or segfault
13:53 robclark: oh.. doh.. I guess there is a new meson arg to enable cmdline compiler..
13:53 robclark:was wondering why OpenCL.std extension was making it die after switching branches and rebuilding
13:57 karolherbst: pmoreau: is bufferreadwriterect supposed to take quite some time?
13:59 karolherbst: duh
13:59 karolherbst: INT2FLOAT test failed 3967067136.000000 (3ad1d10) != 3967067136.000000 (20)
14:00 karolherbst: %f is kind of broken, isn't it?
14:07 karolherbst: uhm
14:08 karolherbst: pmoreau: I think I found a bug for real now
14:08 robclark: karolherbst, any idea what I'm missing here.. https://hastebin.com/xujepizibi.hs .. getting a glsl_type::_error_type ..
14:08 robclark: (that is for test_hi_char3)
14:08 karolherbst: robclark: my updated https://github.com/karolherbst/mesa/commit/590da3e062a4e4c8b97f16b15a797360ed2a24c0
14:09 robclark: ahh, ok, any of the other patches on your branch newer versions of what was on my wip-spirv branch?
14:09 karolherbst: just this one I think
14:09 robclark: k
14:10 karolherbst: ohh %conv1 = OpConvertSToF %float %22
14:10 karolherbst: this looks correct
14:10 karolherbst: nir: vec1 32 ssa_21 = u2f32 ssa_20
14:10 karolherbst: this looks wrong
14:10 karolherbst: should be i2f32
14:12 pmoreau: karolherbst: I don’t think I ever run that test. I haven’t run the CTS that much.
14:13 pmoreau: I have been running it a bit yesterday, to see which parts of the API needs improvement, and would benefit clover as a platform, rather than just Nouveau.
14:13 karolherbst: uhhh
14:13 karolherbst: robclark: %22 = OpLoad %uint %arrayidx Aligned 4
14:13 karolherbst: %conv1 = OpConvertSToF %float %22
14:13 pmoreau: But now I need to run to supervise some lab. ttyl
14:13 karolherbst: spirv_to_nir assumes %22 is signed, but it is unsigned
14:13 karolherbst: so we end up with u2f32 :(
14:14 karolherbst: but the global array is int
14:14 karolherbst: not uint
14:18 robclark: hmm, perhaps shouldn't infer the op from src/dst type.. but that should be easy enough to fix
14:20 robclark: btw, maybe we should have different system values for things that are 64b in cl and 32b in gl/vk..
14:22 karolherbst: maybe
14:22 karolherbst: or we do whatever the spir-v tells us
14:23 karolherbst: the driver would have to deal with this one way or the other anyway
14:23 karolherbst: either all sys values can be 32 bit or 64 bit, or we add 64 bit variants for the same thing
14:24 karolherbst: robclark: anyway, the spirv is wrong
14:24 karolherbst: OpLoad should load a signed int
14:25 karolherbst: not unsigned
14:25 karolherbst: __global int *src in the kernel
14:25 karolherbst: but I think technically we shouldn't assume the type in OpConvertSToF, because the spec is very vague about the source type
14:30 robclark: as far as SToF.. probably ask jekstrand or cwabbott, they know the spec better (and were involved in the creation, or at least cwabbott was)
14:30 robclark: karolherbst, does nv hw actually have 64b values for those intrinsics? afaict it is always 32b on adreno
14:31 karolherbst: sys vals?
14:31 karolherbst: well there are 64 bit sys vals, but you load two 32 bit values
14:32 karolherbst: I think
14:33 robclark: hmm, for me we just program the hw to put those values in particular registers before the shader starts, it is 3 consecutive 32b registers for that one
14:35 robclark: "nir: allow 64 bit shifts".. presumably that also needs some work in the pass to lower int64..
14:41 karolherbst: robclark: right... I think you could just kill the high bits
14:41 karolherbst: though
14:42 robclark: yeah, if you are trying to launch a grid that is more than 2^32 on adreno, you have bigger problems :-P
14:42 karolherbst: :D
14:42 karolherbst: well we are talking about shifts though
14:43 robclark: karolherbst, so probably OpLoad should just pass down the align value and we should align the stride of the src type to the alignment in the instruction when generating the deref..
14:43 karolherbst: robclark: yeah
14:43 robclark: oh, right, I was referring to sysvals
14:43 karolherbst: ohh
14:43 karolherbst: I see
14:43 robclark:crossing the streams
14:47 imirkin: pmoreau: re always splitting, yes
14:48 imirkin: karolherbst: looks like my SUQ change is broken for imageBuffer
14:49 imirkin:goes to cry
14:49 karolherbst: :(
14:51 robclark: SUQy
14:51 imirkin: i get the width/etc via TXQ for those, since i have an image handle
14:52 imirkin: but ... how does textureSize() work with imageBuffer? is it undefined mabye?
14:52 imirkin: no, it should work
14:53 imirkin: maybe it's actually not imageBuffer that's broken
14:53 imirkin: i hate how these tests print errors
14:53 karolherbst: yeah
14:54 imirkin: anyways, i'll dig into it tonight
14:54 imirkin: thanks for testing
14:55 imirkin: karolherbst: you indicated that KHR-GL45.direct_state_access.queries_functional fails on kepler2? worksforme
14:55 imirkin: perhaps i have a patch i didn't send (or i updated a patch)
14:55 karolherbst: mhh
14:55 karolherbst: let me check
14:56 imirkin: it works with this patch: https://github.com/imirkin/mesa/commit/6dab8f9e1e1e08179d459bab4bbf80962f4042de
14:58 karolherbst: still fails for me
14:58 imirkin: booooo
14:58 imirkin: works for both GK208 and GM107 for me
14:59 imirkin: ok, well perhaps there's more fun bugs in there
14:59 imirkin: or your branch has things that break it :p
14:59 karolherbst: fails on GP108, GK208 and GM107 for me
14:59 karolherbst: imirkin: I have two patches now
14:59 karolherbst: GL 4.5 and that one
14:59 imirkin: oh, well you might need the other one...
14:59 karolherbst: even witht he three series it was still broken
15:00 imirkin: oh, no. the clear one is unrelated. nevermind.
15:00 imirkin: ohhh well
15:00 imirkin: what GPUs?
15:00 karolherbst: huh? I just wrote that, didn't I?
15:00 imirkin: you did.
15:00 imirkin: but i didn't see.
15:00 karolherbst: ahh
15:01 imirkin: so ... same GPUs as me, but different results. EVEN BETTER!
15:01 karolherbst: mhh
15:01 karolherbst: maybe it is because I use DRI_PRIME?
15:01 imirkin: i use DRI_PRIME for the GM107
15:01 karolherbst: but that would be weird
15:01 karolherbst: ahh
15:01 imirkin: (with dri3)
15:01 karolherbst: your aptches also on top of master?
15:01 imirkin: ./glcts --deqp-visibility=hidden --deqp-case=KHR-GL45.direct_state_access.queries_functional
15:02 imirkin: i have a couple of highly unrelated patches
15:02 karolherbst: mhh
15:02 karolherbst: interesting
15:02 karolherbst: well it fails though
15:04 imirkin: anyways, other than my little imageSize screwup, things are looking better
15:07 karolherbst: yeah
15:10 karolherbst: pmoreau: bufferreadwriterect cause OOM :(
15:11 imirkin: karolherbst: when you get a chance - https://trello.com/c/UIswujHs/19-khr-gl45shaderatomiccounteropstestsshaderatomiccounteropsadditionsubstractiontestcase
15:11 karolherbst: yeah, just need to get mmt working again
15:11 imirkin: k
15:16 karolherbst: robclark: ohh more fun is heading towards us: OpTypeVector %uchar 8
15:20 robclark: hmm, vec8? fun..
15:20 karolherbst: yeah
15:21 karolherbst: there is also float16, just saying
15:22 karolherbst: robclark: I think people would kill us if we add support for vec8 and vec16 to nir :D
15:22 karolherbst: robclark: SpvOpIsNormal do we have something for that in nir?
15:23 karolherbst: mhh, I guess not
15:24 robclark: not yet, I don't think
15:25 karolherbst: okay
15:25 karolherbst: well it is part of the spir-v spec so there shouldn't be an issue impementing it I thing
15:25 karolherbst: *think
15:26 robclark: some opc's may simply not exist only because they were not needed yet ;-)
15:26 karolherbst: yeah
15:28 robclark: I think we might want vtn_value_type_access_chain to defer generating the deref instructions to OpLoad.. what else besides OpLoad can Op*AccessChain be consumed by?
15:29 karolherbst: no clue
15:30 robclark: oh, well I guess OpStore
15:30 karolherbst: FAILED 20 of 66 tests. wuhu
15:30 robclark: the glass is 2/3rd full :-P
15:30 karolherbst: well
15:31 karolherbst: I disabled a few tests
15:31 karolherbst: 32
15:31 karolherbst: mhh well less
15:31 karolherbst: 29
15:31 karolherbst: so 49/95
15:31 karolherbst: which isn't so bad though
15:33 robclark: ok, yeah, so just OpLoad and OpStore.. so this might not be so bad
15:33 karolherbst: yeah
15:34 karolherbst: robclark: most of the fails are failed compiles though
15:34 karolherbst: or spir-v not supporting something
15:34 karolherbst: uhm
15:34 karolherbst: spirv to nir
15:35 karolherbst: like this At word No.1211: "OpenCL.std vloadn: expected operand P storage class to be UniformConstant or Generic"
15:36 karolherbst: result[4] = (double8)(src[0],src[1],src[2],src[3],vload2(0,src+4),src[6],src[7]);
15:39 robclark: did you already start implementing some of OpenCL.std?
15:40 karolherbst: no
15:40 karolherbst: I think this is llvm-spirv being unhappy
15:41 robclark: oh, I thought you were talking about vtn
15:48 orbea: so I updated 4.14.18 to 4.15.4 and now whenever I exit mplayer my whole system crashes... I'm guessing its a nouveau change, but as I cant ssh in I cant get any dmesg output....
15:49 imirkin: ssh first?
15:50 orbea: yea, I guess I should, even though I dont like that I cant reboot cleanly...
16:14 orbea: i guess it was not nouveau, sorry for the false alarm... http://dpaste.com/0JCE7FN
16:34 pmoreau: karolherbst: re “is bufferreadwriterect supposed to take quite some time?” I said earlier I never ran it. :-p
16:34 karolherbst: :D
16:34 pmoreau: Oh wait, I went to far back in the logs, nevermind :-D
16:35 karolherbst: pmoreau: any idea about Log: [DEBUG] At word No.98: "OpenCL.std vloadn: expected operand P storage class to be UniformConstant or Generic"?
16:35 pmoreau: One sec, still catching with the logs :-D
16:35 karolherbst: you should run into the same problem
16:35 karolherbst: pmoreau: ./test_conformance/math_brute_force/conformance_bruteforce asin
16:36 pmoreau: imirkin: re always splitting, feel free to push it when you have time then, as I don’t have push rights
16:37 pmoreau: Okay, caught up with the logs
16:41 pmoreau: karolherbst: Right, the spec agrees with the storage class being UniformConstant or Generic
16:41 pmoreau: s/being/having to be
16:41 karolherbst: pmoreau: well, I am sure it actually is
16:41 pmoreau: What’s the whole SPIR-V?
16:41 karolherbst: I doubt they wrote invalid C code in the CTS
16:42 pmoreau: It could be an error in the validator from SPIRV-Tools.
16:42 karolherbst: CLOVER_DEBUG doesn't work at that point :(
16:42 karolherbst: don't get the stuff out
16:42 pmoreau: The validation of the OpenCL extension was done quite recently.
16:42 pmoreau: Really? Even if you run the test executable manually?
16:43 karolherbst: they do strange threading stuff in the math tests
16:43 pmoreau: :-/
16:43 pmoreau: Extract the OpenCL C code from the test, and build it manually?
16:44 karolherbst: wherever that is :(
16:44 karolherbst: CLOVER_DEBUG=cl is correct, right?
16:44 pmoreau: The test looks fine in spirv-val: https://github.com/KhronosGroup/SPIRV-Tools/blob/master/source/validate_ext_inst.cpp#L1506
16:44 pmoreau: CLOVER_DEBUG=clc
16:45 karolherbst: ahhh
16:45 karolherbst: pmoreau: https://gist.github.com/karolherbst/5cceb8b7462ec2d058af68b669fb4cb2
16:46 pmoreau: Which one is the one failing?
16:46 karolherbst: no idea
16:46 karolherbst: let me try to figure that out
16:47 karolherbst: should be math_kernel though
16:47 karolherbst: but
16:47 karolherbst: it doesn't matter
16:47 karolherbst: because everything gets compiled
16:47 karolherbst: ;)
16:48 karolherbst: we have just two vload3 anyway
16:48 karolherbst: duh
16:48 karolherbst: in + 3 * i the fucking crap
16:48 karolherbst: pls stahp
16:48 karolherbst: __global double* in
16:49 karolherbst: pmoreau: double3 f0 = vload3( 0, in + 3 * i );
16:49 karolherbst: looks like constant and pointer, no?
16:49 karolherbst: ohh wait
16:50 karolherbst: 2nd arg is a pointer
16:50 karolherbst: pmoreau: yeah, that validate thing is wrong
16:51 karolherbst: it can be crossworkgroup and workgroup as well
16:51 pmoreau: karolherbst: “p must be a pointer(constant, generic) to floating-point, integer.”
16:51 pmoreau: https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html#_a_id_vector_a_vector_data_load_and_store_instructions
16:52 karolherbst: this is wrong
16:52 karolherbst: https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/vloadn.html
16:52 karolherbst: or is generic the same as global or local or private?
16:52 karolherbst: or anything
16:53 pmoreau: Generic it can be anything IIRC
16:53 karolherbst: right
16:53 karolherbst: so also workgroup and crossworkgroup
16:53 karolherbst: the code is clearly wrong then
16:54 pmoreau: No anything, sorry: Function, Workgroup and CrossWorkgroup
16:54 pmoreau: *Not
16:54 karolherbst: ahh
16:54 karolherbst: I add function as well
16:54 pmoreau: Generic is a special storage class in SPIR-V
16:54 karolherbst: pmoreau: search for generic in that file :D
16:55 karolherbst: oh well
16:55 pmoreau: “https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/vloadn.html”
16:55 pmoreau: Argh
16:55 pmoreau: “pointer(generic) denotes an OpTypePointer with Generic Storage Class.”
16:55 pmoreau: Better
16:55 pmoreau: So the code is correct ;-)
16:55 karolherbst: ...
16:55 pmoreau: llvm-spirv is wrong
16:55 pmoreau: It should insert a cast to GenericPtr
16:56 karolherbst: ...
16:56 karolherbst: what a hack
16:56 karolherbst: I say the spec is being stupid here
16:56 karolherbst: now I only get CFG errors, nice
16:57 pmoreau: The generic thing was meant for that: that you don’t have to implement the same function 3 times, for private, global and constant storage spaces.
16:57 karolherbst: you wouldn't do that anyway
16:57 pmoreau: Feel free to open an issue on my GitHub repo about the missing cast to GenericPtr.
16:58 karolherbst: but you have it in the OpenCL spec anyhow the overloads
16:58 pmoreau: Yes, because generic was introduced in OpenCL 2.0 IIRC
16:58 pmoreau: https://www.khronos.org/registry/OpenCL/sdk/2.0/docs/man/xhtml/vloadn.html
16:59 karolherbst: ahh
17:01 pmoreau: Anyway, I’m off for now. Might be back later tonight.
17:03 karolherbst: robclark: now I will handle opencl opcodes :)
17:03 karolherbst: robclark: Vloadn and VStoren :(
18:28 imirkin__: tagr: those "fixup!" patches were just accidentally sent, i take it?
18:29 imirkin_: orbea: someone else reported something like that. was still nouveau's fault. try enabling KASAN.
18:29 tagr: imirkin_: no, I sent them on purpose to make it more obvious what I had changed in v2
18:29 tagr: not sure if that's useful, but I've sent v2 as well, so feel free to look at either of them
18:29 imirkin_: orbea: i think ben's got a fix in the works (by accident), but not 100% sure.
18:29 imirkin_: tagr: ok, makes sense.
18:30 orbea:looks up how to enable KASAN
18:35 orbea: so pretty much build kernel with CONFIG_KASAN = y and reproduce dmesg output...
18:49 imirkin_: orbea: pretty much
18:49 imirkin_: i think you need a gcc newer than something not horribly old
18:49 orbea: 7.3.0 is probably new enough :)
18:50 imirkin_: probably, yes.
18:50 imirkin_: but 4.3.0 might not be
19:04 imirkin_: tagr: and presumably all this modifier stuff makes zero sense for MS or zeta textures?
19:05 imirkin_: tagr: is there any reason not to stick the memtype + tile_mode in and be done with it all btw?
19:05 imirkin_: tagr: other interesting memtypes get used, like 0x20 for video, and something that's not 0xfe for e.g. 16bpp floats which could become interesting for scanout.
19:40 librin: Hello!
19:41 librin: I have jerky mouse movement, especially visible on small movements with GLXVBlank set to 1, while it's perfectly smooth when GLXVBlank is set to 0
19:41 librin: is this a known issue?
19:41 imirkin_: no
19:41 imirkin_: what kernel?
19:41 librin: 4.15.2
19:42 librin: should also mention that crosshair movement in games such as CSGO is perfectly smooth regardless if GLXVBlank is set to 1 or 0
19:42 imirkin_: pastebin xorg log, just for kicks?
19:42 librin: it's only the visible mouse pointer that is jerky
19:42 librin: it's gigabytes in size :V
19:43 imirkin_: xorg log??
19:43 imirkin_: wtf is in there? should be messages since X last started
19:43 librin: I get the `input-thread: InputThreadDoWork waiting for devices` line being written every few miliseconds
19:44 librin: which grows the log to ridiculous sized
19:44 librin: sizes
19:44 imirkin_: wtf. never seen that.
19:44 imirkin_: given that mouse is jerky
19:44 librin: I have that since as long as I can remember
19:44 imirkin_: and you're getting an error about input ...
19:44 librin: well, I am getting it when GLXVBlank is set to 0, too, i.e. when it's fine
19:45 librin: and it doesn't sound like an error – more like mundane logging of the state
19:46 imirkin_: could be your compositor doing stupid things
19:47 librin: I'm running mate in tiling mode
19:48 librin: err
19:48 librin: I mean stacking
19:51 librin: I was running with GLXVBlank set to 0 for a very long time, but unfortunately, this breaks programs that expect vsync to actually work, when it's requested
21:18 karolherbst: imirkin_: uhh ../src/gallium/drivers/nouveau/codegen/nv50_ir_ssa.cpp:90: nv50_ir::DominatorTree::DominatorTree(nv50_ir::Graph*): Assertion `i == count' failed.
21:19 karolherbst: ohh
21:19 karolherbst: block block_8:
21:19 karolherbst: block block_0:
21:19 karolherbst: two empty bbs at the end of the nir shader
21:35 karolherbst: huh
21:35 karolherbst: 10 nodes, but 9 bbs...
21:44 karolherbst: ohh
21:45 karolherbst: it is 10 bbs indeed
21:45 karolherbst: ohh I know that problem
22:09 imirkin_: karolherbst: when you have a while loop that doesn't have any break's in it, only returns?
22:22 karolherbst: imirkin_: actually there is a loop with a break
22:22 karolherbst: https://gist.githubusercontent.com/karolherbst/41adddac7c8a2bc440ae66cafa41bbe0/raw/e9e9381e501d6ab6c8fc21b5a8ed75861bfe7efa/gistfile1.txt
22:22 karolherbst: but there is a stale BB somewhere
22:27 karolherbst: imirkin_: ohh, an empty BB at the end of the loop
22:28 karolherbst: imirkin_: ohh
22:28 karolherbst: ...
22:28 karolherbst: no wonders
22:28 karolherbst: there is nothing pointing to that BB
22:28 karolherbst: https://gist.githubusercontent.com/karolherbst/00932ff342fabf66b4b05839431100f2/raw/68aeecf6013269331f0621f3161de0d0f5adb50d/gistfile1.txt
22:28 karolherbst: block_5
22:28 karolherbst: so nothing has an edge to that one
22:30 imirkin_: need a DBE run? :)
22:30 karolherbst: no I am wondering what I can attach to that one...
22:30 karolherbst: allthough
22:30 karolherbst: I think I could just skip it
22:32 karolherbst: but having a nir block without a nvir BB sounds kind of dangerous
22:33 karolherbst: on the other hand, if it's the last block of a loop there is no way something could actually point to it anyway if nothing does after evaluating all nested blocks
22:34 karolherbst: PASSED :)
22:38 karolherbst: imirkin_: nir is somewhat good in creating unattached blocks :(
22:38 karolherbst: not the first time
22:38 imirkin_: boo
22:38 imirkin_: and nvir is somewhat bad at dealing with unattached blocks :)
22:38 karolherbst: nir tends to create a block after a if;then;else clause
22:39 karolherbst: and if the then contains a break and else a continue, you can imagine what happens to the block following that :)
22:39 karolherbst: yeah
22:46 karolherbst: imirkin_: glsl doesn't know a shadd builtin, right?
22:46 imirkin_: shladd you mean? i don't think so
22:46 karolherbst: hadd returns (x+y) >> 1. The intermediate sum does not modulo overflow.
22:46 imirkin_: there's no single function for that in glsl, nor the glsl ir
22:47 imirkin_: oh. the "average" thing. heh.
22:47 imirkin_: i don't think so.
22:48 karolherbst: mhh, nir also doesn't know anything like that I think
22:48 imirkin_: not exactly a common op
22:48 karolherbst: but something you probably don't want to lower
22:48 karolherbst: though
22:49 karolherbst: mhh
22:50 karolherbst: pmoreau converted it into mad(add(a, b), 2, add(a, b)) >> 1
22:50 karolherbst: the mad with subop = NV50_IR_SUBOP_MUL_HIGH
22:50 imirkin_: oh wait, this is in the SPIR-V?
22:50 karolherbst: OpenCL extinst :)
22:50 imirkin_: pretty sure there's an op in the nvidia isa that does this
22:51 karolherbst: yeah, was wondering about that
22:51 karolherbst: we probably don't use it, right?
22:52 imirkin_: nope
22:52 imirkin_: we don't even use SAD
22:52 imirkin_: which makes me ... sad.
22:52 karolherbst: mhh
22:52 imirkin_: :)
22:52 karolherbst: :D
22:52 karolherbst: could be used in an algebraicopt
22:52 imirkin_: i'll believe it when i see it
22:52 imirkin_: also i think it was nv50-only
22:53 karolherbst: ohh
22:53 karolherbst: the hadd thing?
22:53 imirkin_: sad
22:53 karolherbst: ahh
22:53 karolherbst: seems to be available on fermi as well
22:53 karolherbst: { 0x3800000000000003ull, 0xf800000000000007ull, N("sad"), T(us32_5), DST, T(acout30), SRC1, T(is2w3), T(is3) },
22:54 imirkin_: ah no. looks like it still there
22:54 imirkin_: yea
22:54 karolherbst: and kepler2 :)
22:54 imirkin_: and i've never ever seen it get used, despite the AlgebraicOpt thing
22:54 karolherbst: it is even a PTX one
22:54 karolherbst: like mad24
22:54 imirkin_: well that just makes me mad :p
22:54 karolherbst: wondering why they put those two into PTX
22:55 imirkin_: [ok, too easy]
22:55 karolherbst: is there a way to add a carry to the high bit?
22:56 karolherbst: in a shf?
22:56 imirkin_: mmm... let's think here
22:56 imirkin_: oh, i bet you can just do it with ADD.CC + SHR.X; is SHR.X a thing?
22:57 karolherbst: let me check
22:57 imirkin_: yep
22:57 imirkin_: so ... dunno what it does, but it might do this.
22:57 karolherbst: mhh interesting
22:57 imirkin_: or it might add the carry bit to something else.
22:57 karolherbst: worth trying out
22:58 karolherbst: because if there is such combo on nv hardware, I would add a new nir instruction for hadd :)
22:58 imirkin_: (make sure the .X actually gets emitted. looks like it's done on maxwell, i wouldn't count on other gens)