00:09 karolherbst: imirkin: mhh, I found this FIXME: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c#n1630
00:10 karolherbst: allthough it shouldn't matter because eng3d = true is set by a different condition
00:10 karolherbst: just wondering what that fixme is actually about
01:17 imirkin: the fixme is to not flip it into eng3d unnecessarily
11:20 karolherbst: imirkin: so the 2D engine is a bit more capable than what we are making use of it
13:48 almeida1: karolherbst: it does not hurt to look into those files seen in the miaow as a real hw implementation, you will know how to feed the hw that way
13:49 almeida1: allthough i understood how it works without looking into it, but it still was useful confirming read off
14:08 almeida1: actually it is documented that pointers and also radeon terms shared registers are private, both in patents and the isa docs of earlier vliw and also in vulkan docs
14:08 almeida1: the thing what every implementation can do, since there are scalar loads, you can load identifiers for warps
14:21 imirkin: karolherbst: maybe, maybe not
14:41 karolherbst: imirkin: mhh, I have a small GL problem, when I provide data for a texture via "glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32UI, 1, 1, 0, GL_RGBA_INTEGER, GL_UNSIGNED_INT, pixels_in);" I can't read the data out via glGetTexImage(GL_TEXTURE_2D, 0, GL_RED, GL_UNSIGNED_BYTE, pixels_out);
14:41 karolherbst: only GL_RED_INTEGER seems to work
14:42 imirkin: correct.
14:43 imirkin: int textures and non-int textures are incompatible
14:43 imirkin: if you use a GL_RGBA32UI texture format, you must use GL_RED_INTEGER or whatever
14:43 karolherbst: mhhh
14:43 karolherbst: interesting
14:43 karolherbst: because the CTS does exactly this
14:44 imirkin: there could be more to it, but the way you describe it, that's illegal
14:44 imirkin: you could totally make a texture view though
14:44 imirkin: to "convert" between them
14:44 karolherbst: mhh
14:44 karolherbst: well, the CTS just gens the texture and then reads it out via glGetTexture
14:44 karolherbst: and compares to whatever it expects
14:45 karolherbst: it reuses the buffer though, so maybe it fails because it gets the old result
14:45 karolherbst: GL_DEPTH_STENCIL + GL_UNSIGNED_INT_10_10_10_2 was there before
14:50 karolherbst: ohh wait, it actually doesn't test the GL_RED case... my mistake
14:51 karolherbst: imirkin: ohhh crap... I think I know what the issue is
14:52 karolherbst: imirkin: internal format is unsigned, reading out via signed byte. Stored should become -0x80, but 0x7f is read out
14:52 karolherbst: uhm, returned value
14:54 karolherbst: or maybe it is even the other way around
14:54 imirkin: that's not extremely surprising.
14:55 karolherbst: yeah.. currently I try to extract one of the faulty cases into a piglit test
15:12 imirkin: lmk when you have something working
15:12 imirkin: although my knowledge of the details of all the rounding stuff is ... imperfect
15:13 imirkin: i.e. what is supposed to get clamped where and when
15:29 karolherbst: imirkin: https://github.com/karolherbst/piglit/commit/8ad95053d0d388b55f269a84421079296a149ac9
15:29 karolherbst: output: https://gist.githubusercontent.com/karolherbst/8350f01b62922069926573f0f47fe517/raw/847304b24f5140358050c765cede6006b98e679a/gistfile1.txt
15:30 imirkin: glPixelStorei(GL_PACK_ALIGNMENT, 1);
15:30 imirkin: is that stuff necessary?
15:30 karolherbst: it makes it easier for me
15:30 karolherbst: otherwise I have to keep track of alignment while comparing those pixels
15:30 karolherbst: so 7*3 isn't enough to store stuff, but 8*3 and then I need to skip that
15:30 imirkin: anyways
15:31 imirkin: that makes sense. i'm sure the clamp is missing.
15:31 karolherbst: I am sure it isn't
15:31 imirkin: if you just do &0xff -- does it pass?
15:31 karolherbst: "int32_t in = pixels_in[i*4];" is the important part
15:31 imirkin: i.e. in = in & 0xff
15:32 imirkin: and make it an int8
15:32 imirkin: int8_t in = pixels_in[i*4] & 0xff;
15:32 imirkin: and lose the min/max stuff - does it pass?
15:32 karolherbst: no
15:33 imirkin: huh ok
15:33 karolherbst: no every check fails :)
15:33 karolherbst: *now
15:33 karolherbst: (0,1: original: 0x24924940) 0x40 != 0x7f
15:33 karolherbst: ohh well, x,0 passes
15:34 karolherbst: int32_t -> uint32_t + remove MAX2, then it passes
15:37 imirkin: can you show me the prints for the original version that you had?
15:37 imirkin: (at least some prints)
15:37 karolherbst: "original version"?
15:39 karolherbst: well I gave you a link to the output below the github link
15:39 karolherbst: if that's what you meant
15:44 imirkin: oh, so you did
15:44 imirkin: missed that
15:45 imirkin: riiiight.
15:45 imirkin: ok
15:45 imirkin: so we're clamping it up
15:45 imirkin: and they're clamping it down
15:45 imirkin: since the underlying format is GL_RGBA32UI, i think we're doing the right thing
15:45 imirkin: but like i said, i don't know all the rules.
15:47 karolherbst: llvmpipe passes this case afaik though
15:47 karolherbst: ohh wait
15:47 imirkin: ok. that doesn't change my opinion of what is correct and what's not.
15:47 karolherbst: I think this combination would be ivnalid or whatever there
15:47 karolherbst: weird
15:47 imirkin: if it passes the test as-is, i think it's buggy :)
15:48 karolherbst: ohh now I see
15:48 karolherbst: I make that test fail on intel and pass on nouveau :)
15:48 karolherbst: ahhh
15:48 karolherbst: now it is better
15:50 imirkin: i think the idea is that you're supposed to take the value in the texture, and clamp it to the output range of the type you're fetching to
15:50 imirkin: the value in the texture is a very positive number (it's a UI texture), hence we return the max value
15:50 karolherbst: updated test: https://github.com/karolherbst/piglit/commit/f4e42a795a9710ee62c98c4c699d65248cc39b23
15:50 karolherbst: updated output: https://gist.githubusercontent.com/karolherbst/8350f01b62922069926573f0f47fe517/raw/93610d459ebc3d7ed358474851d10409544b2022/gistfile1.txt
15:51 karolherbst: passes on intel/llvmpipe, fails on nouveau
15:52 imirkin: ok. that's buggy :)
15:53 karolherbst: nvidia passes it as well
16:28 karolherbst: imirkin: where is the source of the 3d blitter by the way?
16:31 karolherbst: or is it still a hw blitter, just being in the 3d engine and not the 2d one?
16:36 karolherbst: ohh, found it
17:10 imirkin: yeah
17:10 imirkin: mostly in nv50_surface.c
17:10 imirkin: (i think?)
17:10 imirkin: but basically it just configures the driver and does draws
17:11 imirkin: it's a lot like u_blitter -- i suspect it was written before u_blitter existed
17:11 imirkin: we can't really use u_blitter directly though
17:11 imirkin: at least not without extending it considerably
17:15 karolherbst: I see
17:15 karolherbst: would be nice though to just use u_blitter
17:15 karolherbst: or well, instead of our 3d one
17:16 karolherbst: imirkin: I guess that clamping would have to happen inside the blitter, right? currently it simply does a tex and returns the output
17:16 karolherbst: actually wondering where the 32 -> 8 clamping happens
17:16 karolherbst: *bit
17:16 imirkin: yeah, it would be. but like i said, it won't work
17:17 imirkin: iirc the main issue is stencil handling
17:17 imirkin: u_blitter really wants support for exporting stencil separately
17:17 imirkin: which means it's only useable by radeon
17:17 imirkin: or by drivers that don't need to support blitting stencil
17:17 imirkin: we handle it by cheating big-time
17:18 imirkin: the clamping would happen at RT time i would assume
17:18 imirkin: not sure.
17:18 imirkin: check what the blitter tgsi looks like
17:18 karolherbst: just a tex
17:19 imirkin: right, so then the tex loads the rgba32ui value
17:19 imirkin: and then it's written to a r8i target. the clamping might happen there? not sure.
17:19 imirkin: are you sure it doesn't use the pbo image write thing?
17:20 karolherbst: well, the shader ends up simply doing a "2: tex 2D $r8 $s0 f32 $r0q $r0d"
17:20 karolherbst: didn't really check what happens around the shader invocation yet
17:22 imirkin: could be many shaders :)
17:22 imirkin: this stuff is quite hard to track down.
17:23 karolherbst: yeah, but I set some breakpoints, so I am quite sure
17:23 karolherbst: before that is just that one fragment and vertex shader which always shows up
17:30 imirkin: so ... uhm ... we're expecting that the RT writing logic clamps things correctly
17:30 imirkin: however it has no clue what the source format is
17:30 imirkin: it just knows what bits are in the output register
17:31 imirkin: and it'd be well within its rights to just take the low 8 bits
17:31 imirkin: i dunno how the hw actually does things
17:31 imirkin: sounds like the tgsi generation needs to be slightly fancier
17:44 karolherbst: imirkin: well, it doesn't simply take the lower bits, because the 32 bit values are correctly set to 0x7f for values above 0x7f
17:44 karolherbst: but
17:44 karolherbst: values below 0 aren't set to 0
18:00 imirkin: do we have separate INT and UINT rt formats? i forget
18:13 karolherbst: dst format is PIPE_FORMAT_R8_SINT and src format is PIPE_FORMAT_R32G32B32A32_UINT in the blitting code, I guess those are the values which specify those things?
18:15 karolherbst: anyway, the miptree bing used has a format of PIPE_FORMAT_R8_SINT
18:16 imirkin: i mean in hw - are there separate formats for R8_SINT and UINT?
18:16 karolherbst: don't know, for what do I have to look?
18:18 karolherbst: I guess NVC0_3D_RT_FORMAT is a good start
18:19 karolherbst: or maybe not
18:23 imirkin: SURFACE_FORMAT_*
18:23 imirkin: or something
18:23 imirkin: yeah, looks like there are separate surface formats there
18:24 karolherbst: yeah
18:24 karolherbst: so we just need to handle unsigned to signed better
18:24 karolherbst: (for now)
18:25 imirkin: i guess
18:26 karolherbst: wondering if there is some value we have to set on the hw
18:26 karolherbst: and then it just works (tm)
18:26 imirkin: ;)
18:27 imirkin: well, i don't think that's possible
18:27 imirkin: because TEX just lands the value into registers
18:27 imirkin: at that point, the values are entirely typeless
18:27 karolherbst: okay sure
18:27 imirkin: so whether the source is RGBA32UI or RGBA32I is totally lost
18:27 karolherbst: but what exactly handles the bit size?
18:27 imirkin: so then the S8 integer encoder thing takes a 32-bit value and treats it *somehow*, but no matter what it does, it does only one thing
18:28 imirkin: so it can't simultaneously do the "right" thing for signed and unsigned source
18:28 imirkin: which means we need an explicit clamp
18:28 karolherbst: right, that's why I was thinking maybe there is some switch we can use for that
18:28 imirkin: i think a CVT should do the trick
18:29 imirkin: with the appropriate stype/dtype
18:29 karolherbst: mhhh
18:29 karolherbst: I don't think so, because integer conversions just cut the bits afaik
18:29 imirkin: uhhhhh
18:29 imirkin: they shouldn't!
18:29 imirkin: i don't think you're quite right there
18:30 karolherbst: let me check, I was dealing a lot with this stuff when working on the opencl conversions
18:31 karolherbst: imirkin: yep, they simply cut
18:31 karolherbst: what we need is a cvt.sat
18:31 imirkin: right, yes
18:31 imirkin: you need sat to saturate ;)
18:31 imirkin: sorry, i forgot about that
18:32 karolherbst: here is the stuff I did for full 8/16/32/64 float/uint/int conversion lowering: https://github.com/karolherbst/mesa/blob/670908fb3a83088e5e7cf2913c325d8d176c509d/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_helper.cpp#L87-L213
18:32 karolherbst: with saturate support
18:33 karolherbst: but I guess 32 -> 8 bit saturate should just work (tm)
18:33 imirkin: right
18:33 imirkin: the source format shouldn't matter
18:34 karolherbst: well, it does
18:34 imirkin: oh, but whether its signed or not does
18:34 karolherbst: ;)
18:34 imirkin: so u32/s32 always
18:34 imirkin: and then the proper destination format... or maybe also just signed/unsigned
18:34 karolherbst: well
18:34 imirkin: and skip the conversion if they're the same
18:34 karolherbst: yeah
18:34 karolherbst: I would just do some if(whatever) {} int he blitter code generation
18:35 karolherbst: and do a conversion if I detect unsigned -> signed integer conversion
18:35 imirkin: i.e. u32/s32 -> u32/s32 and ignore the number of bits
18:35 karolherbst: or so
18:35 karolherbst: ahhh
18:35 imirkin: since i suspect the hw does that correctly
18:35 karolherbst: yeah, maybe that would work
18:35 imirkin: would need to be tested
18:35 karolherbst: well, the cTS tests that
18:36 karolherbst: ohh wait
18:36 karolherbst: except conversion is applied twice then or something
19:16 karolherbst: imirkin: uhm... does TGSI even know about integer saturation?
19:16 karolherbst: does it even know about types?
19:18 karolherbst: from_tgis doesn't even set cvt->saturate
19:18 karolherbst: *from_tgsi
19:26 karolherbst: imirkin: I guess we have to throw in a UMIN then
19:26 karolherbst: imirkin: passes with this blitter: https://gist.githubusercontent.com/karolherbst/e1849d8f0f8fdf5cdcf0e3ddc055f462/raw/558ba3b652c885251ae3f32731ba52e68f8089c2/gistfile1.txt
19:28 karolherbst: we end up with 1mov + 4mins then
19:28 karolherbst: mhh min(a, 0x7fffffff) -> sat.u32.s32(a)
19:28 karolherbst: mhh
19:28 karolherbst: questionable optimization
19:40 imirkin: the tex overhead is much higher
19:41 karolherbst: right
19:41 karolherbst: just wondering if such an opt would make sense general, allthough I doubt that there is much code out there where this would help
20:11 karolherbst: imirkin: any suggestions to make this less ugly? https://github.com/karolherbst/mesa/commit/201eb08589654e9bcaa7f3b94b4007693b2a88f8
20:11 imirkin: karolherbst: also need signed -> unsigned right?
20:12 imirkin: anyways, that seems surprisingly reasonable
20:12 karolherbst: I get to it when there is a CTS test failing due to that ;) but I actually wanted to know what you think about that code before I continue further on that road
20:12 imirkin: preferably SINT_CLAMP
20:12 karolherbst: my version before was much cleaner, but then the cached shader hit me when I ran the CTS test :)
20:15 karolherbst: imirkin: mhh, with that all remaining packed_pixels fails are fixed though
20:16 karolherbst: but unsigned -> signed should be already handled
20:16 karolherbst: because that's already get clamped
20:16 imirkin: ok
20:16 imirkin: it must work itself out based on how the fixed function conversion works
20:16 karolherbst: the issue was the negative values
20:17 karolherbst: because the src was unsigned, hence the hw interpreted big values as negative ones
20:42 RSpliet: karolherbst: https://bugs.freedesktop.org/show_bug.cgi?id=107016 <- Think that might be one for you? It's a GK107 with BOOST and CSTEP tables
20:43 karolherbst: RSpliet: yeah, looks like it, thanks
20:43 karolherbst: RSpliet: this is a laptop though
20:43 RSpliet: Mind you, he reported some problems with a 4.15 kernel, and I have no idea whether anything in 4.16 or 4.17 might have fixed things
20:44 RSpliet: Yeah as it stands it's not a great bug report. Hope we can extract sufficient information...
20:45 karolherbst: uhh, only 3 VID pins, nice
20:46 karolherbst: oh crap
20:46 karolherbst: this is one of those vbios
20:47 karolherbst: huh
20:47 karolherbst: VID and then a PWM?
20:47 karolherbst: this is mupufs bug now :p
20:47 RSpliet: I *think* the hint might be in nouveau 0000:01:00.0: volt: Type mismatch between the voltage table type and the GPIO table. Fallback to GPIO mode.
20:48 karolherbst: yeah
20:48 karolherbst: I guess this would be fine
20:48 karolherbst: problem is...
20:48 karolherbst: how to set the voltage?
20:48 karolherbst: sure, GPIO, but what VID mask belongs to which voltage?
20:49 karolherbst: RSpliet: the CSTEP table is also super evil
20:49 karolherbst: reverse direction of the voltage entries
20:49 karolherbst: allthought hat should be just fine
20:52 karolherbst: this is just stupid
20:53 RSpliet: Yes. :-D
20:53 RSpliet: I can't tell you much about it tbh, in Tesla land everything was a lot easier to comprehend
20:54 karolherbst: I have a guess
20:54 RSpliet: Meanwhile I'm fighting a battle with the Fedora peeps pushing forward a new 4.17 kernel that doesn't even recognise my laptop's keyboard/touchpad.
20:54 RSpliet: Keep Hans occupied ;-)
20:55 karolherbst: :D
20:55 karolherbst: did you run git bisect on the kernel?
20:55 karolherbst: should be fairly easy to track down
20:56 RSpliet: I don't run dev. kernels on this one, but I fear it might be an ACPI problem. I'll let the chaps @ Red Hat tell me what to do next, probably a better use of my time
20:57 karolherbst: well, you have the hardware
20:58 karolherbst: RSpliet: is this one of those I2C connected devices? or even PS2?
21:01 karolherbst: now we are back to 21 fails, nice
21:01 karolherbst: 16 of which are fp64 precision fails
21:03 karolherbst: imirkin: we also fail "KHR-GL44.packed_depth_stencil.blit.depth32f_stencil8", I might just look into that one as well
21:04 imirkin: that one is likely annoying.
21:05 karolherbst: yeah well, the other fails are also annoying
21:05 karolherbst: fp64, pipeline_statistics for compute shaders, 3D images :)
21:11 RSpliet: karolherbst: Yes I do, but not the time or stamina to dig into the problem at great lengths. They can tell me what they need based off logs. If they need more, I can be a rbot and give them their data
21:16 karolherbst: imirkin: meh, multisample stuff :(
21:16 karolherbst: src samples: 0 dst samples: 3
21:51 imirkin: karolherbst: oh yeah, that's a known issue
21:51 imirkin: unfortunately beyond the fact that there's something with with depth and MS, i know little
21:51 imirkin: some various piglits fail in the depth arena as well
21:52 imirkin: since i barely understand depth, i haven't fixed them
21:52 imirkin: although ... i remember depth32f_stencil8 blits being particularly challenging
21:57 karolherbst: okay
21:59 imirkin: iirc we emit it as a RG32 output? something like that.
22:45 karolherbst: imirkin: from the apitrace it looked like we color the upper half wrongly in the framebuffer :/ but yeah, can be something super annoying in the end
22:45 karolherbst: imirkin: any ideas how we could add support for compute shaders in pipeline_statistics_query?
22:45 karolherbst: tracking it inside the driver?
22:46 karolherbst: I tried that once, but then there is this GPU buffer way of extracting that stuff
22:46 imirkin: check with pendingchaos - he's thought about it
22:50 airlied: can you count it somewhere in the command submission stream?
22:51 airlied: after every dispatch increment some counter, (though not sure how indirect would work)
22:51 karolherbst: airlied: maybe we could track it inside VRAM?
22:52 karolherbst: but yeah, probably we should just check what nvidia is doing here
22:52 airlied: yeah that seems like the best first option, at least spot if they are doing anything really simple
22:52 karolherbst: thing is, if it would be simple, it would have been already implemented
22:53 karolherbst: I hope it is not the case, but usually it is
22:53 airlied: well hopefully there are least some signs of it happening in traces
22:55 karolherbst: yeah.. thing is just, that the mmt tracing situation is kind of sad, because most of the time I try to use it, it doesn't work
23:01 karolherbst: airlied: if you are interested, currently I track the status of the GL44 run here (I think GL45 shouldn't introduce new fails though): https://trello.com/c/r056ZtSR/24-cts-master-461x-khronos-mustpass-gl44-master-status
23:01 karolherbst: airlied: uhm, one question: would it be fine to disable pipeline_statistics_query, submit CTS, wait for the result to be positive, and then enable that extension again?
23:02 karolherbst: we would need to take care of that for 4.6 anyway
23:02 karolherbst: I mean, we could also just keep it disabled and take care of the other issues first
23:03 airlied: don't be surprised if gl45 adds test for older extensions in its mustpass
23:04 airlied: well in theory since pipeline stats is busted you should disable the extension until it's fixed
23:05 karolherbst: okay well, bindless_textures is also busted in some ways ;) doesn't prevent us to expose it for many drivers too (allthough arguably the fails there are less important than our pipeline_statistics fails)
23:05 airlied: yeah the 4.5 mustpass has a bunch of stuff that isn't 4.5 specific
23:06 airlied: or maybe not I fail at diff
23:07 karolherbst: well I ran the 45er in the past and the results were pretty much the same (compared to 44)
23:07 karolherbst: maybe newly added tests since then might fail
23:07 karolherbst: there are some tests requiring 4.5 in the 4.4 test though
23:07 karolherbst: and only for silly reasons actually
23:48 mslusarz: karolherbst: have you tried updating mmt recently? ;)
23:52 imirkin: airlied: it gets variously tricky, esp with indirect dispatch, and the very sad capabilities of the macro language
23:52 imirkin: iirc pendingchaos implemented a shifting multiplier in the macro language