IRC Logs of #nouveau on irc.freenode.net for 2023-04-26

00:52 avoidr: I had a display freeze again. On mpv quit, like last time. But this time I didn't use vdpau in mpv explicitly, mpv just said "[gpu]". I don't know if it'd use vdpau implicitly. It seems that a high system load makes the problem much more likely. Before my display froze at one quit, at a previous one the quit took a few seconds. I thought that was just high system load, but then I got one that never quit
00:52 avoidr: fully and nothing happened for too long. I'll remove vdpau completely from the system now and see if I ever get a freeze again.
00:58 karolherbst: avoidr: what mesa version is that?
00:58 karolherbst: though I can imagine that our video acceleration code is kinda broken in many ways
00:59 avoidr: media-libs/mesa-22.3.7-r1
04:44 fdobridge: <gfxstrand> `Pass: 305291, Fail: 4752, Crash: 3407, Skip: 1673845, Timeout: 32, Flake: 48, Duration: 1:16:57`
15:47 fdobridge: <gfxstrand> How do immediates and constants work with double instructions?
15:47 fdobridge: <karolherbst🐧🦀> in which regard?
15:48 fdobridge: <gfxstrand> immediates are 32-bit
15:48 fdobridge: <karolherbst🐧🦀> correct
15:48 fdobridge: <gfxstrand> Are they f32? Top 32 bits of a f64?
15:48 fdobridge: <karolherbst🐧🦀> well
15:48 fdobridge: <karolherbst🐧🦀> some
15:48 fdobridge: <karolherbst🐧🦀> let's see...
15:48 fdobridge: <karolherbst🐧🦀> yes, it's the upper 32 bit
15:49 fdobridge: <gfxstrand> Ok, so it's got all the exponent bits of a f64, just less mantissa?
15:49 fdobridge: <gfxstrand> That makes sense, I guess.
15:49 fdobridge: <karolherbst🐧🦀> correct
15:49 fdobridge: <gfxstrand> COol
15:49 fdobridge: <gfxstrand> And cbuf sources just load 8b?
15:49 fdobridge: <gfxstrand> And cbuf sources just load 8B? (edited)
15:49 fdobridge: <karolherbst🐧🦀> correct
15:49 fdobridge: <gfxstrand> Cool
15:49 fdobridge: <karolherbst🐧🦀> well...
15:49 fdobridge: <karolherbst🐧🦀> mhhh
15:49 fdobridge: <karolherbst🐧🦀> wait
15:49 fdobridge: <karolherbst🐧🦀> there is an exception
15:49 fdobridge: <gfxstrand> Naturally....
15:49 fdobridge:<karolherbst🐧🦀> funky
15:50 fdobridge: <karolherbst🐧🦀> sooo
15:50 fdobridge: <karolherbst🐧🦀> if offset & 4 == 0 then yes, it loads 8b
15:50 fdobridge: <karolherbst🐧🦀> if offset & 4 == 4, then it only loads 4b
15:50 fdobridge: <karolherbst🐧🦀> same as imm32 then
15:50 fdobridge: <gfxstrand> Oh, hell
15:50 fdobridge: <gfxstrand> That's not a giant corner case 🙄
15:50 fdobridge: <karolherbst🐧🦀> well.. all loads have to be aligned by default anyway
15:51 fdobridge:<gfxstrand> thinks
15:51 fdobridge: <karolherbst🐧🦀> and apparently the hardware might report it as an error, not sure about some of the phrases I see
15:51 fdobridge: <gfxstrand> I think I'm going to assume we don't care about that case
15:51 fdobridge: <gfxstrand> I don't see why it'd be valueable
15:51 fdobridge: <karolherbst🐧🦀> mhh
15:51 fdobridge: <karolherbst🐧🦀> yeah don't see it either
15:51 fdobridge: <gfxstrand> Good to know it happens but I don't think there's any reason we'd want to generate that code.
15:52 fdobridge: <karolherbst🐧🦀> I'm sure there is some wonky optimization being able to make use of it, but...
15:52 fdobridge: <karolherbst🐧🦀> it was more pressing in the pre volta days
15:52 fdobridge: <karolherbst🐧🦀> now that we have 32bit imms everywhere it's kinda ... pointless
15:53 fdobridge: <karolherbst🐧🦀> before that there were some instructions able to load from cbuf, but not able to use 32 bit imms
15:53 fdobridge: <gfxstrand> Right
15:56 fdobridge: <karolherbst🐧🦀> it's kinda a super oddball case... I can't imagine nvidia engineers putting that into hardware because they needed it, probably it just how the hardware works and it was for free 😄
16:25 fdobridge: <gfxstrand> Could be
22:46 fdobridge: <gfxstrand> The nvidia increment atomic is funky...
23:13 fdobridge: <karolherbst🐧🦀> howso?
23:14 fdobridge: <karolherbst🐧🦀> because it's 32 bit only?
23:38 RSpliet: iirc, NVIDIA could do 32-bit atomic increment on floating point numbers too...
23:38 RSpliet: or maybe just reduce, not increment. Not 100% sure on that one. Also, that's Kepler-knowledge, which I realised recently is about a decade old
23:41 fdobridge: <karolherbst🐧🦀> well.. you have an add atomic
23:42 karolherbst: the fun part: atomics on floats only work on FTZ.RN
23:43 karolherbst: mhhh.. min/max works on F16x2.. interesting
23:44 karolherbst: ohh wait.. now I know what Faith meant with INC/DEC
23:44 karolherbst: how funky is that...
23:44 karolherbst: INC/DEC actually takes a source
23:46 karolherbst: interesting...
23:46 karolherbst: .DEC is even weirder
23:47 fdobridge: <karolherbst🐧🦀> @gfxstrand do you or have you figured it out already or should I spoiler? 😄
23:50 karolherbst: it's kinda a cool feature tho
23:50 karolherbst: this is really neat for some use cases I figure
23:55 fdobridge: <gfxstrand> Yeah, I found it in the PIX docs. It's definitely funky
23:55 fdobridge: <karolherbst🐧🦀> I'm sure it can be useful for some loops where you loop multiple times over the same range
23:55 fdobridge: <karolherbst🐧🦀> not sure how often that's useful tho
23:56 fdobridge: <karolherbst🐧🦀> mhh
23:56 fdobridge: <gfxstrand> If you're implementing something with a fixed-size ring buffer.
23:57 fdobridge: <karolherbst🐧🦀> yeah.. fair point
23:57 fdobridge: <gfxstrand> If it's power-of-two, you can just let it wrap. If it's not, you need something funky like that.
23:57 fdobridge: <karolherbst🐧🦀> sharing between multiple threads/devices
23:57 fdobridge: <karolherbst🐧🦀> yeah
23:59 fdobridge: <karolherbst🐧🦀> I can see it being useful to implement CL pipes