00:17mandelstays: over engineering it appears it's the first 4bits for the clue, if those are one generate ones and remove and inverse, if those are some other values, generate them and remove and inverse, dma controller should handle unsigned sub fine. in that case you have 4higher bits removed on max minus value, and you can work on inverse values :) just revert back later , just eliminating higher 4bits is definitely enough , but then one pins another conditional
00:18mandelstays: those are in the hash, yeah its possible to be carried out it seems
01:00fdobridge_: <gfxstrand> Okay, I'm very confused as to how anything works in fragment shaders. We're binding FS to index 5 but not binding cbufs to index 5
01:02fdobridge_: <gfxstrand> `PIPELINE_SHADER_TYPE` does 1 2 3 4 5 where 0 is reserved for some magic vertex thing.
01:03fdobridge_: <gfxstrand> But then `CONSTANT_BUFFER` is zero-indexed?!?
01:03fdobridge_: <airlied> @gfxstrand pipe_shader_type_from_mesa is totally reduntant now
01:03fdobridge_: <airlied> just noticed we have one of those in nvk
01:04fdobridge_: <airlied> oh it's not even used
01:05fdobridge_: <airlied> will throw an MR up in a while
01:06fdobridge_: <gfxstrand> I'll delete it
01:06fdobridge_: <gfxstrand> I'm doing a bunch of stuff in that area anyway
02:24fdobridge_: <airlied> ah codegen has some of it,but we can drop the conversions
02:28fdobridge_: <airlied> 26583 has it all removed
03:46fdobridge_: <airlied> oh fp64 is turned on with zink now, some extra fails
04:06fdobridge_: <airlied> oh wait fp64 was already on, guess I get to bisect 😛
04:09fdobridge_: <gfxstrand> Yeah, we don't have FP64.
04:09fdobridge_: <gfxstrand> TBH, it's probably not that hard to wire through at this poing.
04:09fdobridge_: <gfxstrand> *point
04:09fdobridge_: <gfxstrand> Mostly typing.
04:11fdobridge_: <airlied> I wonder what caused the regression then, I'll bisect in a few minutes
04:32fdobridge_: <airlied> @gfxstrand commit 00be041ffcb01aa70b582361755e71cc672f49d1
04:32fdobridge_: <airlied> Author: Benjamin Lee <benjamin@computer.surgery>
04:32fdobridge_: <airlied> Date: Wed Nov 15 13:25:17 2023 -0800
04:32fdobridge_: <airlied>
04:32fdobridge_: <airlied> nak: implement SHL and SHR on SM50
04:32fdobridge_: <airlied> causes a regression on turing with GL CTS fp64 tests
04:36fdobridge_: <airlied> https://gitlab.freedesktop.org/mesa/mesa/-/issues/10266
04:36fdobridge_: <airlied> @vdpafaor ^
04:51fdobridge_: <benjaminl> hmm... it looks like there were two changes to the SM70 codegen
04:51fdobridge_: <benjaminl> disabling `wrap` on SHF and swapping the data type for SHF.L to I32
04:52fdobridge_: <benjaminl> disabling `wrap` on SHF.* and swapping the data type for SHF.L to I32 (edited)
04:52fdobridge_: <benjaminl> I don't have SM70 hardware to test with, but would expect both of those to be equivalent
04:58fdobridge_: <airlied> the wrap change seems to fix it
05:02fdobridge_: <benjaminl> definitely missing something here
05:02fdobridge_: <benjaminl> `nir_{i,u}sh{l,r}` are non-wrapping, right?
05:03fdobridge_: <benjaminl> and my understanding is that `wrap` on `SHF` wraps the 64-bit value, so it wouldn't be correct for 32-bit wrapping shifts anyway...
05:04fdobridge_: <airlied> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26586 fixes it anyways, @gfxstrand might have a better idea
05:05Liver_K: Still don't know how to apply that conf o/
05:06Liver_K: "TearFree" whatever that is an option to
05:07airlied: I doubt it'll do anything, not even sure when modesetting started to support it
05:10Liver_K: Well nobody's offered anything else so I need to try it
05:10Liver_K: Also I'm still wondering my last question about why the nouveau drv module isn't being loaded
05:11airlied: we don't use it anymore
05:11airlied: might be worth checking with a live usb if another distro has the same bugs
05:12airlied: not sure what debian that is, but debian and being up to date never inspire my confidence
05:13airlied: is it a new install, or has it happened before?
05:18Liver_K: It's bookworm
05:19Liver_K: airlied: This has been a problem since I put i3 on here
05:19Liver_K: Just dealt with it for a few months since then
05:20airlied: ah so other wm's didn't have the problem?
05:20Liver_K: I didn't use any other wm's on this machine
05:21Liver_K: I installed it with GNOME by accident and then nuked all of GNOME and installed i3-wm
05:21Liver_K: Installed Debian with GNOME by accident that is
05:39airlied: will be a pain to track down then, since it's old hw + X.org + i3, probably a mesa bug but could also be glamor
05:42Liver_K: Love to track it down anyway xD
05:45airlied: you could try install xserver-xorg-video-nouveau and see does it fix it
05:47Liver_K: That's already installed
05:50fdobridge_: <airlied> https://paste.centos.org/view/raw/b6cb6e92
05:50fdobridge_: <airlied> maybe try xorg.conf with that
05:51fdobridge_: <gfxstrand> @karolherbst Can F2F still do FRnd things on Turing? We don't have DRnd so I either need to use that or emulate it. If Turing F2F can do those things, why did they make a separate FRnd?
05:52Liver_K: airlied: Does the logfile not tell you if it is already using the nouveau driver at that verbosity?
05:52airlied: nope it's not using it
05:54Liver_K: Is it the "[ 691.977] X.Org Video Driver: 25.2" line?
06:01fdobridge_: <airlied> there should be a LoadModule: line for each module it loads
06:04Liver_K: Ok so the modesetting module is mutually exclusive to the nouveau one? As in they are both video driver DDXes that you can pick between?
06:05airlied: yes only one can be used at a time, modesetting should work, but bugs in the 3D driver might be breaking it
06:06Liver_K: Well I've checked out the feature matrix for nouveau and my GF110 is totally supported by nouveau, and I have all of it installed after all and I kind of thought I was already using it lol
06:06Liver_K: Anyway I will apply that xorg.conf next thanks
06:06airlied: the project name is nouveau but that covers a lot more than the ddx
06:07airlied: it just happens the ddx is called after the project
06:07airlied: supported doesn't mean it all works or keeps working :)
06:07Liver_K: ... What other video driver ecosystem components does nouveau cover?
06:09airlied: kernel driver, and 3d driver
06:09airlied: in fact the DDX is deprecated in favour of the generic modesetting one
06:10airlied: just happens the 3D driver still likely has bugs, and not many people care
06:15Liver_K: What function does the kernel driver serve if not what the 2d acceleration DDX and 3d driver do?
06:16airlied: they are all separate pieces of a driver stack
06:17airlied: https://blogs.igalia.com/itoral/2014/07/29/a-brief-introduction-to-the-linux-graphics-stack/ probably a good background
07:11Liver_K: Ah thanks, I'll read that in a bit
07:27mandelstays: To be honest , i do not calculate or expect what you want or need, you are garbage to me, completely deluded set of trash. And i know you get handled by vaccines too, which is super fun. Trash should be behaved alike with, you bullied you got owned. It's not like you are not suicidal, we understand that you want to go, if i was as big as crook i'd be too, we just need to work against your will that you can't bring me down before you get killed.
09:18fdobridge_: <dadschoorse> they are, they use DXBC SM5 behavior, which always wraps
12:15fdobridge_: <!DodoNVK (she) 🇱🇹> :triangle_nvk:
12:15fdobridge_: <!DodoNVK (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1182656610242408498/Screenshot_20231208_141244.png?ex=65857dcc&is=657308cc&hm=c91e129212b059c4f4964d0e8e213a18f42f53d4cf1cad1f7406054cb911e775&
12:28fdobridge_: <huntercz122> since gpu stats work?
12:32fdobridge_: <!DodoNVK (she) 🇱🇹> They don't on nouveau (temperature reporting is missing too on GSP)
12:33fdobridge_: <!DodoNVK (she) 🇱🇹> So I put fake values in MangoHUD to make it clear they don't work
12:45fdobridge_: <!DodoNVK (she) 🇱🇹> Has anyone tried benchmarking NVK with vkoverhead? :triangle_nvk:
12:55fdobridge_: <huntercz122> ok that 69 number now makes sense lol
12:56fdobridge_: <huntercz122> 😼
14:19mandelstays: I do not care if you wonder that i work on stone age hw, such limitations like no overflow, underflow , control flow , no bitshifts is cause to offload that lifting computation to dma, and there only sub and add, are easy to expose on all hardware with any number of channels. But the least four to 3 bits of highest significance possibly need to be removed if they are set, to fit field permutes/possibilities, so dri-devel attempt appeared to be
14:19mandelstays: random even incorrect.
14:19mandelstays: there are likely better ways to read 4 top bits.
15:08fdobridge_: <gfxstrand> Nice!
15:10fdobridge_: <gfxstrand> Not that I know of but I'm really not worried about it.
15:13fdobridge_: <!DodoNVK (she) 🇱🇹> You might be missing those 1000% performance increases though 😅
15:13fdobridge_: <!DodoNVK (she) 🇱🇹> https://youtu.be/Z6XLwkyo6Nw?t=1367
15:27fdobridge_: <gfxstrand> Yeah... That's the thing. Vkoverhead is good for memes. Less good for actual users.
15:29fdobridge_: <gfxstrand> Like, it does matter
15:29fdobridge_: <!DodoNVK (she) 🇱🇹> So vkoverhead is a meme but zink isn't?
15:29fdobridge_: <gfxstrand> Just... Not exactly top priority and a 1000% improvement in vkoverhead likely translates to 1-2 FPS at most.
15:30fdobridge_: <gfxstrand> It can be valuable and it's probably caught or at least highlighted real issues.
15:31fdobridge_: <gfxstrand> And there are apps that shove so many draw calls to the hardware that that stuff does legitimately matter.
15:45fdobridge_: <rhed0x> vkoverhead is something for mature drivers
15:47fdobridge_: <gfxstrand> Yeah, there are a lot more pressing performance things to look at before we try to optimize the shit out of vkCmdDraw.
15:47fdobridge_: <gfxstrand> And I'm really not worried about it. We have exactly one per-draw bottleneck and I know what it is without that benchmark.
15:48fdobridge_: <gfxstrand> Like maybe it'd point out something I'm not aware of bit I find it unlikely.
15:48fdobridge_: <gfxstrand> Like maybe it'd point out something I'm not aware of but I find it unlikely. (edited)
15:59fdobridge_: <!DodoNVK (she) 🇱🇹> On a more serious note enjoy this crashfest :triangle_nvk:
15:59fdobridge_: <!DodoNVK (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1182713074084479036/nvk_vkd3dtest_journaldmesg.log?ex=6585b262&is=65733d62&hm=a80c0e19305487bff0f072d363095b9d07785b4f36fb9d2e29b963330a917153&
16:32fdobridge_: <gfxstrand> That's fun...
18:56fdobridge_: <esdrastarsis> Another d3d12 sample, but without textures
18:56fdobridge_: <esdrastarsis> https://cdn.discordapp.com/attachments/1034184951790305330/1182757698857599107/20231208_15h55m26s_grim.jpeg?ex=6585dbf1&is=657366f1&hm=7b491dc73ad71d52ce6f0c8ed7a866ea7e0e01c97e71816a6be903ad08ac4a67&
19:18fdobridge_: <airlied> Do you have the tlb flush patch in 6.7rc4?
19:25fdobridge_: <!DodoNVK (she) 🇱🇹> Do you mean this one?: https://lore.kernel.org/dri-devel/20231130010852.4034774-1-airlied@gmail.com/
19:29fdobridge_: <airlied> yeah, it might avoid one of the explosions
19:30fdobridge_: <!DodoNVK (she) 🇱🇹> I guess I should update my kernel then
22:37merineitsi: I realized actually that there is no way to handle the 0 vs max in unsigned with restrictions such as no control flow, no underflow , no overflow and only sub/add , except interpolating a function into hash, i.e doing two samples and making conditional permutes, so i am ready to work out the stuff now. So you do "max-0 is max , 0+half.of.max-half.of.max-3 is 3" but samewise max-max is max+half.of.max-half.of.max-3 is half.of.max-3 ,
22:37merineitsi: it's not yet done or anywhere close to perfect, but its along such lines, need to sleep, two samples are needed. and a conditional index, nothing works otherwise. This is my last stone on road or was before starting coding, i am a brick at times.