IRC Logs of #nouveau on irc.freenode.net for 2023-08-09

02:23 fdobridge: <gfxstrand> `Pass: 402081, Fail: 829, Crash: 37, Skip: 1729861, Timeout: 2, Flake: 560, Duration: 1:14:14`
02:24 fdobridge: <gfxstrand> Things are looking pretty stable vs. old API now. 🥳
02:24 fdobridge: <gfxstrand> :triangle_nvk:
02:25 fdobridge: <gfxstrand> Kicking off conditional render now.
04:12 fdobridge: <gfxstrand> CTS looks good. Added features.txt and assigned Marge.
04:15 fdobridge: <airlied> yay GL4.5 should be showing up now with zink
04:15 fdobridge: <airlied> 4.6 needs vk1.1 subgroup stuff
04:52 fdobridge: <airlied> Scalar block layout would also be good, probably NAK territory
05:47 fdobridge: <gfxstrand> Yeah. We could enable it now but I don't trust codegen.
05:50 fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> To not do wacky stuff with images?
06:00 fdobridge: <airlied> Nah wacky stuff with loads and stores
06:34 fdobridge: <airlied> @gfxstrand is write_image_view_desc: Assertion `view->planes[plane].storage_desc_index > 0' failed.
06:34 fdobridge: <airlied> a codegen limit?\
07:50 airlied: dakr, gfxstrand : I suspect a possible locking problem between the intr spin lock and the fence context spin locks, I'll see if I can summon the brain power to figure it out tomorrow
09:17 ilgaz: Hello everyone. There seems to be an issue on latest openSUSE Tumbleweed kernel on nvidia 9400/macbook 5,1 laptop. A SuSE person fixed it and I thought I should share with you. https://bugzilla.opensuse.org/show_bug.cgi?id=1214073
10:13 raket: karolherbst: Hello! is there any hope for re-introducing gm200 reclocking again? i've run it since 2019/2020 and it works fine and haven't crashed a single time. however, with newer kernels, 6.5, i can't get it to work. is it better to just buy a gtx 780?
10:57 karolherbst: not sure, it was quite a bit of a hack to begin with as it messes around with the firmware loading and nothing we really want to support I think
10:58 raket: ok! thanks! i will get rid off the gm200/980ti and replace it with a gtx 780.
10:58 raket: ;-)
11:08 ilgaz: Hello karolherbst should I fınd the 3 patches mentioned by openSUSE engineer at https://bugzilla.opensuse.org/show_bug.cgi?id=1214073
11:23 ilgaz: I am not sure if they are openSUSE patches or nouveau kernel patches. Once I switched to the kernel he built, system started to boot fine.
11:24 karolherbst: ilgaz: what is it that they changed?
11:31 ilgaz: I better find out. I get really lost in kernel rpm stuff. He also mentions someone on mailing list (it is their primary communications platform). While on it, I found a big issue with "nomodeset" thing with nvidia 9400, I will report it to UEFI people as he said.
11:32 ilgaz: sorry not UEFI, EFI (as this is a Mac)
12:07 ilgaz: karolherbst: found something, I am still looking +xxx patches.kernel.org/6.4.7-047-drm-nouveau-disp-PIOR-DP-uses-GPIO-for-HPD-not-.patch
12:08 ilgaz: on line 1227 https://build.opensuse.org/package/view_file/home:tiwai:bsc1214073/kernel-source/series.conf?expand=1
12:09 karolherbst: so they probably just removed that patch
12:21 ilgaz: karolherbst: Here is the patch, Ben Skeggs <!!!@redhat.com> https://paste.opensuse.org/pastes/4599d5c02d2b
12:21 karolherbst: that's just the patch causing the regression
12:24 ilgaz: Oh you are the reviewer. Sorry the weather caused my brain causing firmware underclocking :-)
12:24 ilgaz: I shouldn't mail anyone I guess?
14:04 fdobridge: <gfxstrand> That probably means they didn't set STORAGE in the view usage flags.
16:03 culuar: you land an exponent function on those contiguous bits, so 1 represents bit two in power of zero, which is one, and you just make exponent through the rippling elimination technique, i figured maybe you do not understand that part, it's the easiest, like what is the formula to feed to solver after the stream has been lifted like bitwuzla does it well
16:06 culuar: so 32 in minified, represents the bit the last bit in power of 32 from two
16:07 culuar: zero is missing, but it can be 33 for an example
16:14 culuar: that kind of sum is 528 + 33 makes some number, and their indexes, with a subtract a subprobem will ripple through to next index, once you make the correct , and the exponent functionality just later makes a 32 bit value from it, but you can make all the airthmetic by treating the sums , multiplies etc. from of 528, you can pack a whole lot of performance into it, but solvers are rather hard to read, likely glasgow solver is thin and nice, which can do
16:14 culuar: it, handles events too, all commented
16:18 culuar: the solvers to code are very difficult, out of my current league, but luckily or probably it suites, but to test it through it takes a at least to winter time for me
16:18 culuar: then i could say i can offer the full version
16:23 culuar: anyways bounds is exactly the same thing as mask or inverse value, they do it correctly, but show me they guy who says that writing solver of that kind is easy!
16:23 culuar: its a very difficult thing to do
16:42 culuar: inference is a runtime routine, only a little happens in the compiler, but propagation is a compile time routine except when the events of variable uploads kick in
16:43 culuar: it is pretty difficult to test the solver too, it was already hard to google those things
16:43 culuar: that such people exist, who do in unis out of magazines great code
16:44 culuar: in general this is hard work to plumb it in
16:45 culuar: i can see, that the solver is meant to handle it, but it may have bugs i dunno
16:52 culuar: it's not like i am a retard cause of struggling with it, cause it's all hard task
16:56 culuar: in theory such thing should be marked, y=528 t=1,t2=2 ...t32=32 x=y OR t1 OR y OR t2.......
16:56 culuar: many ways how to enter it, but it is meant to be solved like this
16:57 culuar: so the basic block is annotated with utter big line of such relations, and you just infer the solution and can loop print every variable and lift it to mazimized values
17:06 culuar: and finally when you decompose those into smaller subproblems, and take a trace of those buffers, you can shortcut the compiler into very little procedures
17:06 culuar: but it is still very difficult to find those lines, though i am somewhat working
17:06 culuar: on it
17:07 culuar: i am sure if i worked every day i am done in 2024 and no longer performance issues would exist
17:09 culuar: but none know what i am or understand what i do here, and i am nearly dead cause of their nonsense and terror
17:11 culuar: anyways i am off to working more today, and i am freaking tired of this crap, simply put, it's a needed feature under linux or whatever os cpu and gpu stack, and it would work even better due to data locality on FPGAs
17:12 culuar: but up to this point , i never saw something more capable than fpga's, never know what tetramem does, and nuclear chips the most powerful there is already regulation that most never reach to there
17:14 culuar: i mean you just have a look at those files i land here, and just try to participate too, drivers have been stable for ages to me
17:14 culuar: would just need to get more performance on lower end hw
17:28 culuar: today is a last day i work on 12 hours on reading this code, due to financial troubles from where i soon recover, next month i work almost full month and forward from there, so freely have a look at the glasgow solver too
17:41 culuar: https://github.com/ciaranm/glasgow-constraint-solver/blob/main/gcs/innards/state.cc that's an interesting file so far ctrl-f for word, you see how they pack bits into machine word
17:41 culuar: actually better to just trace with some debugger
17:42 culuar: and isolate favourably loopless runtime procedures to start with, which inference the results
17:43 culuar: then add a bitformat to annotate, their container i have not heard before OPB
17:45 culuar: i once calculated, that roughly 10000 instructions can be executed in few cycles like this
17:46 culuar: there is even better encoding than 528 mentioned , but it's not so simple to envision
17:46 culuar: mass programming demands simplicity
17:50 fdobridge: <gfxstrand> So, those test results look pretty good but I don't like that flake number.
17:58 fdobridge: <gfxstrand> I've got three theories:
17:58 fdobridge: <gfxstrand> 1. Something's going wrong with descriptor tables where we aren't cache invalidating them enough or something.
17:58 fdobridge: <gfxstrand> 2. GPU hangs from one context are destroying another.
17:58 fdobridge: <gfxstrand> 3. The GPU hang actually happens in an earlier test and we just don't detect the hang in time to abort.
17:58 fdobridge: <gfxstrand> I've got three theories:
17:58 fdobridge: <gfxstrand>
17:58 fdobridge: <gfxstrand> 1. Something's going wrong with descriptor tables where we aren't cache invalidating them enough or something.
17:58 fdobridge: <gfxstrand> 2. GPU hangs from one context are destroying another.
17:59 fdobridge: <gfxstrand> 3. The GPU hang actually happens in an earlier test and we just don't detect the hang in time to abort. (edited)
17:59 fdobridge: <gfxstrand> I've got three theories:
17:59 fdobridge: <gfxstrand> 1. Something's going wrong with descriptor tables where we aren't cache invalidating them enough or something.
17:59 fdobridge: <gfxstrand> 2. GPU hangs from one context are destroying another.
17:59 fdobridge: <gfxstrand> 3. The GPU hang actually happens in an earlier test and we just don't detect the hang in time to abort. (edited)
18:13 fdobridge: <gfxstrand> I'm going to disable graphicsfuzz and see if that stabilizes results at all.
18:15 dakr: @gfxstrand: wrote you in IRC, still being banned from Discord... :/
18:17 fdobridge: <airlied> I think it's often number 3
18:28 fdobridge: <gfxstrand> Yeah, most of the failures seem to be in big test groups where there's just a lot of tests so statistics aren't in our favor.
19:36 fdobridge: <airlied> think we just have to fix more of the gpu crashers 🙂
21:00 fdobridge: <gfxstrand> Well, disabling graphicsfuzz helped reduce runtime by 2 minutes but didn't really reduce flakes. 😕
21:03 fdobridge: <airlied> okay fence, intr and event locks have a mexican standoff situation at least on gsp + uapi merged
21:03 fdobridge: <airlied> also causes iwlwifi flakey backtraces
21:04 fdobridge: <airlied> since I think they stall out some workqueue
21:05 fdobridge: <gfxstrand> Oh, iwlwifi doesn't need gsp+uapi. I had to shut off iwlwifi on just gsp
21:05 fdobridge: <gfxstrand> Oh, iwlwifi doesn't need gsp+uapi. I had to shut off iwlwifi on just uapi (edited)
21:25 fdobridge: <georgeouzou> Fixed some crashes on pipeline multisample sample-locations tests
21:29 fdobridge: <airlied> yeah this problem is probably not gsp related, just haven't ruled it out
21:32 fdobridge: <esdrastarsis> Is there any kernel tree updated with uapi+gsp?
21:34 airlied: nope, not unless you count my hack branches
21:34 airlied: which if you build from you agree never to ask me anything about
21:41 airlied: dakr: did you send the fence workaround patch?
21:42 dakr: airlied: sending it out after supper.
21:43 airlied: cool!
22:16 fdobridge: <gfxstrand> @karolherbst Do CVT instructiosn with 16-bit destinations leave the top 16 bits zero or do they smash over it with a sign extension or whatever?
22:19 dakr: airlied: https://lore.kernel.org/dri-devel/20230809221729.3657-1-dakr@redhat.com/T/#u
22:20 fdobridge: <karolherbst🐧🦀> sign extended
22:21 fdobridge: <karolherbst🐧🦀> but
22:21 fdobridge: <karolherbst🐧🦀> there are exceptions
22:21 fdobridge: <gfxstrand> ?
22:21 fdobridge: <karolherbst🐧🦀> a NaN input only sets the highest bit of the result type and doesn't sign extend
22:22 fdobridge: <karolherbst🐧🦀> so a NaN .s16 gives you 0x8000
22:22 fdobridge: <gfxstrand> Weird
22:23 fdobridge: <karolherbst🐧🦀> yeah... CVT is weird and best avoided if you don't absolutely need it
22:24 fdobridge: <karolherbst🐧🦀> we've used CVT in the past to implement NEG/ABS, but turns out doing a ADD instead is faster as well, so I hope you are not planning of doing that as well 😛
23:02 fdobridge: <gfxstrand> Okay, robustness for 1D and 2D array textures fixed.
23:03 fdobridge: <gfxstrand> Not a source of hangs, unfortunately.
23:04 Mangix: karolherbst: rusticl have a channel?
23:14 karolherbst: yeah, #rusticl
23:55 culuar: https://en.wikipedia.org/wiki/Unit_propagation it looks like wikipedia has great content, inference is also called a resolution, which is referenced/linked from the offered page.