IRC Logs of #dri-devel on irc.freenode.net for 2023-09-28

00:05 kode54: I'm also waiting on gsc-mei or similar for a "single gen of hw" for xe.ko
00:05 kode54: so that HuC firmware works
00:06 kode54: upstream doesn't want to do that because that "single gen of hw" is different from preceding and following generations
00:06 kode54: except it's literally the first two generations of dedicated GPUs from Intel
00:06 kode54: well, no
00:06 kode54: DG1 is early gen, but has the benefit of working with the old method
00:07 kode54: and apparently DG3 or Xe2 is going to use something new, similar to MTL
00:07 kode54: nobody seems to have surfaced media or CL support for Xe.ko yet, either
00:08 kode54: I'm currently using a Polaris 10 GPU with PRIME for emulators, since those otherwise run like crap on the Arc cards, even though they should be technically faster
00:33 alyssa: kode54: likely dxvk bug, if the VVL is complaining
00:34 alyssa: thoug crashing the GuC seems.. extreme
00:35 kode54: It only crashes the GuC on xe.ko, not i915.ko
00:36 kode54: I posted an apitrace of enough dx11 to cause the error (four whole frames, crashes on the first)
00:36 kode54: Doesn’t crash with wined3d either
00:37 kode54: But wined3d has other issues with this game, such as the gui occasionally causing polygon explosions that obscure the entire viewport and don’t go away unless I either dismiss the elements by picking up the item spawning them, or quit the game
00:38 kode54: It’s basically what happens when a dev takes an early unreal engine game and ports it to newer unreal engine, from dx9 to dx11
00:40 DemiMarie: kode54: gsc-mei is only needed for restricted content, which is something that I suspect most upstream developers only support reluctantly. The GuC can authenticate the HuC sufficiently for unrestricted media workloads.
00:41 DemiMarie: In fact, one could make an argument that all restricted content support should be dropped because (IIUC) it requires closed source userspace and drivers/gpu requires open source userspace for all APIs.
00:43 DemiMarie: alyssa: what I meant is that crashing the GuC is a Xe bug by definition, irrespective of whether the Vulkan API usage was valid or not.
00:43 kode54: DG2 needs gsc to upload the HuC firmware
00:43 DemiMarie: Ah
00:43 DemiMarie: I had forgotten about those GPUs
00:43 kode54: HuC firmware is needed for things like bitrate control
00:43 DemiMarie: why is that important?
00:43 kode54: Some people want to use this for streaming
00:44 DemiMarie: How does the quality of the produced video compare to software encoding?
00:44 kode54: It can encode 1080p av1 at over 600fps
00:45 DemiMarie: I’m talking about quality vs bandwidth of the output
00:45 kode54: Guess I’ll stick to taking a whole 24 hours to encode a movie
00:45 DemiMarie: Does SW encoding parallelize well?
00:45 kode54: I don’t know how the quality compares to software
00:46 kode54: Av1 can’t parallelize at all unless I use tiled encoding
00:46 DemiMarie: Tiled encoding?
00:46 kode54: Splitting the frame into tiles and encoding them separately
00:46 kode54: Ffmpeg does support this for svt av1
00:47 kode54: It also requires a massive amount of memory
00:47 kode54: But the quality is probably orders of magnitude better than the hardware
00:48 kode54: I’ll have to test it
00:49 DemiMarie: Context: in Qubes OS I want to expose the minimum acceleration necessary, because each GPU feature that is exposed is extra attack surface. Battery life is terrible anyway because of e.g. wakeups not getting batched by the hypervisor.
00:50 DemiMarie: kode54: does the HuC firmware do any parsing of media during decode?
00:51 DemiMarie: If so, that is a hard no for Qubes OS and will require that hardware decode remain unused.
00:51 DemiMarie: How much memory are you talking about?
00:52 kode54: https://gist.github.com/mrintrepide/b3009f5d0f08d437ebbb4c17cbf36e18
00:53 kode54: HuC isn’t needed for decode
00:53 kode54: But you will need GuC anyway for xe.ko, since it uses GuC scheduling only
00:54 kode54: Not sure how much of i915 needs it for newer cards
00:54 kode54: Does Qubes just throw out all proprietary firmware?
00:54 DemiMarie: GuC doesn’t talk to untrusted inputs so doesn’t bother me
00:54 kode54: Oh
00:55 DemiMarie: kode54: No, but we are very skeptical of exposing proprietary firmware to potentially hostile inputs.
00:55 kode54: Gotcha
00:56 DemiMarie: Hence all of my concerns about userspace command submission.
00:57 DemiMarie: gfxstrand finally persuaded me that the inputs to the firmware are so simple that this is not a serious concern in practice, especially since the doorbells must be proxied when virtualization is in use.
00:57 DemiMarie: However, if the video firmware were to e.g. parse H.265 inputs, that would be a hard no.
01:01 DemiMarie: Since I don’t do any video production (only calls), the metric that matters to me is realtime quality: maximum quality that can be encoded in real time at a given bandwidth constraint.
01:02 DemiMarie: Is SW or HW better for that?
01:02 kode54: For av1, software can’t really do real time
01:02 DemiMarie: Is that why video calls use other codecs like H.264?
01:03 kode54: H.264 is the lowest common denominator, yes
01:03 DemiMarie: What is the best codec for software real-time encoding?
01:03 kode54: Good enough for general use, fast enough to do in software
01:04 DemiMarie: Gotcha
01:08 DemiMarie: Ah, apparently AV1 can be software encoded in real time but there are tradeoffs
01:49 mareko: zmike goes all in on those antimodifiers ;)
01:55 ascent12: Is there any equivilent of USE_SCANOUT for vulkan? Or is it always expected to use GBM for allocation of buffers for KMS?
02:01 zmike: mareko: just let it end 🤕
02:05 kode54: yes, let those crappy old cards end
02:06 kode54: and let vendors not bother to come to market if their contribution is going to be the Arc
02:06 kode54: otherwise I would have just bought another AMD card
02:07 kode54: or maybe a nice 3060
02:12 mareko: more modifier-dependent stuff isn't a bad thing
02:34 mareko: also the linear modifier isn't compatible between GPUs, so interop isn't always guaranteed to work
02:36 mareko: we should have LINEAR_64B, LINEAR_128B, LINEAR_256B, etc.
02:39 mareko: #define DRM_FORMAT_MOD_LINEAR_ALIGNED(pitch_align) fourcc_mod_code(NONE, pitch_align) // there I fixed the linear modifier for you
03:04 kode54: I'm kind of mad I bought the wrong GPU though
03:05 kode54: wish I knew why ANV was so CPU dependent
03:11 ids1024[m]: Instead of having different modifiers for different linear pitch alignments, would it make sense to have an EGL extension method to query what alignment is required for import, and a gbm function that allocated with a given minimum alignment? Are alignment issues like this applicable to any modifiers other than linear?
03:14 ids1024[m]: And I guess the same minimum alignment would apply to both offset and pitch?
03:19 mareko: only linear
03:20 mareko: we haven't run into an issue with offset alignment, though there is probably something too
03:21 mareko: modifiers contain all information about themselves, I don't think any query API can impose additional restrictions
03:23 mareko: AMD only has 256B pitch alignment and Intel had to increase their alignment to match AMD to make interop work, but it's not sustainable or compatible with anything else
03:25 soreau: vkICantBelieveItsNotModifiersEXT
04:46 kode54: https://share.icloud.com/photos/0e4zn_Tf3J9bidOAxFuLlLyEQ
04:46 kode54: flickering on the bottom of the compositor
06:27 kode54: damn
06:27 kode54: this annoys the hell out of me
06:28 kode54: I had plugged in my Radeon card as a second GPU because I thought my primary GPU was too slow at running Yuzu
06:28 kode54: turns out it was running too slow even with xe.ko because I left Perfetto support enabled
06:52 mripard: airlied: could you ack/review https://lore.kernel.org/all/20230921105743.2611263-1-mripard@kernel.org/ ?
07:04 airlied: mripard: a-b
07:04 mripard: thanks :)
07:06 lumag: pinchartl, just another ping for https://lore.kernel.org/linux-arm-msm/CAA8EJprBGrG0qMO3yrPxcPZu8kqcOZNw6htZZSKutYfFcZxBfQ@mail.gmail.com/ (and another answer in that thread).
07:07 lumag: The mesa CI is now hitting this issue (cc DavidHeidelberg), so we'd like solve this somehow. Either using this approach or some other one.
07:31 kode54: um
07:31 kode54: some person is trying to merge new functionality into wlroots to support tearing protocol
07:31 kode54: they seem to think that they need `DRM_MODE_PAGE_FLIP_ASYNC` and that atomic modesetting doesn't support that
07:53 javierm: kode54: I don't think https://lore.kernel.org/lkml/20230707224059.305474-2-andrealmeid@igalia.com/ ever landed ?
07:54 kode54: hmm
07:55 kode54: I didn't realize it wasn't implemented
07:59 kode54: oh I see
08:00 kode54: that patch set is currently dependent on some comment changes
08:00 kode54: and emersion has been mostly afk for a while
08:00 kode54: or at least, out of the picture
08:00 kode54: I wish him the best
08:11 psykose: on vacation
08:24 kode54: cool
08:27 pq: DemiMarie, karolherbst, I don't think GBM is the best EGL platform for headless rendering apps, because it's geared towards dmabuf import to KMS. The better choices are surfaceless platform, and maybe something with EGLDevice. Then just use glReadPixels in the app, and do wl_shm yourself. No need to hassle trying to get pixels from dmabuf or gbm_bo copied to CPU. You can probably even pipeline that
08:27 pq: glReadPixels, too, instead of stalling?
08:29 pq: OTOH, to avoid needing to touch app code, that's something you'd need to do in the Mesa EGL implementation or in the guest side Wayland proxy if you have one.
08:30 pq: karolherbst, btw. gbm_bo_map() will do a de-tiling blit when necessary.
08:31 pq: dmabuf mmap() won't
08:39 karolherbst: ah, fair enough
08:47 MrCooper: mareko: one issue with your linear-with-pitch-alignment modifier proposal is that different pitch alignments would always be considered incompatible, even if one is a multiple of the other, so drivers would have to advertise every possible multiple as well
08:48 MrCooper: the general feeling has been that this kind of restriction would need to be handled separately from modifiers
08:52 kode54: multiple of the other should be tested by testing if source is an even multiple of the destination alignment
08:53 MrCooper: not how modifiers work
08:54 kode54: I literally don't know what modifiers even do
08:55 kode54: if it's just recast buffer as another format, why didn't GPUs already support that from the getgo
08:56 pq: kode54, a modifier is an opaque number. Applications filter lists of modifiers without understanding what they mean. Either they match, or they don't.
08:57 pq: then they pass the list forward, e.g. to a driver to allocate something
08:57 kode54: and what the heck are they for
08:57 pq: they are to agree on a buffer layout that everyone involved can understand
08:57 kode54: oh
08:58 kode54: I was under the mistaken impression that buffers were just arbitrarily configurable by a lengthy descriptor block of some sort
08:58 karolherbst: it's more about tiling format
08:58 pq: kode54, please, mind your language. It's making me not want to reply to you.
08:58 pq: layout is tiling format, yeah
08:59 kode54: saying heck is a swear now?
08:59 pq: pixel format is separate
08:59 pq: yes
08:59 karolherbst: though in theory we could make GPUs eat tiling formats of other vendors by retiling through shaders instead of going through linear, but....
09:01 karolherbst: actually..... that shouldn't be too hard, just need a shader reading a tilied buffer through raw ssbos (or whatever) and the GPU automatically tiles it if rendering into a tiled surface...
09:01 karolherbst: not sure if it's worth the effort, but might help laptops
09:02 pq: karolherbst, what about texture filtering?
09:02 karolherbst: why would that be a concern?
09:02 pq: you'd lose the benefits of using dedicated filtering hardware?
09:03 karolherbst: atm you blit into linear for scanout, and the receiving side tries to display it
09:03 karolherbst: that would just skip that linear blit there
09:03 pq: oh, you mean just for blits for display purposes
09:03 karolherbst: yeah
09:03 pq: alright
09:03 karolherbst: like some hardware can't even render to linear e.g. nvidia
09:04 karolherbst: well.. it can as long as you have no depth buffer
09:04 karolherbst: but then again.. no idea if it's worth the effort
09:04 karolherbst: but instead of going the linear route, Intel could e.g. just detile directly for displaying it
09:10 karolherbst: s/for/when/
10:18 zamundaaa[m]: <ascent12> "Is there any equivilent of..." <- Sadly there is not. If you want to have a guarantee that the buffers you get are scanout capable, you have to use GBM
10:19 ascent12: Yeah I was skimming through the WSI code, and I saw some internal struct added onto pNext which seems to do it.
10:20 ascent12: An API would be nice, but it would just be me saving a little bit of code setting up/using GBM, probably too niche a thing to really bother :P
10:49 cwabbott: dj-death: while rebasing the vulkan VK_EXT_feedback_attachment_loop_dynamic_state series I realized that we totally forgot to update the common renderpass/pipeline flag handling code to handle maintenance5 pipeline create flags :/
10:51 cwabbott: I'm adding commits to introduce common helpers for that, switch all of that stuff over to the new pipeline flags enum, and then rebase everything on top of that
10:53 dj-death: cwabbott: thanks, I can review
11:09 cwabbott: dj-death: done, it's part of !25436 now
11:10 cwabbott: I wrote the changes to other drivers blind, so I'm build-testing now
11:34 dj-death: cwabbott: looks correct to me
13:28 snaibc:    ,    ,
13:28 snaibc:    /(_,   ,_)\
13:28 snaibc: \ _/ \_ / irc.d⁠e⁠ft.com
13:28 snaibc: //    \\
13:28 snaibc: \\ (@)(@) // #supe⁠rbowl
13:28 snaibc:   \'=\"==\"='/
13:28 snaibc: ,===/ \===,
13:28 snaibc: \",===\ /===,\"
13:29 snaibc: \" ,==='------'===, \"
13:29 snaibc: \"       \"
13:29 snaibc: snaibc enunes CATS DodoGTA shashanks_ alyssa orbea nchery lstrano_ zf rsripada agd5f CosmicPenguin rcn-ee___ ccaione steve--w macromorgan minecrell mal JohnnyonFlame apinheiro nir2042 gio anarsoul sarahwalker vliaskov tursulin sgruszka Company mvlad sghuge rasterman mripard pcercuei tzimmermann YuGiOhJCJ fab egbert crabbedhaloablut alanc lemonzest rsalvaterra co1umbarius psykose
13:29 snaibc: OftenTimeConsuming jernej_ Dark-Show Daanct12 DemiMarie rcf pochu xroumegue dtmrzgl RSpliet Leopold_ neniagh ungeskriptet jfalempe mairacanal novaisc imre atipls praneeth_ tales-aparecida glennk cef guru_ bbrezillon mauld robmur01 larunbe DavidHeidelberg tonyk ascent12 kos_tom Kayden rgallaispou kem Peuc anholt a-865 Ryback_[WORK] shankaru1 fdu_ dolphin aswar002 pzanoni mattrope_ tristianc6704
13:29 snaibc: noord simon-perretta-img yogesh_m1 sumoon kchibisov ella-0 rosefromthedead rpigott kennylevinsen ifreund MrCooper bnieuwenhuizen sre ndufresne nuclearcat2 i-garrison qyliss hch12907 illwieckz sravn akselmo Ristovski digetx dos1 ced117 dv_ pinchartl italove8 the_sea_peoples Armote[m] treeq[m] karolherbst soreau nightquest cyrozap Mangix Emantor sarnex dhmltb^ Cyrinux9474 danylo Kwiboo mwk_ t
13:29 snaibc: arceri libv eukara rz xxmitsu KitsuWhooa exit70 dwlsalmeida jkhsjdhjs DragoonAethis RAOF mattst88 Namarrgon phryk xypron lcn dliviu leo60228 konstantin TMM melonai339 dschuermann arnd robher kerneltoast jimjams eric_engestrom dianders zx2c4 zmike rodrigovivi rib daniels steev austriancoder zzag markco mdnavare hfink hashar rg3igalia olv linusw kxkamil neggles DPA q66 linkmauve immibis azerov
13:29 snaibc: Surkow|laptop Stary tyalie lina zehortigoza Lyude vdavid003[m] Net147 invertedoftc09691 xantoz kugel pixelcluster mareko dri-logger glisse vup doras pankart[m] T_UNIX Sofi[m] moben[m] pushqrdx[m] siddh shoffmeister[m] orowith2os[m] masush5[m] gnustomp[m] swick[m] msizanoen[m] tomba BilalElmoussaoui[m] Mershl[m] aura[m] YHNdnzj[moz] Eighth_Doctor Vin[m] MayeulC ids1024[m] go4godvin EricCurtin[m]
13:29 snaibc: cmeissl[m] Tooniis[m] YaLTeR[m] nielsdg ram15[m] tintou enick_991 jenatali sknebel q4a dantob koike Anson[m] nyorain[m] kelbaz[m] aradhya7[m] nick1343[m] isinyaaa[m] jtatz[m] kusma Newbyte zzxyb[m] Quinten[m] xerpi[m] Vanfanel gallo[m] yshui` ajhalaney[m] samueldr tomeu JosExpsito[m] dabrain34[m]1 Hazematman kallisti5[m] viciouss[m] sergi1 vidal72[m] znullptr[m] KunalAgarwal[m][m] c
13:29 snaibc: wfitzgerald[m] AlexisHernndezGuzmn[m] pp123[m] Targetball[m] AlaaEmad[m] kunal_10185[m] zzoon_OOO_till_03_Oct[m] krh ogabbay vignesh benettig ernstp tfiga SanchayanMaity vgpu-arthur pundir norris NishanthMenon linyaa kathleen_ tchar ddavenport_ jhugo appusony____ naseer__ angular_mike______ _alice lvrp16 jluthra haasn lileo pendingchaos seanpaul hwentlan_ cheako sskras jstultz cwabbott TimurTabi
13:29 snaibc: vaishali ezequielg khilman mvchtz enick_185 Wallbraker tak2hu[m] zamundaaa[m] dcbaker devarsht[m] robertmader[m] Sumera[m] x512[m] heftig FloGrauper[m] jeeeun841351 halfline[m] ohadsharabi[m] K0bin[m] undvasistas[m] sigmoidfunc[m] tleydxdy ttayar[m] gdevi reactormonk[m] dhirschfeld2[m] bubblethink[m] daniliberman[m] jasuarez kunal10710[m] knr urja bylaws egalli MotiH[m] onox[m] ella-0[m]
13:29 snaibc: naheemsays[m] talcohen[m] Ella[m] nicofee[m] DUOLabs[m] nekit[m] fkassabri[m] ofirbitt[m] exp80[m] vjaquez ishitatsuyuki tlwoerner MoeIcenowy _whitelogger UndeadLeech kurufu _isinyaaa bcheng airlied Frogging101 jolan gfxstrand KungFuJesus wens Shibe rossy graphitemaster narmstrong rcombs kisak SolarAquarion greaser|q gabertron Lightsword clever jrayhawk schaeffer thaytan smaeul JPEW bwidawsk
13:29 snaibc: robclark nirmoy nicolejadeyee jbarnes caseif_ _lemes codingkoopa32 robink sh-zam Simonx22 radii_ siqueira MTCoster demarchi andrey-konovalov kallisti5 sumits ayaka melissawen cmarcelo ManMower abhinav__ lumag quantum5 jessica_24 jljusten ZeZu hays JTL flto anujp jhli swivel phire gerddie6 paulk adavy cazzacarna pq milek7 lplc padovan Omax kbingham bbhtt Rayyan moony shoragan mceier turol APic
13:29 snaibc: aissen tjaalton marex yoslin Prf_Jakob dakr opotin65 BobBeck tango_ mriesch evadot_ gerddie Lynne FLHerne neobrain mstoeckl skinkie Adrinael jadahl mort_ tnt sigmaris_ Venemo lanodan calebccff ccr robertfoss ds` iokill mlankhorst Plagman llyyr yang3 mmind00 dj-death Koniiiik sven a1batross wv rawoul vapid haagch BobBeck9 aleasto sebastiencs Mary klounge kgz LaserEyess lucaceresoli hakzsam
13:29 kallisti5[m]: nods
13:29 kennylevinsen: has this kind of spam strategy *ever* worked to justify the scripting cost?
13:29 mattst88: can someone give me ops here?
13:29 karolherbst: same
13:29 kallisti5[m]: I think the problem is the cost is 0
13:29 ndufresne: don't you want to watch the super bowl now ?
13:30 zmike: it's not even the right season
13:31 tlwoerner: maybe they're advocating for more rust usage?
13:31 kennylevinsen: I imagine I wouldn't want to watch it over IRC regardless
13:31 kennylevinsen: kallisti5[m]: I mean, it took time away from improving their spam email techniques, no?
13:31 hch12907: _sigh_
13:31 heftig: could also be a false flag attack to get angry people to join that channel
13:31 kallisti5[m]: also, by the responses here... it's working
13:31 kallisti5[m]: I should start asking that channel for mesa support
13:31 hch12907: not if we don't care about the message at all
13:31 hch12907: (other than being tagged)
13:31 kallisti5[m]: Your time isn't valuable if you're worthless 🤗
13:31 kennylevinsen: kallisti5[m]: that is a *great* idea
13:32 pixelcluster: smh not even their ascii art works
13:32 zmike: oh I get it
13:33 zmike: it's superb owl
13:33 zmike: not superbowl
13:33 kennylevinsen: 00000000 23 73 75 70 65 e2 81 a0 72 62 6f e2 80 8b 77 6c |#supe...rbo...wl|
13:33 kennylevinsen: unicode fun?
13:34 simon-perretta-img: Padding to get around word/pattern filters I guess
13:34 pixelcluster: an attempt at irc color codes maybe?
13:35 Newbyte: I see colour here in my Matrix client
13:52 hashar: e2 81 a0 is unicode word joiner, it is not visible and merely an indication to prevent line breaking at that point
14:05 karolherbst: zmike: soo.. it looks like in lvp 1Darray, 2Darray and 3D images are kinda broken, or at least I have crashes in JIT code... but the nir llvmpipe generates is also... "interesting"
14:05 karolherbst: 32x4 %44 = (float32)txl %29 (0x3, 0x1, 0x0) (texture_handle), %43 (0x3, 0x0, 0x0) (sampler_handle), %42 (coord), %0 (0.000000) (lod), 0 (texture), 0 (sampler)
14:06 karolherbst: in case you have any ideas
14:08 karolherbst: uhhh
14:08 karolherbst: I think I know what's up
14:08 karolherbst: yo.. pain
14:08 karolherbst: it's unnormalized coordinates
14:09 karolherbst: are unnormalized coordinates an optional vulkan feature or something?
14:09 karolherbst: though the vvl should have complained
14:10 karolherbst: maybe some weird lowering somewhere messing it up
14:24 karolherbst: same issue on radv
14:24 karolherbst: nice
14:34 Ristovski: It seems like the spam was actually a 1000IQ move to revive this channel
14:34 karolherbst: that must be it
14:34 Ristovski: joke was on us all along
14:36 austriancoder: karolherbst: In an old MR I found this commit: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18986/diffs?commit_id=1d3f6fa7d68471e3d7b6d2868bd9beb0aae95181 do still plan to do it in the frontend or should that be done in the backend? I am happy with everything ... just want to know
14:37 karolherbst: austriancoder: good question. I think it might make sense for drivers to report what sys vals they want to see lowered in the frontend
14:37 karolherbst: a lot of drivers are basically lowering them via push constants or uniforms or other similar things...
14:38 karolherbst: and every driver lowers some
14:38 karolherbst: so we could just make it a frontends problem in the future, but I also don't know if anybody actually cares enough to make that change
14:38 karolherbst: or if driver want to be in control to do something better than just putting those into ubos
14:39 austriancoder: so leave it to the driver .. fine
14:39 karolherbst: yeah... how much work would that be?
14:39 austriancoder: 5-10 minutes
14:39 karolherbst: though I'm really considering, because I might need this for non uniform work groups anyway
14:39 karolherbst: ahh
14:39 karolherbst: yeah, that's fine then
14:40 austriancoder: an other question
14:40 austriancoder: is there a spirv to nir cache somewhere used by rusticl that I need to enable? It takes about 7-8 seconds until I see the first nir_print_shader(..) outputs on my dut.
14:40 karolherbst: it uses the driver cache
14:41 karolherbst: the 7-8 seconds startup time is for converting libclc to nir
14:41 karolherbst: it's like a 2MiB spirv
14:42 karolherbst: rusticl also has its own cache for some CLC to spirv stuff I think? But anyway, it uses the driver cache for driver specific binaries
14:44 austriancoder: ah okay .. I have disabled the driver cache as I want it most of the time disabled and I was to lazy to setup my env with MESA_SHADER_CACHE_DISABLE
14:44 karolherbst: yeah.... I'm wondering if I want to make the libclc -> nir thing driver independent, but...
14:45 karolherbst: I tihnk the reason is, that we need the drivers compile options here
14:46 karolherbst: everything OpenCL to SPIR-V gets cached by rusticls internal cache
15:11 mareko: MrCooper: yes, advertising all possible multiples of the pitch would be necessary up to a certain number e.g. 1024, isn't it how it's suppposed to work? or do you want the kernel, mesa, and other drivers to each have their own pitch alignment query?
15:12 MrCooper: that'd be a long list of linear modifiers; the idea would be some kind of other mechanism which better fits this kind of constraint
15:14 zmike: karolherbst: huh
15:15 mareko: MrCooper: that would require a change in every user of modifiers
15:15 karolherbst: zmike: yeah.. I have no idea what's causing it.. the nir looks fine
15:16 zmike: I feel like I've asked this before but is there a reason you can't use the same lowering for unnormalized that every other state tracker uses
15:17 karolherbst: because hardware actually supports it?
15:17 karolherbst: and I don't really want to do shader variants just for that
15:18 karolherbst: it's unknown at compile time, so that would be a huge pain and everything
15:19 karolherbst: anyway.. it works with all the drivers, just radv/lvp are kinda.. either broken there or something else is going on, but anv is entirely fine here
15:20 karolherbst: mhh.. even if all coords are 0 it defaults in lvp...
15:20 karolherbst: *segfaults
15:21 karolherbst: uhhh actually...
15:22 karolherbst: yeah.. no idea what's up there
15:22 karolherbst: probably some weirdo driver bug
15:24 pekkari: can I have your thoughts in the following kasan report? http://paste.debian.net/1293407/
15:24 pekkari: it seems a legit bug, and more or less easy to reproduce in my vm, but I fail to find where update_plane could free a crtc structure
15:29 karolherbst: fixed it for lvp.. uhhh
15:32 karolherbst: zmike: anyway.. it's a lvp driver bug.. I just fixed it, and I expect the same to be true for radv
15:33 zmike: 🤔
15:33 zmike: seems like this should be tested by vkcts if it's a driver bug
15:33 karolherbst: something with the JIT
15:33 karolherbst: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/49d5fe694edf960e80cf3f47d448216ff4f7cc6e
15:33 karolherbst: this was changed like 4 months ago
15:34 karolherbst: maybe a regression
15:34 karolherbst: maybe not
15:34 karolherbst: but maybe nobody cares/notices if it's not 1D/2D
15:34 zmike: huh
15:35 zmike: oh okay
15:35 zmike: you're violating spec
15:35 zmike: but there's no explicit VU
15:35 karolherbst: I am?
15:35 zmike: When unnormalizedCoordinates is VK_TRUE, images the sampler is used with in the shader have the following requirements:
15:35 zmike: The viewType must be either VK_IMAGE_VIEW_TYPE_1D or VK_IMAGE_VIEW_TYPE_2D.
15:35 zmike: The image view must have a single layer and a single mip level.
15:35 karolherbst: pain
15:35 zmike: but there's no VU
15:36 zmike: so I'll raise and issue and see what happens
15:36 karolherbst: yeah.. I kinda need it for all image types
15:36 zmike: most likely any solution here would involve a new extension
15:36 karolherbst: but the vvl didn't complain
15:36 zmike: or maybe a maintenance extension as a shortcut
15:36 zmike: yeah like I said there's no VU for it
15:36 karolherbst: ahh
15:36 zmike: I expect that to be resolved within the next week or two
15:36 karolherbst: cool
15:37 karolherbst: okay.. so anything non 1D/2D we'd have to lower it
15:43 karolherbst: I'm inclined to not lower it and wait until anybody actually files a bug requiring it... or I'll write a CL specific vulkan extension to allow this kinda of nonsense :D but probably the lesser evil than shader variants, given there is hardware supporting it
15:43 karolherbst: I wonder what clvk is doing here...
15:44 zmike: given that approximately zero people will be using rusticl+zink seriously for the next year+ I'd say just go with it for now
15:44 zmike: leave a ticket open in the tracker
15:44 karolherbst: we'd only have to care if we want to file for conformance
15:44 karolherbst: probably
15:44 zmike: they care about validation errors?
15:44 zmike: in that case file now since there's no errors
15:45 zmike: :P
15:45 karolherbst: good question
15:46 karolherbst: "For layered implementations, for each OS, there must be Successful Submissions using at least two (if
15:46 karolherbst: available) independent implementations of the underlying API from different vendors (if available), at
15:46 karolherbst: least one of which must be hardware accelerated." is all I can find
15:47 zmike: so you're good
15:47 karolherbst: if radv and anv count as "independent implementations of the underlying API from different vendor"
15:47 karolherbst: because they sure don't sound like independent to me :D
15:47 zmike: dunno
15:48 zmike: not sure anyone's ever hit this before
15:48 karolherbst: yeah...
15:48 karolherbst: I could probably ask
15:48 karolherbst: zink-lvp: Pass 2403 Fails 49 Crashes 2
15:51 karolherbst: still want to get luxmark to run on radv without crashing the GPU :D
15:51 karolherbst: probably another bug lurking somewhere
16:01 karolherbst: funky.. the rendering is broken on anv :D... I guess there is a real bug somewhere then
16:03 karolherbst: not even texture related...
19:24 mareko: karolherbst: the fix for radeonsi compute-only contexts is in main, FYI
21:04 karolherbst: yeah, I've created https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25466 now
21:08 robclark: pekkari: see 4e076c73e4f6e90816b30fcd4a0d7ab365087255
21:14 mareko: karolherbst: it also allows compute jobs to run in parallel with gfx and increases the GPU hang timeout from 10s to 60s
21:15 karolherbst: ohh, so radeonsi allocates a compute queue thing then?
21:15 mareko: karolherbst: yes
21:15 karolherbst: okay, that's going to be useful indeed
21:16 karolherbst: could we also enable hmm/mmu_notifiers with then to support SVM? Or would that require more work?
21:16 mareko: karolherbst: I'm not familiar with that
21:17 karolherbst: it's about replayable page faults
21:17 airlied: I think that's more work
21:17 karolherbst: okay
21:17 airlied: not sure if that stuff is only exposed via amdkfd at the moment
21:17 mareko: karolherbst: RDNA can't do that anyway, not very well at least
21:17 karolherbst: ahh
21:17 karolherbst: okay
21:17 airlied: oh yeah it's also limited on non-compute gpus
21:18 karolherbst: I also have a userptr based SVM implementation I need to finish, but that doesn't require kernel support, just the driver to manage the VM
21:18 karolherbst: on the mesa side I mean
21:21 mareko: I'm sure KFD doesn't support and won't support replayble page faults on RDNA due to hw limitations
21:21 airlied: yeah that would be difficult :-)
21:21 airlied: one more reason CUDA will continue to eat people's lunch :-P
21:22 mareko: it's also not a good idea to have that complexity in a gaming GPU in general
21:22 karolherbst: well.. nvidia has it on all GPUs
21:22 karolherbst: since pascal
21:22 mareko: do they have a TLB in every SM?
21:22 karolherbst: it's only for compute tho
21:23 karolherbst: mhh.. good question
21:24 airlied: mareko: it is if you want to beat NVIDIA and provide a consistent behaviour across devices :-P
21:24 mareko: you can either have a TLB per SM or higher in the hierarchy
21:24 airlied: I understand it might not be easy or trivial to get working :-)
21:24 mareko: e.g. 128 SM is 128 TLBs vs 16 TLBs per memory channel
21:24 karolherbst: I think on nvidia it's one level higher than SMs
21:25 karolherbst: they have pairs of SMs
21:25 karolherbst: called TPC
21:26 mareko: getting it working isn't the issue, putting it in the chip and increasing power, cost, and latencies is
21:26 karolherbst: but I think the HPC gpus are a bit special there, just not in terms of functionality
21:28 karolherbst: but anyway.. no idea about the specifics of the TLB arengement here
21:34 DemiMarie: karolherbst: could there be a kernel command line option to ensure that compute and gfx are serialized? I don’t want gfx crashes killing long running compute workloads.
21:34 karolherbst: depends on the driver I suspect
21:35 DemiMarie: The correct fix is of course to be able to kill individual crashed workloads without taking unrelated workloads with them, but I don’t know which consumer GPUs support that (and do not care about datacenter GPUs)
21:35 karolherbst: on nvidia there isn't really a strict separation between compute/gfx and you can't really run it in parallel anyway
21:36 DemiMarie: AGX apparently allows for running multiple contexts in parallel, as shown by faults in one job killing others
21:37 DemiMarie: Which to me is just a flat out bug in the firmware and/or hardware — the GPU should provide the same amount of isolation as the CPU.
21:37 DemiMarie: It wouldn’t help for Apple, but hopefully a future Vulkan spec requires this level of isolation.
21:37 karolherbst: I doubt you'd get enough people on the WG to sign up on it
21:38 DemiMarie: why?
21:38 karolherbst: sounds like a lot of work for everybody :P
21:39 karolherbst: also.. how would you be able to ensure this
21:39 karolherbst: would have to write a CTS tests which tests this
21:41 mareko: DemiMarie: amdgpu only kills shaders of one process and only if that fails, it resets everything
21:41 DemiMarie: mareko: when can that fail?
21:41 mareko: DemiMarie: any situation that can't be resolved by killins shaders, such as rasterizer hang
21:42 DemiMarie: karolherbst: add a new SPIR-V intrinsic that deliberately crashes the shader, implemented on Mesa as an illegal instruction
21:42 DemiMarie: mareko: when can the rasterizer hang?
21:42 karolherbst: DemiMarie: not the same
21:42 mareko: if the driver is bad
21:42 DemiMarie: mareko: kernel or userspace driver?
21:42 mareko: either
21:42 mareko: or vulkan app
21:42 DemiMarie: I assume that having the kernel-mode driver prevent this would require moving too much of Mesa into the kernel?
21:43 karolherbst: it would mean validating the entire command submission
21:43 mareko: vulkan allows hanging the GPU entirely and not just shaders
21:43 karolherbst: which means you get CPU rendering speed
21:43 DemiMarie: karolherbst: why is command validation so slow?
21:43 karolherbst: seriously, you can't validate that userspace won't be able to crash the GPU
21:43 karolherbst: DemiMarie: because you'd have to execute the sahder on the CPU
21:44 karolherbst: to check it doens't do anything bad, like a null pointer access
21:44 DemiMarie: karolherbst: then it is a hardware bug that the GPU cannot recover from any crashes
21:44 karolherbst: or passes a null/invalid buffer where it shouldn't
21:44 ccr: :P
21:44 karolherbst: well...
21:44 karolherbst: yeah, but we have that hardware
21:44 DemiMarie: karolherbst: so obviously I am very much missing something that is presumably obvious to everyone else here, but to me this just seems like GPU hardware is horrible
21:45 karolherbst: well
21:45 mareko: there is specialized hw that has better QoS like CDNA
21:45 karolherbst: let me put it this way
21:45 karolherbst: why can't the kernel verify an userspace application won't access memory OOB before running it?
21:45 karolherbst: smae thing
21:45 DemiMarie: mareko: not helpful for any of my use-cases
21:45 karolherbst: why is it okay for the CPU do allow such applications, but not for the GPU?
21:46 DemiMarie: karolherbst: my point is that a faulting application’s impact _should_ be limited to that single application
21:46 DemiMarie: and on the CPU, it is
21:46 ccr: the halting problem called and wants its Turing machine back
21:46 karolherbst: DemiMarie: ohh sure,.. in the perfect world GPU contexts are completely isolated and you just need to reap that context
21:46 karolherbst: but sometimes hardware can't do it, or there are bugs
21:46 mareko: you can buy 1 GPU per process :D
21:47 DemiMarie: karolherbst: why can’t the hardware do it?
21:47 karolherbst: because it can't
21:47 HdkR: Expose SR-IOV on consumer class chips, SR-IOV every process :D
21:47 karolherbst: can't fix hardware after it was produced, can you
21:47 mareko: buy 1 GPU per process thread
21:47 DemiMarie: no, but this kind of bug never happens on a CPU
21:47 karolherbst: I'm sure they do happen on CPUs
21:48 DemiMarie: okay, almost never happens
21:48 mareko: it's not a bug, it's a feature :)
21:48 karolherbst: but yeah.. on CPU the matter is more sensitive
21:48 karolherbst: so the vendors generally care more
21:48 karolherbst: and it's also not bad on all GPUs
21:49 DemiMarie: which GPUs is it not bad on?
21:50 HdkR: H100 because then you don't have graphics to muck things up ;)
21:50 karolherbst: good question... I want to say nvidia, but the driver is terrible in terms of recovery and it generally just crashes the entire GPU without actually recovering... no idea why, because it's not really a hardware problem, just the driver being bad here
21:50 mareko: I suggest you go in front of NVIDIA HQ and protest
21:50 karolherbst: well.. nvidia's driver that is
21:50 karolherbst: nouveau also has bugs in the recovery code and it's generally less stable if you have multiple processes crash their contex
21:51 mareko: you can buy TV ads to increase awareness
21:51 karolherbst: anyway.. it's not really considered a problem yet, so I doubt much will change, but vendors are slowly moving towards making things more robust
21:53 karolherbst: but anyway, GPUs are fully programmable hardware, and it's not feasible to reject command buffers triggering full GPU resets
21:53 DemiMarie: karolherbst: unless _no_ command buffer can trigger a GPU reset
21:53 DemiMarie: which is the case on the CPU
21:53 DemiMarie: the worst you can get is SIGSEGV/SIGABRT/etc
21:54 DemiMarie: is there a reason that this analogy does not hold?
21:54 karolherbst: well.. then you kill the process
21:54 karolherbst: and (some?) gpu drivers can just kill the context triggering those
21:54 karolherbst: it's just not perfect
21:54 DemiMarie: I guess my question is why it is not perfect.
21:54 karolherbst: ask the vendors
21:55 DemiMarie: Or is this a question that only the HW vendors can answer?
21:55 karolherbst: there are always hardware limitations on how you can recover from what type of faults.
21:55 karolherbst: GPUs area also way more complex than CPUs
21:56 mareko: DemiMarie: cost
21:56 mareko: if you are willing to pay 10x or 100x the price for a GPU, we can make it perfect
21:57 DemiMarie: karolherbst mareko: why are there these limitations? why is it so much harder on a GPU?
21:57 mareko: nobody wants to pay for such a feature
21:57 DemiMarie: Or is it just that GPUs are where CPUs were 20 years ago, because they are so much younger than CPUs are?
21:58 karolherbst: ehh no, GPUs are just more complex
21:58 DemiMarie: because of fixed function?
21:58 mareko: like seriously, we can make it perfect
21:58 karolherbst: well.. the graphic pipeline consists of many pieces
21:58 karolherbst: not just the shader
21:58 DemiMarie: mareko: why does it add such a huge overhead on a GPU but not on a CPU?
21:59 mareko: not huge overhead
21:59 mareko: just a little bit
21:59 DemiMarie: how much, ish?
22:00 mareko: the cost is in R&D, die size, and volume produced
22:00 karolherbst: doesn't really matter in the end, because as long as vendors don't care enough, we have to deal with what we got
22:00 DemiMarie: ugh
22:01 karolherbst: but they are getting better
22:01 mareko: R&D is pricy, the die size increases cost and power, and low volume increases the cost since you need to buy whole wafers and the first one is like 20 millions or so
22:01 karolherbst: 10 years ago not all GPUs had MMUs :')
22:01 karolherbst: well
22:01 kisak: There's one place where it makes sense to go through all the extra costs to be completely fault tolerant, and that's if you're sticking it in mission critical hardware for space flight, but who needs a GPU to run a spacecraft or satellite?
22:01 karolherbst: or was it 5?
22:01 karolherbst: do we still have GPUs without MMUs produced today?
22:02 DemiMarie: BTW, software fault isolation is a thing, so one can have a secure system with no MMU
22:03 karolherbst: mhhh
22:03 karolherbst: well
22:03 karolherbst: in theory
22:03 DemiMarie: but anyway, now I understand that this is not a technical problem but an economic problem
22:06 mareko: that's exactly right
22:06 Dark-Show: In general when an issue is fixed and a merge request is submitted, does the issue owner close the issue then or wait until the merge is in main? (Submitted my first issue and want to close it properly)
22:08 mareko: Dark-Show: if the fix is in main, you can close manually, but usually pushing the fix closes tickets automatically if the commit message references it
22:08 Dark-Show: Ok, I'll leave it alone then, thanks!