00:09tlwoerner: gfxstrand: let me know what email address you'd like to use, i'll add you to the mentors list, google will send you an invite, you'll need to agree to their terms, then i can add you to the 2023 roster
00:30karolherbst: "Authentication error: expired or invalid token" this gsoc website :)
01:00mohamexiety: it's impressively bad, honestly
01:00mohamexiety: keeps logging you out randomly too
01:01karolherbst: maybe somebody should submit a gsoc project to google to "write a new gsoc webpage"
01:01mohamexiety: ahahahaha yes
07:52ishitatsuyuki: if I'm changing a CI image and want to test the updated pipeline in CI, do I need to do it in the default branch of my fork?
10:16pundir: robclark: hi.. reported a mesa/main crash on db845c running AOSP with upstream kernel https://gitlab.freedesktop.org/mesa/mesa/-/issues/8776. let me know if it sounds familiar.
10:17pundir: fwiw the crash is reported with a very very old version of popular benchmarking app, not sure if it is the application which is at fault here
10:49javierm: tzimmermann: sigh, I just pushed that typos fix patch and I see you have comments. It seemed trivial enough...
10:51javierm: tzimmermann: ah, I fixed one of your comments before applying indeed
11:00tzimmermann: javierm, happens. maybe ask him to type up an update
11:03javierm: tzimmermann: it's OK. As mentioned I fixed before applying the issue you pointed out and for the other I belive a separate patch would be better anyways
11:15tzimmermann: javierm, do you remember danvet's series about fixing the vfio-aperture problems. we talked about it not too long ago. i wanted to see if this can be merged somehow
11:17javierm: tzimmermann: I do, yes. There are some patches in that series that I believe you can just merge
11:19tzimmermann: javierm, do you remember the name?
11:20javierm: tzimmermann: https://patchwork.kernel.org/project/dri-devel/list/?series=711019&archive=both
11:20tzimmermann: javierm, that's the one. thank you
11:21javierm: tzimmermann: you are welcome. Patch #1 and #2 are straightforwared and you said that even tested it
11:21javierm: so I'll just apply those
11:21javierm: *I would
11:22tzimmermann: javierm, the gma500 patch does not work
11:23javierm: tzimmermann: right, I'm reading your comments now. I was misremembering
11:25javierm: hmm, what's the solution for patch #2? I guess the driver refactor is the only way to unblock that
11:25javierm: because the two options proposed seems like a step backward to me...
11:28tzimmermann: javierm, there's quite a bit of controversy in the discussion
11:30javierm: tzimmermann: yeah, is that really needed to have something like patch https://patchwork.kernel.org/project/dri-devel/patch/20230111154112.90575-11-daniel.vetter@ffwll.ch/ though?
11:30javierm: I understand is a nice cleanup but could we keep the open code in that driver while still achieving the same that patch #11 ?
11:30tzimmermann: javierm, we should merge patches 1, 6 and 7
11:31tzimmermann: and for the rest, i have to review them again
11:33javierm: tzimmermann: agreed
11:34javierm: tzimmermann: are you asking due https://www.spinics.net/lists/linux-fbdev/msg36462.html ?
11:34javierm: I had that email in my TODO list to answer
11:35tzimmermann: javierm, kind of. the bug comes up occationally and I wanted to fix it for a while.
11:43javierm: tzimmermann: yeah, me too. I've answered to Samuel now
11:45tzimmermann: javierm, let me see if i can get something to work. daniel's solution was that we only ever kick out aperture owners on the primary device. that seems fine
11:47javierm: tzimmermann: something that I don't understand / remember, for these dual GPU machines, the aperture range is shared between PCI devices?
11:47tzimmermann: javierm, IDK. how those that work?
11:47tzimmermann: 'does'
11:50tzimmermann: javierm, IIRC the nvidia.ko modules picks up any framebuffer has been configured by efifb. even if the memory is on a different card
11:51javierm: tzimmermann: yes, I think there are two differnt issues there and are somewhat related
11:51tzimmermann: and if vfio releases the apertures of 'nvidia-hw-2', it will remove the efifb of 'nvidia-hw-1'
11:52tzimmermann: nvidia.ko never tests if the console fb is actually within its PCI BARs
11:52tzimmermann: i don't have the code here, but that's how i remember
11:52javierm: tzimmermann: I don't think is nvidia-hw-{1,2} but integrated GPU (i.e: Intel iGPU) and discrete Nvidia GPU
11:52tzimmermann: is was
11:52javierm: tzimmermann: and you can connect different display outputs to different GPUs, i.e: DVI to intel and HDMI to Nvidia
11:53tzimmermann: same problem then: nvidia-hw will pick up the efifb that operates on intel-hw. vfio then removes efifb when it forwards nvidia-hw to the guest system
11:53javierm: and when people want to use vfio as you said, it will kick out efifb but I thought that only if matched the aperture range for the booting device
11:54tzimmermann: ok, could be
11:55javierm: tzimmermann: so if you pass-through the device that owns the efifb aperture, then is expected that it will go away right?
11:58javierm: tzimmermann: because you just aperture_detach_devices(base, size), it shouldn't conflict...
11:59tzimmermann: i'm reading through the bug report: sysfb_disable() removes the efifb device unconditionally
11:59tzimmermann: even for the secondary card
11:59tzimmermann: we only want to to that on the primary one
12:00danvet: tzimmermann, yeah sorry still trying to find some bw to revisit that series
12:01danvet: if you feel like there's a subset that you can land feel free to push them
12:01danvet: and yeah part of it is that I didn't come up with a good idea for gam500 yet :-/
12:03tzimmermann: danvet, no problem. i'll pick the easy patches and try to figure out something for the rest
12:03tzimmermann: javierm, why do we call https://elixir.bootlin.com/linux/latest/source/drivers/firmware/sysfb.c#L66 in the first place?
12:04tzimmermann: it should be enough to disable sysfb
12:04javierm: tzimmermann: it was to prevent a race because otherwise "simple-framebuffer" could be registered after the native DRM driver was probed
12:04tzimmermann: oh i remember! because there might not yet be a driver on the device
12:04javierm: tzimmermann: yeah...
12:05javierm: tzimmermann, danvet: primary is always 0 for !x86 right ?
12:05tzimmermann: i think we should do what daniel did and only call sysfb_disable() for the primary device
12:05danvet: javierm, it's always 0 for !pci afaiui
12:05javierm: tzimmermann: yeah, I was typing that but then it would regress that case for !pci
12:05tzimmermann: javierm, no there are architectures with detection code
12:06tzimmermann: sparc IIRC
12:06javierm: tzimmermann: ahh!
12:06javierm: does aarch64 detect that ?
12:06danvet: tzimmermann, for gma500 I think the least ugly idea I had is just two calls, one for the pci and one for the fb range or something like that
12:06tzimmermann: javierm, search through https://elixir.bootlin.com/linux/latest/A/ident/fb_is_primary_device
12:06danvet: but not sure that gets us the same behaviour
12:06tzimmermann: danvet, i didn't liek this
12:06javierm: tzimmermann: https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/fb.h#L18
12:06danvet: javierm, I think vgaarb also does some work to keep track of the primary
12:06danvet: tzimmermann, yeah but it's gma500
12:07danvet: seems better to hide the ugly there than inflict it on common code
12:07javierm: tzimmermann: so if we have have a SoC with two GPU / display controllers, then it may be a problem :)
12:07tzimmermann: i'd just push calls to fb_is_primary device() into aperture helpers
12:07danvet: tzimmermann, isn't that more or less dropping the cleanup?
12:07javierm: danvet: it is indeed kicking out from our plate :)
12:08tzimmermann: danvet, the cleanup has been controversial AFAICT
12:08danvet: tzimmermann, I guess you could expose a __ helper just for gma500 to be able to open code whatever it needs
12:08danvet: tzimmermann, well the reason I've done it is that a lot of drivers just get it wrong
12:09javierm: tzimmermann: now I remember that I proposed a patch that was if (primary) sysfb_disable(); but with a #ifdef CONFIG_X86 guard
12:09javierm: and danvet said that it was horrible :)
12:09tzimmermann: danvet, for gma500, we'd have to read the framebuffer range from HW regs. that's all
12:09tzimmermann: the _pci aperture helper doesn't work
12:09danvet: yeah hence the two calls or something
12:09tzimmermann: so we kick out the full range
12:10javierm: tzimmermann: I think the two calls proposed by danvet is the least bad option
12:11danvet: I guess I really need to respin this thing instead of slacking on other stuff now :-)
12:11tzimmermann: why are you so eager to chage gma500?
12:11tzimmermann: change
12:11danvet: it's the dead driver
12:11danvet: keeps the special case away from new drivers
12:11danvet: fewer changes for new drivers to be screwed up or inconsistent
12:12tzimmermann: just let it kick the full range. no one cares about gma500
12:12danvet: yeah, hence 2 calls
12:12tzimmermann: it's not worse than some SoCs
12:12danvet: one call to kick the full range
12:12danvet: one call to kick the sysfb primary
12:13danvet: which might be vgacon or some horror like that
12:13danvet: and vgacon is pretty much the reason why we need to track pci primary, since that's really another word for "legacy vga decoding device"
12:13javierm: tzimmermann, danvet: btw, why we have aperture_detach_devices(VGA_FB_PHYS_BASE, VGA_FB_PHYS_SIZE) in the non-pci aperture_remove_conflicting_devices() ?
12:14javierm: I guess there are some arches with VGA that are no x86 ?
12:14tzimmermann: what's wrong with making sysfb_disable() conditional in aperture_remove_conflicting_devices() ?
12:14tzimmermann: only call it for primary devices
12:15tzimmermann: same for vgacon
12:16tzimmermann: there's also code in sparc that potentially detects primary cards that are not PCI : https://elixir.bootlin.com/linux/latest/source/arch/sparc/include/asm/fb.h#L18
12:16tzimmermann: not sure if it's related
12:16javierm: tzimmermann: the problem is that for !x86 primary is really always 0
12:16javierm: for example aarch64 just return 0 for fb_is_primary_device()
12:16tzimmermann: javierm: https://elixir.bootlin.com/linux/latest/source/arch/sparc/include/asm/fb.h#L18
12:17javierm: so if someone has for example a rockchip DRM driver built-in and simpledrm as a module then it would be a problem
12:17tzimmermann: i'm no expert for sparc, but it looks like it could be relevant
12:18javierm: tzimmermann: yeah, sparc seems to detect it but aarch64 no as mentioned
12:18javierm: we can call fb_is_primary_device() and call it a platform bug :P
12:19tzimmermann: javierm, it's 0 for all the others IIRC
12:20tzimmermann: javierm, what we maybe want is something like is_not_primary_device(), which tells us to ignore sysfb_deisbale() and vgacon
12:20javierm: tzimmermann: https://elixir.bootlin.com/linux/latest/source/arch/x86/video/fbdev.c#L14
12:20tzimmermann: any be default, we kick out everything. for non-primary devices, we are more careful
12:20javierm: probably we should use fb_is_primary_device() in aperture_remove_conflicting_pci_devices()
12:21tzimmermann: javierm, that's what i've been advocating the whole time :)
12:22javierm: tzimmermann: ah ok :)
12:22tzimmermann: we want to do the full treatment by default. and if we detect a non-primary device, we ignore sysfb and vgacon. it's just that fb_is_primary_device() gets it wrong, because it returns 0 by default
12:23tzimmermann: it should return 1 by default
12:23tzimmermann: so that 'is-primary' is the default
12:24javierm: tzimmermann: indeed
12:24tzimmermann: my bios has options to disable the ROM shadowing. would the detection still work?
12:26tzimmermann: i'm looking at this line: https://elixir.bootlin.com/linux/latest/source/arch/x86/video/fbdev.c#L34
12:27javierm: yeah, I never fully understood what the pdev->resource[PCI_ROM_RESOURCE].flags & IORESOURCE_ROM_SHADOW meant
12:28javierm: and why that was a hint about being a primary device
12:28danvet: javierm, the bios copies the vbios of the primary device into system ram
12:28danvet: or something like that
12:28danvet: and then reroutes pci decoding
12:28danvet: so that the vbios runs much faster
12:29tzimmermann: this ^
12:29tzimmermann: but if it's disabled in the bios, would the detection code fail?
12:30javierm: I suppose
12:30tzimmermann: (do i really want to find out)
12:30danvet: I think a fallback is the legacy vga decoding in vgaarb.c?
12:30danvet: at least for drm drivers
12:30danvet: ah no it checks everything
12:31danvet: nah it's or'ed
12:31danvet: either it's the primary or it's shadowed the rom
12:31danvet: or am I blind?
12:31tzimmermann: danvet, i looked at vgaarb.c . it's only used by vgaswitcheroo
12:31danvet: nah there should be two things
12:32danvet: one is the pure vgaarb
12:32danvet: most drivers don't care, because they don't need to access any of the legacy vga registers
12:32danvet: but it's also exposed to userspace
12:32danvet: so that old ums Xorg drivers which do need legacy vga can do the right thing
12:32danvet: we never ended up porting those to kms
12:32tzimmermann: danvet, maybe https://elixir.bootlin.com/linux/latest/source/drivers/pci/vgaarb.c#L546 ?
12:33danvet: or well, the patches never landed, vague memories of a drm vga modeset helper I have
12:34javierm: danvet: maybe you are thinking about this? https://elixir.bootlin.com/linux/latest/source/drivers/pci/vgaarb.c#L606
12:34danvet: tzimmermann, hm grepping for vga_get shows some in-tree users: i915, qxl, virtio
12:35danvet: tzimmermann, javierm I'm lost on what you're trying to tell me
12:36tzimmermann: this appears to be yet another rabbit hole
12:36javierm: danvet: in response to your either it's the primary or it's shadowed the rom
12:36javierm: but just ignore me :)
12:39danvet: javierm, well with primary I meant "whatever vgaarb thinks is the boot device" and then there's the shadow fb check on top
12:39danvet: I have no idea whether that makes sense
12:39danvet: maybe the shadow fb check should be moved into vgaarb?
12:40danvet: but also this is probably an orthogonal can of worms
12:40javierm: danvet: right, that's what I understood and that's why I pointed to the comment of that vga_is_boot_device() function in vgaarb
12:42danvet: javierm, fw device is another thing than rom shadowing?
12:42danvet: s/?//
12:42danvet: i.e. I'm not seeing the relevance ...
12:42danvet: well it sometimes is the fw device, but it's kinda different things
12:42javierm: danvet: I see. Got it, sorry for the confusion then
12:43javierm: thought that were basically the same and we could replace that check for primary by a is_boot_device()
12:43danvet: like rom shadowing is just a way to make the pci option rom on the vga device run faster
12:43danvet: afaiui at least
12:43danvet: which generally then installs the necessary handlers to provide the fw screen
12:43danvet: which is then hopefully captured by the boot code correctly and passed on
12:44danvet: javierm, yeah I think in theory they should always match
12:44javierm: danvet: right, but if there are bios that disables that optimization as tzimmermann said, then it isn't a great heuristic
12:44danvet: maybe it's just different heritage
12:44danvet: javierm, yeah but we don't require both, just one or the other
12:44danvet: I guess we could try deleting the shadow rom check and see whether anyone complains
12:44javierm: danvet: agreed, hence my comment to use that is_boot_device() instead of the shadow rom check
12:45danvet: and if they do, maybe better to fix up vgaarb to get that situation right
12:45danvet: javierm, well atm it uses both
12:45danvet: combined with an OR
12:45javierm: danvet: ah! I finally understood your point. So then it should be safe to just drop that rom check
12:47javierm: tzimmermann, danvet: guess the outcome of this conversation would be to realize that there are too many can of worms and just let nvidia loss its VT if doesn't register an emulated fbdev
12:47javierm: and then will discuss again in a few months and have to remember all this :)
12:47tzimmermann: :D
12:47danvet: yeah I'm trying to at least respin the patch series and add a lot more comments
12:47danvet: for the record and all that
12:47javierm: danvet: that could be great
12:50tzimmermann: danvet, there are too many different devices througout the code IMHO: primary default, default device, boot device, firmware device. it's hard to understand what they stand for
12:50tzimmermann: javierm ^
12:54danvet: tzimmermann, I'm tempted to just leave fb_is_primary_device as-is
12:54danvet: since it's really just about what the default fbcon should be
12:54danvet: unless you're thinking of cleaning up the x86 implementation to just use the vga boot device?
12:54danvet: and drop the shadow rom check
12:55javierm: tzimmermann: agree that's confusing
12:56tzimmermann: danvet, i'm not going to touch this code
12:56tzimmermann: it's too confusing for now
12:58danvet: yeah same tbh
12:58danvet: like I frankly don't really care how fbcon picks the "right" driver :-)
13:03javierm: yeah, same. It's too confusing for me and for sure if we touch it, we will break someone setup
13:16danvet: I just stumbled over the hyperv pci stub driver
13:16danvet: is this like on windows where you need dummy drivers to avoid a warning in the device manager because there's no driver for a device?
14:16danvet: tzimmermann, since I just noticed loongson driver ready for merging?
14:18tzimmermann: danvet, i wanted to take another look. i told him about a better design, but nothing has moved AFAIK. hopefully this won't backfire. but in general, this should be mergable
14:24danvet: tzimmermann, javierm has the switch to simpledrm happened? asking since I wonder whether I really should fight geert on fbdev validation or just move on to only fixing them for drm and ignoring everything else
14:25javierm: danvet: for at least fedora and opensuse (AFAIK) it has happeed for a couple of releases now
14:25tzimmermann: danvet, at least opensuse has switched
14:26javierm: I believe arch did it too
14:26tzimmermann: ubuntu has support as well IIRC
14:26danvet: cool
15:16macromorgan: so I'm having an odd issue I was hoping someone could help direct me where best to troubleshoot. When I suspend my system I get a completely blank screen when I resume. If I use modetest to test a test image with a different size plane though the screen works again after I ctrl+c out of it.
15:17macromorgan: as best I can tell with a register dump the registers themselves on the hardware are identical before and after suspend. (this is for a Rockchip system with the VOP2 driver)
15:17macromorgan: and it affects both DSI and HDMI outputs after suspend
16:47danvet: tzimmermann, javierm https://paste.debian.net/hidden/2d8ff5a3/ too much snark?
16:47danvet: I have a few more patches which just fix more obvious stuff
16:47danvet: all in drm fb helper since I'm kinda fed up arguing whether things should be check in fbmem.c or not :-)
16:51javierm: danvet: a little bit :) but what you wrote is true though and the rationale for doing the change in that layer
16:51danvet: javierm, I mean probably that x/yoffset overflow should be in fbmem.c
16:52danvet: but geert just shot down silently handling y/x/res_virtual and so I'm not in the mood to try arguing
16:53tzimmermann: danvet, looks ok AFAICT
16:53danvet: I do wonder whether strictly speaking we need an atomic_check in there
16:53danvet: because that's really what our fb_pan_display does
16:54danvet: but that's maybe a bit much work ...
16:54tzimmermann: danvet, i think we do. but no one care so far
16:54javierm: danvet: yeah, read that thread. Looks good to me too
16:54danvet: tzimmermann, I mean in fb_check_var
16:54tzimmermann: yes
16:54danvet: nah we don't do any atomic_check in there at all
16:55danvet: I guess whichever driver is the one with funny pan restrictions gets to sort this out first :-)
16:55tzimmermann: it's in some way another interface for atomic_check. but noone has implemented this yet
16:55danvet: atm we also set info->fix.x/ypanstep = 1
16:55danvet: tzimmermann, yeah ... if we ever get to supporting real modesets through fbdev then it'd likely only be for atomic drivers
16:56danvet: since you can't check for legacy drivers properly in fb_check_var
16:57javierm: danvet: btw, jfalempe has made some progress with drmlog + a panic handler: https://github.com/kdj0c/linux/commits/drmlog
16:59vsyrjala: not sure there's much point in atomic_check in check_var since you'd do that in set_par anyway when you commit
16:59danvet: vsyrjala, yeah but fbmem.c calls check_var before set_par
17:00danvet: and it expects set_par to not fail
17:00vsyrjala: pretty sure set_par is allowed to fail
17:00danvet: plus ... *drumrolls* fbdev uapi actually has check_only support
17:00danvet: vsyrjala, it results in dmesg warnings
17:00vsyrjala: ah
17:00danvet: which piss off syzcaller
17:01vsyrjala: check_only == that activate_now stuff?
17:01danvet: which is why I'm in this mess to begin with, helge added more of them to encourage drivers to fix their check_var
17:01danvet: instead of just closing the gaps and moving on
17:01danvet: some people just like fbdev too much
17:01danvet: vsyrjala, well the other one, ACTIVATE_TEST
17:02danvet: except it still updates the sw state so it's kinda hilarious
17:02vsyrjala: sure. don't recall anyone ever using that. i certainly could have used it in the past but didn't realize it was a thing
17:03danvet: I just realized my patch nukes that option
17:03danvet: but also that code looks really fishy
17:03vsyrjala: i take it back. directfb did use it. seems my memory is going
17:04danvet: ah no sw state isn't updated
17:04vsyrjala: but it used it only for verifying the mode list, not for testing actual commits
17:05danvet: yeah in kernel
17:05danvet: but couldn't userspace do funny stuff with it?
17:05vsyrjala: i mean userspace == directfb
17:05danvet: oh
17:11alyssa: khronos: dynamic state all the things!
17:11alyssa: cwabbott: i'm going to prebake your state and you're going to like it!
17:11alyssa: interesting week in vk, yes
17:25dj-death: alyssa: can I bother you with the midguard change on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853
17:25dj-death: alyssa: just not using the right MAX_COMPONENT define for the backend instructions
17:42alyssa: i love midgard changes
17:43alyssa: r-b
17:43alyssa: THAT said
17:43alyssa: I am very much uncomfortable bumping up to vec32 in NIR
17:43karolherbst: wat
17:43alyssa: this has the potential to bloat NIR's memory usage a *lot*
17:44karolherbst: yeah. I spent already enough time to make it less bad
17:44karolherbst: but uhhh
17:44zmike: just close a browser tab smh
17:45karolherbst: who thought that hw vec32 is a good idea
17:45karolherbst: heck
17:45karolherbst: who thought that even hw vec8 is a good idea?
17:45alyssa: Arm
17:45karolherbst: NAK: pls tell your hw engineers to not do stupid things
17:46karolherbst: oh well
17:46karolherbst: saying that
17:46karolherbst: we'll need vec32 but for other reasons
17:46karolherbst: sorry not sorry
17:46alyssa: why
17:46karolherbst: I am not allowed to talk about it I think
17:46alyssa: nak this is a bad idea
17:46alyssa: https://xkcd.com/2217/
17:46karolherbst: but I think I could talk to you about it if nobody else listens
17:49vsyrjala: lalala
17:49alyssa: dj-death: but, um, nak on the nir change
17:50anholt: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853#note_1853576 -- I think NIR could be fine, but I'd like a bit more evidence for it
17:50robclark: alyssa: so, vec53?
17:50alyssa: noooo
17:50alyssa: robclark: ^^
17:51alyssa: anholt: My concern is memory usage, which has knock-on effects on compile time because of malloc etc
17:51alyssa: this adds 32 bytes to every fadd instruction
17:53dj-death: actually I think the LSC unit on DG2 can do up to vec64 in some case if I'm not mistaken
17:53anholt: for the swizzle. also, nir_constant is concerning and I worry we're going to want to have a dynamically allocated values[] :(
17:53dj-death: just for the lulz
17:54alyssa: dj-death: thing is, nobody's hardware is actually doing ALU of that size
17:54alyssa: and certainly not doing unconstrained swizzles at that size
17:55alyssa: actually, even Midgard which has native i8vec16 ALU (!) has *tons* of constraints on the swizzle
17:55karolherbst: yes
17:55karolherbst: if you increase the vec size, please also reduce memory consumption
17:55karolherbst: optimize against the vec4 case
17:56jenatali: Yeah, seems like there needs to be some kind of special return type for loads that return larger datasets
17:56alyssa: karolherbst: my point is that even vec4 hardware doesn't necessarily want this
17:56robclark: swizzle could be packed (assuming no one introduces vec256)..
17:56alyssa: robclark: it's already only 8-bits per component
17:57karolherbst: yeah, but for vec4 it doesn't matter much
17:57robclark: right, but it could be.. 5b?
17:57alyssa: 10b with the new vec16 then
17:57karolherbst: robclark: sooo.... uhm... some people think about that
17:57alyssa: er
17:57alyssa: vec32
17:57robclark: but maybe the better plan is just have union { something[4]; ptr *more_than_vec4_case; }
17:57alyssa: 20 bytes?
17:58alyssa: robclark: with my scalar ISA hat on, I don't want swizzles in NIR at all
17:58robclark: anholt: 5 bits will hold 0..31
17:58robclark: err, alyssa
17:58karolherbst: mood
17:58karolherbst: yes, let's just remove the swizzle and lower it
17:58karolherbst: let backends optimize it :P
17:58alyssa: I can't tell if you're joking
17:59karolherbst: maybe we can just assume I'm not and see how far we'll get with it?
17:59alyssa: I don't think it'd be catastrophic to make that change for my compilers, only concern for Midgard is that its copyprop can't see through phis which might regress moves but I'm not super worried
17:59karolherbst: yeah sooo..
17:59alyssa: bigger issue is that it *is* a significant change for vec4 compilers and we have a bunch of those that are minimally maintained
17:59karolherbst: my hw doesn't have swizzling at all, soo...
17:59alyssa: same here for everything but midgard
17:59alyssa: well
17:59alyssa: sort of
18:00alyssa: bifrost/valhall have a 2-bit swizzle for i16vec2
18:00karolherbst: uhhh
18:00karolherbst: pain
18:00alyssa: yes
18:00alyssa: that compiler is ingesting SSA directly so if I need to copyprop harder, whatever
18:00karolherbst: yeah
18:00karolherbst: whatever
18:00alyssa: Anyway, there is a part of me that would really like to have no swizzle on the nir_alu_src
18:00karolherbst: anyway
18:01alyssa: (implicitly an identity swizzle)
18:01alyssa: (or maybe an optional broadcast?)
18:01karolherbst: there needs to be some trade. You add more components, you also reduce memory consumption :D
18:01alyssa: and a nir_op_swizzle that does vec->vec with the swizzle passed specially
18:01alyssa: IDK if that'd be ALU or intrinsic or what
18:01karolherbst: I think when I bumped wiht all the opts it was like ~5%
18:01alyssa: I guess ALU with a constant source for the swizzle itself would be fine
18:01alyssa: although would need special handling in nir_print src to not suck
18:02alyssa: Oh, wait, duh
18:02alyssa: different swizzle ALU op per vector size, maybe
18:02karolherbst: we kinda traded -10% with +15%
18:02alyssa: swizzle{2,3,4,8,16} ops taking that many constant sources, acting like a vec+extract together
18:02karolherbst: anyway
18:02karolherbst: bumping the max vec size has a huge impact
18:03alyssa: on scalar hardware, swizzle ops would never be seen since the vectors get lowered
18:03alyssa: on vector hardware, swizzle ops translate to a vector move
18:03anholt: alyssa: texture ops (for example) return vectors, you'd still need swizzle ops to access their components.
18:03alyssa: i.e. "foo = swizzle4 bar, 3, 2, 1, 0" translates to "foo = mov bar.wzyx"
18:04karolherbst: why?
18:04jenatali: FWIW, in DXIL, which is a scalar SSA, when an instruction needs to return multiple values (e.g. a load of 4 values at once), it returns a special struct which has extract functions to retrieve the relevant scalars. Seems something like that (a return value that's multiple max-size vectors) with extract helpers would work
18:04karolherbst: there is nothing that says that you need swizzle ops for tex results
18:04karolherbst: that's a hw detail
18:04alyssa: anholt: right, you need a swizzle1 ("extract"), also just a move
18:04alyssa: the SSA-based RA compilers are already doing this in the backend
18:05karolherbst: yeah, that's just a nop
18:05karolherbst: on sane hw at least
18:05anholt: yeah. I was just confused by "swizzle ops would never be seen since the vectors get lowered"
18:05anholt: they wouldn't be lowered in NIR, the backend would definitely see them
18:05anholt: I'm not saying this is a terrible idea, just that it's invasive to every driver.
18:05alyssa: yeah, absolutely
18:06alyssa: this isn't really "we should do this" and more "if I were building NIR today knowing what I know now, I wouldn't have put that damn array in nir_alu_src, and now you want to make it BIGGER?"
18:06anholt: given that we haven't even eliminated nir_register from src/dsts yet, I'm skeptical of making this change
18:06alyssa: even lower hanging fruit, the abs/neg flags
18:06alyssa: none of the modern backends use them and we got so close to deleting them
18:07alyssa: but a pile of legacy hw is using them for stuff and yeah.
18:08karolherbst: I wouldn't be opposed to make evertyhing vec4+ very special in nir and reduce max components to 4 again
18:08karolherbst: or rather
18:08karolherbst: swizzles are 4 comps max or any other restrictions
18:08karolherbst: only thing you'd get above 4 is linear allocation of values
18:09karolherbst: or we just remove the swizzle and make it an alu op only thing
18:09karolherbst: and then backends deal with it themselves
18:09jenatali: That might work... CL stuff might get tricky though
18:10jenatali: I don't remember if they can do swizzles on arbitrary ops though or if they end up as dedicated mov instructions
18:10karolherbst: doesn't really matter
18:10alyssa: yeah, how does vec16 spirv look like?
18:10jenatali: The same as vec4
18:11karolherbst: if we remove swizzles from the IR and just do vec4 ssa_1 = nir_swizzle(ssa_0, 0x5) whatever, then backends can inline it in the consumer or something
18:11jenatali: Fair
18:11alyssa: karolherbst: that's my above "wishful thinking, would take months of rewriting everybody's backends to land"
18:11karolherbst: yeah...
18:11alyssa: and I agree with anholt, if we can't even remove abs/neg and nir_register, no way we can do that.
18:11karolherbst: make it a req to land vec32 🙃
18:12cwabbott: gfxstrand: there are a bunch of common changes in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22301
18:12karolherbst: but yeah.. without a proper analysis of the increase in peak memory consumption I don't think this should land at all
18:12cwabbott: hopefully it's interesting food for thought on how to pre-bake things on HW that has some state entanglement
18:16alyssa: aside: if someone could remove abs/neg from nir it would make me happy ;p
18:18karolherbst: also
18:18karolherbst: I shouldn't have made it simple to bump the max vec size 🙃
18:18alyssa: karolherbst: I forgive you
18:18karolherbst: thanks, it means a lot to me
18:20DemiMarie: karolherbst alyssa: is NIR a bunch of heap-allocated linked nodes or a compact array?
18:20alyssa: I don't really remember why I gave up on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4825
18:20alyssa: Not wanting to rewrite a bunch of other drivers I guess
18:20dj-death: there is no other example of loading loads for data and accessing it in NIR?
18:21karolherbst: DemiMarie: kinda both
18:21dj-death: that wouldn't be exposed to ALUs
18:21alyssa: the "needs rewrite" list: ntt, etnaviv, a2xx, lima, r600/sfn
18:21alyssa: mostly (all?) vec4 hw, spicy.
18:21karolherbst: if codegen needs a rewrite to land some clean ups, please ping me hard
18:22alyssa: codegen is fine I think
18:22alyssa: just those 5 backends
18:22karolherbst: nice
18:22alyssa: I have none of the affected hardware :d
18:23alyssa: hopefully there's drm-shims.
18:27alyssa: oh and lima is two compilers, right
18:29alyssa: seemingly a2xx backend copyprop already handles fabs/fneg propagation
18:30alyssa: r600/sfn doesnt
18:30jenatali: Lima devices look out of disk space? https://gitlab.freedesktop.org/mesa/mesa/-/jobs/39329881
18:34jenatali: enunes: ^^ IIRC lima contact is you?
18:42enunes[m]: jenatali yes, I'm away for a few days though, David Heidelberg should have disabled it for now
18:47jenatali: Right, I missed that it was disabled, the MR I was looking at just hadn't been rebased, thanks!
19:31danvet: robclark, I've done a quick diff for the fence deadline modeset regression
19:31danvet: do you want to take over or should I like bake this into a patch if it turns out correct?
19:31danvet: well maybe two patches
19:33gfxstrand:reads backlog
19:33gfxstrand: alyssa said I might have opinions...
19:35karolherbst: yes, tldr: vec32
19:39gfxstrand: Ok, it's about what I thought. 32-wide uniform SEND messages.
19:42gfxstrand: On the one hand, vec32 makes total sense in that context. On the other hand, I really don't want to add another 16B to nir_alu_src.
19:42karolherbst: we need a more general solution for wide vectors I think
19:43karolherbst: or ehh
19:43karolherbst: a different solution
19:43gfxstrand: here's a crazy thought... We could make nir_alu_src::swizzle a pointer and just allocate all the memory for it at the end of the instruction but only allocate as much as needed.
19:43karolherbst: Maybe we just figure out what's the max component hw can swizzle and above that we just manage it differently?
19:43karolherbst: mhhh
19:44gfxstrand: Then the only things to get 32B of swizzle would be things that actually need it which is pretty much nothing.
19:44gfxstrand: It'd make copying ALU sources around a giant PITA, though. :-/
19:44karolherbst: wouldn't it make nir_alu_instr a giant pita generally?
19:45gfxstrand: Yeah, but the sources is where it'd really hurt
19:45gfxstrand: IDK that we copy them around all that much today but we do some
19:45karolherbst: what I mean is.. we can't make nir_alu_src variable in length without rewriting how sources of nir_alu_instr are accessed
19:46karolherbst: but yeah...
19:46gfxstrand: Did you read what I typed?
19:47gfxstrand: It would make it work for 95% of cases. We'd just have to fix up the few times that we copy whole nir_alu_src around which isn't that many.
19:47karolherbst: ohh, right...
19:48karolherbst: I wonder if we should just dtich the swizzle from the IR
19:48gfxstrand: That's tricky
19:48karolherbst: it's alerady a huge memory consumption and you almost never need it
19:49gfxstrand: You never need it for scalar back-ends
19:49gfxstrand: We really want to keep it for vec4
19:49karolherbst: right, but even then
19:49karolherbst: I do wonder if converting swizzles to an alu instruction backends could optimize into hw swizzles is actually the solution
19:49karolherbst: or if we'd come up with something better
19:50gfxstrand: IDK that I want to trust vec4 back-ends to be smart about copy-prop and coalescing.
19:50gfxstrand: Not that smart, anyway
19:50karolherbst: the idea was.. if the source is a nir_swizzle, just use that
19:50karolherbst: but yeah.. I see your point
19:51gfxstrand: But we could probably cut the swizzle down to uint8_t[4] if we wired through a nir_op_has_swizzle()
19:51gfxstrand: Or `nir_alu_instr_has_swizzle()`
19:51gfxstrand: or something
19:52karolherbst: mhh
19:52karolherbst: here is an idea
19:52karolherbst: we just allocate all possible combination of swizzles and then point to that...
19:52karolherbst: might actually need less memory
19:53karolherbst: mhh
19:53gfxstrand:does math
19:53karolherbst: well probably not with 32 components
19:53karolherbst: but we could e.g. do it for all 4 component ones
19:53gfxstrand: Yeah, no, 32^32 is way too many. :-P
19:53karolherbst: and if somebody needs more, it's dynamically allocated
19:54karolherbst: could even share it across all shaders
19:55gfxstrand: That's basically the same pain as my "make it part of nir_alu_instr at the end" plan
19:55karolherbst: though currently we use 128 bit per alu src
19:55gfxstrand: Just even more compact
19:55karolherbst: and a pointer would be 32/64
19:55karolherbst: yeah...
19:55gfxstrand: It's not a horrible idea, it just has the same problem where we'd need to do a tricky refactor
19:55gfxstrand: But refactors can be done
19:56karolherbst: might want to keep in mind that we might have to grow that sooner or later :(
20:14gfxstrand:is contemplating adding nir_tex_instr::backend_flags
20:14gfxstrand: To go with src_type_backend0 and src_type_backend1
20:14gfxstrand: backend1 and backend2, rather
20:33robclark: danvet: half your diff is patch I've already sent.. but nerfing it for modeset also sounds like a good idea (but won't be able to look until a bit later so feel free to send a patch)
20:35danvet: robclark, oh right yours actually fixing something
20:36danvet: robclark, you've pushed it somewhere already?
20:36danvet: I guess I can take care of the needs_modeset one if I get a tested-by or so
20:36robclark: no.. I think it has some t-b's.. maybe someone could push it to drm-misc for me?
20:37danvet: robclark, tzimmermann r-b'ed it already but I guess assumed you have commit rights
20:37danvet: robclark, I'll push it
20:38DemiMarie: Sorry if there is a bad place to ask, but I don’t know of a better one! Why can’t implicit derivatives be in non-uniform control flow?
20:39gfxstrand: Helper pixels.
20:39robclark: thx
20:40DemiMarie: gfxstrand: what do you mean?
20:40gfxstrand: The way implicit (or indeed explicit) derivatives work is that hte hardware is dispatched in blocks of 2x2 pixels, usually tightly packed within a wave. The dFdx/dFdy as well as texture ops with implicit derivatives look at the other pixels in that 2x2 to compute the derivative.
20:40danvet: robclark, kinda one reason I don't like topic pulls, it means I somehow also end up being responsible for applying the fixes :-P
20:41gfxstrand: If a triangle only covers some of the pixels in any give 2x2, all 4 are dispatched and the inputs are interpolated to exend past the edge of the triangles into those extra pixels.
20:41gfxstrand: These extra pixels are called "helper" pixels and their corresponding fragment shader invocations are called helper invocations.
20:41gfxstrand: Now, back to your original question...
20:42gfxstrand: If a dFdx, dFdy, or texture op with implicit derivatives is executed in non-uniform control flow, we have nothing guaranteeing that all 4 pixels within the 2x2 group actually reach that operation. If, say, only 3 of them do then the 4th will have garbage data and the derivative will be no good.
20:42gfxstrand: Uniform control-flow guarantees that all the data we need is there.
20:43robclark: danvet: if you want I can send PR for fixes.. but hopefully there is only one or two
20:43gfxstrand: Of course, uniform is way more strict than we actulaly need. Typically, what you really need is just 2x2-subspan-uniform but all these specs were written long before we had that specific language in the spec.
20:44gfxstrand: I made an attempt a few years ago to loosen things up a bit so that 2x2-uniform was all that was required but ran into problems. It's been a while so I don't remember which hardware struggled with it.
20:45danvet: robclark, then I'm still responsible for applying them if it's just a pr with a single patch
20:49DemiMarie: gfxstrand: Would an accurate summary be “if not all of the pixels compute the implicit derivative at the same time, the ones to arrive first will get garbage data from those that have not arrived yet, and the pixels that come later might see data that has been clobbered”?
20:50robclark: danvet: I guess you get to apply it in some form or another regardless ;-)
20:50jenatali: "Yet" isn't necessarily accurate, other pixels might not ever arrive if the non-uniform control flow is a branch
20:51gfxstrand: DemiMarie: Some may not get there ever
20:51danvet: robclark, well the big integration pulls are kinda unavoidable, but I don't mean those
20:51DemiMarie: gfxstrand jenatali: true!
20:53Lynne: any advice on debugging a piece of shader code that keeps crashing GPUs and llvmpipe (it's a black box even with debug info on)?
20:53Lynne: the exact same piece of code runs fine elsewhere
20:53Lynne: just not in the spot I'd like it to run in
20:54Lynne: gpu assisted validation is out, I use descriptor buffers + bda
20:54Lynne: (running any descriptor code with validation just hangs the GPU currently)
21:04zmike: is it a descriptor issue?
21:15Lynne: no, my descriptors are fine, I've tried both BDA + ssbo, both crash
21:16dj-death: Lynne: what driver ?
21:16Lynne: it's a parallel prefix sum shader I'm running, if I remove it, the code runs fine
21:17Lynne: RADV and lavapipe (this one just segfaults)
21:17zmike: does lavapipe segv inside llvm?
21:17Lynne: possibly, I'm seeing nothing in gdb, just pointers
21:17karolherbst: jenatali: if you got some time reviewing the clc patches in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22280
21:18karolherbst: uhm.. one patch even
21:18zmike: if it's ???????? pointers then that's inside llvm
21:20zmike: which could be either descriptor issues or other types of invalid access
21:21zmike: or stuff like divide by zero
21:21zmike: I assume the shader in question uses descriptors?
21:22Lynne: no, I just give the shader a device address via push memory
21:22zmike: ah
21:22Lynne: (though I've tried descriptors, didn't help)
21:22zmike: you could try instrumenting src/gallium/auxiliary/gallivm/lp_bld_nir_soa.c with lp_build_print_value calls to bisect the runtime execution
21:23zmike: though that semi depends on having some familiarity with nir
21:24zmike: I'm heading out for the day, but if you're still having issues with it in ~12h ping me with a branch and some vague instructions and I can try hunting it down
21:24Lynne: I speak native x86 and arm64 assembly, if that helps
21:25zmike: hm maybe close enough?
21:26Lynne: how do I turn on lp_bld_nir_soa?
21:26zmike: basically around the bottom of lp_bld_nir.c in the same dir you can stick a nir_print_shader(nir,stdout); call (https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/gallium/auxiliary/gallivm/lp_bld_nir.c#L2773) which will print the shader ir
21:26zmike: and then using that you can gdb through the shader's build from that file
21:27zmike: lp_build_print_value(gallivm, "somename", LLVMValueRef) will print whatever value to stdout in all the active lanes
21:27zmike: best used with MESA_SHADER_CACHE_DISABLE=1 LP_NUM_THREADS=0
21:27zmike: this should let you determine where in the shader it's crashing pretty easily
21:29Lynne: thanks, I'll give it a go
21:29zmike: good luck
21:36Lynne: I'm printing, how do I gdb using that?
22:31anholt: dj-death: heads up, since you also have done perfetto stuff: I'm working on getting command buffer handles attached to command buffer events, and getting debug names associated with handles.
22:39zmike: Lynne: you don't gdb using it, you just keep adding prints until you've isolated the instr that's causing the crash (since you don't see any prints that you've added to it)
23:24Lynne: hmm, I was able to fix the crash in llvmpipe by grossly overallocating (expecting to read 128mib, had to allocate a gig to keep it happy)
23:25Lynne: output looks correct though, so I wonder why it's overreading that hugely
23:25Lynne: radv still crashes, even if allocating 8gib
23:26Lynne: the device address has 0x8xxxx... as the top bit, surely it's not taking that literally and accessing petabytes of memory
23:28zmike: could be stride issue
23:29zmike: you can print the loaded offsets from gallivm
23:50Lynne: debugPrintfEXT reveals my push data is junk
23:51Lynne: but then how are the addresses fine if they come after the stride (also in the push data)?