IRC Logs of #dri-devel on irc.freenode.net for 2023-04-17

07:36 hakzsam: FYI, I'm going to stress the vkcts-navi21-valve job this week because a regression has been introduced last week and it randomly hangs now.
07:38 javierm: tzimmermann: hi, when you have some time could you please take a look to https://lists.freedesktop.org/archives/dri-devel/2023-April/399428.html ?
07:38 javierm: tzimmermann: I tried to fix with https://lists.freedesktop.org/archives/dri-devel/2023-April/399963.html but the reporter said that didn't fix it
07:39 javierm: tzimmermann: so then proposed https://lists.freedesktop.org/archives/dri-devel/2023-April/400088.html and that works for him. But don't know what info from FW can be trusted and what not
07:39 tzimmermann: javierm, hi. i'm just back from vacation. could take a bit
07:41 javierm: tzimmermann: yeah I figured. I'm summarizing the threads since there were many mails there
07:41 eric_engestrom: hakzsam: thanks! fyi I've hit this in 3 out of 3 runs of `vkcts-navi21-valve 3/3` (https://gitlab.freedesktop.org/mesa/mesa/-/jobs/40079004), while the other 2 worked on the first try; perhaps that can help narrow it down
07:42 hakzsam: yeah, we are working on
07:50 tzimmermann: javierm, looking at it
07:52 javierm: tzimmermann: thanks, no rush
08:02 tzimmermann: javierm, thanks your looking at this bug report. i think we should try that fixed version you posted
08:07 tzimmermann: err, it was posted by pierre
08:10 javierm: tzimmermann: the max(max(max... ?
08:10 javierm: that's horrible IMO :)
08:11 tzimmermann: it is :) but it fixes the problem and is clear once you got it
08:11 tzimmermann: let me type a reply
08:11 javierm: tzimmermann: but if you think that's the correct version, I won't going to argue since you looked at this issue more than me
08:16 tzimmermann: javierm, it's been a while
08:19 javierm: tzimmermann: I think the question is what can be trusted and what can't
08:19 tzimmermann: the proposed fix is closer to the original code. IIRC i discarded lfb_depth because it didn't work with cases of 15-bit rgb. this cases used bpp=16 and one of the tests failed. adding it back in this max3() statement should still work
08:20 javierm: tzimmermann: yeah, you explained why lfb_depth can't be trusted and it seems there are BIOS that don't report some channels (i.e: for xRGB the filler bits)
08:20 javierm: but Pierre patch is still assuming that either of those can be trusted
08:21 javierm: while my patch assumes that lfb_linelength and lfb_width can be trusted (we are relying on those anyways)
08:22 javierm: tzimmermann: if we go with Pierre's patch, then I think that we should also apply my v1 (recalculate lfb_linelength if BPP is calculated)
08:22 javierm: in other words, we should either trust lfb_linelength or recalculate it
08:22 javierm: if we trust it, then can be used to calculate the BPP or if we don't, we shouldn't use it as provided
08:24 tzimmermann: javierm, IDK which information is trustworthy. i'll try to send pierre's patch through a number of systems
08:24 tzimmermann: javierm, i think your patch failed at the test at : https://elixir.bootlin.com/linux/v6.3-rc1/source/drivers/firmware/sysfb_simplefb.c#L73
08:24 javierm: tzimmermann: can you also do it with my patch? Because otherwise doesn't make sense to me
08:24 javierm: with my v1 I mean
08:25 tzimmermann: whihc one is v1?
08:25 javierm: tzimmermann: https://lists.freedesktop.org/archives/dri-devel/2023-April/399963.html
08:26 tzimmermann: javierm, yeah. i assume that your patch still fails at https://elixir.bootlin.com/linux/v6.3-rc1/source/drivers/firmware/sysfb_simplefb.c#L73 .
08:26 javierm: the commit message isn't accurate because wasn't the cause after all, but we are calculating BPP and then blindly use lfb_linelength to create the I/O resource
08:26 javierm: tzimmermann: yeah, patch v2 worked for Pierre though
08:27 javierm: because BPP is calculated from lfb_linelength and lfb_width
08:27 tzimmermann: and then it still picks and incorrect format for the sysfb framebuffer. hence, there's a sysfb but with the wrong colroformet
08:28 javierm: tzimmermann: correct, but the stride matches the (wrong color format BPP) at least
08:28 javierm: that's why I think we need both to prevent the calculated BPP and reported stride to not match
08:29 javierm: tzimmermann: let me answer in the thread
08:29 tzimmermann: javierm, i cannot follow 100%, but the stride has no effect on the internal pixel format: if you need xrgb8888, but select rgb888, each pixel will still look wrong. stride only affects the overall line length
08:30 javierm: tzimmermann: I know but if you pick a wrong color format (i.e: rgb8888) then the line lenght will be bigger than format * resolution
08:31 javierm: it should match, regardless if was picked correctly or not (of course ideally correct but the selected pixel format and stride should match)
08:40 MrCooper: karolherbst: piglit/bin/cl-api-enqueue-fill-image fails an assertion in radeonsi because depth == 0
08:59 javierm: tzimmermann: answered in the thread, maybe I'm missing something silly but don't understand how we can't trust lfb_depth and then happily use lfb_linelength rather than a stride using the calculated BPP
09:02 tzimmermann: javierm, stride is an arbitrary value. it is only vaguely connected to the bpp.
09:07 javierm: tzimmermann: is used to calculate the size of I/O resource memory though: https://elixir.bootlin.com/linux/latest/source/drivers/firmware/sysfb_simplefb.c#L93
09:08 tzimmermann: because we trust it ;)
09:09 javierm: tzimmermann: ah, I see. Is not that we don't trust the lfb_depth is just that is wrong to assume that's the BPP ?
09:10 javierm: so is only a problem of format selection
09:12 javierm: IOW, always lfb_depth == (lfb_linelength * 8 / lfb_width) but the error is assuming that lfb_depth == bpp and that's why you used the color bits to calculate it ?
09:13 tzimmermann: javierm, i think format selection is the problem here. lfb_depth can be 15, but bits_per_pixel is 16. consequently the test failed at https://elixir.bootlin.com/linux/v6.2/source/drivers/firmware/sysfb_simplefb.c#L40 that's why i replaced the direct use of lfb_depth
09:17 javierm: tzimmermann: perfect, then if you trust lfb_linelength and lfb_width, you can just do: bpp = (lfb_linelength * 8 / lfb_width)
09:17 javierm: I don't see why the calculation has to be so complicated with 3 max and using the color bits
09:18 javierm: tzimmermann: since the SIMPLEFB_FORMATS array has the bpp that can be used to match the format
09:21 javierm: we are trusting those anyways to calculate the I/O resource mem size as mentioned so I don't see why can't be trusted for the BPP too. That was my rationale in v2
09:21 javierm: tzimmermann: but up to you, I'm OK with Pierre's patch too
09:24 tzimmermann: javierm, in an email reply, i've given the example of allocating 800x600, which gives a stride of 832 pixel (at 4 bpp): 832 × 4 × 8 ÷ 800 = 33,28
09:26 tzimmermann: that's 33 bits_per_pixel. that's what I meant with 'bpp and linelength are only vaguely connected'.
09:26 tzimmermann: sorry for all the mess and confusion here
09:30 javierm: tzimmermann: got it. On the contrary, thanks for the clarifications and sorry for my confusion
11:12 karolherbst: MrCooper: ohh interesting, maybe I should run piglit on radeonsi then
11:13 karolherbst: but normally that shouldn't happen :)
11:13 karolherbst: I'll check what's wrong there
12:55 MrCooper: karolherbst: to be clear, that was without your !22506
12:56 karolherbst: right, but that one doesn't fix anything really
12:56 karolherbst: just adds more validation
12:57 karolherbst: the CL spec already requires the unused dimension to be set correctly, so I'm mostly curious what happens with piglit here
12:57 karolherbst: could be also a piglit bug
12:58 MrCooper: interesting
13:01 tzimmermann: javierm, can you review some fbdev patchsets?
13:01 karolherbst: could also be that the crash is when piglit checks for the error code to be returned :)
13:02 javierm: tzimmermann: sure, I didn't because thought that all where reviewed by the drivers' maintainers
13:03 karolherbst: either way, that MR doesn't fix the problems I'm seeing with darktables. Interesting enough, I do see them also on llvmpipe and older intel gens.. so might be some weird general problem. Maybe something annoying like clamping behavior
13:03 javierm: tzimmermann: I see that Marek only provided his tested-by but not ack/review for "[PATCH 0/5] drm/exynos: Convert fbdev to DRM client"
13:03 tzimmermann: javierm, yeah, those client conversions. i did not get much response for armada: https://patchwork.freedesktop.org/series/115848/
13:04 tzimmermann: javierm, exynos is on its way into drm-next
13:04 tzimmermann: the maintainer sent the PR today
13:04 javierm: tzimmermann: ah, Ok. I'll review armada then after a meeting
13:05 tzimmermann: thanks a lot
13:12 karolherbst: uhm.. I think I just realized I pushed to the wrong drm-misc branch. If I want to get a fix in existing in 6.3-rc1, it should go through drm-misc-next-fixes, no?
13:12 karolherbst: or drm-misc-fixes?
13:13 karolherbst: or ist next-fixes for before rc1 was tagged?
14:04 eric_engestrom: am I missing something, or is there zero CI coverage of the WSI right now?
14:04 eric_engestrom: from what I can tell, not a single vulkan driver HWCI_START_WESTON or HWCI_START_XORG in its test jobs; is there another way to start them that I'm missing?
14:07 javierm: tzimmermann: .fb_destroy was the callback that was executed only when the last client closed the fbdev fb right ?
14:07 javierm: even after module removel / unregister
14:09 tzimmermann: javierm, yes, it's the cleanup after the fbdev device has been removed. it's called from framebuffer_release() IIRC
14:11 javierm: tzimmermann: yeah, just checked it. Just wanted to be sure that I reminded it correctly
14:12 javierm: tzimmermann: r-b patches #3 and #4, didn't for #1 and #2 because I see that Sui already did for those
14:13 tzimmermann: great, thank you so much. I'll leave the patchset around for a bit. maybe the maintainers still want to comment
14:13 javierm: tzimmermann: you are welcome
14:13 javierm: tzimmermann: any other patchset that you are missing a r-b or that is the last one from your series ?
14:18 tzimmermann: javierm, so let me introduce you to that 100+ patches series that i've been working on for a while....
14:19 tzimmermann: just kidding
14:20 tzimmermann: there's still a similar series for i915. jani didn't have the time to review it. but i got errors from the CI. i'd first have to look through them and see if they are related
14:24 javierm: tzimmermann: Ok
15:16 jenatali: David Heidelberg: https://github.com/microsoft/DirectX-Headers/pull/93
15:24 DavidHeidelberg[m]: jenatali: nice, thank you much ;)
15:25 DavidHeidelberg[m]: I'll try ot today
15:31 jenatali: Not sure if there's a better way to detect the use of the MinGW headers which is the real problem, 'cause there's also a Clang included in e.g. MSYS2
15:31 jenatali: There's other workarounds like changing the include order that I guess we could do instead
16:12 karolherbst: I wonder, are fixes like this valid enough to push them through fixes? https://patchwork.freedesktop.org/series/116536/ I don't think any of those actually fix bugs, they just mostly clean up bad style code or resolve undefined behavior
16:15 mripard: karolherbst: yeah, if it doesn't actually fix anything there's no real reason to make them go through fixes
16:17 mripard: take extra care with Markus Elfring patches though, he has (or used to have) very bad reputation
16:17 karolherbst: yeah
16:17 karolherbst: the patches are all fine tho
16:18 karolherbst: just dealing with &NULL->some_field things
16:18 karolherbst: mostly
16:18 karolherbst: not sure what's the reason for bad reputation, just if it's because of stuff like that, I wouldn't mind as much. Though I'd still drop Fixes tags unless people don't care either way
16:19 karolherbst: or rather, those patches just move such dereferences behind null pointer checks
16:19 karolherbst: and the other part is seq_printf -> seq_puts/putc
16:26 MrCooper: karolherbst: the bad reputation is because his patches tend to be trivial and some of them incorrect
16:28 karolherbst: not sure if that alone justifies bad rep
16:36 MrCooper: indeed, he also tends to turn review feedback into pointless arguments
16:38 karolherbst: ah yeah, that's more of an issue 🙃
16:57 mdnavare: daniels: Were you able to update the ssh keys for me so I can set up my dim?
17:02 mdnavare: daniels: My fd.o username is mdnavare, how do I change that to match my login name from my Google linux machine?
17:03 mdnavare: and I will upload the ssh keys in the plain text format to the ticket
17:05 daniels: mdnavare: https://gitlab.freedesktop.org/freedesktop/freedesktop/-/issues/603#note_1871233
17:05 daniels: thanks
17:05 daniels: you can put 'User mdnavare' in your ~/.ssh/config
17:05 daniels: you don't need to have the same username
17:06 mdnavare: Okay so I can update the ~/.ssh/config to add mdnavare username and then regenerate the ssh keys?
17:07 daniels: yes
17:11 mdnavare: I didnt save my .ssh/config from my old laptop, could you share the example config for reference so I can update mine?
17:16 mdnavare: what should be the format to add fd.or host and hostname in ssh/config: Host example
17:16 mdnavare: HostName example.net
17:16 mdnavare: User buck
17:18 daniels: Host *.freedesktop.org
17:18 daniels: User mdnavare
17:18 daniels: it's also documented through man ssh_config
17:21 mdnavare: okay added, attaching the ssh keys to the ticket now
17:21 mdnavare: daniels: Also with this I should still be able to retain the commit rights right?
17:22 mdnavare: I need to merge something to drm-misc
17:23 daniels: yes, your account is still there from Intel and it still has the same permissions
17:24 mdnavare: Okay great thanks a lot daniels , I have just uploaded the new keys directly so should be in plain txt
17:25 daniels: thanks, I've added your new key now but it'll take ~10min until it's active
17:26 mdnavare: Okay great, i will try my dim setup in a few mins
17:39 DavidHeidelberg[m]: jenatali: tested; works fine. Thank you again
17:39 jenatali: 👍
17:47 mdnavare: daniels: Still failing, probably need to wait a bit longer
17:47 mdnavare: ./maintainer-tools/dim setup
17:47 mdnavare: Setting up drm-rerere ...
17:47 mdnavare: mdnavare@git.freedesktop.org: Permission denied (publickey).
17:47 mdnavare: fatal: Could not read from remote repository.
17:47 mdnavare: Please make sure you have the correct access rights
17:47 mdnavare: and the repository exists.
17:47 mdnavare: dim: Failed to fetch drm-tip
17:50 jenatali: David Heidelberg: Any known issues with one of the lima machines? https://gitlab.freedesktop.org/mesa/mesa/-/jobs/40121453
17:50 jenatali: Seems consistently failing but unrelated to payload
17:50 DavidHeidelberg[m]: jenatali: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22531
17:50 DavidHeidelberg[m]: waiting :/
17:50 jenatali: Aha, thanks
17:51 DavidHeidelberg[m]: jenatali: sadly bump to kernel from 6.1 to 6.3 worked great for everything. Except disabled lima EGL tests :D
17:51 jenatali: Yeah no worries, just wanted to make sure it was known
17:51 DavidHeidelberg[m]: so when lima got brought up, it passed, but started flaking. enunes will check it when he gets back
18:06 daniels: mdnavare: please attach the output somewhere of ssh -vvv git.freedesktop.org
18:09 mdnavare: https://www.irccloud.com/pastebin/82t99x0i/Output_ssh_fdo
18:10 mdnavare: daniels: heres the output
18:14 daniels: mdnavare: your ssh client isn't trying to send that ssh key to us
18:14 daniels: mdnavare: I guess you saved it somewhere that SSH doesn't know how to find it
18:15 mdnavare: Hmm I ran ssh-keygen -t rsa and then gave a filename to save, should have kept that as default?
18:16 mdnavare: It has saved the private key and publick key in HOME for me
18:16 daniels: if you've saved it to a different path, then you need to tell ssh where you saved it - it doesn't find it automatically
18:16 mdnavare: is it trying to find in .ssh/id_dsa ?
18:17 DavidHeidelberg[m]: mdnavare: use -i path/file
18:17 DavidHeidelberg[m]: dsa? these days we usually use rsa or 25519
18:17 daniels: mdnavare: look at lines 265 onwards of the output you pasted; it tells you where it searches by default
18:18 daniels: IdentityFile (documented in man ssh_config) alters the search path
18:19 mdnavare: let me regenerate and save it in default location
18:19 mdnavare: ?
18:20 karolherbst: MrCooper: yeah.. so it was a negative test so the API validation part of that MR fixes it
18:20 daniels: mdnavare: you can just move it to the default location, no need to regenerate it
18:20 daniels: it's a file like any other
18:22 mdnavare: just move the private key to .ssh/ and save as id_rsa right?
18:22 daniels: yep, that should work
18:24 mdnavare: cool, yes done! dim setup doesnt error out now so its probably fetching the repos 🤞
18:28 mdnavare: Thanks a lot daniels its fetching everything now
18:33 daniels: np
22:38 kchibisov: What is :cs0 thread created by mesa responsible for?
22:50 Kayden: that's the radeonsi command submission thread created by util_queue in src/gallium/winsys/amdgpu
22:50 Kayden: radeonsi enqueues jobs, and that thread takes those and actually submits them to the kernel
22:51 Kayden: (assuming you're on radeonsi; maybe other drivers have "cs" threads too that I'm not aware of)
22:51 kchibisov: Yeah, I only have amdgpu hardware.
22:52 Kayden: at one point it was created to hide the latency of command submission in the kernel, but at this point threaded submission is assumed to exist for some fencing stuff, I think.
22:53 kchibisov: So it's basically a transmission channel of everything my application is doing to the kernel.
22:54 Kayden: not quite everything - radeonsi constructs batches of GPU commands (set state, draw, and so on) and asks the kernel to execute those batches. those batch submissions are what go through the :cs0 thread, IIRC.
22:55 Kayden: other things, like "allocate me a buffer" can just go directly to the kernel without going through that queue
22:55 Kayden: having trouble with it?
22:56 kchibisov: I have frequent page faults with my pure gles2 boring applications.
22:57 kchibisov: Which mentions cs0 thread. Though, I have a feeling that GPU firmware is at fault.
22:58 kchibisov: At this point, I'm trying to reduce vector or maybe developing some workaround in the application(at least for myself).
23:00 Kayden: ah, GPU hangs?
23:01 kchibisov: They don't really hang.
23:01 kchibisov: It's more like "we try to handle" -> reset or recovered.
23:02 Kayden: right
23:03 Kayden: #radeon may be able to help too, though sounds like filing a bug against radeonsi at https://gitlab.freedesktop.org/mesa/mesa/-/issues with some way to reproduce it might be the way to go (if you haven't already)
23:03 kchibisov: Kayden: oh, there're bugs already for that.
23:03 Kayden: ah
23:03 kchibisov: But I have a feeling that unless you fix it yourself there's no luck.
23:05 Kayden:doesn't actually work on amdgpu...just read through that code a bit trying to understand how other drivers handle certain things
23:06 kchibisov: I could easily make my program robust, though, the issue is that it won't help me :p
23:06 kchibisov: Because my compositor is not robust yet.
23:07 kchibisov: But it's not really a "solution".
23:11 airlied: if you are getting faults from a gl or gles program, there is either a bug in your app or the shader compiler
23:11 kchibisov: airlied: the app works fine for 2-3 years and crashes only for me on a specific GPU.
23:12 kchibisov: And the crashes are recent thing.
23:12 airlied: okay so might be a bug in the shader compiler or just some undefined behaviour raising its' head
23:12 kchibisov: I could share the shaders, they are sort of simple.
23:13 kchibisov: I know that I have crash with gles2 shaders, OpenGL 3.3 shaders, and when using zink as well.
23:14 kchibisov: we could continue in #radeon, airlied if it's better.