IRC Logs of #dri-devel on irc.freenode.net for 2024-04-09

07:22 likelyso: Not a lot of complexity is there to spot, only intermediate results, some kind of dependency tree needs to be hashed, cause offsets are generated in the hash for dependent instructions, so to get output results of pcX, it needs operands loaded before to be performed the async ones as well as dependent ones generated in hash, but UpTo only pcX not to deeper, so to blacklist a deeper intermediate result, it needs to remove their offset, but
07:22 likelyso: those are based of dependency graph, but there's an SSA or where that's natural, out of SSA would leave the users after the write. So intermediate results fetching need to be fast, need to decide over the mapping there, Linus has been giving critics for dwarf too, SSA dwarf otherwise would be fast way to there likely. So pc is defined by its writes virtual registers index, and reusing dwarf structures it's possible to fetch where c variabl
07:22 likelyso: e a is in SSA. That should of course do it, unless I develop something thinner but at the moment nothing clever has hit me as of yet.
07:26 MrCooper: columbarius: gbm_bo_map only really works with single-plane formats
07:35 columbarius: would the implicit api prevent me from creating a buffer with a multiplane format, or would it just return me a bo with a single fd, which I could import with gl, but not map manually (at least via a public api)?
08:11 MrCooper: columbarius: the GBM YCbCr formats were added long before modifiers were a thing, presumably it was possible to use them somehow
08:37 tzimmermann: jfalempe, hi. about the recent mgag200 patchset: thanks for testing. do you know which chip version the iDrac console uses?
08:57 jfalempe: tzimmermann: lspci says: 03:00.0 VGA compatible controller [0300]: Matrox Electronics Systems Ltd. Integrated Matrox G200eW3 Graphics Controller [102b:0536] (rev 04)
08:57 tzimmermann: jfalempe
08:57 tzimmermann: thanks ^
08:58 tzimmermann: there are only two chips that actively use the bmc clock
08:59 tzimmermann: ew3 and wb
08:59 tzimmermann: this appears to be an ew3
09:00 tzimmermann: i'll add bmc functionality for those two chips
10:14 jfalempe: tzimmermann: ok these are the two models using mgag200_bmc_enable_vidrst(). But even for other models, I'm wondering what the userspace does if the connector is unconnected and mode is empty.
10:14 tzimmermann: jfalempe, the connector is then disconnected and userspace wouldn't use it
10:15 tzimmermann: i assume that chip models without vidrst support don't have bmcs connected
10:16 tzimmermann: otherwise, the driver should use vidrst during modesettings
10:16 jfalempe: hum, I think all mgag200 have bmc connected
10:16 tzimmermann: that's why i asked about the model
10:16 jfalempe: at least I used bmc on all mgag200 variant, when I tested DMA
10:17 tzimmermann: i also have to check my local test system
10:17 tzimmermann: but then, mgag200 should program the vidrst bits for those models as well
10:18 tzimmermann: there's also plain g200 support, which likely doesn't have a bmc. it's the desktop chip
10:18 jfalempe: yes, but I'm not sure there still user of plain g200 around.
10:18 tzimmermann: but the bmc support will be trivial once implemented. like in ast
10:19 tzimmermann: jfalempe, not many. but it's helpful to use a desktop sometimes
10:20 jfalempe: there is probably the same issue than ast, that you can't really know if there is a bmc connected or not.
10:21 tzimmermann: indeed. there's no bit to test for a bmc
10:22 jfalempe: so apart from the PCI G200, others may have a bmc, and we should add a virtual bmc connector for them ?
10:24 tzimmermann: so far i assumed that only wb and ew3 have bmcs. they are the only ones with vidrst code. for the others, i have to investigate.
14:07 mattst88: with the massive increase in the number of tests (and therefore skipped tests) in dEQP-VK, I tested what would happen if we trimmed the caselist by removing known-to-skip tests
14:07 mattst88: on a 14-thread Intel system, the runtime dropped from 1h52 to 1h
14:08 mattst88: has anyone done any investigation of improving dEQP-VK runtimes?
14:08 mattst88: e.g. is it possible to speed up test skipping somehow?
14:12 zmike: there was some work done on this last year
14:12 zmike: because after shader object tests landed the test skipping took literal hours
14:12 zmike: I'm not sure if skipping could be improved further? probably worth opening a ticket
14:12 mattst88: wow
14:13 mattst88: I saw some patches land upstream that reduced the number of useless/duplicated tests after the shader object tests landed, but even still... on the most capable Intel GPU we're passing ~1.2 million tests and skipping ~2.5 million /o\
14:14 zmike: 😬
14:14 zmike: https://gitlab.khronos.org/Tracker/vk-gl-cts/-/issues/4704
14:14 mattst88: thanks!
14:15 zmike: cc also rg3igalia
14:15 mattst88: guess the first thing I'll do is run deqp-vk under perf with a caselist that is only known-to-skip tests and see what it's doing
14:17 rg3igalia: my guess is that most time will be spent on two steps: (a) generating the caselist hierarchy and (b) running the support check methods to see if the test is supported
14:18 rg3igalia: i.e. implementations of ::checkSupport for the TestCase class
14:18 mattst88: thanks
14:19 mattst88: might require too long of an answer for IRC, but what is the purpose of generating the caselist hierarchy?
14:21 rg3igalia: assigning each case a test name, basically, organizing them in a tree
14:22 rg3igalia: i.e. when you run dEQP-VK.pipeline.shader_object_whatever.blah, there's a "pipeline" test group, followed by a "shader_object_whatever" test group, etc etc
14:22 rg3igalia: and each of those has a subcase, and that's generated on runtime
14:23 rg3igalia: it takes a few seconds to generate the full hierarchy, which is annoying when you want to run a single case, but not a very long time if you want to run millions
14:23 mattst88: okay. and the resulting data structure is used to determine all the tests that should be run?
14:23 rg3igalia: yes
14:23 rg3igalia: for example, if you run deqp-vk --deqp-runmode=stdout-caselist >/dev/null you can measure the time it takes to generate the full list of cases
14:23 mattst88: ah, cool
14:24 mattst88: I wonder if deqp-runner magnifies that cost when it divides the deqp invocations into groups of e.g. 500?
14:24 rg3igalia: depends on how much time it takes to run those 500 tests
14:25 rg3igalia: for example, I know a few test groups where each test runs in milliseconds, so running 500 tests or skipping 500 tests fakes a fraction of a second, so generating the test hierarchy may take comparatively long
14:26 rg3igalia: while others are slower, so 500 tests is not too bad
14:27 rg3igalia: taking a look at the checkSupport code, I think we could throw in a couple of optimizations to make skipping shader object tests much faster
14:27 rg3igalia: for example, isCoreDeviceExtension is super slow because it doesn't cache the extension list per api version, so we could do that
14:29 rg3igalia: and then we could cache the results for a given extension in a hash or set
14:29 mattst88: interesting, yeah. I will take a look at that and see what improvements I can get
14:29 rg3igalia: do you have access to khronos stuff?
14:29 mattst88: just a quick 2-minute deqp-vk run shows 5% of time spent in tcu::(anonymous namespace)::channelToFloat
14:29 mattst88: yeah
14:30 rg3igalia: could you open an issue in the tracker explaining your shader object case? the 1h vs 1h 52m
14:30 mattst88: yeah, will do
14:30 rg3igalia: someone from my team could take a look at this in the coming days/weeks and improve the situation, thanks!
14:31 mattst88: thank you!
14:37 mattst88: filed https://gitlab.khronos.org/Tracker/vk-gl-cts/-/issues/5069
14:43 robclark: mattst88: at least depending on the test, deqp spends a lot of time in it's reference sw rasterizer.. but I think the problem here is just (re)initializing all the tests for each invocation of deqp-vk. Ie. if you have that many skips, odds are you have a bunch of invocations that end up running few or no tests
14:44 robclark: mattst88: it would be cleaver if there could be a "zygote" deqp process which forks instances for each batch of tests, perhaps?
14:44 mattst88: yeah, I suspect so
14:49 robclark: I suppose you should mention this is parallel deqp-runner in the krhonos issue.. I guess the issues are different if you are running deqp the "official" way (ie. less startup overhead, but single threaded)
14:54 mattst88: oh, very true
15:20 willy: does anyone have suggestions for how i might go about testing changes to fb_defio? I don't think I have any of the hardware that uses it
15:29 javierm: willy: the DRM fbdev emulation layer uses it IIRC? So you could test it by enabling DRM_FBDEV_EMULATION and using the emulated /dev/fb? dev
15:30 willy: aha, thanks
15:32 javierm: willy: https://elixir.bootlin.com/linux/v6.9-rc3/source/drivers/gpu/drm/drm_fbdev_generic.c#L119
15:33 willy:nods. I can grep, I just have no idea how all the graphics stuff fits together
15:34 willy: I come to you from MM land; I need to stop fb_defio from touching struct page
15:34 javierm: willy: sorry, I didn't imply that you can't grep :) But was mostly mean to confirm that the generic fbdev emulation did use defio
15:35 javierm: because I could be misremembering
17:32 mattst88: rg3igalia: I might be misreading the perf report, but is seems to me that for any invocation of deqp-vk by deqp-runner, if any of the 500 tests is e.g. created by ::createExtendedDynamicStateTests, then all extended dynamic state tests will be instantiated
17:33 mattst88: does that seem correct?
17:34 mattst88: if so, I wonder what would happen if we modified deqp-runner to run groups of related tests together
17:43 zmike: on one hand sorting might be beneficial for runtimes, but on the other hand it would eliminate some cases where weird caselist ordering finds driver bugs
17:44 mattst88: yeah, definitely true
18:27 zmike: that's probably more an issue on the GL side though
18:30 Sachiel: but the tests already run sorted
18:53 tzimmermann: willy, how does defio still use struct page?
18:53 tzimmermann: it doesn't really modify it
18:53 tzimmermann: it only holds a reference
18:54 willy: it sets page->mapping, it sets PageDirty(), it calls page_mkclean()
18:55 tzimmermann: willy, can/will you replace that code?
18:55 willy: that's the plan, once i can test that i've not broken it
18:56 willy: i've just got the dusty old laptop into a state where it can boot a -next kernel. the iwlwifi driver is bitching, but ignore that ...
18:56 tzimmermann: i've done most of that stuff in recent releases. can i help you somehow?
19:00 willy: that would be nice! i have a sketch of a design, but haven't coded it up. wanted to get this laptop to a point where i could prove it was actually running the fb_defio code first
19:00 willy: lsmod says i don't even have any fb modules loaded (wayland is running, but i guess it's not using fbdev)
19:01 airlied: tzimmermann: can/do any of the virtual gpu drivers use fb_defio?
19:01 tzimmermann: you can boot most modern test systems with DRM and enable simpledrm
19:01 tzimmermann: airlied, anything with generic fbdev uses defio.
19:02 airlied: willy: so you could use a vm to test
19:02 willy: qemu?
19:02 tzimmermann: willy, booting with the kernel's nomodeset parameter should only load simpledrm
19:02 tzimmermann: qemu + bochs
19:04 tzimmermann: fbdev-generic emulates fbdev support for bochs. it uses fb_defio
19:05 tzimmermann: if you grep the DRM directory for 'drm_fbdev_generic' you can see the drivers that use fbdev-generic
19:05 mattst88: Sachiel: when run from deqp-runner, they run sorted?
19:06 willy: hm, i get a blank screen on dustylaptop when passing "nomodeset"
19:06 tzimmermann: willy, that might be worth a bug report on its own
19:07 tzimmermann: what HW is it?
19:07 Sachiel: mattst88: iirc, deqp-runner will call deqp-vk once for each caselist, deqp-vk runs the tests in that caselist sorted by the order they are defined in the framework
19:07 willy: Sunrise Point -- 2016 era
19:07 willy: Intel HD Graphics 620
19:08 mattst88: Sachiel: right, but each caselist of 500 tests given to an invocation of deqp-vk won't be 500 consecutive tests from a plain deqp-vk run, right?
19:08 willy: i say blank screen, it's actually black with a flashing text cursor
19:08 Sachiel: mattst88: no, it'll be those 500 tests, just not in the order the caselist shows them
19:09 tzimmermann: do you have CONFIG_DRM=y CONFIG_DRM_SIMPLEDRM=y and CONFIG_SYSFB_SIMPLEFB=y ?
19:10 mattst88: right. I'm suggesting that deqp-runner could group tests together that are created by the same function (like ::createExtendedDynamicStateTests) and feed a group of 500 of those to deqp-vk
19:10 willy: i have the Debian config; let me just verify those ...
19:10 Sachiel: mattst88: ah, yeah, that would lose coverage that has found real issues
19:10 willy: argh, no, Debian didn't enable SIMPLEDRM
19:10 mattst88: Sachiel: right
19:13 tzimmermann: willy, when you rebuild the kernel with the config options i mentioned, you should also set CONFIG_DRM_I915=n and CONFIG_XE=n. that will disable any intel driver for the HD 620
19:13 tzimmermann: and your system should be fb_defio-only :)
19:16 tzimmermann: you should also set CONFIG_FB_DEVICE=y so that you get a device file for the fbdev code under /dev/fb0
19:16 tzimmermann: you can trigger fb_defio simply by writing to /dev/fb0
19:17 tzimmermann: fb_defio will track the modified pages
19:17 tzimmermann: no wait... you need to do an mmap and write to the mmap'ed range
19:18 tzimmermann: the regular I/O writes won't go through fb_defio
19:20 willy: hm. now with those two config options enabled, adding "nomodeset" to the command line results in freezing after "Loading initial ramdisk ..."
19:20 willy: the cursor isn't blinking
19:22 tzimmermann: willy, you can boot the kernel with fb.lockless_register_fb=1 to maybe get some more information out of the console code
19:23 tzimmermann: but IDK what the debian kernel does
19:23 willy: well, i'm booting next-20240408 just using the Debian config file as a start
19:25 airlied: willy: is that thing booting using bios?
19:25 willy: grub from efi, I believe
19:25 airlied: ah at least there is some hope, maybe check you get efifb on a normal boot before the gpu driver loads
19:25 willy: i think i do, let me reboot without nomodeset
19:27 tzimmermann: willy, see https://etherpad.opensuse.org/p/QId-BMXnhlupH6FIqrbx for my current x86-64 config
19:27 willy: i didn't get anything extra from it with "fb.lockless_register_fb=1"
19:30 willy: oh! journalctl shows it did boot last time! i had a hunch that it might be fine, just waiting at the dm-crypt prompt, so i entered my drive passphrase
19:30 willy: it never displayed anything, but it did boot
19:30 tzimmermann: ok
19:31 willy: heh. "Setting dangerous option fb.lockless_register_fb - tainting kernel"
19:31 tzimmermann: yeah, it's just for debugging
19:32 tzimmermann: it will avoid taking some locks that would otherwise prevent output during the console handover between drivers
19:33 willy: yeah, it doesn't look like it loaded the simpledrm module
19:38 columbarius: MrCooper: thx
19:40 tzimmermann: willy, ok see the config options that i mentioned. i's getting late here. i'll be around tomorrow again. good luck
19:40 tzimmermann: "it's getting"
21:23 bwidawsk: do I need to lease the `writeback connector`, or just the resources that I'm trying to writeback from (if I want to use it without root)?
21:51 zmike: mareko: are you planning to land https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28436