01:43tertl3: hi I would like to casual chat about this option in gnome 40 - "launch using discrete graphics card"
01:44imirkin: they just mean secondary graphics card
01:44imirkin: making that the primary rather than the main display card
01:44imirkin: useful if you have an optimus-type setup and don't care about battery
01:44imirkin: (or vsync)
01:54tertl3: so it runs it on the cpu instead of the 1060 when I choose that?
01:54imirkin: should run on the 1060, assuming primary is intel or whatever
01:55tertl3: so if I dont choose that then it runs it on the cpu?
01:55imirkin: it runs on the intel *gpu*
01:55tertl3: it even lets you choose for chrome
01:55imirkin: "run on the cpu" implies it's running software rasterization, which is something different
01:56tertl3: do you notice a difference?
01:56imirkin: software rast is super-slow
01:56imirkin: and very power inefficient
01:56tertl3: but it wouldnt do that either way right?
01:57tertl3: unless you didnt have a gpu at all
01:57tertl3: of either type
01:57tertl3: discrete or otherwise integrated
02:00imirkin: right, you'd have to go out of your way to make that happen
03:02tertl3: interesting
03:02tertl3: im watching this dude https://www.youtube.com/watch?v=x1oXByIJcHU
03:02tertl3: its a fosdem talk Back to the Linux Framebuffer! Linux Framebuffer support in free software by Nicolas Caramelli
03:03tertl3: he is obsessed with the frame buffer
03:03tertl3: i do want to try out his examples though
03:03tertl3: where is the frame buffer stuff in the kernel source?
03:05tertl3: oh here we go https://github.com/torvalds/linux/tree/fcadab740480e0e0e9fa9bd272acd409884d431a/drivers/video/fbdev/nvidia
03:06tertl3: nvidiafb occurs 77 tiems in this file
03:07imirkin: nouveau provides a fbdev as well
03:09tertl3: ah ok
03:09tertl3: it would be "nice" if linux folders to have a read me in github
03:10imirkin: send patches.
03:11tertl3: or even for the files too
03:11imirkin: see above.
03:11tertl3: i think Torvalds would yell at me
03:11tertl3: for being so naive
03:11imirkin: doubtful.
03:11imirkin: you'd have to do something much more actively dumb to get yelled at
03:12tertl3: like if this file had a README, i might be able to learn something from it nv17_fence.c
03:12imirkin: unlikely.
03:13tertl3: no comments or anything
03:14tertl3: i do notice that the fbcon.c and fbfense.c only go up to nvc0
03:16tertl3: there are some questions I have seriously, like, have the nouveau devs attempted to make power and video decoding acceleration for my card/generation or are they worknig there way towards the newer cards?
03:17tertl3: or is there something complicating adding those features to the pascal and newer card
03:18imirkin: what card do you have again?
03:18tertl3: 1060 6GB, GP106, nv136
03:18imirkin: video decoding accel: nope
03:19imirkin: no plans on it ever being implemented
03:19tertl3: what do yo umean nope?
03:19tertl3: oh
03:19imirkin: power states: requires signed firmware from nvidia, and there are additional complications that would make this tricky even if we had that
03:19tertl3: ah ok
03:20imirkin: video decoding should be achievable
03:20imirkin: it's just that nobody's stepping up to the task
03:20imirkin: i'm probably the best person to do it (since i did VP2, and a bunch of fixing on the vp4/5 stuff)
03:21imirkin: but i've lost any sort of motivation for it
03:22tertl3: oh nice
03:22tertl3: the H.264 acceleration is big
03:23tertl3: i think thats what my screen recorder uses
03:23imirkin: video encoding is another one that needs doing
03:27tertl3: ive been using wf-recorder on my sway setup to record my music production program
03:27tertl3: obs-studio is just now getting wayland support
03:27tertl3: so its not working on my machine yet
03:28tertl3: hopefully when fedora 34 ships it will be all good
03:28tertl3: to be honest, wf-recorder works pretty snappy on this machine
03:35tertl3: https://usercontent.irccloud-cdn.com/file/hF8thQsP/output.gif
06:40pmoreau: imirkin: I probably need something similar for loads, but so far the issue is with stores (which is why I’m mentioning `combineSt()` and not `combineLd()` 😉). So I guess your comment from a month ago was also about loads which makes more sense, but even flipping the values (i.e. setting dType to U8 and sType to U32) still results in an error as `sizeSt = typeSizeof(st->dType)` but then it subtracts `st->getSrc(s +
06:40pmoreau: 1)->reg.size`.
06:53pmoreau: imirkin: I added both MRs to my to-do list, and rebased my compute branch on top of yours.
07:00imirkin: pmoreau: cool
07:00imirkin: pmoreau: it'd probably be helpful if you could provide a sample program pre-memory opt
07:00imirkin: since it's not immediately clear to me what the situation is
07:00imirkin: and a program is worth a thousand words :)
07:01imirkin: (or dwords?)
07:02pmoreau: :-)
07:02pmoreau: Sure, one sec
07:09pmoreau: imirkin: Here we go: https://gitlab.freedesktop.org/pmoreau/mesa/-/snippets/1861; note that this is without the changes to dType and sType.
07:10imirkin: pmoreau: ok, and is %r106 a 32-bit ssa value or 8-bit?
07:10pmoreau: Should be a 32-bit one
07:11imirkin: are 8-bit l[] stores possible?
07:11imirkin: (i don't know offhand)
07:13pmoreau: I don’t think the hardware complain about those, as some of the tests using them do succeed.
07:13imirkin: pmoreau: ok, so i think using ->reg.size there is wrong
07:13imirkin: i think it should be looking at the sType / dType
07:14imirkin: but ... it's slightly tricky
07:14pmoreau: I assume there are no load/store op that could take differently-sized arguments, right?
07:14imirkin: well, load/store do a single op
07:15imirkin: the problem
07:15imirkin: is that you have a bunch of u32 values where you only care about the lower u8
07:15imirkin: and then you e.g. want to load that whole range
07:15imirkin: the memoryopt logic is likely to get quite confused
07:15imirkin: honestly i'd just run purgeRecords() anytime i saw a < 32-bit load/store
07:16imirkin: and leave solving it to another day
07:17pmoreau: Could do that 👍️
19:41imirkin: pmoreau: btw, heads-up, i moved the offset stuff into nv50_ir_from_tgsi, so you'll need to make an analogous change in from_nir
19:42pmoreau: 👍️ I saw your change, and I had already a similar change locally so I’m already ready for it. 🙂
19:44imirkin: cool
19:45imirkin: i think we're getting closer
19:48pmoreau: Currently going through your series
19:50imirkin: the only stuff after that are "bug fixes" at least as far as core goes
19:51imirkin: and i can start looking at your work getting CL going
19:52imirkin: but let me know what to look at :)
19:56pmoreau: Sounds good! Branch-wise, it would be the nv50_compute_support branch. Feature-wise, I am mostly trying to at least get the basic test from the OpenCL CTS to fully run without errors (ignoring running out of reg space for now).
19:57imirkin: which ones concretely?
19:57imirkin: (which basic tests)
19:58imirkin: i picked a random basic-seeming test and it was prettyc omplete fail
20:15pmoreau: The one called test_basic :-) (which does test quite a few things and is made of sub-tests that one can run individually).
20:16pmoreau: I do not remember what the current status is with that test, but it definitely got some pass.
20:16pmoreau: I see I will need to implement something similar to “nv50: add remapping of buffers/images into unified space” for the NIR frontend.
20:28pmoreau: https://github.com/KhronosGroup/OpenCL-CTS/blob/master/test_conformance/basic/main.cpp is the one I am referring to, which indeed does quite a few things. You might want to grab https://github.com/pierremoreau/OpenCL-CTS/tree/support_opencl_1.0_and_1.1 rather than the version from Khronos, as I added a couple missing version restrictions.
20:35imirkin: pmoreau: ah ok
21:03pmoreau: imirkin: I’m setting a list of all the current fails with basic on my G96 here https://gitlab.freedesktop.org/pmoreau/mesa/-/snippets/1865. So far it seems to mostly fall into 1) uses too many registers, or 2) indexing issues for non-32-bit values to/from local or private memory.
21:04imirkin: pmoreau: cool
21:04imirkin: well let's not worry about fixing everything at once
21:04imirkin: i'd rather get your stuff in which gets it this far
21:05imirkin: and then we can evolve a common base
21:05pmoreau: A few missing features as well such as u2u16, or SV_WORK_DIM in `getSRegEncoding()`.
21:05imirkin: what's SV_WORK_DIM?
21:05imirkin: it's that stupid extra arg, right?
21:05imirkin: can we pass it in via the argument param?
21:06imirkin: if it's 16-bit then it fits
21:06pmoreau: I think it says whether the grid is 1D, 2D, or 3D.
21:06imirkin: oh, so literally "1, 2, 3"?
21:06pmoreau: If it’s what I believe it is, yes
21:06imirkin: heh ok
21:06imirkin: yeah, that should be easy
21:06imirkin: worst case we burn an extra user param
21:06imirkin: we could pack it along with the 'z' value
21:06imirkin: which doesn't need all 32 bits
21:07pmoreau: Seems reasonable
21:18imirkin: pmoreau: btw, i dunno that you need to do anything right now for the gmem index remapping stuff -- until you hit images, it's all out of g15 for now, which is special
21:20pmoreau: Ah okay, then that will be for later as I haven’t started looking at images yet.
21:22imirkin: for GL, you have multiple buffers, all in their own happy g[] spaces
21:23imirkin: but in CL, it's all one happy space
21:29pmoreau: There we go, snippet updated with all the sub-tests from basic now, and the overall picture does not change: the two main sources of failures are 1) using too many regs, and addressing issues in local memory for non-4-byte values. If ignoring images, it seems like very little is otherwise missing feature-wise, and there might be an issue or two that lie in clover.
22:13imirkin: pmoreau: "nv50: add compute support" -- did you mean "nv50/ir: offset accesses to shared memory"?
22:17imirkin: (for the patches which i should add your r-b to)
23:46karolherbst: why is this hda discussion running into red herring arguments again? : *sigh*
23:46karolherbst: :/