09:38josey: For a long time now I get the following messages on boot about my graphics card; "nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 3e6684 [ IBUS ]" and "nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 10ac08 [ IBUS ]". Is that normal?
09:38josey: My card is a Nvidia GM107 [GeForce GTX 750 Ti]
11:10RSpliet: josey: I believe these warnings should be harmless.
11:11RSpliet: Someone should make sure these "MMIO reads" to non-existing locations no longer occur on graphics cards like yours, but they're not a source of erroneous behaviour as far as I know.
11:13josey: Thank you RSpliet :)
13:56ryszard: hey, which nvidia card should I take to get the best nouveau support but on low-powered card similar to nvs295?
13:58karolherbst: ryszard: define "best support"
14:00karolherbst: we can't even promise to fix any bug, because of the randomness and lack of documentation
14:00ryszard: I'm looking for similar to nvs295 card with the least missing functionalities/hangs/the most acceleration
14:01karolherbst: all of that is too random to be able to make a definite statement in terms of reliability
14:02ryszard: I don't play games, neeed acceleration only i.e. ini firefox or gnome3
14:02karolherbst: I mean, from a functionality point of view every more or less modern GPU is fine (except Turing)
14:02linkmauve: ryszard, why not go for Intel or AMD instead?
14:03ryszard: I use AMD AM3/AM3+ only then no Intel possible, AMD cards all are power hungry as I see
14:03karolherbst: ryszard: that's not true
14:03ryszard: nvs295 is good example of lowpower card
14:04karolherbst: and NVS 295 is not a low power card
14:04karolherbst: it's old and power inefficient compared to any new card
14:04ryszard: but is passive cooled
14:05karolherbst: so?
14:05karolherbst: it will still use more power when idling than a 2080 ti
14:06karolherbst: probably
14:06ryszard: hmm
14:07ryszard: if I will buy any newer GPU with official driver support which one should I buy to use CUDA on it?
14:07HdkR: 2080ti uses ~17w idling according to reviewers :P
14:08karolherbst: HdkR: yeah well.. the 295 will be constantly at full load anyway with a modern desktop
14:08karolherbst: so...
14:08HdkR: yea, bad times
14:08karolherbst: my point was just, that old GPUs because of the process are less efficient in terms of perf/W
14:09karolherbst: and just having a 23W TDP GPU doesnt mean you save power compared to a 180W TDP one as long as the avg power consumption is higher on the older one
14:09HdkR: Definitely
14:09karolherbst: and the 295 is quite slow, so its not even fun to use it
14:09linkmauve: Also, you won’t get CUDA support.
14:09karolherbst: ryszard: the question is rather do you want to use CUDA or not
14:10ryszard: ok, but which should I get to be able to learn and use CUDA on it?
14:10karolherbst: linkmauve: the 295 supports CUDA
14:10karolherbst: all tesla ones do I think
14:10linkmauve: karolherbst, but not Nouveau, right?
14:10ryszard: I'm not proffesional programmer, wish to learn toi use CUDA then no bleeding edge CUDA version support is needed
14:10karolherbst: well, you won't get CUDA with any GPU when using Nouveau :p
14:10ryszard: ok
14:11ryszard: how long nvidia provide official support for their cards?
14:11karolherbst: depends on how much you pay I'd say
14:11karolherbst: but tesla is out of upport lredy
14:12ryszard: great ;p
14:12karolherbst: and fermi i probbly next
14:12karolherbst: https://www.nvidia.com/en-us/drivers/unix/
14:12ryszard: do they provide any table with EOLs ?
14:12karolherbst: mhhh
14:13karolherbst: well, I couln't talk about it anyway
14:13karolherbst: ohh actually, the 340 branch supports the nvs 295, but...
14:14karolherbst: the 304 is already out of upport for two years
14:14karolherbst: so you can make your own assumptions from that
14:14ryszard: yeah, I know
14:14karolherbst: 390 doesn't support the nvs 295
14:14ryszard: it sucks tbh
14:14karolherbst: well, old hardware also sucks :p
14:15karolherbst: ryszard: why not a GT 1030?
14:15karolherbst: there are some passive ones as well
14:16ryszard: but I have iGPU on AM3 mb Radeon HD4200 which has still great support and provides excellent performance on gnome3 and firefox - then it is possible to support old hardware very well
14:16ryszard: the case I'm talking about is another machine on AM3+ with AMD FX
14:16ryszard: where I don't have iGPU then need regular one on pcie
14:16karolherbst: sure, but if you want to use cuda, you have to play by the rules the vendor is giving you
14:16ryszard: I know
14:17HdkR: Palit has a passive GTX 1650 if you want the latest toys :P
14:17karolherbst: HdkR: how? :D
14:17karolherbst: that's still a 75W GPU, no?
14:17ryszard: even if I try to use ROCm then I have to get Ryzen platform (expensive) or get GFX9 GPU (which is expensive) :p
14:18HdkR: karolherbst: honkin huge heatsink
14:18HdkR: Way bigger than a regular PCIe slot
14:18karolherbst: ROCm is just a fancy OpenCL stack
14:18karolherbst: nobody uses their propriatary API
14:18ryszard: GT1030? when nVidia cuts off support for it ?
14:18karolherbst: ryszard: nobody will be able to answer this
14:18karolherbst: but it will take longer than the nvs295
14:19HdkR: Give it like five generations and it'll fall back to the "legacy" driver probably :)
14:19karolherbst: yeah.. probably
14:19HdkR: Then give it another 2-3 generations and it'll be deader than dead
14:20HdkR: and you can buy another $100 GPU at that point to replace it
14:20HdkR: the cycle continues
14:20karolherbst: if we assume the nvs295 falls out of support soon, you hve like 12 years per gen
14:20ryszard: I see that gt1030 is only in 2GB version?
14:20karolherbst: give you around 10 years with the 1030
14:20karolherbst: ryszard: I guess so? why?
14:21HdkR: That's more ram than the 295 at least
14:21karolherbst: mhhh
14:21karolherbst: the biggest issue with the 1030 is that it's a gp108 though
14:22HdkR: If they want to run the proprietary stack for cuda then nothing nouveau related matters though
14:22karolherbst: but it's the newest lower power GPU at least
14:22karolherbst: HdkR: ohh, gp108 has no video accel
14:22karolherbst: even on nvidia
14:22karolherbst: not even decoding
14:22ryszard: radeon rx550 is 50W as I see
14:22HdkR: Yea I know, I have a turdy Razer with the thing
14:23ryszard: but no GPU computations possible
14:23ryszard: ehh
14:24ryszard: looks like every choice is bad :|
14:24karolherbst: yep
14:24HdkR: Oh right, firefox was a pre-req. People like hardware decoded video
14:24karolherbst: ;)
14:24HdkR: I'm too used to my 2990wx just not caring about cpu load
14:25HdkR: Just buy a Titan RTX and be set for a decade :P
14:26HdkR: underclock it to GT1030 perf levels, run it with the fans unplugged
14:26karolherbst: :D
14:26karolherbst: just detach the fan, the driver will underclock for you
14:26HdkR: perfect
14:27ryszard: hehe
14:27linkmauve: HdkR, but Firefox doesn’t do hardware decoding on Linux yet.
14:28linkmauve: There are patches floating for using vaapi on Wayland, but that’s only available for free drivers.
14:28HdkR: linkmauve: Neither does chrome sadly
14:28karolherbst: linkmauve: there is a vdpau to vaapi bridge
14:28linkmauve: karolherbst, but no Wayland.
14:28karolherbst: ehh, k, that might be
14:28linkmauve: (On proprietary Nvidia.)
14:29HdkR: So many problems
14:31ryszard: gt1030 has latest official driver support
14:31ryszard: but is also last on the list
14:31ryszard: then will be dropped soon as well
14:32karolherbst: legacy branches are maintained for quite some time though
14:34ryszard: I see that there is 1050Ti in reasonable price
14:35HdkR: karolherbst: Did the GT710 have video decode? I don't think i ever checked
14:35karolherbst: maybe?
14:35karolherbst: dunno
14:35karolherbst: but the gt710 is... stupid
14:35HdkR: It's such an ultra trash tier GPU that I never even booted with it
14:36karolherbst: ohh wit, 710 is kepler
14:36karolherbst: the 705 was fermi :D
14:36ryszard: gt710?
14:36ryszard: this is gpu on 1050Ti?
14:36HdkR: no
14:36karolherbst: fun fact, the gt 710 has more perf and consumes less power than the gt 705
14:36HdkR: hah
14:37HdkR: GT710 is less powerful than Tegra. It need not exist
14:37ryszard: but gt710 is out of support
14:37ryszard: only nouveau, then no cuda
14:38HdkR: I'm not suggesting it :)
14:38ryszard: heh
14:38HdkR: It just comes with random OEM PCs for no reason
14:38HdkR: It's like a laptop coming with an MX130
14:38HdkR: Or MX330
14:40karolherbst: uff
14:40karolherbst: we only have this stupid situation because Intel sucked
14:41HdkR: The 96EU laptop parts can't make it to market fast enough
14:42HdkR: We were stuck on 24EU parts for too long
14:42karolherbst: yeah.. because it wasn't possible to do so 5 years ago
14:42karolherbst: and if they started with their iris GPU, they totally messed up as well
14:42karolherbst: putting it only in their high end mobile chips
14:42karolherbst: so stupid
14:43HdkR: So the only company that shipped it was Apple
14:43karolherbst: in the beginning yes
14:43karolherbst: but they were forced to
14:43karolherbst: alternative would be crappy GPU or dual GPU
14:43karolherbst: I have no idea what they were thinking
14:44karolherbst: they could have prevented most of the dual GPU laptop market
14:44karolherbst: I am still angry at them for this :p
14:44linkmauve: Same. :|
14:44HdkR: "Surely normies don't need a powerful GPU, there's no market for this"
14:44karolherbst: ohh, I am sure most put dual GPUs in their for better external display support as intel sucked there as well, especially with HDMI
14:45karolherbst: and intel also messed up TB
14:45karolherbst: Intel just deserves that AMD overtook them
14:45karolherbst: :p
14:45HdkR: Lenovo just announced Ryzen laptops with Thunderbolt finally
14:45HdkR: Sadly the AMD models don't ship a 4k display D:
14:45linkmauve: What does that mean exactly?
14:46HdkR: linkmauve: Messing up thunderbolt?
14:46HdkR: Oh the woes
14:46karolherbst: ridiculous license fees
14:46karolherbst: for most, buying a desktop was cheaper than an eGPU case
14:47karolherbst: it's better now
14:47karolherbst: but back then: 400€ for a proper case
14:47karolherbst: yeah.. no way
14:47HdkR: addin cards on desktop are still a mess
14:48HdkR: requires motherboard to provide additional headers. Which each only ever provide one header, so you can't have more than one add in card
14:48karolherbst: oh well
14:49HdkR: Mac Pro surprisingly gets this right by just sticking a ton of ports in to the device
14:50HdkR: Hopefully the USB-IF sorts that dumpster fire out :)
14:50HdkR: Then calls it USB 4.0 2x2x2 or something dumb
14:54linkmauve: karolherbst, how much does one cost nowadays?
14:55karolherbst: linkmauve: I got one for 280€ or o
14:55karolherbst: *so
14:55linkmauve: I think I have Thunderbolt on my computer, so maybe I could avoid having to buy a 50€ desktop and then changing everything inside.
14:55karolherbst: has a 450W PSU
14:55karolherbst: well, it sucks with linux
14:56karolherbst: can't recommend
14:56karolherbst: first issue: kernel drivers are crashing on hotunplug :p
14:56karolherbst: maybe amdgpu survives these days
14:56linkmauve: (I might obtain a Sandy Bridge Xeon desktop for 50€, but it has a crappy workstation Nvidia GPU and the PSU will probably not be good enough for the AMD GPU I’ll replace it with.)
14:56karolherbst: but probably not
14:56linkmauve: (If I go that way.)
14:56HdkR: Absolute cheapest eGPU box I see is $190 and only has a 60w PSU :P
14:56karolherbst: HdkR: well.. pay $100 more
14:56linkmauve: Ok, so the desktop is still much cheaper than that.
14:57HdkR: Yea, $100 more and you have a large pick
14:57linkmauve: But probably as bothersome, since I would need to figure out streaming and stuff for gaming.
14:57karolherbst: HdkR: I have the Sonnet eGFX Breakaway Box 350
14:57karolherbst: ohh, only 350W?
14:57karolherbst: oh well
14:57karolherbst: HdkR: https://www.amazon.com/Sonnet-eGFX-Breakaway-550W-GPU-550W-TB3/dp/B0764J5QVD/ref=pd_sbs_147_t_0/135-6536509-2775008?_encoding=UTF8&pd_rd_i=B0764J5QVD&pd_rd_r=5600ef91-929c-445e-9994-f2b01f66dd45&pd_rd_w=lrK1g&pd_rd_wg=LeSub&pf_rd_p=5cfcfe89-300f-47d2-b1ad-a4e27203a02a&pf_rd_r=E9FZTK2ERXTD3RSGZR70&psc=1&refRID=E9FZTK2ERXTD3RSGZR70
14:57HdkR: sonnect has a 550w...yea
14:57HdkR: that one
14:58HdkR: :D
14:58karolherbst: jep
14:58karolherbst: that's probably the best perf/money if you really onlt care about the GPU
14:58karolherbst: others also have internel SATA and fancy shit
14:58karolherbst: ethernet...
14:58karolherbst: you name it
14:58HdkR: I still have my random razer one that doesn't have any additional ports
14:58karolherbst: ahh, but you pay $200 for the razor logo
14:58karolherbst: falls under "fancy shit" :p
14:58HdkR: It's terribly overpriced
14:59karolherbst: no shit :D
14:59HdkR: Save $60 and just get the Sonnet
15:00linkmauve: karolherbst, what do they use SATA and Ethernet for?
15:01HdkR: Obviously you won't just saturate 4 PCIe lanes with a puny GPU, stick some HDDs, USB 3.0 ports, Gigabit ethernet, and LEDs in that
15:01HdkR: (Copying a 4k render back over the line causes huge perf hits btw)
15:03karolherbst: linkmauve: HDDs
15:03karolherbst: so, you know, because your laptop only has a sucky 256GB SSD, you also need a 4TB HDD for all your games :p
15:03karolherbst: why having an external case if everything can be in your eGPU one
15:04linkmauve: Oh.
15:04karolherbst: and ethernet for obvious reasons :p
15:04karolherbst: the eGPU case is also like a full dock so to speak
15:04HdkR: That explains the Radeon Pro SSG </s>
15:04karolherbst: HdkR: well, you can also have external displays on the GPU
15:04karolherbst: then you even save the copies
15:05karolherbst: I mean.. it makes perfect sense
15:05HdkR: Yea, it works even
15:05karolherbst: instead having a crappy dock only doing ethernet, power and maybe some display.. it also has a beafy GPU
15:05linkmauve: Ok, so I’m decided, I’ll do the copies over Ethernet instead of Thunderbolt. :p
15:06karolherbst: :p
15:06linkmauve: Sounds both cheaper and easier.
15:06karolherbst: yeah.. intels license policies on TB were.... stupid
15:06HdkR: no no no, what you want is to buy one of the PCIe 16x lane network switches
15:06karolherbst: I think it got better now
15:06karolherbst: HdkR: and put it in an eGPU case, right? :p
15:07karolherbst: there are awesome USB-C docks out there though
15:07HdkR: yea, eGPU goes out one end
15:07karolherbst: even.. light ones
15:07HdkR: Oh sorry, it's only 8x lanes https://www.dolphinics.com/products/IXS600.html
15:09karolherbst: and only gen3
15:09karolherbst: shame on you
15:09HdkR: Intel isn't shipping gen4 yet so nobody cares right? :)
15:09karolherbst: right
15:10karolherbst: well.. except,
15:10karolherbst: there are pcie 4 nvme drivers already :p
15:10karolherbst: those are insane btw
15:10karolherbst: 5GB/s read/write...
15:14HdkR: Yea, they have some decent numbers
15:14HdkR: Just make to cool them properly otherwise the controller hard crashes and you need to power cycle it :D
15:14imirkin_: just read a bit of scrollback ... NVS295 didn't have a fan, so that makes it low power enough.
15:15imirkin_: and the best nvidia card to buy is an AMD one. hopefully that was covered.
15:15karolherbst: imirkin_: yeah.. but one of the req was CUDA support.. so
15:16imirkin_: HdkR: GT 710 needs to exist, otherwise what card would i be able to afford to hack on nouveau?
15:16HdkR: You've got me there
15:17imirkin_: i'm sure nvidia was thinking of me
15:17karolherbst: of cours
15:17karolherbst: e
15:17imirkin_: shitty enough to end up in random dells, imirkin will order dells, and grab the free card
15:18imirkin_: seems like the logic is simple enough :)
15:18HdkR: Now I imagine you hoarding shelves of these cards
15:19imirkin_: i just grab one of each kind
15:19imirkin_: so just the one shelf
15:21imirkin_: HdkR: i think the list is NV5, NV17, NV34 (x2), NV42, NV44, NV4A (PCI!), G84 (x2), G96, G98, GT215, GF108 (x2), GK208, GM107, GP108
15:21HdkR: Oh, you did finally get your hands on a GP108? Nice
15:21imirkin_: dell!
15:22imirkin_: the GK208, GM107 and GP108 are all from dell
15:22imirkin_: "extras" as it were
15:22HdkR: Discarded bonus hardware is always good stuff
15:22imirkin_: GT730, GTX745, and GT1030, respectively
15:23imirkin_: not exactly the cream of the crpo
15:23karolherbst: I think the only real PCIe cards I own are the NV4x ones :D
15:24imirkin_: one of my nv34's is PCIe, the other is PCI
15:24imirkin_: but it's through that PCIe <-> PCI bridge
15:24imirkin_: so MSI thinks it works, but doesn't
15:24karolherbst: ufff
15:25imirkin_: oh, and i forgot the G92 in that list. oops.
15:45imirkin_: i keep wanting to get a Quadro K4000 (GK104), but can't find them for less than like $75 on ebay =/
15:46imirkin_: and of course right as i say that, there's one for $46 expiring in 2h
15:46imirkin_: tempting.
15:48imirkin_: very very tempting.
15:49imirkin_: karolherbst: have you tried running CTS again?
15:49imirkin_: karolherbst: and did you figure out the fermi issue?
15:52karolherbst: I am looking into the fermi stuff
15:52karolherbst: now
15:53imirkin_: ah ok
15:53karolherbst: sadly I had random other things to do, so I really found any time for the CTS
15:53imirkin_: yeah ok
15:53imirkin_: i may plug the GM107 in, in case the memory issue i hit is pascal-specific somehow
15:53imirkin_: (since it does have a different VM)
15:54karolherbst: yeah
15:55imirkin_: (and it's a lot easier to blame the VM than it is to blame myself...)
16:18karolherbst: heh.. weird.. now that I test on the fermi it works.. but I also have a 5.5 kernel running not nouveau master
16:18karolherbst: but it's also getting late.. tomorrow then
16:25imirkin_: reasonable.
18:03karolherbst: imirkin_: dunno if you saw (or if I even asked), but do you know if we can do indirect ubo accesses on the index level in compute shaders without having to fall back to global memory?
18:03karolherbst: like if we are sure that we only have 4 ubos bound
18:03karolherbst: or well.. only 4 cbs filled with data
18:03karolherbst: or is this more of a robustness issue where an access could overflow
18:04imirkin_: sure
18:04karolherbst: (but then we have the same issue in graphics as well)
18:04imirkin_: but if you have to support 16, then the indirect could span the physical cb's and gmem cb's
18:04karolherbst: right
18:04karolherbst: but normally we know how wide the bound ubo is, or is that not clear in glsl?
18:04karolherbst: or not always
18:06karolherbst: anyway, for CL it matters even less as you can't indirectly access a const buffer, just indirect offset is possible
18:07karolherbst: so I was wondering if spending some time on making it less painful for compute shaders would make sense as well
18:09karolherbst: and eg, if compute kernel only h 6 contstant args, we can be sure that 5 is the highest ubo index accessed
18:09karolherbst: *has
18:09imirkin_: it's clear in glsl, but it might still span the boundary
18:10imirkin_: and iirc we do use the physical ubo's if they fit
18:10imirkin_: but i don't fully remember
18:10imirkin_: i don't think indirect ubo accesses are too common though
18:11karolherbst: not for indirect access on the index I think
18:11imirkin_: right
18:11imirkin_: that's what i meant by indirect
18:11karolherbst: yep.. (fileIndex >= 6 || ind)
18:11karolherbst: ahh
18:11karolherbst: imirkin_: well in CL it's always indirect
18:11imirkin_: 7 is taken
18:11karolherbst: or at least ... how we do stuff in clover
18:12karolherbst: it's stupid because the first arg is at index 0 anyway
18:12karolherbst: maybe I just make use of this "ABI"
18:12karolherbst: and only pass in the offset
18:12imirkin_: so anyways, with the indirect, i guess we could optimize it further by knowing the range (which TGSI gives us)
18:12imirkin_: (i think, at least)
18:12imirkin_: maybe not. it's definitely there in glsl
18:13karolherbst: yeah.. but I decided that clover is stupid and I just get rid of the indirect instead
18:13karolherbst: less painful anyway
18:13karolherbst: just pass in the offset and the first arg is c1, the second is c2, etc...
18:13karolherbst: solves the issue in a less painful way
18:13karolherbst: but yeah.. it would still be an interested opt for compute shaders
18:13karolherbst: although.. maybe super uncommon
18:14imirkin_: yeah, i don't remember seeing a practical application that made use of indirect UBO's
18:14imirkin_: not saying they don't exist, but ...
18:16karolherbst: well.. easy to check
18:20karolherbst: well.. at least nothing in the shaders we've got is doing that
18:21karolherbst: but uhm.. the code looks weird
18:21karolherbst: imirkin_: this look wrong, doesn't it? https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp#n2732
18:22karolherbst: "i->getIndirect(0, 0)" is used as an operand in a 32 bit and 64 bit add
18:22imirkin_: mmmmm yeah, that's questionable.
18:23imirkin_: chances are good that it owrks out
18:23imirkin_: since 64-bit adds get split
18:23imirkin_: and iirc the split handles this
18:24karolherbst: mhh
18:24karolherbst: add u64 %r93d %r90d %r82
18:24imirkin_: that's pre-split
18:24imirkin_: split happens super-duper late, kinda annoying actually
18:24imirkin_: almost during RA
18:24imirkin_: or even after
18:24karolherbst: yeah.. it seems okay after RA
18:25imirkin_: split64bitsomething
18:25karolherbst: high bits: add u32 $r3 $r3 $r255 $c0
18:25imirkin_: right
18:26imirkin_: $c0 = carry bit
18:26karolherbst: yep
18:26imirkin_: and with a bit of luck we even emit that correctly :)
18:26karolherbst: well, we emit a 0 as the high bits of the second operand
18:26karolherbst: so.. this part is fine
18:26karolherbst: I guess there is some magic somewhere
18:26imirkin_: search for "split64Bit"
18:27imirkin_: in build_util.cpp
18:27karolherbst: yeah..
18:28karolherbst: mhhh
18:29karolherbst: if (lo->getSrc(s)->reg.size < 8)
18:30karolherbst: so yeah.. we handle that
19:17imirkin_: so that K4000 still went for like $72, including shipping. too rich for me.
19:18karolherbst: :/
21:32karolherbst: imirkin_: any idea on how to propery support user const buffers inside nve4_compute_validate_constbufs? Without uploading data that is as the size of the buffer is not known
21:33karolherbst: I am sure that's not possible, even hardware wise, but maybe you have an idea?
21:42imirkin_: could you use a persistent buffer instead?
21:47imirkin_: what's the CL side of the API?
21:53karolherbst: imirkin_: SVM
21:53karolherbst: clSetKernelArgSVMPointer to be precise
21:53karolherbst: which you can do on a constant* kernel parameter
21:54karolherbst: and this is basically which gives me the biggest headache overall right now
21:54imirkin_: what object is an SVMPointer?
21:54imirkin_: is it a nouveau_bo?
21:54karolherbst: malloced memory
21:54imirkin_: but then that has to get upgraded/pinned somehow, right?
21:54imirkin_: so that it's mirrored?
21:54karolherbst: no
21:55karolherbst: that's the HMM stuff
21:55karolherbst: since pascal the compute context is able to do recoverable page faults
21:55karolherbst: so some engine faults, the kernel migrates the memory pages to VRAM and then continues
21:55karolherbst: but in the end you just get a pointer pointing to valid memory within the application context
21:56karolherbst: and GPU, as with HMM there is just one virtual memory across the CPU and GPU essentially
21:56karolherbst: and then it gets annoying if you have CL code like this: https://gist.githubusercontent.com/karolherbst/94facaf19aa02afc87754ec4d882f2c0/raw/9e2ae982f0f7dd624dd43f3f9154fec96d1986a1/tmp.cl
21:56imirkin_: so it's _basically_ a coherent mapping
21:57karolherbst: yeah
21:57karolherbst: just reverse
21:57imirkin_: it needs to be tracked in some structure
21:57imirkin_: and you just feed that pointer to the ubo address, right?
21:57imirkin_: er, in the descriptor
21:57karolherbst: well....
21:57karolherbst: here the thing, nvidias CL implementation doesn't use ubos
21:57imirkin_: right, coz caching this through the ubo machinery would be bad
21:57karolherbst: maybe
21:58imirkin_: you'd spend more time flushing than caching
21:58karolherbst: I don't think you have to
21:58karolherbst: you just flash once you start the kernel
21:58imirkin_: ubo updates have to go through a carefully controlled thing
21:58karolherbst: the spec doesn't mandate anything here
21:58imirkin_: we flush all the time for coherent too
21:58karolherbst: right, but it doesn't matter for CL as kernel could run for minutes or hours as well
21:58karolherbst: or at least way longer than compute shaders would
21:59imirkin_: sucks for progress bars :p
21:59karolherbst: well
22:01karolherbst: imirkin_: the issue is just, that inside the spirv a constant indirectly accessed array looks the same as a constant* kernel arg :/
22:01karolherbst: it's all very annoying
22:02karolherbst: what if you have something like int constant* tmp = some_cond ? &constant_arg[id] : &constant_arr[id];
22:02imirkin_: then you're in for some fat pointers
22:02karolherbst: yep
22:03imirkin_: mmmm bacon
22:08karolherbst: mhhh, I could check what nvidia is doing actually...
22:27karolherbst: imirkin_: fun... what nvidia does:
22:27karolherbst: IADD32I R4, R3, 0x30000 ;
22:27karolherbst: LDC.IL R3, c[0x0][R4] ;
22:27karolherbst: :D
22:27karolherbst: but that's for an in kernel constant
22:27karolherbst: and with my indirection they also do two cbs..
22:28karolherbst: which is weird as I have no idea how that works with SVM
22:29karolherbst: mhh, but I see nvidia using g[] for constant* args as well...
22:29imirkin_: i forget what IL is (or never knew)
22:29imirkin_: IS is the cross-constbuf one
22:29karolherbst: well...
22:29karolherbst: apparently they do cross const buffer shit there
22:29karolherbst: as the constant is in cb3
22:30karolherbst: not cb0
22:30karolherbst: ohh wait, I need to enable SVM still
22:52karolherbst: nice.. nvidia returns CL_INVALID_ARG_VALUE if I call clSetKernelArgSVMPointer on a constant* parameter :/
22:52karolherbst: ahhhhh
22:54karolherbst: but I was sure that worked in the past