09:38 josey: For a long time now I get the following messages on boot about my graphics card; "nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 3e6684 [ IBUS ]" and "nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 10ac08 [ IBUS ]". Is that normal?
09:38 josey: My card is a Nvidia GM107 [GeForce GTX 750 Ti]
11:10 RSpliet: josey: I believe these warnings should be harmless.
11:11 RSpliet: Someone should make sure these "MMIO reads" to non-existing locations no longer occur on graphics cards like yours, but they're not a source of erroneous behaviour as far as I know.
11:13 josey: Thank you RSpliet :)
13:56 ryszard: hey, which nvidia card should I take to get the best nouveau support but on low-powered card similar to nvs295?
13:58 karolherbst: ryszard: define "best support"
14:00 karolherbst: we can't even promise to fix any bug, because of the randomness and lack of documentation
14:00 ryszard: I'm looking for similar to nvs295 card with the least missing functionalities/hangs/the most acceleration
14:01 karolherbst: all of that is too random to be able to make a definite statement in terms of reliability
14:02 ryszard: I don't play games, neeed acceleration only i.e. ini firefox or gnome3
14:02 karolherbst: I mean, from a functionality point of view every more or less modern GPU is fine (except Turing)
14:02 linkmauve: ryszard, why not go for Intel or AMD instead?
14:03 ryszard: I use AMD AM3/AM3+ only then no Intel possible, AMD cards all are power hungry as I see
14:03 karolherbst: ryszard: that's not true
14:03 ryszard: nvs295 is good example of lowpower card
14:04 karolherbst: and NVS 295 is not a low power card
14:04 karolherbst: it's old and power inefficient compared to any new card
14:04 ryszard: but is passive cooled
14:05 karolherbst: so?
14:05 karolherbst: it will still use more power when idling than a 2080 ti
14:06 karolherbst: probably
14:06 ryszard: hmm
14:07 ryszard: if I will buy any newer GPU with official driver support which one should I buy to use CUDA on it?
14:07 HdkR: 2080ti uses ~17w idling according to reviewers :P
14:08 karolherbst: HdkR: yeah well.. the 295 will be constantly at full load anyway with a modern desktop
14:08 karolherbst: so...
14:08 HdkR: yea, bad times
14:08 karolherbst: my point was just, that old GPUs because of the process are less efficient in terms of perf/W
14:09 karolherbst: and just having a 23W TDP GPU doesnt mean you save power compared to a 180W TDP one as long as the avg power consumption is higher on the older one
14:09 HdkR: Definitely
14:09 karolherbst: and the 295 is quite slow, so its not even fun to use it
14:09 linkmauve: Also, you won’t get CUDA support.
14:09 karolherbst: ryszard: the question is rather do you want to use CUDA or not
14:10 ryszard: ok, but which should I get to be able to learn and use CUDA on it?
14:10 karolherbst: linkmauve: the 295 supports CUDA
14:10 karolherbst: all tesla ones do I think
14:10 linkmauve: karolherbst, but not Nouveau, right?
14:10 ryszard: I'm not proffesional programmer, wish to learn toi use CUDA then no bleeding edge CUDA version support is needed
14:10 karolherbst: well, you won't get CUDA with any GPU when using Nouveau :p
14:10 ryszard: ok
14:11 ryszard: how long nvidia provide official support for their cards?
14:11 karolherbst: depends on how much you pay I'd say
14:11 karolherbst: but tesla is out of upport lredy
14:12 ryszard: great ;p
14:12 karolherbst: and fermi i probbly next
14:12 karolherbst: https://www.nvidia.com/en-us/drivers/unix/
14:12 ryszard: do they provide any table with EOLs ?
14:12 karolherbst: mhhh
14:13 karolherbst: well, I couln't talk about it anyway
14:13 karolherbst: ohh actually, the 340 branch supports the nvs 295, but...
14:14 karolherbst: the 304 is already out of upport for two years
14:14 karolherbst: so you can make your own assumptions from that
14:14 ryszard: yeah, I know
14:14 karolherbst: 390 doesn't support the nvs 295
14:14 ryszard: it sucks tbh
14:14 karolherbst: well, old hardware also sucks :p
14:15 karolherbst: ryszard: why not a GT 1030?
14:15 karolherbst: there are some passive ones as well
14:16 ryszard: but I have iGPU on AM3 mb Radeon HD4200 which has still great support and provides excellent performance on gnome3 and firefox - then it is possible to support old hardware very well
14:16 ryszard: the case I'm talking about is another machine on AM3+ with AMD FX
14:16 ryszard: where I don't have iGPU then need regular one on pcie
14:16 karolherbst: sure, but if you want to use cuda, you have to play by the rules the vendor is giving you
14:16 ryszard: I know
14:17 HdkR: Palit has a passive GTX 1650 if you want the latest toys :P
14:17 karolherbst: HdkR: how? :D
14:17 karolherbst: that's still a 75W GPU, no?
14:17 ryszard: even if I try to use ROCm then I have to get Ryzen platform (expensive) or get GFX9 GPU (which is expensive) :p
14:18 HdkR: karolherbst: honkin huge heatsink
14:18 HdkR: Way bigger than a regular PCIe slot
14:18 karolherbst: ROCm is just a fancy OpenCL stack
14:18 karolherbst: nobody uses their propriatary API
14:18 ryszard: GT1030? when nVidia cuts off support for it ?
14:18 karolherbst: ryszard: nobody will be able to answer this
14:18 karolherbst: but it will take longer than the nvs295
14:19 HdkR: Give it like five generations and it'll fall back to the "legacy" driver probably :)
14:19 karolherbst: yeah.. probably
14:19 HdkR: Then give it another 2-3 generations and it'll be deader than dead
14:20 HdkR: and you can buy another $100 GPU at that point to replace it
14:20 HdkR: the cycle continues
14:20 karolherbst: if we assume the nvs295 falls out of support soon, you hve like 12 years per gen
14:20 ryszard: I see that gt1030 is only in 2GB version?
14:20 karolherbst: give you around 10 years with the 1030
14:20 karolherbst: ryszard: I guess so? why?
14:21 HdkR: That's more ram than the 295 at least
14:21 karolherbst: mhhh
14:21 karolherbst: the biggest issue with the 1030 is that it's a gp108 though
14:22 HdkR: If they want to run the proprietary stack for cuda then nothing nouveau related matters though
14:22 karolherbst: but it's the newest lower power GPU at least
14:22 karolherbst: HdkR: ohh, gp108 has no video accel
14:22 karolherbst: even on nvidia
14:22 karolherbst: not even decoding
14:22 ryszard: radeon rx550 is 50W as I see
14:22 HdkR: Yea I know, I have a turdy Razer with the thing
14:23 ryszard: but no GPU computations possible
14:23 ryszard: ehh
14:24 ryszard: looks like every choice is bad :|
14:24 karolherbst: yep
14:24 HdkR: Oh right, firefox was a pre-req. People like hardware decoded video
14:24 karolherbst: ;)
14:24 HdkR: I'm too used to my 2990wx just not caring about cpu load
14:25 HdkR: Just buy a Titan RTX and be set for a decade :P
14:26 HdkR: underclock it to GT1030 perf levels, run it with the fans unplugged
14:26 karolherbst: :D
14:26 karolherbst: just detach the fan, the driver will underclock for you
14:26 HdkR: perfect
14:27 ryszard: hehe
14:27 linkmauve: HdkR, but Firefox doesn’t do hardware decoding on Linux yet.
14:28 linkmauve: There are patches floating for using vaapi on Wayland, but that’s only available for free drivers.
14:28 HdkR: linkmauve: Neither does chrome sadly
14:28 karolherbst: linkmauve: there is a vdpau to vaapi bridge
14:28 linkmauve: karolherbst, but no Wayland.
14:28 karolherbst: ehh, k, that might be
14:28 linkmauve: (On proprietary Nvidia.)
14:29 HdkR: So many problems
14:31 ryszard: gt1030 has latest official driver support
14:31 ryszard: but is also last on the list
14:31 ryszard: then will be dropped soon as well
14:32 karolherbst: legacy branches are maintained for quite some time though
14:34 ryszard: I see that there is 1050Ti in reasonable price
14:35 HdkR: karolherbst: Did the GT710 have video decode? I don't think i ever checked
14:35 karolherbst: maybe?
14:35 karolherbst: dunno
14:35 karolherbst: but the gt710 is... stupid
14:35 HdkR: It's such an ultra trash tier GPU that I never even booted with it
14:36 karolherbst: ohh wit, 710 is kepler
14:36 karolherbst: the 705 was fermi :D
14:36 ryszard: gt710?
14:36 ryszard: this is gpu on 1050Ti?
14:36 HdkR: no
14:36 karolherbst: fun fact, the gt 710 has more perf and consumes less power than the gt 705
14:36 HdkR: hah
14:37 HdkR: GT710 is less powerful than Tegra. It need not exist
14:37 ryszard: but gt710 is out of support
14:37 ryszard: only nouveau, then no cuda
14:38 HdkR: I'm not suggesting it :)
14:38 ryszard: heh
14:38 HdkR: It just comes with random OEM PCs for no reason
14:38 HdkR: It's like a laptop coming with an MX130
14:38 HdkR: Or MX330
14:40 karolherbst: uff
14:40 karolherbst: we only have this stupid situation because Intel sucked
14:41 HdkR: The 96EU laptop parts can't make it to market fast enough
14:42 HdkR: We were stuck on 24EU parts for too long
14:42 karolherbst: yeah.. because it wasn't possible to do so 5 years ago
14:42 karolherbst: and if they started with their iris GPU, they totally messed up as well
14:42 karolherbst: putting it only in their high end mobile chips
14:42 karolherbst: so stupid
14:43 HdkR: So the only company that shipped it was Apple
14:43 karolherbst: in the beginning yes
14:43 karolherbst: but they were forced to
14:43 karolherbst: alternative would be crappy GPU or dual GPU
14:43 karolherbst: I have no idea what they were thinking
14:44 karolherbst: they could have prevented most of the dual GPU laptop market
14:44 karolherbst: I am still angry at them for this :p
14:44 linkmauve: Same. :|
14:44 HdkR: "Surely normies don't need a powerful GPU, there's no market for this"
14:44 karolherbst: ohh, I am sure most put dual GPUs in their for better external display support as intel sucked there as well, especially with HDMI
14:45 karolherbst: and intel also messed up TB
14:45 karolherbst: Intel just deserves that AMD overtook them
14:45 karolherbst: :p
14:45 HdkR: Lenovo just announced Ryzen laptops with Thunderbolt finally
14:45 HdkR: Sadly the AMD models don't ship a 4k display D:
14:45 linkmauve: What does that mean exactly?
14:46 HdkR: linkmauve: Messing up thunderbolt?
14:46 HdkR: Oh the woes
14:46 karolherbst: ridiculous license fees
14:46 karolherbst: for most, buying a desktop was cheaper than an eGPU case
14:47 karolherbst: it's better now
14:47 karolherbst: but back then: 400€ for a proper case
14:47 karolherbst: yeah.. no way
14:47 HdkR: addin cards on desktop are still a mess
14:48 HdkR: requires motherboard to provide additional headers. Which each only ever provide one header, so you can't have more than one add in card
14:48 karolherbst: oh well
14:49 HdkR: Mac Pro surprisingly gets this right by just sticking a ton of ports in to the device
14:50 HdkR: Hopefully the USB-IF sorts that dumpster fire out :)
14:50 HdkR: Then calls it USB 4.0 2x2x2 or something dumb
14:54 linkmauve: karolherbst, how much does one cost nowadays?
14:55 karolherbst: linkmauve: I got one for 280€ or o
14:55 karolherbst: *so
14:55 linkmauve: I think I have Thunderbolt on my computer, so maybe I could avoid having to buy a 50€ desktop and then changing everything inside.
14:55 karolherbst: has a 450W PSU
14:55 karolherbst: well, it sucks with linux
14:56 karolherbst: can't recommend
14:56 karolherbst: first issue: kernel drivers are crashing on hotunplug :p
14:56 karolherbst: maybe amdgpu survives these days
14:56 linkmauve: (I might obtain a Sandy Bridge Xeon desktop for 50€, but it has a crappy workstation Nvidia GPU and the PSU will probably not be good enough for the AMD GPU I’ll replace it with.)
14:56 karolherbst: but probably not
14:56 linkmauve: (If I go that way.)
14:56 HdkR: Absolute cheapest eGPU box I see is $190 and only has a 60w PSU :P
14:56 karolherbst: HdkR: well.. pay $100 more
14:56 linkmauve: Ok, so the desktop is still much cheaper than that.
14:57 HdkR: Yea, $100 more and you have a large pick
14:57 linkmauve: But probably as bothersome, since I would need to figure out streaming and stuff for gaming.
14:57 karolherbst: HdkR: I have the Sonnet eGFX Breakaway Box 350
14:57 karolherbst: ohh, only 350W?
14:57 karolherbst: oh well
14:57 karolherbst: HdkR: https://www.amazon.com/Sonnet-eGFX-Breakaway-550W-GPU-550W-TB3/dp/B0764J5QVD/ref=pd_sbs_147_t_0/135-6536509-2775008?_encoding=UTF8&pd_rd_i=B0764J5QVD&pd_rd_r=5600ef91-929c-445e-9994-f2b01f66dd45&pd_rd_w=lrK1g&pd_rd_wg=LeSub&pf_rd_p=5cfcfe89-300f-47d2-b1ad-a4e27203a02a&pf_rd_r=E9FZTK2ERXTD3RSGZR70&psc=1&refRID=E9FZTK2ERXTD3RSGZR70
14:57 HdkR: sonnect has a 550w...yea
14:57 HdkR: that one
14:58 HdkR: :D
14:58 karolherbst: jep
14:58 karolherbst: that's probably the best perf/money if you really onlt care about the GPU
14:58 karolherbst: others also have internel SATA and fancy shit
14:58 karolherbst: ethernet...
14:58 karolherbst: you name it
14:58 HdkR: I still have my random razer one that doesn't have any additional ports
14:58 karolherbst: ahh, but you pay $200 for the razor logo
14:58 karolherbst: falls under "fancy shit" :p
14:58 HdkR: It's terribly overpriced
14:59 karolherbst: no shit :D
14:59 HdkR: Save $60 and just get the Sonnet
15:00 linkmauve: karolherbst, what do they use SATA and Ethernet for?
15:01 HdkR: Obviously you won't just saturate 4 PCIe lanes with a puny GPU, stick some HDDs, USB 3.0 ports, Gigabit ethernet, and LEDs in that
15:01 HdkR: (Copying a 4k render back over the line causes huge perf hits btw)
15:03 karolherbst: linkmauve: HDDs
15:03 karolherbst: so, you know, because your laptop only has a sucky 256GB SSD, you also need a 4TB HDD for all your games :p
15:03 karolherbst: why having an external case if everything can be in your eGPU one
15:04 linkmauve: Oh.
15:04 karolherbst: and ethernet for obvious reasons :p
15:04 karolherbst: the eGPU case is also like a full dock so to speak
15:04 HdkR: That explains the Radeon Pro SSG </s>
15:04 karolherbst: HdkR: well, you can also have external displays on the GPU
15:04 karolherbst: then you even save the copies
15:05 karolherbst: I mean.. it makes perfect sense
15:05 HdkR: Yea, it works even
15:05 karolherbst: instead having a crappy dock only doing ethernet, power and maybe some display.. it also has a beafy GPU
15:05 linkmauve: Ok, so I’m decided, I’ll do the copies over Ethernet instead of Thunderbolt. :p
15:06 karolherbst: :p
15:06 linkmauve: Sounds both cheaper and easier.
15:06 karolherbst: yeah.. intels license policies on TB were.... stupid
15:06 HdkR: no no no, what you want is to buy one of the PCIe 16x lane network switches
15:06 karolherbst: I think it got better now
15:06 karolherbst: HdkR: and put it in an eGPU case, right? :p
15:07 karolherbst: there are awesome USB-C docks out there though
15:07 HdkR: yea, eGPU goes out one end
15:07 karolherbst: even.. light ones
15:07 HdkR: Oh sorry, it's only 8x lanes https://www.dolphinics.com/products/IXS600.html
15:09 karolherbst: and only gen3
15:09 karolherbst: shame on you
15:09 HdkR: Intel isn't shipping gen4 yet so nobody cares right? :)
15:09 karolherbst: right
15:10 karolherbst: well.. except,
15:10 karolherbst: there are pcie 4 nvme drivers already :p
15:10 karolherbst: those are insane btw
15:10 karolherbst: 5GB/s read/write...
15:14 HdkR: Yea, they have some decent numbers
15:14 HdkR: Just make to cool them properly otherwise the controller hard crashes and you need to power cycle it :D
15:14 imirkin_: just read a bit of scrollback ... NVS295 didn't have a fan, so that makes it low power enough.
15:15 imirkin_: and the best nvidia card to buy is an AMD one. hopefully that was covered.
15:15 karolherbst: imirkin_: yeah.. but one of the req was CUDA support.. so
15:16 imirkin_: HdkR: GT 710 needs to exist, otherwise what card would i be able to afford to hack on nouveau?
15:16 HdkR: You've got me there
15:17 imirkin_: i'm sure nvidia was thinking of me
15:17 karolherbst: of cours
15:17 karolherbst: e
15:17 imirkin_: shitty enough to end up in random dells, imirkin will order dells, and grab the free card
15:18 imirkin_: seems like the logic is simple enough :)
15:18 HdkR: Now I imagine you hoarding shelves of these cards
15:19 imirkin_: i just grab one of each kind
15:19 imirkin_: so just the one shelf
15:21 imirkin_: HdkR: i think the list is NV5, NV17, NV34 (x2), NV42, NV44, NV4A (PCI!), G84 (x2), G96, G98, GT215, GF108 (x2), GK208, GM107, GP108
15:21 HdkR: Oh, you did finally get your hands on a GP108? Nice
15:21 imirkin_: dell!
15:22 imirkin_: the GK208, GM107 and GP108 are all from dell
15:22 imirkin_: "extras" as it were
15:22 HdkR: Discarded bonus hardware is always good stuff
15:22 imirkin_: GT730, GTX745, and GT1030, respectively
15:23 imirkin_: not exactly the cream of the crpo
15:23 karolherbst: I think the only real PCIe cards I own are the NV4x ones :D
15:24 imirkin_: one of my nv34's is PCIe, the other is PCI
15:24 imirkin_: but it's through that PCIe <-> PCI bridge
15:24 imirkin_: so MSI thinks it works, but doesn't
15:24 karolherbst: ufff
15:25 imirkin_: oh, and i forgot the G92 in that list. oops.
15:45 imirkin_: i keep wanting to get a Quadro K4000 (GK104), but can't find them for less than like $75 on ebay =/
15:46 imirkin_: and of course right as i say that, there's one for $46 expiring in 2h
15:46 imirkin_: tempting.
15:48 imirkin_: very very tempting.
15:49 imirkin_: karolherbst: have you tried running CTS again?
15:49 imirkin_: karolherbst: and did you figure out the fermi issue?
15:52 karolherbst: I am looking into the fermi stuff
15:52 karolherbst: now
15:53 imirkin_: ah ok
15:53 karolherbst: sadly I had random other things to do, so I really found any time for the CTS
15:53 imirkin_: yeah ok
15:53 imirkin_: i may plug the GM107 in, in case the memory issue i hit is pascal-specific somehow
15:53 imirkin_: (since it does have a different VM)
15:54 karolherbst: yeah
15:55 imirkin_: (and it's a lot easier to blame the VM than it is to blame myself...)
16:18 karolherbst: heh.. weird.. now that I test on the fermi it works.. but I also have a 5.5 kernel running not nouveau master
16:18 karolherbst: but it's also getting late.. tomorrow then
16:25 imirkin_: reasonable.
18:03 karolherbst: imirkin_: dunno if you saw (or if I even asked), but do you know if we can do indirect ubo accesses on the index level in compute shaders without having to fall back to global memory?
18:03 karolherbst: like if we are sure that we only have 4 ubos bound
18:03 karolherbst: or well.. only 4 cbs filled with data
18:03 karolherbst: or is this more of a robustness issue where an access could overflow
18:04 imirkin_: sure
18:04 karolherbst: (but then we have the same issue in graphics as well)
18:04 imirkin_: but if you have to support 16, then the indirect could span the physical cb's and gmem cb's
18:04 karolherbst: right
18:04 karolherbst: but normally we know how wide the bound ubo is, or is that not clear in glsl?
18:04 karolherbst: or not always
18:06 karolherbst: anyway, for CL it matters even less as you can't indirectly access a const buffer, just indirect offset is possible
18:07 karolherbst: so I was wondering if spending some time on making it less painful for compute shaders would make sense as well
18:09 karolherbst: and eg, if compute kernel only h 6 contstant args, we can be sure that 5 is the highest ubo index accessed
18:09 karolherbst: *has
18:09 imirkin_: it's clear in glsl, but it might still span the boundary
18:10 imirkin_: and iirc we do use the physical ubo's if they fit
18:10 imirkin_: but i don't fully remember
18:10 imirkin_: i don't think indirect ubo accesses are too common though
18:11 karolherbst: not for indirect access on the index I think
18:11 imirkin_: right
18:11 imirkin_: that's what i meant by indirect
18:11 karolherbst: yep.. (fileIndex >= 6 || ind)
18:11 karolherbst: ahh
18:11 karolherbst: imirkin_: well in CL it's always indirect
18:11 imirkin_: 7 is taken
18:11 karolherbst: or at least ... how we do stuff in clover
18:12 karolherbst: it's stupid because the first arg is at index 0 anyway
18:12 karolherbst: maybe I just make use of this "ABI"
18:12 karolherbst: and only pass in the offset
18:12 imirkin_: so anyways, with the indirect, i guess we could optimize it further by knowing the range (which TGSI gives us)
18:12 imirkin_: (i think, at least)
18:12 imirkin_: maybe not. it's definitely there in glsl
18:13 karolherbst: yeah.. but I decided that clover is stupid and I just get rid of the indirect instead
18:13 karolherbst: less painful anyway
18:13 karolherbst: just pass in the offset and the first arg is c1, the second is c2, etc...
18:13 karolherbst: solves the issue in a less painful way
18:13 karolherbst: but yeah.. it would still be an interested opt for compute shaders
18:13 karolherbst: although.. maybe super uncommon
18:14 imirkin_: yeah, i don't remember seeing a practical application that made use of indirect UBO's
18:14 imirkin_: not saying they don't exist, but ...
18:16 karolherbst: well.. easy to check
18:20 karolherbst: well.. at least nothing in the shaders we've got is doing that
18:21 karolherbst: but uhm.. the code looks weird
18:21 karolherbst: imirkin_: this look wrong, doesn't it? https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp#n2732
18:22 karolherbst: "i->getIndirect(0, 0)" is used as an operand in a 32 bit and 64 bit add
18:22 imirkin_: mmmmm yeah, that's questionable.
18:23 imirkin_: chances are good that it owrks out
18:23 imirkin_: since 64-bit adds get split
18:23 imirkin_: and iirc the split handles this
18:24 karolherbst: mhh
18:24 karolherbst: add u64 %r93d %r90d %r82
18:24 imirkin_: that's pre-split
18:24 imirkin_: split happens super-duper late, kinda annoying actually
18:24 imirkin_: almost during RA
18:24 imirkin_: or even after
18:24 karolherbst: yeah.. it seems okay after RA
18:25 imirkin_: split64bitsomething
18:25 karolherbst: high bits: add u32 $r3 $r3 $r255 $c0
18:25 imirkin_: right
18:26 imirkin_: $c0 = carry bit
18:26 karolherbst: yep
18:26 imirkin_: and with a bit of luck we even emit that correctly :)
18:26 karolherbst: well, we emit a 0 as the high bits of the second operand
18:26 karolherbst: so.. this part is fine
18:26 karolherbst: I guess there is some magic somewhere
18:26 imirkin_: search for "split64Bit"
18:27 imirkin_: in build_util.cpp
18:27 karolherbst: yeah..
18:28 karolherbst: mhhh
18:29 karolherbst: if (lo->getSrc(s)->reg.size < 8)
18:30 karolherbst: so yeah.. we handle that
19:17 imirkin_: so that K4000 still went for like $72, including shipping. too rich for me.
19:18 karolherbst: :/
21:32 karolherbst: imirkin_: any idea on how to propery support user const buffers inside nve4_compute_validate_constbufs? Without uploading data that is as the size of the buffer is not known
21:33 karolherbst: I am sure that's not possible, even hardware wise, but maybe you have an idea?
21:42 imirkin_: could you use a persistent buffer instead?
21:47 imirkin_: what's the CL side of the API?
21:53 karolherbst: imirkin_: SVM
21:53 karolherbst: clSetKernelArgSVMPointer to be precise
21:53 karolherbst: which you can do on a constant* kernel parameter
21:54 karolherbst: and this is basically which gives me the biggest headache overall right now
21:54 imirkin_: what object is an SVMPointer?
21:54 imirkin_: is it a nouveau_bo?
21:54 karolherbst: malloced memory
21:54 imirkin_: but then that has to get upgraded/pinned somehow, right?
21:54 imirkin_: so that it's mirrored?
21:54 karolherbst: no
21:55 karolherbst: that's the HMM stuff
21:55 karolherbst: since pascal the compute context is able to do recoverable page faults
21:55 karolherbst: so some engine faults, the kernel migrates the memory pages to VRAM and then continues
21:55 karolherbst: but in the end you just get a pointer pointing to valid memory within the application context
21:56 karolherbst: and GPU, as with HMM there is just one virtual memory across the CPU and GPU essentially
21:56 karolherbst: and then it gets annoying if you have CL code like this: https://gist.githubusercontent.com/karolherbst/94facaf19aa02afc87754ec4d882f2c0/raw/9e2ae982f0f7dd624dd43f3f9154fec96d1986a1/tmp.cl
21:56 imirkin_: so it's _basically_ a coherent mapping
21:57 karolherbst: yeah
21:57 karolherbst: just reverse
21:57 imirkin_: it needs to be tracked in some structure
21:57 imirkin_: and you just feed that pointer to the ubo address, right?
21:57 imirkin_: er, in the descriptor
21:57 karolherbst: well....
21:57 karolherbst: here the thing, nvidias CL implementation doesn't use ubos
21:57 imirkin_: right, coz caching this through the ubo machinery would be bad
21:57 karolherbst: maybe
21:58 imirkin_: you'd spend more time flushing than caching
21:58 karolherbst: I don't think you have to
21:58 karolherbst: you just flash once you start the kernel
21:58 imirkin_: ubo updates have to go through a carefully controlled thing
21:58 karolherbst: the spec doesn't mandate anything here
21:58 imirkin_: we flush all the time for coherent too
21:58 karolherbst: right, but it doesn't matter for CL as kernel could run for minutes or hours as well
21:58 karolherbst: or at least way longer than compute shaders would
21:59 imirkin_: sucks for progress bars :p
21:59 karolherbst: well
22:01 karolherbst: imirkin_: the issue is just, that inside the spirv a constant indirectly accessed array looks the same as a constant* kernel arg :/
22:01 karolherbst: it's all very annoying
22:02 karolherbst: what if you have something like int constant* tmp = some_cond ? &constant_arg[id] : &constant_arr[id];
22:02 imirkin_: then you're in for some fat pointers
22:02 karolherbst: yep
22:03 imirkin_: mmmm bacon
22:08 karolherbst: mhhh, I could check what nvidia is doing actually...
22:27 karolherbst: imirkin_: fun... what nvidia does:
22:27 karolherbst: IADD32I R4, R3, 0x30000 ;
22:27 karolherbst: LDC.IL R3, c[0x0][R4] ;
22:27 karolherbst: :D
22:27 karolherbst: but that's for an in kernel constant
22:27 karolherbst: and with my indirection they also do two cbs..
22:28 karolherbst: which is weird as I have no idea how that works with SVM
22:29 karolherbst: mhh, but I see nvidia using g[] for constant* args as well...
22:29 imirkin_: i forget what IL is (or never knew)
22:29 imirkin_: IS is the cross-constbuf one
22:29 karolherbst: well...
22:29 karolherbst: apparently they do cross const buffer shit there
22:29 karolherbst: as the constant is in cb3
22:30 karolherbst: not cb0
22:30 karolherbst: ohh wait, I need to enable SVM still
22:52 karolherbst: nice.. nvidia returns CL_INVALID_ARG_VALUE if I call clSetKernelArgSVMPointer on a constant* parameter :/
22:52 karolherbst: ahhhhh
22:54 karolherbst: but I was sure that worked in the past