15:11karolherbst: imirkin: mind taking a quick look at this commit? (caps VRAM allocation limits) https://github.com/karolherbst/mesa/commit/21b1c81e22c4815f43940cc2eb042c5d8f91096e
15:11karolherbst: for GL I doubt it matters at all, but with CL we can run into big issues with the current values
15:11karolherbst: the CTS has a test where they allocate like 10 buffers and some of them are like 24GB big :/
15:12karolherbst: okay.. the biggest buffer was actually 94GB :D
15:13imirkin_: it's dumb, but should we just dump those into GART?
15:13imirkin_: or are those buffers specified as vram allocs?
15:14karolherbst: well, if we dump those in GART we have to still check how much system memory we have
15:14imirkin_: wouldn't you just oom?
15:15karolherbst: PIPE_COMPUTE_CAP_MAX_GLOBAL_SIZE seems to be a clover only as it seems
15:15imirkin_: i mean, what are those tests expecting?
15:15imirkin_: or rather, what is CL expecting?
15:15karolherbst: PIPE_COMPUTE_CAP_MAX_GLOBAL_SIZE specifies the size of global memory
15:15karolherbst: and there is a test allocating buffers according to those rules
15:15imirkin_: on linux, with a regular program, you can do malloc(100TB)
15:15karolherbst: and that should obviously pass
15:15imirkin_: but then you crash if you use that
15:15karolherbst: they fill the buffer with random data
15:15karolherbst: and do some tests then
15:15karolherbst: like blitting and stuff
15:16imirkin_: ok, so there's no way to properly utilize swap, i assume
15:16karolherbst: nvidia exposes similiar limits to what I do in the patch
15:16imirkin_: anyways, it's fine, but please add a TODO
15:16imirkin_: to figure out whether the limits should include system memory
15:16imirkin_: also this will break CL on tegra
15:16imirkin_: since vram_size == 0
15:16karolherbst: ahh, PIPE_COMPUTE_CAP_MAX_MEM_ALLOC_SIZEis also clover only
15:16imirkin_: not that you super-care, but just pointing it out
15:16karolherbst: oh well
15:17karolherbst: vram_size is still set on tegra
15:17karolherbst: I think?
15:17imirkin_: ... to 0, i think
15:17imirkin_: and we use vram_size == 0 to move all VRAM allocations to GART instead
15:17imirkin_: that's my recollection, at least
15:18karolherbst: vram_size == NOUVEAU_GETPARAM_FB_SIZE
15:18imirkin_: it was one of the changes gnurou did to make things work on tegra
15:18karolherbst: and FB_SIZE is getparam->value = drm->gem.vram_available
15:18karolherbst: I mean, I have a tegra here anyway, so I could check there as well
15:20karolherbst: ahh, so vram_size is whatever we report in nouveau as "VRAM" in dmesg
15:21karolherbst: okay.. so "DRM: VRAM: 0 MiB" on tegra
15:21karolherbst: so vram_size should be 0 :/
15:22imirkin_: and then we set "vram_domain" (in screen) to GART
15:22imirkin_: and all the "vram" buffers are allocated against the vram_domain
15:22karolherbst: yeah.. but gart_size (or however that field is called) isn't a useful value either :/
15:22imirkin_: i didn't say i had solutions.
15:22imirkin_: i just bring you problems. :)
15:28karolherbst: ohh, right, I have that nvidia jetson image on a different SD card... let's check there
15:28imirkin_: something about me and sd cards... i don't think i've ever gotten an sd card to work
15:28imirkin_: either it doesn't get recognized, or i can't write, or the data's messed up after writing
15:29karolherbst: well, the jetson nano only boots from microSD :)
15:29karolherbst: and I think TFTP
15:29karolherbst: super.. the ssh daemon doesn't run on the nvidia image :(
15:29karolherbst: ohh.. it does, the SD card is just terribly slow
15:39karolherbst: imirkin_: heh.. no OpenCL support on tegra
15:39karolherbst: so there is that
15:40karolherbst: I leave a todo then
16:50imirkin_: karolherbst: would it be enough to limit MAX_MEM_ALLOC_SIZE?
16:50imirkin_: and leave MAX_GLOBAL_SIZE alone?
16:50imirkin_: btw2 - MAX_GLOBAL_SIZE should be 1 << 48 on pascal+
16:51karolherbst: imirkin_: I don't think so, as the runtime/application wants to know how much global memory there is
16:51imirkin_: right, but what does it use it for?
16:52karolherbst: performance mainly. So instead of executing your CL kernel for your 1TiB database, you split it up ;)
16:52karolherbst: anyway, I don't know if the spec is that precise about this value
16:52karolherbst: in the end it ends up being the reported value for CL_DEVICE_GLOBAL_MEM_SIZE
16:53karolherbst: and the spec says "Size of global device memory in bytes."
16:53karolherbst: I guess because it says "device" we should report the physical limit, not the virtual one...
16:53imirkin_: can you see what nv/aqmd report for this?
16:54karolherbst: nv reports VRAM size
16:54karolherbst: no idea about AMD
16:55karolherbst: mhh, whatever HSA_AMD_MEMORY_POOL_INFO_SIZE is for rocm
16:55karolherbst: yeah.. no idea, ROCms code is weird
16:56karolherbst: the other value is 25% up to 100% of CL_DEVICE_GLOBAL_MEM_SIZE
16:56karolherbst: and I think it kind of depends on ... stuff
16:57karolherbst: actually.. the spec only specifies the minimum to be 25%
16:57karolherbst: but CL is generally more explicit than GL, and applications want to get limited by physical properties due to performance anyway
17:02imirkin_: well, i don't super-care, just don't paint yourself into a funny corner.
17:09karolherbst: nah, it's fine. Without that the CTS allocates 95GB of memory and tries to use it inside a kernel... and I don't see other implementations passing that without limiting those values
19:06joepublic: I want to confirm: if fastest 3d with linux-libre is my goal, I want a GTX 780?
19:08imirkin_: 780 Ti
19:08imirkin_: or a titan (the kepler kind)
19:09chithead: is the kepler titan supported now? I thought nobody had one to test
19:09imirkin_: it always has been...
19:09joepublic: Thank you sir. Ebaying for a used one now.
19:11HdkR: It's a sad time that the Titan Black is only a 5TFLOP card now
19:12HdkR: Current top end GPUs being 3x faster in raw compute :|
19:17imirkin_: joepublic: to be clear, the experience will not be a good one
19:17imirkin_: lots of hangs, etc
19:17imirkin_: if you're using gnome or kde or something like that
19:17imirkin_: as i've probably mentioned before, i would recommend getting an AMD GPU
19:18imirkin_: and dumping linux-libre, since what it's doing makes absolutely no sense.
19:18imirkin_: but - your call. as long as you have the relevant info.
19:23joepublic: I am upgrading from a GTX 760, which I imagine would be a similar experience.
19:24imirkin_: yes, largely identical
19:24imirkin_: actually you should run into fewer shader compilation failures
19:24imirkin_: since GTX 780 has 256 ISA regs, vs 64 on the earlier keplers
19:24imirkin_: which doesn't stress our spilling logic quite as hard
19:25joepublic: Nice. I have been happy with the 760, would recommend it for a fsf-style free-software-only desktop. Just looking to eke out the last bit of 3d speed I can.
19:26imirkin_: presumably you're aware that reclocking is a thing?
19:26imirkin_: i.e. setting pstates
19:27joepublic: I have a process running that looks for software that needs 0f/fast and sets as needed; 07/slow and cool otherwise. No crashes* ever. (knock on wood for luck)
19:28imirkin_: you might see monitor blinks sometimes
19:28joepublic: yes, it blinks on every change. I consider that a form of positive feedback that "means it is working."
19:28imirkin_: in theory it shouldn't
19:28imirkin_: but in practice we don't control the line buffer
19:29joepublic: I play some 3d games, drive 4 monitors (one of them 4k), really give the card a workout when I need to get work/play done.
19:29joepublic: Totally happy. Thanks to everyone involved for making it possible.
19:30imirkin_: if there are rendering bugs with games or whatever, let us know
19:30imirkin_: if there are hangs, then we already know :)
19:31joepublic: my free-software-only stance means the games I play are for the most part already in debian main, but I would of course come ask about weird rendering.
19:34imirkin_: joepublic: i can't speak for others, but i actually play very few games. so misrendering would go unnoticed by me unless brought up directly.
22:08Lyude: imirkin_: btw, sent an email off to nvidia about the crc stuff to see if maybe they of a way to avoid losing crcs when changing contexts
22:11Lyude: otherwise though it sounds like it might not be that big of a deal, which would be nice :)
22:12Lyude: just would need to make sure we match up each entry with it's actual vblank correctly
23:28karolherbst: imirkin_: btw, mind reviewing the patches I sent out today? I could also just push them.. but just in case I missed something
23:32karolherbst: heh, that might be interesting to watch: https://www.nvidia.com/en-us/gtc/session-catalog/?search.sessiontype=option_1559593594815&search.primarytopic=option_1564595677056#/