15:11 karolherbst: imirkin: mind taking a quick look at this commit? (caps VRAM allocation limits) https://github.com/karolherbst/mesa/commit/21b1c81e22c4815f43940cc2eb042c5d8f91096e
15:11 karolherbst: for GL I doubt it matters at all, but with CL we can run into big issues with the current values
15:11 karolherbst: the CTS has a test where they allocate like 10 buffers and some of them are like 24GB big :/
15:12 karolherbst: okay.. the biggest buffer was actually 94GB :D
15:13 imirkin_: it's dumb, but should we just dump those into GART?
15:13 imirkin_: or are those buffers specified as vram allocs?
15:14 karolherbst: well, if we dump those in GART we have to still check how much system memory we have
15:14 imirkin_: wouldn't you just oom?
15:15 karolherbst: PIPE_COMPUTE_CAP_MAX_GLOBAL_SIZE seems to be a clover only as it seems
15:15 imirkin_: i mean, what are those tests expecting?
15:15 imirkin_: or rather, what is CL expecting?
15:15 karolherbst: PIPE_COMPUTE_CAP_MAX_GLOBAL_SIZE specifies the size of global memory
15:15 karolherbst: and there is a test allocating buffers according to those rules
15:15 imirkin_: on linux, with a regular program, you can do malloc(100TB)
15:15 karolherbst: and that should obviously pass
15:15 imirkin_: but then you crash if you use that
15:15 karolherbst: right..
15:15 karolherbst: they fill the buffer with random data
15:15 karolherbst: and do some tests then
15:15 karolherbst: like blitting and stuff
15:16 imirkin_: ok, so there's no way to properly utilize swap, i assume
15:16 karolherbst: nvidia exposes similiar limits to what I do in the patch
15:16 imirkin_: anyways, it's fine, but please add a TODO
15:16 imirkin_: to figure out whether the limits should include system memory
15:16 imirkin_: also this will break CL on tegra
15:16 imirkin_: since vram_size == 0
15:16 karolherbst: ahh, PIPE_COMPUTE_CAP_MAX_MEM_ALLOC_SIZEis also clover only
15:16 imirkin_: not that you super-care, but just pointing it out
15:16 karolherbst: oh well
15:17 karolherbst: vram_size is still set on tegra
15:17 karolherbst: I think?
15:17 imirkin_: ... to 0, i think
15:17 imirkin_: and we use vram_size == 0 to move all VRAM allocations to GART instead
15:17 imirkin_: that's my recollection, at least
15:18 karolherbst: vram_size == NOUVEAU_GETPARAM_FB_SIZE
15:18 imirkin_: it was one of the changes gnurou did to make things work on tegra
15:18 karolherbst: and FB_SIZE is getparam->value = drm->gem.vram_available
15:18 karolherbst: mhh..
15:18 karolherbst: I mean, I have a tegra here anyway, so I could check there as well
15:20 karolherbst: ahh, so vram_size is whatever we report in nouveau as "VRAM" in dmesg
15:21 karolherbst: okay.. so "DRM: VRAM: 0 MiB" on tegra
15:21 karolherbst: so vram_size should be 0 :/
15:22 imirkin_: and then we set "vram_domain" (in screen) to GART
15:22 imirkin_: and all the "vram" buffers are allocated against the vram_domain
15:22 karolherbst: yeah.. but gart_size (or however that field is called) isn't a useful value either :/
15:22 imirkin_: i didn't say i had solutions.
15:22 imirkin_: i just bring you problems. :)
15:22 karolherbst: right
15:27 karolherbst: mhhhh
15:28 karolherbst: ohh, right, I have that nvidia jetson image on a different SD card... let's check there
15:28 imirkin_: something about me and sd cards... i don't think i've ever gotten an sd card to work
15:28 imirkin_: ever
15:28 imirkin_: either it doesn't get recognized, or i can't write, or the data's messed up after writing
15:29 karolherbst: well, the jetson nano only boots from microSD :)
15:29 karolherbst: and I think TFTP
15:29 karolherbst: super.. the ssh daemon doesn't run on the nvidia image :(
15:29 karolherbst: ohh.. it does, the SD card is just terribly slow
15:39 karolherbst: imirkin_: heh.. no OpenCL support on tegra
15:39 karolherbst: so there is that
15:40 karolherbst: I leave a todo then
16:50 imirkin_: karolherbst: would it be enough to limit MAX_MEM_ALLOC_SIZE?
16:50 imirkin_: and leave MAX_GLOBAL_SIZE alone?
16:50 imirkin_: btw2 - MAX_GLOBAL_SIZE should be 1 << 48 on pascal+
16:51 karolherbst: imirkin_: I don't think so, as the runtime/application wants to know how much global memory there is
16:51 imirkin_: right, but what does it use it for?
16:52 karolherbst: performance mainly. So instead of executing your CL kernel for your 1TiB database, you split it up ;)
16:52 karolherbst: anyway, I don't know if the spec is that precise about this value
16:52 karolherbst: in the end it ends up being the reported value for CL_DEVICE_GLOBAL_MEM_SIZE
16:53 karolherbst: and the spec says "Size of global device memory in bytes."
16:53 karolherbst: I guess because it says "device" we should report the physical limit, not the virtual one...
16:53 imirkin_: can you see what nv/aqmd report for this?
16:54 karolherbst: nv reports VRAM size
16:54 karolherbst: no idea about AMD
16:54 imirkin_: ok
16:55 karolherbst: mhh, whatever HSA_AMD_MEMORY_POOL_INFO_SIZE is for rocm
16:55 karolherbst: yeah.. no idea, ROCms code is weird
16:56 karolherbst: the other value is 25% up to 100% of CL_DEVICE_GLOBAL_MEM_SIZE
16:56 karolherbst: and I think it kind of depends on ... stuff
16:57 karolherbst: actually.. the spec only specifies the minimum to be 25%
16:57 karolherbst: but CL is generally more explicit than GL, and applications want to get limited by physical properties due to performance anyway
17:02 imirkin_: right
17:02 imirkin_: well, i don't super-care, just don't paint yourself into a funny corner.
17:09 karolherbst: nah, it's fine. Without that the CTS allocates 95GB of memory and tries to use it inside a kernel... and I don't see other implementations passing that without limiting those values
19:06 joepublic: I want to confirm: if fastest 3d with linux-libre is my goal, I want a GTX 780?
19:08 imirkin_: 780 Ti
19:08 imirkin_: or a titan (the kepler kind)
19:09 chithead: is the kepler titan supported now? I thought nobody had one to test
19:09 imirkin_: it always has been...
19:09 joepublic: Thank you sir. Ebaying for a used one now.
19:11 HdkR: It's a sad time that the Titan Black is only a 5TFLOP card now
19:12 HdkR: Current top end GPUs being 3x faster in raw compute :|
19:17 imirkin_: joepublic: to be clear, the experience will not be a good one
19:17 imirkin_: lots of hangs, etc
19:17 imirkin_: if you're using gnome or kde or something like that
19:17 imirkin_: as i've probably mentioned before, i would recommend getting an AMD GPU
19:18 imirkin_: and dumping linux-libre, since what it's doing makes absolutely no sense.
19:18 imirkin_: but - your call. as long as you have the relevant info.
19:23 joepublic: I am upgrading from a GTX 760, which I imagine would be a similar experience.
19:24 imirkin_: yes, largely identical
19:24 imirkin_: actually you should run into fewer shader compilation failures
19:24 imirkin_: since GTX 780 has 256 ISA regs, vs 64 on the earlier keplers
19:24 imirkin_: which doesn't stress our spilling logic quite as hard
19:25 joepublic: Nice. I have been happy with the 760, would recommend it for a fsf-style free-software-only desktop. Just looking to eke out the last bit of 3d speed I can.
19:26 imirkin_: presumably you're aware that reclocking is a thing?
19:26 imirkin_: i.e. setting pstates
19:27 joepublic: I have a process running that looks for software that needs 0f/fast and sets as needed; 07/slow and cool otherwise. No crashes* ever. (knock on wood for luck)
19:27 imirkin_: cool
19:28 imirkin_: you might see monitor blinks sometimes
19:28 joepublic: yes, it blinks on every change. I consider that a form of positive feedback that "means it is working."
19:28 imirkin_: hehe
19:28 imirkin_: in theory it shouldn't
19:28 imirkin_: but in practice we don't control the line buffer
19:29 joepublic: I play some 3d games, drive 4 monitors (one of them 4k), really give the card a workout when I need to get work/play done.
19:29 joepublic: Totally happy. Thanks to everyone involved for making it possible.
19:29 imirkin_: coolio
19:30 imirkin_: if there are rendering bugs with games or whatever, let us know
19:30 imirkin_: if there are hangs, then we already know :)
19:31 joepublic: my free-software-only stance means the games I play are for the most part already in debian main, but I would of course come ask about weird rendering.
19:34 imirkin_: joepublic: i can't speak for others, but i actually play very few games. so misrendering would go unnoticed by me unless brought up directly.
22:08 Lyude: imirkin_: btw, sent an email off to nvidia about the crc stuff to see if maybe they of a way to avoid losing crcs when changing contexts
22:10 imirkin_: cool
22:11 Lyude: otherwise though it sounds like it might not be that big of a deal, which would be nice :)
22:12 Lyude: just would need to make sure we match up each entry with it's actual vblank correctly
22:19 imirkin_: yeahhhh
23:28 karolherbst: imirkin_: btw, mind reviewing the patches I sent out today? I could also just push them.. but just in case I missed something
23:32 karolherbst: heh, that might be interesting to watch: https://www.nvidia.com/en-us/gtc/session-catalog/?search.sessiontype=option_1559593594815&search.primarytopic=option_1564595677056#/