09:12ask6155: hello
09:17ask6155: karolherbst told me I could help fix the problem I have https://gitlab.freedesktop.org/drm/nouveau/-/issues/88 If I could recompile the kernel...
09:18ask6155: I found on the arch wiki that to just patch a module I can just compile the module and replace it... if that is the case then I can compile the module I just don't know which lines I have to edit to fix the bug
09:20karolherbst: yeah, I've spoken with skeggsb about the issue and there was something we could do about it, but can't really remember what the actualy fix was. You could always increase the sleep period inside that one function.. but let me see...
09:21karolherbst: RSpliet also did some test and it didn't appear to really help
09:24ask6155: I can still test it out... I have the kernel source. which line do I have to edit?
09:25karolherbst: https://gitlab.freedesktop.org/drm/nouveau/-/blob/master/drivers/gpu/drm/nouveau/dispnv50/base507c.c#L155 inside base507c_ntfy_wait_begun
09:26karolherbst: and change it to something like usleep_range(1000, 10000);
09:51ask6155: okay module has been compiled. I should have asked this before but I'd like to add a print statement at the initialization stage of the module to identify that it is patched (for debugging) since I'm overwriting the module for the first time. in which file can I add a print statement?
09:51ask6155: I have ccache so I think I won't have to compile it again?
09:54karolherbst: ohh, if you just change one file you don't have to compile from scratch anyway
09:55karolherbst: and a good place would be nouveau_drm_probe inside drivers/gpu/drm/nouveau/nouveau_drm.c
10:11ask6155: success?
10:14karolherbst: ask6155: so is it better now or still the same?
10:15ask6155: better(?)
10:16ask6155: https://imgur.com/FKmnmLF
10:16ask6155: perf report
10:16karolherbst: well, at least no nouveau
10:16karolherbst: not sure what the swapper is doing
10:17ask6155: does it have anything to do with swapiness?
10:17karolherbst: could be
10:17karolherbst: maybe wait a bit and do it again and see what changes
10:18ask6155: okay
10:18karolherbst: ohh wait no
10:18karolherbst: swapper is the "idle" task
10:18karolherbst: so that means your kernel isn't doing anything really
10:18karolherbst: just check with htop/top your overall CPU load
10:18karolherbst: if that's a one digit number you are good
10:19karolherbst: perf doesn't care about CPU load
10:19karolherbst: so even if your system isn't doing anything, it sums up to 100%
10:20ask6155: 2 cores add up to 15%
10:20karolherbst: what are they busy with?
10:21ask6155: X compositor and window manager
10:21karolherbst: okay
10:22karolherbst: then I guess it's "better" indeed, your system is just busy with other things
10:22karolherbst: or your compositor is doing more stuff than actually needed
10:22karolherbst: okay..
10:22karolherbst: but I guess without that patch, nouveau/the kernel were using more CPU time overall?
10:23ask6155: well compositor needs to be configured better. I run picom and there is still screen tearing and I have to look into how to stop that... There are also glitches ;(
10:23ask6155: yeah I think they were using more gpu
10:23ask6155: *cpu
10:23karolherbst: ahh yeah...
10:24karolherbst: usually it's better to just use a compositor from a normal desktop. They are also tuned to use as few CPU cycles as possible
10:24karolherbst: or use something like compton
10:24karolherbst: but yeah...
10:24ask6155: compton is dead I think
10:24karolherbst: ahh
10:24ask6155: so I switched to picom
10:24karolherbst: I am just using what the desktop gives me as this usually works best
10:25karolherbst: and with wayland you don't have much of a choice anyway :p
10:25orbea: karolherbst: picom is the continuation of comptom
10:25karolherbst: I see...
10:25karolherbst: why does it tear then? :D
10:25karolherbst: and why does it use so much CPU?
10:25orbea: you need special arguments to get it to stop tearing with nouveau
10:25ask6155: they say there is a secret set of options in picom which will stop screen tearing... I don't know them
10:25orbea: the arch wiki might have them?
10:25karolherbst: orbea: ahh could be as we have no workarounds inside the DDX
10:25orbea: under the nvidia section
10:26karolherbst: tearing under X is a lost cause anyway
10:26ask6155: no... the ones given there are for compton and anyway don't work for me ;(
10:26orbea: i know I have seen it work before, but I have amd plugged in right now
10:26ask6155: proprietary drivers don't have screen tearing
10:27ask6155: maybe there is a setting to fix it...
10:33ask6155: I think you can take this as a fix?
10:33karolherbst: not sure
10:34karolherbst: I think there is a reason the sleep is so low...
10:34karolherbst: but...
10:34karolherbst: not sure what the disadvantage here is
10:35karolherbst: but at least we know it's indeed this wait causing issues
10:36karolherbst: I think skeggsb has some ideas here.. but...
10:36ask6155: is the swapper an issue or is that normal behaviour?
10:37karolherbst: normal
10:37karolherbst: it's essentially a "the kernel doesn't have anything better to do" task
10:37karolherbst: and it handles the CPU freq stuff and so on
10:37karolherbst: afaik
10:38ask6155: okay
10:38ask6155: what does the sleep *do*?
10:39karolherbst: waiting
10:39karolherbst: or well
10:39karolherbst: nothing
10:39ask6155: what does the range do then?
10:39karolherbst: it's essentially a hint to the scheduler that the task has nothing to do for a given period of time
10:39karolherbst: it's a min/max range
10:40karolherbst: so if the CPU is woken up anyway, and your task slept already for the min amount, your task get scheduled
10:40karolherbst: if nothing woke the CPU up until the max count, the timer triggers
10:40ask6155: aah
10:40karolherbst: yeah, it's some nice power saving feature :D
10:40karolherbst: apple also has something similiar within their OS
10:40ask6155: and what does it do when it wakes up?
10:41karolherbst: well, the task continues working on stuff
10:42ask6155: so I guess there is a performance hit then since it sleeps for longer?
10:42karolherbst: not necassarily
10:42karolherbst: it might hurt responsiveness
10:43karolherbst: usually if it comes to displays, you want low latency stuff and such.. so maybe you can get dropped frames or so? dunno
10:47ask6155: I think there is?
10:53ask6155: I got 250 on unigine valley on high
10:53ask6155: I used to get 300+ I think
10:54ask6155: I'll take a control test to confirm
11:09ask6155: um this is awkward
11:10karolherbst: ask6155: benchmarking is hard
11:10ask6155: the control and the patch got the same score
11:10karolherbst: yeah
11:10ask6155: but the thing is... now I'm on the control and the cpu usage is the same as the patch
11:11karolherbst: :/
11:11karolherbst: mhhh
11:11karolherbst: maybe it's just some random state the GPU is somtimes in?
11:11karolherbst: or maybe it gets worse over time?
11:11ask6155: I did update the kernel so that the module could be loaded in since it needs to be the same version
11:12karolherbst: that could have changed something... but maybe it's just a coincidence
11:12karolherbst: sometimes those issues are also a bit random :/
11:12karolherbst: hardware being hardware and all that
11:13ask6155: I was on 5.10.24-lts when I took the perf screenshot in the issue
11:13ask6155: now i'm on 31
11:14karolherbst: well if you can reliably reproduce it with the older kernel that might give us a better idea
11:14karolherbst: but chances are it won't reproduce
11:14ask6155: I'll try that
11:18ask6155: um
11:19ask6155: well now kworker is using 40-60% of the cpu
11:19ask6155: it is the kernel
11:20ask6155: I guess it is *fixed*?
11:25karolherbst: mhhh
11:25karolherbst: sample size 1 is always... .. open for errors, but yeah
11:25karolherbst: maybe some patch fixed it
11:25karolherbst: RSpliet mentioned having the issue as well
11:25karolherbst: not sure what kernel was tested
11:27ask6155: I saw the fosdem talk on nouveau and I felt I've seen that name... It was you! :D
11:29ask6155: it was pretty good...
11:34karolherbst: ask6155: thanks
11:35ask6155: this is weird...
11:35ask6155: the high kworker usage happens on both kernel version
11:35ask6155: BUT
11:35ask6155: it only happens when I run picom
11:36ask6155: when I kill it it goes away
11:36karolherbst: ahh
11:36karolherbst: that makes sense
11:36karolherbst: picom probably makes use of the atomic kms API
11:36karolherbst: and that triggers it
11:36karolherbst: without using that, there is no point doing all the stuff I guess
11:36karolherbst: I am not an expert in this area, so just making wild assumptions :p
11:37ask6155: so is picom at fault here or the driver?
11:37ask6155: I'm on control
11:38karolherbst: the driver
11:38karolherbst: probably
11:38karolherbst: it might be that picom just using an API way too often
11:38karolherbst: *is
11:38ask6155: but then why is it not counted as it's cpu usage?
11:39karolherbst: like... imagine it tries to page flip 1000 a second
11:39karolherbst: ask6155: because what picom is doing is probably very cheap
11:39karolherbst: and kernel space is counted extra
11:39karolherbst: that's always the issue with IPC
11:39karolherbst: attribution is hard
11:40ask6155: maybe I can try different picom flags
11:40karolherbst: yeah...
11:40karolherbst: my assumption is, if you get rid of the tearing and get proper vsyinc
11:40karolherbst: it might fix the CPU issue as well
11:40karolherbst: *vsync
11:42ask6155: okay I've figured stuff out
11:42ask6155: when picom uses glx as the backend the problem occurs
11:43ask6155: and it puts a warning that might be related
11:43ask6155: GLX_EXT_buffer_age not supported by your driver,`use-damage` has to be disabled
11:45karolherbst: ufff
11:46karolherbst: ask6155: ahh.. you need dri3 emabled
11:46karolherbst: *enabled
11:46karolherbst: check your X logs
11:46karolherbst: but I assume you will have to use the modesetting driver for that
11:47karolherbst: dunno if the nouveau ddx even supports dri3
11:49ask6155: I found something
11:49ask6155: on the wiki
11:50ask6155: glFinish, glClientWaitSync, etc. use busy waiting, causing high CPU usage in the client
11:50karolherbst: yeah well.. that's a picom bug then
11:50ask6155: also it has a number of other quirks
11:50karolherbst: but for that you have the damage extension in X
11:50karolherbst: using dri3 is really not optional anymore
11:51karolherbst: and everybody saying anything else has no clue :p
11:51karolherbst: ehh
11:51karolherbst: present extension actually
11:51ask6155: Doesn't implement the Present extension, or DRI3.
11:51ask6155: also given on page
11:51karolherbst: picom?
11:51ask6155: yeah
11:51karolherbst: okay, use something else then :D
11:52ask6155: i dunno if that means picom or the drivers
11:52karolherbst: well, mesa supports it
11:52karolherbst: as long as your xorg server enables it
11:52karolherbst: this all works with other compositors
11:52ask6155: how do I check that
11:52karolherbst: the X log should contain something about dri3
11:52ask6155: it contains nothing
11:52karolherbst: "(II) Initializing extension DRI3" e.g.
11:53ask6155: wait no
11:53ask6155: it says that
11:53karolherbst: mhh
11:53karolherbst: weird
11:53karolherbst: mind sharing your entire current Xorg log?
11:54ask6155: this is the running log I think
11:54ask6155: http://ix.io/2WQl
11:54karolherbst: "Allowed maximum DRI level 2." :/
11:55karolherbst: I think dri3 might needs to be enabled explicitly...
11:55ask6155: what is it?
11:56karolherbst: it will give you GLX_EXT_buffer_age
11:56karolherbst: but with dri3 all the tearing stuff is... less worse
11:56karolherbst: so compositors usually make use of that
11:56ask6155: sign me up!
11:56karolherbst: with dri2 it's all... broken and workarounds involve using a lot of CPU
11:57ask6155: how do I enable it?
11:57karolherbst: yeah.. I don't know how to enable that with the nouveau ddx except using modesetting :/
11:57karolherbst: or well... I prefer to just use wayland and ignore X completely as it's not really worth the effort :p...
11:58ask6155: isn't modesetting like adding a kernel parameter?
11:58karolherbst: but I think there was an option inside xorg.conf
11:58karolherbst: ask6155: there is also an x driver
11:58karolherbst: but some people hate it
11:58ask6155: I found this page
11:58ask6155: https://github.com/yshui/picom/wiki/Xorg-Rants
11:58karolherbst: ask6155: maybe like this? https://gist.githubusercontent.com/karolherbst/eb7f7668ca4b5acd6ddd423180f434a0/raw/d15b356ddab9866fbf81c0efd3d3d69f885ab02b/gistfile1.txt
12:00karolherbst: ask6155: you also want "--experimental-backends" as it seems
12:00karolherbst: ask6155: https://github.com/yshui/picom/wiki/Vsync-Situation
12:04RSpliet: karolherbst: RE high CPU utilisation: I was on 5.11.12. I now have a 5.14 with increaed usleep ranges, but to no avail - still reproducible
12:04karolherbst: RSpliet: it seems like the issue is the compositor doing on a CPU wasting spree
12:04RSpliet: Wayland, F33, Gnome Shell/Mutter
12:04karolherbst: ahhh
12:05ask6155: why is dri3 no enabled by default
12:06ask6155: *not
12:06karolherbst: no clue
12:06karolherbst: ask6155: did the snippet help?
12:06ask6155: yeah
12:06karolherbst: cool
12:06karolherbst: also with the tearing?
12:06ask6155: glx backend doesn't use high cpu now
12:06karolherbst: nice
12:07ask6155: the tearing is there
12:07karolherbst: ahh sad :/
12:07karolherbst: mind sharing your new x log though?
12:07karolherbst: just wanting to confirm some stuff
12:07RSpliet: karolherbst: sorry, I mean 5.11.14 ofcourse. 5.14 is non-existing
12:12ask6155: did u get the link?
12:12karolherbst: nope
12:12ask6155: http://ix.io/2WQr
12:12ask6155: I think the client glitched
12:12ask6155: irc client that is
12:13karolherbst: ask6155: ahh yeah, seems everything is in order
12:13karolherbst: you might have to follow those picom instructions to get rid of tearing
12:14ask6155: experimental yeah
12:14ask6155: *experimental backend and vsync = true fixes it
12:14karolherbst: cool
12:14karolherbst: still low CPU usage?
12:16ask6155: well the perf report says ioread32 has gone down to 4.9%
12:17karolherbst: okay.. I guess that's good enough
12:17ask6155: I was reading the xorg rants page and it was not pretty
12:17ask6155: is wayland viable?
12:17karolherbst: well, using it for over a year now
12:17karolherbst: seems good enough
12:18karolherbst: there are some caveats
12:18ask6155: I mean...
12:18ask6155: do things just werk?
12:18karolherbst: like if you use chrome/chromium you might not want to use wayland yet unless you except that on 4k the window is a bit blurry
12:18karolherbst: ahh, yeah, mostly
12:18karolherbst: X applications are limited
12:18ask6155: what consists of x applications?
12:18karolherbst: like steam has tons of X only features which you can't really fake that easily
12:19karolherbst: closed source games are mostly X
12:19karolherbst: stuff like stadia doens't really work well if chromium uses the experimental wayland backend
12:19karolherbst: random stuff nobody cares about
12:19karolherbst: as long as you simply use your desktop it's fine
12:19karolherbst: even playing games usually works
12:20karolherbst: there are just some niche features being broken
12:20karolherbst: like steam does some gamepad emulation stuff
12:20ask6155: what do you consider as *games*?
12:20karolherbst: well.. games :D
12:20karolherbst: they only target X
12:20karolherbst: but that's not an issue directly
12:20ask6155: I really only care about minecraft and wine
12:20karolherbst: ahh...
12:21karolherbst: that should be fine.... I think
12:21karolherbst: you can always try and see how it works, but usually there is not that huge benefit unless you have a system not supported by X
12:21karolherbst: like mixed DPI display setups
12:21ask6155: also I use bspwm and I've customized it to be /comfy/ I don't think bspwm can do wayland
12:21karolherbst: mixing a FHD and 4K screen just doesn't work in X
12:22karolherbst: yeah. probably not
12:22karolherbst: the mixed DPI display setup was my reason to finally move over
12:22karolherbst: as this is really not berable with X
12:22karolherbst: *bearable
12:22ask6155: I've never had a monitor higher than 1366x768
12:22karolherbst: okay... sad
12:23karolherbst: and I am like "I won't use anything below 4k anymore" :D
12:23ask6155: I think for a monitor 1080p is more than enough...
12:23karolherbst: depends on the size
12:23karolherbst: but you really see the difference
12:24karolherbst: for a 13" one, yeah, probablye 1080p is enough
12:24karolherbst: but I have a 27" one here
12:24karolherbst: and there you clearly see the difference
12:24ask6155: for a tv 4K maybe worth it
12:24karolherbst: the biggest advantage is not really having more pixels, but you font doesn't need to be as big to be still readable
12:24karolherbst: *the
12:25karolherbst: so smaller font sizes still work just as good
12:25karolherbst: and for gaming it makes a _huge_ difference actually :D
12:25karolherbst: but without the GPU it's all useless anyway
12:26karolherbst: using 2x SS was always so much better than 8x MSAA or whatever other options you had
12:26karolherbst: but with 4k it's even better
12:27ask6155: I'm more of the 'ugly games can be good too' category
12:27karolherbst: ohh, sure
12:27karolherbst: not arguing that
12:27karolherbst: but if I can make the games look even better :p
12:27karolherbst: last new game I tried was everspace 2
12:27karolherbst: that's... ui.... it just looks so awesome :D
12:27ask6155: 'a 60GB game is too big'
12:28karolherbst: it's worth it
12:28karolherbst: sometimes
12:28karolherbst: for everspace 2 I gladly sacrifize 50GB of space
12:28karolherbst: just look at the pictures :D
12:29karolherbst:big fan of space shooter games
12:29karolherbst: but yeah, there are also a lot of garbabe game their only positive aspect is to look good
12:30karolherbst: soo...
12:30karolherbst: I also have a tons of 2d games I really like
12:30ask6155: due to my low resources I've been looking into no graphics games like dwarf fortress, nethack...
12:30karolherbst: dwarf fortress is "fun" :3
12:31ask6155: _haha_
12:31ask6155: text based games have a special charm...
12:32karolherbst: yeah, they have
12:32ask6155: I mostly look at games at this website: osgameclones.com
12:33ask6155: anyways I guess my *that* issue is fixed...
12:33karolherbst: cool
12:34ask6155: I'll come back with different ones soon
12:34karolherbst: yeah.. not sure why dri3 isn't enabled by default
12:34karolherbst: maybe we should look into on how to do it
12:34karolherbst: but I fear the answer is "not that simple" :/
12:35ask6155: there are random errors which popup is dmesg, minecraft 1.12 doesn't work with mods, etc etc
12:36karolherbst: yeah.. minecraft is hitting a bigger issue I am working on a fix atm
12:36karolherbst: it's just a lot of work
12:36karolherbst: well.. the fix itself not so much, but not regressing anything is
12:36ask6155: but maybe my pc issue will be fixed and I can switch back to nvidia drivers >:)
12:38karolherbst: maybe
12:38karolherbst: but probably not as you are already on a legacy branch, no?
12:41me_: well the whole pc froze
12:43ask6155: this issue is complicated and for now I'm using nouveau drivers because there are more 'configurable'
12:43ask6155: *they
12:44ask6155: I'm on the main branch
12:44ask6155: the issue is unrelated to the driver
12:45karolherbst: ohh right.. you had a kepler2 GPU...
12:46karolherbst: imirkin: is there a way to default to dri3?
12:46karolherbst: mhh.. actually there should be because modesetting is doing this as well
12:47ask6155: I thought he sleeps at this time?
12:47karolherbst: sure, but might see it in the backlog :D
12:48ask6155: also the minecraft modding problems gets fixed in 1.13 or a higher version... somehow
12:50ask6155: i guess I'm done here for today... thank you
12:50karolherbst: yw
15:43imirkin: karolherbst: dri3 is the default.
15:52karolherbst: imirkin: mhhh... why did a user had to explicitly enable dri3 then?
15:53karolherbst: downstream patches or sw too old?
15:53imirkin: karolherbst: oh, in the nouveau ddx?
15:54karolherbst: yes
15:54imirkin: sorry, i'm like half-asleep
15:54imirkin: dri3 is the default in mesa
15:54karolherbst: in mesa, sure
15:54imirkin: nouveau ddx does not expose dri3 by default
15:54karolherbst: any reason why?
15:54imirkin: dri3 + exa = sometimes fail
15:54karolherbst: ohh, I remember :/ ufff
15:54imirkin: not usually
15:54imirkin: but sometimes
15:54karolherbst: that's... bad :/
15:55karolherbst: we really should default to dri3 though
15:55imirkin: iirc it was only kde that was triggering the badness
15:55imirkin: yeah, i could consider flipping it
15:55imirkin: and/or adding a compile option for the default
15:55karolherbst: I'd just flip it and work out the bugs
15:55imirkin: er, s/compile/configure/
15:55imirkin: the bugs are unfixable
15:55karolherbst: ahh, reasonable
15:55imirkin: they are architectural-level
15:55karolherbst: imirkin: yeah, well, X is unfixable, so yeah
15:55imirkin: that's ... not what i'm talking about
15:55imirkin: dri3 is incompatible with exa's architecture. as i understand it, at least.
15:56imirkin: X is the definition of perfection, and thus does not require fixing ;)
15:56karolherbst: sure
15:57karolherbst: imirkin: worst case, we add some stupid workaround? dunno :/
15:57karolherbst: having to use dri2 is not.. great
15:57imirkin: i'll be honest - i'm weak on the details of what's broken about it
15:58imirkin: MrCooper is the only person i'm aware of who could explain what's wrong
15:58karolherbst: but some might argue if you run KDE you can also just use modesetting anyway :D
15:58imirkin: coz of full-page compositor?
15:58imirkin: that's probably fair
16:00karolherbst: yeah
16:00karolherbst: you run 20 GL applications anyway, so one more won't matter much
19:19pmoreau: Nice, tls could end up under-allocated due to how tls_size was computed.
20:31pmoreau: Does someone know how l[] addressing works? I was under the impression that the shader would have something like l[0x0] and the hardware would automatically compute an offset added to that so that each thread ends up accessing their own private data.
20:31pmoreau: But I am getting confused by the description of LOCAL_WARPS_LOG_ALLOC: “number of bits in l[] addressing used for warp number”.
20:32imirkin: i don't know anything about that, fwiw
20:32imirkin: maybe you can hit other threads' private data by messing with the address?
20:33pmoreau: How come you don’t know everything single bit of information about the hardware! :-D
20:33pmoreau: No worries
20:33imirkin: because i'm not mwk
20:33imirkin: i know every other bit :)
20:33pmoreau: ;-)
20:33imirkin: this one appears to have gotten skipped
20:34pmoreau: Only going for the odd or even ones?
20:34imirkin: ("every other", i.e. "half", not "N-1")
20:34imirkin: nah, i try to have a more sophisticated scheme for partitioning integers into 2
20:34pmoreau: I see
20:35imirkin: odds/evens is just so boring
20:35pmoreau: True
20:36pmoreau: I’ll need to do some experimenting with a simple shader. I am a bit confused why the current compute code adds a 65536 offset to the TLS bo address, when the graphics code does no such thing.
20:36imirkin: oh
20:36imirkin: that's unrelated
20:36imirkin: i'm guessing it's so that they don't step over each other? dunno
20:37RSpliet: pmoreau: you probably need to tell it how many warps fit in a work-group, so that they split the allocation evenly
20:37imirkin: pmoreau: btw, 50c0 docs are published
20:37pmoreau: 👀
20:37imirkin: https://nvidia.github.io/open-gpu-doc/classes/compute/
20:37RSpliet: I suspect they don't just clip the memory in equal chunks, but they interleave data from different warps using some magical mapping to reduce the chance for bank conflicts.
20:37imirkin: ok, maybe not *docs*
20:38imirkin: but at least class method names
20:38imirkin: which are bettert than UNK0360
20:38imirkin: (it's not a high bar)
20:38pmoreau: RSpliet: That’s a good point
20:41mwk: pmoreau: for G80, I've got it basically figured out
20:41mwk: or... had, at least; I'm not sure it's actually written down
20:41imirkin: still all in your head, presumably, assuming you've been wearing ear plugs?
20:42pmoreau: If you have pointers to the information or remember it, I would gladly read/hear it. :-)
20:42mwk: anyway, this is exactly what it says on the tin; determines how many bits of address correspond to warp id
20:42RSpliet: pmoreau: from the name I'd say set it to the log2 of the number of warps in a WG, rounded up to the nearest integer. (or round up the number of warps to the nearest POT, whatever you fancy :-P)
20:42imirkin: [so that it doesn't spill out]
20:42mwk: which directly influences how many warps can be used at once
20:42mwk: ie. if you set it to 3, only warps 0-7 can ever become active on a given MP
20:43mwk: so you can reduce memory usage but it costs you parallelism
20:43mwk: (and lowers max group size)
20:44mwk: there is also the "LOCAL_WARPS_NO_CLAMP" switch, which explicitely tells it to use fewer bits for addressing, but *ignore* the limit, meaning that some warps will walk over each others' data
20:46pmoreau: Ooh I think I had missed the bit: when you say “how many bits of address correspond to warp id”, you are talking about the final addressing done by the GPU so that each threads can access its own region, not the address specified in the load/store to local mem instruction, right?
20:46mwk: yes
20:46pmoreau: That makes a lot more sense!
20:46pmoreau: Thank you :-)
20:48pmoreau: And the limit of local memory one thread can access is 64KiB?
20:49mwk: it is any power of two between 16 bytes and 64kiB, as determined by LOCAL_SIZE_LOG
20:50pmoreau: Gotcha
20:51pmoreau: I was going to say: why isn’t 8 bytes the minimum since the size is only divided by 8 in the formula, but the result still need to be a power of 2, so 16 it is.
20:51mwk: because the minimum non-0 value of the log is 1
20:52mwk: 0 just disables local storage entirely
20:52pmoreau: Oh right
20:52pmoreau: Does it have any impact (disabling it)?