00:11 pmoreau: RSpliet: I think I solved my memory issues, so shared memory should be usable. Will have to double check tomorrow, but the generated code looks good at least. :-)
00:13 RSpliet: Ah perfect!
00:14 pmoreau: I don’t whether immediates are allowed as argument to store, but Nouveau does not complain about it, thinks it’s generating proper code, but the instruction is just garbage
00:14 pmoreau: Nouveau’s point of view: `st u32 # g[$r0d+0x0] 0x40490fd0` vs envydis' point of view: `st b32 wb g[$r63d+0x10] $r16`
00:14 RSpliet: Does that include the async_copy OpenCL method being called?
00:14 RSpliet: (does clover/llvm take care of translating that to something sane for you?)
00:14 pmoreau: I haven’t looked at the async_copy yet
00:16 RSpliet: cool cool!
00:17 RSpliet: but that's good news :-)
00:18 pmoreau: I would guess, I would end up with a SPIR-V OpCopyMemory or OpCopyMemorySized
01:02 pmoreau: RSpliet: And working! I hadn’t realised I needed to say how much shared memory I was using. :-)
01:02 pmoreau: Plus I needed to tweak the Nouveau code a bit, so that I could report back how much shared memory was being used.
01:10 RSpliet: pmoreau: oh yes of course it needs to know :-) There's only 16KiB* to divide over all active threads. Just like reporting the number of registers you use directly affects how much parallelism you can have on a single SM
01:11 RSpliet: (configurable to 48KiB in Cuda by snooping off the L1, but NVIDIA never exposed this for OpenCL. Here's your moment to make nouveau shine over the blob :-P)
01:11 pmoreau: I do know that, since it has been limiting me sometimes when writing kernels.
01:12 pmoreau: And since Maxwell (v2? or maybe v1), it is no longer split with L1, but it has it’s own space
01:12 RSpliet: Oh I hadn't realised that
01:12 RSpliet: (meanwhile my leg fell asleep, time for my body to follow, like I intended half an hour ago :')
01:13 pmoreau: + you get at least 64KiB (GM200, GP100), and 96KiB of shared on the other ones ;-)
01:14 pmoreau: A block still remains limited to 48KiB, but at least you can have 2-3 blocks of that size now. :-)
03:08 Horizon_Brave: greets everyone
11:35 pmoreau: Wooooot? I don’t remember any patch going in 4.10 to enable vga_switcheroo for Retina MBPs!
11:35 pmoreau: l1k, Lekensteyn: Did I completely missed/forgot about 1 of your patches?
11:40 pmoreau: Though, I’ll have to disable it for now as it fails… :-/
14:38 dboyan_: I sent the glsl/tgsi shader cache enablement to the list. Reduced load time of Portal 2 from 1min to 50s on my laptop.
14:38 karolherbst: dboyan_: awesome :)
14:38 dboyan_: However shader binary cache like that of radeonsi still need more work
14:39 dboyan_: seems some validation is needed
14:39 karolherbst: is the git tag/version string already hashed in?
14:40 karolherbst: ohh wait
14:40 dboyan_: it checked the validity of content, based on binary length and crc
14:40 karolherbst: it's just reading out the stuff
14:40 karolherbst: or... something else?
14:41 karolherbst: mhh
14:41 karolherbst: we need something like a "compiler version"
14:41 karolherbst: so that broken compilations are removed from the cache and so on
14:42 karolherbst: or that new optimizations are applied and stuff like that
14:44 dboyan_: yeah, that'll be also needed in shader binary cache
14:48 dboyan_: There isn't problem between different builds of mesa. However, toggling of nouveau compile options and hash collision should be taken into account.
14:54 karolherbst: dboyan_: there is also the option to enable/disable optimisations for nv50 codegen
15:00 dboyan_: karolherbst: I guess some investigation is needed before implementing shader binary cache.
15:00 karolherbst: yeah
15:00 dboyan_: The glsl/tgsi cache should be safe, though
15:00 karolherbst: debug builds shouldn't use it as well
15:00 karolherbst: etc...
15:01 karolherbst: and with debug builds I mean builds with --enable-debugging
15:01 dboyan_: I don't think radeonsi or r600 disable them, or rather I overlooked something?
15:01 karolherbst: or maybe environmental variable "NV50_IR_USE_SHADER_CACHE" which defaults to !isDebugBuild
15:02 karolherbst: no idea
15:02 karolherbst: but as a developer I would be annoyed if I couldn't disable it
15:02 dboyan_: There is an environment variable to disable shader cache
15:04 karolherbst: okay, nice
15:04 karolherbst: that should be enough then
15:04 dboyan_: I saw some discussion about a radeon-specific flag should be added. Finally people decided that if they don't want shader cache, they should disable them all.
15:05 karolherbst: yeah
15:05 karolherbst: dboyan_: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp#n3670
15:05 karolherbst: the numbers are for the optimization level
15:06 tarragon: karolherbst: hei sweety
15:38 whompy: well that's a tad creepy for irc.
15:42 karolherbst: whompy: especially you really need the right channel for greetings inside IRC anyway
16:04 spinat: hello guys
16:04 spinat: anybody on?
16:13 mupuf: spinat: hey
16:14 mupuf: just ask your question and hang around until someone answers ;)
16:14 spinat: oh ok sorry
16:15 mupuf: don't be ;)
16:15 spinat: tryin to get my gtx960 to work with nouveau on arch linux
16:15 spinat: and get this message (EE) Unknown chipset: NV126
16:16 spinat: i found this old bug report here
16:16 spinat: https://bugs.freedesktop.org/show_bug.cgi?id=94728
16:16 spinat: but it is marked as not a bug
16:16 spinat: i installed the newest mesa and nouveau today
16:17 spinat: but still no luck
16:17 karolherbst: spinat: did you update xf86-video-nouveau?
16:17 karolherbst: to like .14 or so?
16:17 karolherbst: or .13
16:17 spinat: is gtx960 supported?
16:17 karolherbst: yes
16:17 spinat: i have .13
16:18 karolherbst: spinat: anyway, you should remove your xorg.conf file if you have any
16:18 spinat: aha
16:18 spinat: and that should work?
16:18 karolherbst: ohh wait, xf86-video-nouveau only supports GM starting with .14
16:18 karolherbst: before that, modesetting should be used
16:18 karolherbst: yes
16:18 karolherbst: except your hw is broken and you need dirty workarounds
16:18 karolherbst: but usually nobody needs a xorg.conf today
16:18 spinat: so then:
16:19 spinat: 1) where to get .14?
16:19 karolherbst: 1. aks your distribution ;)
16:19 spinat: 2) how to use modesetting?
16:19 karolherbst: 2. by removing your xorg.conf file
16:19 karolherbst: you could pastebin your xorg.conf
16:19 spinat: got it
16:19 karolherbst: maybe something important is inside it
16:20 spinat: well actually
16:20 karolherbst: but usually Xorg autoconfigures itself
16:20 spinat: i never touched xorg.conf
16:20 karolherbst: mhh odd
16:20 spinat: it schould be default
16:20 karolherbst: it should fall back to modesetting
16:20 karolherbst: wait
16:20 karolherbst: ist your X server starting?
16:20 spinat: but i preveously had nvidia drivers
16:20 karolherbst: ohh
16:20 karolherbst: nvidia installs a xorg.conf file most of the time
16:20 spinat: with nvidia driver yes
16:21 spinat: but never with nouveau
16:21 spinat: got it
16:21 spinat: will give it a try
16:22 spinat: thx a lot
16:23 spinat: where is xorg.conf located?
16:24 whompy: /etc/X11
16:24 whompy: Look for stuff in xorg.d/ as well
16:25 karolherbst: well files in xorg.d tend to be important though
16:25 karolherbst: spinat: by any chance, do you have a xf86-video-modesetting package?
16:25 spinat: well X11 has no xorg.conf
16:25 karolherbst: spinat: if not, you may want to install it, except xorg-server is new enough
16:26 spinat: only xorg.conf.de
16:26 spinat: *d
16:26 karolherbst: could you pastebing your Xorg.0.log file?
16:27 spinat: you are right
16:27 spinat: i cleared conf.d
16:27 spinat: and it worked
16:27 spinat: thx
16:28 karolherbst: mhh
16:28 karolherbst: well
16:28 karolherbst: bad thing is, some files from there might be important
16:28 karolherbst: but the only things I know is for touchpad related stuff
16:28 karolherbst: and so
16:29 karolherbst: oh well
16:29 karolherbst: package updates might fix it
16:35 rpirea: hi
16:35 rpirea: i have a sli setup with 2xgtx 970. how can i help you to support sli in nouveau?
16:42 pmoreau: spinat: The xf86-video-nouveau package has been flagged as out-of-date, but it hasn’t been updated yet. Hopefully some time next week, maybe?
16:43 imirkin: dboyan_: that was a good example with the histogram... doesn't work on fermi either. but for some other reason - the "bug" program you wrote works fine.
16:43 imirkin: Lyude: have you fixed the tgz situation for 1.0.14?
16:45 imirkin: mwk: any thoughts on what might be special about a NV44A with a PCI connector? the fifo can't read the pushbufs we give it (or the memory we give it) or something. it allegedly used to work, haven't bisected yet, but was hoping to get some opinions...
16:45 imirkin: mwk: you can see some info in https://bugs.freedesktop.org/show_bug.cgi?id=70388#c35
16:55 karolherbst: rpirea: by implementing SLI within nouveau
16:56 karolherbst: rpirea: this would be a good first step: https://trello.com/c/GXe2bbEO/161-use-sli-for-prime-offloading
16:56 karolherbst: rpirea: currently you can use (or should be able to use) DRI_PRIME offloading to render something on the other GPU
16:56 karolherbst: this could be speed up by using SLI
16:57 karolherbst: vblank_mode=0 DRI_PRIME=0 glxgears should be significantly faster than vblank_mode=0 DRI_PRIME=1 glxgears
16:57 karolherbst: the idea behind this is, just to get the data transfer working between both cards. The next step could be to do some OpenGL stuff on both gpus
16:58 karolherbst: but this will be like a _huge_ project
16:59 karolherbst: rpirea: by any chance, are you a student? Then you could do something related to that as a GSoC project
16:59 rpirea: karolherbst yes.
17:00 karolherbst: sadly I have only minor kownledge about SLI in general. I think imirkin knows more? Somebody reverse engineered a few registers related to SLI already
17:00 rpirea: karolherbst and i am not interset about GSoC.
17:01 karolherbst: rpirea: well... you are not working for Google if that would be your issue or so
17:05 imirkin: SLI the way that it works (or used to work) on windows is largely a waste of time
17:05 imirkin: the difficulty, esp in modern games, comes from figuring out how to split up a single workload among multiple GPUs
17:05 imirkin: with single-pass games it was pretty straightfoward
17:05 imirkin: but with multi-pass it's much harder
17:07 imirkin: that said, SLI does offer a bus for inter-gpu communication (i think?), and modeling that out properly would be nice, if not immediately useful
17:08 imirkin: one could imagine a system where a dmabuf from one GPU can be transfered over the SLI bus to the other GPU if necessary
17:08 imirkin: however that is so far removed in the future, that it's not particularly interesting to think about (to me)
17:40 pmoreau: I am getting **so** confused by the code using sometimes "shared" for shared memory (CUDA/OpenGL denomination), and sometimes "local" (OpenCL denomination). Especially since "local" in CUDA means "spill-memory"…
17:42 calim: it's not so bad considering they're actually the same part of memory
17:42 calim: just divided up for different purposes
17:42 pmoreau: Isn’t local off-chip, while shared is definitely on-chip
17:43 calim: on pre-Maxwell you had to configure how much of it to use for L1 cache and how much for shared memory
17:43 pmoreau: I always thought as local as a part of global memory, but reserved for the driver usage
17:43 pmoreau: Right, but it was shared vs L1, not shared vs local
17:44 imirkin: pmoreau: it's all the same
17:44 calim: yeah buuut L1 is mainly used for local memory
17:45 pmoreau: Oh, ok, so local is on chip as well, and not part of global as I was thinking. Good to know!
17:45 calim: of course local/private is also backed by normal device memory
17:45 calim: because there's not enough of it otherwise
17:46 pmoreau: I guess I never thought that "local memory" would be cached, but it makes sense.
17:46 calim: it doesn't need to be coherent
17:47 pmoreau: Right
17:47 pmoreau: Thanks for the clarification! :-)
17:47 calim: I still always write shared/local and local/private when it's not clear ;)
17:51 pmoreau: So, now that everything is clearer, back to looking into how I should report shared/local memory through clover, and why tlsSize of nv50_ir::Program is not properly initialised…
18:29 john_cephalopoda: Hey.
20:38 Lekensteyn: pmoreau: Lukas Wunner has been working on some vgaswitcheroo patches (mainly related to Thunderbolt)
20:41 Lekensteyn: pmoreau: see https://lists.freedesktop.org/archives/nouveau/2017-February/027374.html
20:49 pmoreau: Lekensteyn: I have seen those, but they haven’t been merged yet, and they definitely aren’t part of 4.10. :-)
20:50 Lekensteyn: oh right, these are still pending. I am not aware of vgaswitcheroo patches for 4.10
20:51 pmoreau: Hum… weird. I’ll need to investigate it
20:53 Lekensteyn: pmoreau: btw, what does "enable vga_switcheroo" mean? do you mean runtime pm?
20:53 pmoreau: It was not really clear, I agree. Indeed, I meant runtime pm.
20:53 karolherbst: imirkin: :O I found the error
20:53 karolherbst: imirkin: or at least I know what the error _might_ be
20:54 karolherbst: imirkin: this looks fine right? $2 = {resource = 0x7ffdab0e5e60, level = 0, usage = 6146, box = {x = 0, y = 0, z = 0, width = 901120, height = 1, depth = 1}, stride = 0, layer_stride = 0}
20:54 karolherbst: _but_: p *(struct nouveau_transfer *)transfer is $3 = {base = {resource = 0x7ffdab0e5e60, level = 0, usage = 6146, box = {x = 0, y = 0, z = 0, width = 901120, height = 1, depth = 1}, stride = 0, layer_stride = 0}, map = 0xffffffff <error: Cannot access memory at address 0xffffffff>, bo = 0x0, mm = 0xcc9c00000000, offset = 0}
20:54 mlankhorst: MAP_fAILED ?
20:55 karolherbst: seems like we cast something to nouveau_transfer which isn't a nouveau_transfer? (struct nouveau_transfer *tx = nouveau_transfer(transfer);)
20:55 karolherbst: ?
20:57 karolherbst: or it didn't get copied properly
20:57 karolherbst: some silly issue like this
20:59 pmoreau: Lekensteyn: If I do not boot with `nouveau.runpm=0`, I get https://hastebin.com/igelibuzut.go, and launching anything on the discrete GPU will result in the corresponding process locking up, and if having running any OpenGL animation, even if on the integrated GPU, the screen just freezes (I haven’t checked whether the computer would still respond).
21:15 Lekensteyn: pmoreau: hm, I'm still on 4.9 (also Arch), have not checked with 4.10 yet. only 4.10 feature that I can think of is 3a6536c51d5db3adf58dcd466a3aee6233b58544
21:17 karolherbst: imirkin: another think: it always crashes if size==901120
21:18 karolherbst: *thing
21:29 pmoreau: Lekensteyn: Hum… it seems unlikely that that commit would cause that issue, but who knows
21:50 karolherbst: mhh, doesn't look this a little odd? box = {x = 0, y = 0, z = 0, width = 901120, height = 1, depth = 1}
23:40 pmoreau: It seems that I will never be satisfied of my serie for clover… :-(
23:47 lotus: Hi there~
23:48 calim: why not ?
23:51 pmoreau: calim: I always find some things that are missing, or bugs, and I am not happy with how the code looks. :-D
23:52 calim: at some point you'll run out of the first 2 things though
23:53 calim: (with a lenient definition of "bug")
23:54 calim: it's so hard to make code not look ugly, all those imperfectly aligned lines ...
23:54 pmoreau: :-)
23:57 karolherbst: this one bug is killing me :(
23:58 karolherbst: pmoreau: did you take care of getting shadow of mordor working or was it hakzsam?
23:59 pmoreau: I did not took care of it, so most likely hakzsam.