00:07 Tom^: imirkin: idk if its just a coincidence but i got this in cs:go now https://gist.github.com/gulafaran/74020177c1b94d45d0459336191d4fd1 :p
00:07 Tom^: and very weird lags directly
00:08 imirkin: yes. coincidence.
00:08 imirkin: you have a ttm leak
00:38 karolherbst: Tom^: split debug symbols
00:38 karolherbst: kernel option
00:38 Tom^: we shall see dropped bunch of unnecessery drivers for various devices.
00:38 karolherbst: Tom^: you don't udnerstand :p
00:38 Tom^: nope
00:39 Tom^: =D
00:39 karolherbst: Tom^: CONFIG_DEBUG_INFO_SPLIT
00:40 karolherbst: Tom^: and CONFIG_GDB_SCRIPTS might also make sense
00:41 karolherbst: stupid kernel, why doesn
00:41 karolherbst: 't the kernel tell my why the system was woken up
00:42 karolherbst: but I think unplugging my USB keyboard helps
01:01 Tom^: hm that didnt work still just prints bunch of garble even tho i left debug symbols in hm
01:04 imirkin: are you still seeing issues even after a reboot?
01:04 Tom^: oh im just trying to figure out the ttm crash when replaying the trace :p
01:05 imirkin: well, i'm guessing that ttm got messed up
01:11 Tom^: would it help if i find which gldraw call that does it?
01:17 imirkin: no, i'm sure it's random. just reboot and it'll all be better.
01:17 Tom^: yea but it happends everytime i replay the trace :p
01:17 Tom^: i mean even after reboot
01:19 Tom^: just sorta thought it could be useful to pinpoint why but idk
01:20 imirkin: oh. very werid.
01:21 Tom^: its this bit that is persistent https://gist.github.com/gulafaran/715a2b46e1348f1cc1b1c8112de84e74 on the replays, that steam cs:go ttm thing was just random.
01:25 imirkin: tbh i dunno what that means. i'm hoping skeggsb can make sense of it.
01:37 imirkin: it *sounds* like ttm is leaking
02:48 mwk: ugh, I never noticed how weird the Falcon ISA is until I tried to make a compiler for it
02:49 mwk: the flags seem to be a total waste
02:49 mwk: $p0-$p7
02:50 mwk: could've been much more useful if it supported mov/and/xor/or on these
02:53 mwk: also [$sp+$rX*4] is a horrible addressing mode
02:54 mwk: no absolute addressing mode... on an MCU!
02:56 imirkin: well, at least things improve with revisions, no?
02:57 mwk: that's true
02:59 mwk: but not by much
02:59 mwk: unless I missed some opcodes on v4, which is quite possible
02:59 imirkin: there's a v5 too, which i'm sure you're aware of
03:00 mwk: v5 is mostly an encoding optimization
03:00 imirkin: no better time to sneak in some new ops :)
03:00 mwk: yeah, I haven't scanned the opcode space on v5
03:00 mwk: it could have something nice
03:01 mwk: but I see nothing off when I decompile things, so maybe not
03:01 imirkin: have you analyzed pmu and vdec as well?
03:01 mwk: I'll probably write some hwtests for Falcon soon
03:02 imirkin: what happened to the gf100+ gpu hw tests?
03:02 mwk: I'm encoding a lot of knowledge in LLVM that I took straight from my own Falcon docs, hoping I got it right back then
03:02 mwk: that's still on the todo list
03:02 mwk: so is G80 doc
03:05 mwk: eh
03:05 mwk: I need to go through all Falcon doc and code and swap REG1 with REG2
03:05 mwk: as it turns out, the operand ordering makes much more sense when you do that
03:06 mwk: and count destination operands last
03:07 mwk: it actually helped clean up the patterns in LLVM, since I could then reuse rri8 for ri8 and rrr for rr
03:10 imirkin:has never looked at falcon encodings
03:11 imirkin: hopefully i can keep it that way ;)
11:23 night199uk: hrm, odd question - maybe badly phrased - do different nvidia cards have different framebuffer alignments?
11:24 night199uk: seems like the framebuffer size on fermi+ (or maybe kepler+) needs to be 255 byte aligned, whereas before it was 64 byte aligned?
11:33 karolherbst: night199uk: I am pretty sure that maybe you may find the answer int the nouveau code :D
11:33 night199uk: yeah - any idea where i’d look?
11:33 karolherbst: drm/nouveau/nvkm/subdev/fb
11:34 karolherbst: it is a beast though
11:34 night199uk: feared that might be the answer but thought i’d see if someone knew ‘off-hand’ first :-)
11:34 karolherbst: well
11:34 karolherbst: there is a "size = min(size, 0x1000);"
11:35 karolherbst: but that's for the mmu
11:35 night199uk: hrm
11:35 karolherbst: nvkm_mask(device, 0x100c80, 0x00000001, 0x00000000); /* 128KiB lpg */
11:35 karolherbst: maybe this is it?
11:35 night199uk: there is some code in the driver that seems to calculate the number of bytes per row
11:35 night199uk: related to 0x610758
11:35 karolherbst: no idea what is lpg
11:35 night199uk: hrm, sec
11:36 night199uk: hrm, nah
11:36 night199uk: that is to do with setting the size of the FB I think
11:36 night199uk: the driver calculates a buffer size for storing a row of display data
11:36 night199uk: i call it BytesPerRow
11:37 night199uk: it’s then aligned which seems to be chipset specific (this is related to my chipset ID query yesterday)
11:37 night199uk: well, the row is padded out
11:37 night199uk: for below fermi it seems to be padded to the next 64-byte boundary
11:37 night199uk: for fermi+ it is padded out to a 256-byte boundary
11:38 karolherbst: well I never looked into this area of nouveau really
11:38 night199uk: yah :-)
11:38 night199uk: i figure i’m way off on the long road by now ;-)
11:39 night199uk: this value is programmed into what I guess is PDISP reg 0x455/0x456 by the mode setting scripts
11:40 night199uk: where does nouveau do similar to the mode setting scripts that exist in the driver, any ideas?
11:55 pmoreau: imirkin: I’ll need to spend some more time in nv50_ir_peephole.cpp to add U64/S64 support. Probably worth to get that working before returning to the CVT U64/S64, since I saw some CVT handling in there.
12:00 karolherbst: night199uk: if the script is in the vbios, nouveau usually just exectues them
12:02 mwk: night199uk: framebuffer alignment depends on the type of surface involved
12:03 mwk: for simple pitch surface, it used to be 64 I think, but maybe you need to start on a GOB, ie. 0x200 byte bounduary for Fermi+
12:03 mwk: for tiled surfaces, you most definitely eed GOB alignment
12:17 night199uk: mwk: hrm, this is essentially 0x100 aligned
12:17 night199uk: mwk: so it sounds like theres definitely a difference in generations
12:17 night199uk: which kind of confirms what i’m seeing
12:17 night199uk: 64-bytes was pre-fermi?
12:18 night199uk: what’s a GOB?
12:18 night199uk: and i know nothing about surface types :-)
12:18 night199uk: all the code so far has been on mode setting and display standards
12:18 night199uk: so i don’t understand much about PDISP / PGRAPH arch
12:19 night199uk: karolherbst: nah… these mode setting scripts are contained in the EFI driver itself, as opposed to the ‘VBIOS’ scripts
12:20 night199uk: so they’re in the vbios but probably not what you’re meaning
12:34 mwk: night199uk: about surface types, see http://envytools.readthedocs.io/en/latest/hw/memory/g80-surface.html
12:35 mwk: tiled == blocklinear
12:35 mwk: gob is 0x100 bytes for Teslas, 0x200 bytes for Fermis and up
12:36 mwk: it's an important unit for the memory subsystem
12:37 mwk: blocklinear aka tiled surfaces always need to be gob aligned, due to their structure
12:37 mwk: IIRC for pitch surfaces you could get away with "only" 0x40-byte alignment, but I wouldn't be surprised if they require gob alignement too for some circumstances - that might include framebuffer
12:44 night199uk: interesting, thanks mwk that really helps
12:45 night199uk: i guess since what i’m looking at is 2d this is a pitch surface?
12:45 mwk: it could be blocklinear
12:45 mwk: whose surface are you looking at?
12:45 night199uk: well, the value of this is used in a few places
12:46 night199uk: most notably as the size of a buffer used for hardware accelerated 2d drawing
12:46 night199uk: the driver keeps a 1 line size buffer used for simple screen scrolling in 2d
12:46 night199uk: so it calculates the total bytes per row and then does this alignment dance
12:46 night199uk: but the aligned bytes per row value is used in a bunch of places
12:49 night199uk: the other place its used is in mode setting, it’s pushed into what i guess is PDISP reg 0x455
12:49 night199uk: ahh yeah, that 0x200 makes more sense now looking at this code again
12:53 night199uk: this page is really useful mwk
12:53 night199uk: thanks
12:54 night199uk: i can see some of what i’m looking at relates to this but i can’t figure out how yet :-)
14:28 Tom^: imirkin: https://gist.github.com/gulafaran/3c14aecc7cd9b41d69b79f496dec30dd im getting these ttm issues more often now since your patch and/or since i recompiled mesa from git with it.
14:28 imirkin: my guess is the latter :)
14:28 imirkin: what mesa did you have before?
14:29 Tom^: -git with that was compiled justa few days ago
14:30 imirkin: hmmmm
14:30 Tom^: i was on this commit https://cgit.freedesktop.org/mesa/mesa/commit/?id=59156b2
14:31 Tom^: according to my freedesktop bugreport :p https://bugs.freedesktop.org/show_bug.cgi?id=95403
14:32 imirkin: well, it's only i965 patches since then...
14:32 karolherbst: Tom^: maybe you should reboot and it happens less :D
14:32 Tom^: karolherbst: i dont think ive had an uptime longer then ~16 hours for the past 2 weeks
14:33 karolherbst: and now it is how long?
14:33 Tom^: 2 minutes.
14:33 Tom^: xorg and the gpu sort of freezes when these ttm issues comes so im sort of required to reboot :p
14:33 karolherbst: ahh
14:36 Tom^: wait
14:37 Tom^: imirkin: googling seems to suggest some laptop owner had similiar issue in the past where the compositor asked for to much vram.
14:37 Tom^: imirkin: *cough* shader allocation perhaps *cough.
14:37 Tom^: xD
14:38 imirkin: well, that's what fail_validate usually means
14:38 imirkin: but you have a ton of vram
14:38 imirkin: so... not sure how that'd be happening
14:38 Tom^: im alloccating 1 << 30
14:38 Tom^: =D =D
14:38 karolherbst: ...
14:38 karolherbst: Tom^: you are aware of that this is per context?
14:38 Tom^: yes
14:39 Tom^: i said i was gonna test an obscene amount of it to rule it out :P
14:39 karolherbst: this is 1GB,right?
14:40 karolherbst: I already though about saying that this might be the reason for your ttm problems, but I suspected you already took care of that
14:40 karolherbst: imirkin: well he has only 3GB though
14:41 imirkin: Tom^: errrr what??
14:41 imirkin: Tom^: that's definitely not good
14:42 Tom^: yes yes, im setting it to more normal levels before i proceed.
14:44 Tom^: 1 << 30 was the highest i could go before hitting int max
14:44 Tom^: hehe
14:49 imirkin: Tom^: yeah, so basically you got a TON of thrashing as it wants the code segment to live in vram
14:49 imirkin: and you were allocating 1GB of it
14:50 imirkin: and only 3GB of vram
14:50 imirkin: so each application had its own 1GB code segment
14:50 imirkin: and you can see how much fun that was.
14:50 Tom^: :)
14:50 imirkin: (and on occasion, applications also want to store other things, like textures and whatnot)
14:51 Tom^: atleast i throughly tested that i aint leaking in ttm.
14:52 Tom^: i was simply merely running out of vram out of own stupidity
14:53 imirkin: ok, well i just pushed out that fix
14:54 imirkin: so no longer need to patch
14:54 Tom^: cool
15:05 Tom^: imirkin: ive also found another tiny visual glitch with msaa 2x turned on they disappear without it on. http://i.imgur.com/U6KmB5o.png see those red lines appearing on edges of stuff? , is this uh traceable?
15:17 karolherbst: Tom^: the question is, does it happen with nvidia?
15:19 Tom^: gonna have to install blob and see before answering.
15:19 imirkin: Tom^: sure, make a trace, see if it happens when replaying the apitrace
15:19 imirkin: Tom^: i can definitely see us messing something up causing that to happen
15:19 imirkin: but it could also easily be the game
15:19 Tom^: yea il test the blob first
15:21 imirkin: also try 4x
15:22 imirkin: our resolve "algorithm" isn't what one would call... great.
15:22 imirkin: Tom^: btw, why is the image so small? i figured you'd do full-screen stuff :)
15:23 imirkin: [but if you make a trace, small window is def better, coz then it doesn't take me a year to replay]
15:26 Tom^: its 1920x1080 but cut small with scrot
15:26 Tom^: to focus on the red lines :P
15:26 imirkin: aha
15:27 Tom^: but that raises a valid point, i could trace at a very low resolution
15:33 karolherbst: Tom^: is this like from a PS2 emulator or something?
15:33 Tom^: its cs:go
15:33 tinnuel: Hello. I am trying to get my NVIDIA graphics card (GF108M) working with nouveau on Debian testing amd 64. I am running linux kernel version 4.5.0-2-amd64. I have installed bumblebee and primus. I made sure that nouveau was not blacklisted. The graphics card does not show up under xrandr --listprovider, but it does shoe up under lspci. Nouveau seems to throw up a number of vague read-write and time out error messages in dmesg. I am not sure
15:33 tinnuel: what they mean. Here is the pastebin for dmesg: http://pastebin.com/wM2JwMdC
15:33 karolherbst: tinnuel: ahh
15:33 karolherbst: ...
15:33 karolherbst: Tom^: ahh
15:34 karolherbst: tinnuel: don't use bumblebee with nouveau
15:35 tinnuel: karolherbst: so, do I just purge it, cross my fingers and hope for the best?
15:35 karolherbst: it won
15:35 karolherbst: 't change a thing though
15:35 karolherbst: well I assume you already followed the optimus wiki page?
15:35 karolherbst: https://nouveau.freedesktop.org/wiki/Optimus/
15:36 karolherbst: tinnuel: I think your Xorg log would help here more
15:43 Lekensteyn: karolherbst: you had an issue with nouveau reporting "unknown chipset (ffffffff)" after disabling the graphics card. Have you ever figured out what was going on?
15:43 Lekensteyn: right now I get the exact same message after powering off/on the PCIe port, and then loading nouveau
15:43 xaphir100: karolherbst: are u wine expert?
15:44 karolherbst: Lekensteyn: not really, I just know that this happens
15:44 karolherbst: xaphir100: nope
15:44 karolherbst: Lekensteyn: but
15:44 karolherbst: Lekensteyn: I think it only happens the second time
15:44 Lekensteyn: the second time what? off, on, load -> broken?
15:45 karolherbst: Lekensteyn: nope, on -> off -> on -> off -> broken
15:45 karolherbst: Lekensteyn: turning the card on again works though
15:45 karolherbst: maybe this is some caching problem on the kernel side when there is no driver loaded for that gpu?
15:46 Lekensteyn: seems unlikely, if I do ACPI magic behind the scenes I can reproduce the same issue
15:47 karolherbst: yeah, no idea why that happens
15:47 karolherbst: but I also don't really care that much about this
15:47 Lekensteyn: here is the logs where you can see the _OFF/_ON (invoked via acpi_call), then loading nouveau triggers triggers the unknown chipset issue http://sprunge.us/QRTi
15:47 karolherbst: vgaswitcheroo could be a little bit less strict about it
15:47 karolherbst: Lekensteyn: yeah, because the card is off at loading time
15:48 Lekensteyn: I wonder if there is a sequence that could somehow reinitialize the card (I saw something about NvPost thing, would that help?)
15:48 karolherbst: Lekensteyn: well the card needs to be posted after it was off
15:48 karolherbst: if there is an ACPI method for posting the GPU, this might help
15:48 karolherbst: Lekensteyn: but I have no idea when nouveau posts
15:49 karolherbst: Lekensteyn: you can turn on the card with bbswitch without problems though
15:49 karolherbst: and then load nouveau
15:50 tinnuel: karolherbst: Here is the log for Xorg http://pastebin.com/0kEw2Jsb .
15:50 karolherbst: Lekensteyn: lspci -vv: "01:00.0 VGA compatible controller: NVIDIA Corporation GK106M [GeForce GTX 770M] (rev ff) (prog-if ff) !!! Unknown header type 7f
15:50 karolherbst: tinnuel: you need the nouveau DDX
15:51 karolherbst: tinnuel: do you use a custom Xorg.conf ?
15:51 karolherbst: tinnuel: ohh, the issue is more severe than I though
15:52 karolherbst: tinnuel: and your kernel is like a little old
15:53 karolherbst: tinnuel: mhhh your software stack seems a bit old in general too
15:57 tinnuel: karolherbst: I have not customised my Xorg.conf. My graphics card is quite old, so I thought the kernel would not be the most critical. Under Xorg the Release dat is 4 April 2016
16:01 Tom^: tinnuel: according to the Xorg.0.log you are not on kernel 4.5 you are on 3.16
16:02 tinnuel: Tom^: Hi, just saw that. But the Xorg.1.log file is dated with 2015, which although newer, is still not new enough. when checking Xorg, it states release date 4 April 2016. The kernel info I got from checking through terminal.
16:04 tinnuel: Tom^: addendum: Xorg -version gives Build operating system Linux 3.16.0-4-amd64 x86_64 Debian, but under Current operating sytem it states Linux mimir 4.5.0-2-amd64
16:11 imirkin: Lekensteyn: you get -1's from PCI when the device is off
16:12 Lekensteyn: imirkin: yes, except that the pci config space does not return -1s
16:13 karolherbst: tinnuel: "Kernel command line: BOOT_IMAGE=/vmlinuz-3.16.0-4-amd64 root=/dev/mapper/mimir--vg-root ro initrd=/install/gtk/initrd.gz quiet"
16:13 karolherbst: tinnuel: maybe the log is just old?
16:13 imirkin: Lekensteyn: how did you check? linux kernel caches it...
16:13 karolherbst: tinnuel: also "Current Operating System: Linux mimir 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u3 (2015-08-04) x86_64"
16:14 imirkin: tinnuel: looks like nouveau is very upset with the display bits of your GPU... but if you're not attaching screens to it, it shouldn't really matter.
16:15 tinnuel: karolherbst: Is there a way to manually generate a new log file?
16:15 Lekensteyn: lspci -nnvvvxxxx (just after remove and rescan PCI device) http://sprunge.us/GNNV
16:15 Lekensteyn: lspci http://sprunge.us/ZbdQ (after modprobe nouveau; rmmod nouveau; \_SB.PCI0.PEG0.PG00._OFF ; \_SB.PCI0.PEG0.PG00._ON ; modprobe nouveau (reports unknown chipset issue); rmmod nouveau)
16:16 imirkin: Lekensteyn: try lspci -H1 or lspci -H2
16:16 Lekensteyn: using -H1 (reading directly from /dev/mem also reports a valid value (not rev ff))
16:16 imirkin: ok
16:17 Lekensteyn: `git diff --color-words lspci-nv-00-rescanned.txt lspci-nv-03-offon.txt` shows some differences for the memory regions, no idea if that is signficant
16:21 karolherbst: tinnuel: there should be one called xorg.0.log or maybe it is inside the journal
16:24 tinnuel: karolherbst: the xorg.0.log is the file I put on paste bin. The xorg.1.log seems to be slightly newer, but still too old.
16:27 karolherbst: tinnuel: yeah, I guess it is in your journal then :/
16:55 tinnuel: karolherbst: I have copied the sections that are possibly related to Xorg from the my journal on pastebin here: http://pastebin.com/JU3z4SS8
16:58 Lekensteyn: nouveau is using pci_resource_start without pci_request_regions, is that legal?
17:34 karolherbst: tinnuel: seems fine, thanks
17:34 karolherbst: tinnuel: do you have xserver-xorg-video-nouveau installed?
17:39 tinnuel: karolherbst: Yes, xserver-xorg-video-nouveau is installed (version 1:1.0.12-1, installed from debian stretch)
17:42 karolherbst: tinnuel: output of "find /etc/modprobe.d/ -type f -exec grep -i nouveau {} +" please
17:42 karolherbst: hey..., somebody with a name similiar to mine
17:49 tinnuel: karolherbst: ohh... results is"/etc/modprobe.d/bumblebee.conf:# do not automatically load nouveau as it may prevent nvidia from loading
17:49 tinnuel: /etc/modprobe.d/bumblebee.conf:blacklist nouveau" dose that mean purging bumblebee (as it is not needed) would sort this out?
17:56 Soukyuu: about that "NV50 suddenly starting to drop frames" issue - it seems the only variable is time. And it's getting worse. I now get graphical hangs exactly every 2 seconds.
17:56 Soukyuu: restarting helps, then it starts getting worse over time again
17:57 Soukyuu: changing clocks does not, and it happens on mid + high clocks (didnt test low clocks because they're too slow to do anything)
17:57 Soukyuu: ah, more specifically, it's a 260GTX
17:58 imirkin: Soukyuu: what was the issue exactly?
17:59 Soukyuu: the performance is great right after booting, but starts to degrade over time
17:59 imirkin: that's weird
17:59 Soukyuu: nothing in dmesg except for an occasional nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 7 [kwin_x11[916]] subc 0 mthd 0060 data beef0201
18:00 Soukyuu: according to pcsx2 i now lost about 25% of performance
18:01 imirkin: hmmmmm i wonder if there's some sort of resource leak in the nv50 driver :(
18:01 imirkin: can you flip kwin to not use GL for compositing?
18:01 imirkin: also, is restarting kwin enough to get your perf back, or do you need a full reboot?
18:02 Soukyuu: is ctrl+alt+f12 enough to disable GL usage?
18:02 Soukyuu: and no, kwin_x11 --replace doesn't solve it
18:02 imirkin: no idea what either of those would do
18:03 Soukyuu: first disables compositing, second restarts kwin
18:03 imirkin:avoids all applications that start with the letter 'k'
18:03 Soukyuu: heh
18:03 Soukyuu: even the kernel?
18:03 imirkin: linux :)
18:03 imirkin: and it's not an application
18:04 imirkin: anyways... if kwin_x11 --replace *really* restarts kwin, then it's unfortunate it doesn't solve the issue
18:04 imirkin: i.e. is it a fresh process and everything?
18:04 Soukyuu: as far as i know, yes
18:04 imirkin: just restart X :)
18:04 imirkin: could be some other application causing the leak
18:05 imirkin: anyways, my theory is that you're leaking vram
18:05 imirkin: which in turn causes things to go slower over time
18:05 orbea: could try it in a minimal wm to test if its something else
18:05 Soukyuu: can i see vram usage somewhere?
18:05 imirkin: nope
18:05 Soukyuu: too bad
18:05 imirkin: yes.
18:05 Soukyuu: what would running nvidia-settings on nouveau do?
18:06 imirkin: skeggsb: this request has probably been made before, but visibility into the VM would be great from debugfs... iirc mlankhorst had some semi-related patches
18:06 Tom^: Soukyuu: it would do nothing
18:06 imirkin: Soukyuu: it would cause you to see an error dialog about a missing NV-CONTROL X extension
18:06 Tom^: Soukyuu: maybe print a cute error to stderr
18:06 Soukyuu: would have been too easy
18:18 Soukyuu: imirkin: I think you might be right, logging out and logging in again made it work smoothly again
18:18 tinnuel: karolherbst: If I should not use bumblebee with nouveau, what can I use to switch between GPU and CPU?
18:18 imirkin: Soukyuu: ok... so next step is to figure out what's causing the leak
18:18 imirkin: Soukyuu: how much vram do you have btw?
18:19 Soukyuu: imirkin: 896MB
18:19 karolherbst: tinnuel: lsmod | grep nouveau
18:19 karolherbst: tinnuel: but I thought nouveau gets loaded :/ odd
18:20 Soukyuu: imirkin: I'm not quite sure it's an application though, i never had this issue with the blob
18:20 Soukyuu: the only thing I changed in my system recently is nouveau
18:20 Soukyuu: hmm and pcsx2
18:20 imirkin: Soukyuu: (a) blob is a lot better at dealing with this stuff, (b) i'm fully expecting this to be something in mesa leaking textures or something
18:20 Soukyuu: i see
18:21 imirkin: if you're looking for a high-quality driver, stick with the blob
18:21 Soukyuu: the first time it happened, i haven't used pcsx2 - so i dont think that's it
18:21 Soukyuu: ah well, the blob is kind of broken for 32-bit
18:21 imirkin: =/
18:21 Tom^: shouldnt be
18:22 tinnuel: karolherbst: nouveau is not listed when using "lsmod | grep nouveau" :/
18:22 Tom^: or rather i doubt it
18:22 Soukyuu: Tom^: https://github.com/PCSX2/pcsx2/issues/1355
18:23 karolherbst: tinnuel: then remove the blacklist entry
18:23 karolherbst: tinnuel: and reboot
18:23 Soukyuu: Tom^: I get like 1fps on it and perf showed the blob is what's wasting all the cycles
18:23 Soukyuu: Tom^: I'm not an expert on this stuff though
18:27 Tom^: Soukyuu: did you typo or is "lib32-libgl" installed because you need lib32-nvidia-340xx-libgl and lib32-nvidia-340xx-utils
18:27 Tom^: Soukyuu: but yea old blob is old ;p
18:28 Soukyuu: Tom^: I meant the nvidia-340xx version
18:29 Soukyuu: Tom^: double checked, I had "local/lib32-nvidia-340xx-libgl 340.96-1" installed. And yeah, legacy blob for a legacy GPU
18:39 imirkin: neat that nouveau works better than the blob
18:39 imirkin: looking at that bug, it looks like at least one of the issues is that it wants ARB_texture_barrier, while blob driver will only have NV_texture_barrier (at least that one)
18:57 tinnuel: karolherbst: removed blacklist entry in /etc/modprobe.d/bumblebee.conf and made sure that driver is set to nouveau in etc/bumblebee/bumblebee.conf. Rebooted. System started showing errors on noueau. Resulst of dmesg | grep nouveau is here: http://pastebin.com/hk1C1RsC .
18:58 imirkin: tinnuel: bumblebee + nouveau = fail
18:58 imirkin: although in this case, the failures are unlikely to be related
18:59 tinnuel: imirkin: any suggestions on what to replace bumblebee with? I thought it is the linux replacement for Nvidia Optimus
19:00 imirkin: nah
19:00 imirkin: it's just not necessary at all
19:00 imirkin: at least not with nouveau
19:00 imirkin: nouveau should auto-suspend the gpu when it's not in use
19:00 imirkin: and bring it back up when you use it
19:02 tinnuel: imirkin: interesting. I originally only had nouveau and no bumblebee. Did not seem to work either. :/
19:02 imirkin: right, like i said the failures seem unrelated :)
19:03 imirkin: but adding bumblebee to the mix just increases chances of fail.
19:09 karolherbst: imirkin: allthough in that case the kernel module fails to load
19:09 karolherbst: ohhh
19:09 karolherbst: I know
19:09 karolherbst: maybe nouveau forgets to POST the card
19:10 karolherbst: tinnuel: try to boot with nouveau.config=NvForcePost=1
19:10 karolherbst: tinnuel: or blacklist bbswitch
19:11 karolherbst: tinnuel: try to former first
19:11 karolherbst: *the
19:18 tinnuel: karolherbst: where would the nouveau.config be found? Also, I noted that under etc/bumblebee/bumblebee.conf PMMethod=auto. Don't know whether this is related.
19:19 karolherbst: tinnuel: doesn't matter much, you could just disable the bumblee service or uninstall it completly
19:19 imirkin: tinnuel: you can put it in your kernel cmdline, or add something like "options nouveau config=NvForcePost=1" to a modprobe.d/* file
19:20 karolherbst: and don't forget to update your initramfs after it
19:20 tinnuel: karolherbst: Just to doulbe check: (1) purge bumblebee (2) update initramfs?
19:22 karolherbst: tinnuel: (2) add the modprobe config (3) update initramfs
19:25 imirkin: or however you would normally get options to the kernel module you're loading :)
19:26 imirkin: diff people set their systems up differently.
19:41 tinnuel: karolherbst: I created the file modprobe.d/nouveau.conf and added the "nouveau.config=NvForcePost=1". But when trying to update initramfs using "sudo update-initramfs -u" it gives me an error: libkmod: ERROR ../libkmod/libkmod-config.c:635 kmod_config_parse: /etc/modprobe.d/nouveau.conf line 3: ignoring bad line starting with 'nouveau.config=NvForcePost=1'
19:47 karolherbst: tinnuel: options
19:47 karolherbst: tinnuel: options nouveau nouveau.config=NvForcePost=1
19:47 karolherbst: ohh wait
19:47 karolherbst: tinnuel: options nouveau config=NvForcePost=1
19:49 tinnuel: karolherbst: thanks. initramfs updated. will reboot now.
19:49 karolherbst: tinnuel: I hope you put in the last
19:50 karolherbst: and not my messed up ones :D
19:50 tinnuel: karolherbst: hehe. used "options nouveau config=NvForcePost=1" ;)
19:51 karolherbst: good
19:59 mupuf: imirkin_: cool, you implemented GL_ARB_robust_buffer_access_behavior :)
19:59 imirkin: mmmm
19:59 imirkin: not really
19:59 imirkin: i implemented a very tiny component of it
20:01 imirkin: i just looked through the spec... it has some annoying provisions
20:01 mupuf: what else is missing? We already have program isolation (yeah for hw contexts and VMAs)
20:01 imirkin: well, for one:
20:02 mupuf: the writes are not discarded yet though
20:02 imirkin: "Reads from unbound resources return zero and writes are discarded."
20:02 imirkin: but you could def architect a situation where you could read an unbound resource and have it work
20:03 mupuf: oh, yeah, this will require a bit more work
20:03 imirkin: basically you could go out of bounds on a texture array
20:03 imirkin: which would pick a previously-allocated TIC/TSC
20:03 imirkin: which hadn't been 0'd out
20:03 imirkin: i could zero stuff out in the TIC/TSC tables, but ... yeah, dunno.
20:04 mupuf: the write semantic is funny. You may either discard or overwrite data in your own process
20:04 mupuf:has no knowledge about TIC/TSC
20:04 imirkin: yeah, that part's handled
20:05 mupuf: it is very convenient that we do not need to check for how the context was created
20:05 mupuf: since the behaviour with this extension enabled is compatible with when it is not enabled
20:05 imirkin: well... if it reduces perf, we should look at how the context is created
20:05 imirkin: but the current stuff is fine
20:06 mupuf: but, if when we find a case where this becomes a performance issue, we will need to do something about it
20:06 imirkin: yeah. not likely.
20:07 mupuf: yeah, but I will never say never. Some applications are just a big WTF
20:08 karolherbst: SR3! :D
20:09 imirkin: i just mean... the protections i've added now aren't perf losers
20:09 imirkin: since the global memory access will be WAY heavier
20:09 imirkin: however if we start messing with texture state/caches, that could become more of a thing.
20:10 mupuf: imirkin_: that assumes that the application is not going to read over and over again the same data with a dynamically computed index
20:10 mupuf: so as the compiler could not optimize it
20:10 mupuf: but the cache would always be hot
20:10 mupuf: but as you said, unlikely
20:10 imirkin: heh. you could probably construct such a case, but even if you were explicitly going for it, i suspect you'd have a hard time
20:11 mupuf: yeah :D
20:11 mupuf: and maybe, like for image, nvidia will add instructions that will do the lookup in a cache for you
20:11 mupuf: intel added something like this in SKL IIRC
20:11 mupuf: not for graphics though
20:12 mupuf: like for image on maxwell*
20:12 imirkin: well, you do have access to the cache...
20:12 imirkin: both a local cache and a SM-wide cache
20:12 imirkin: or MP-wide
20:12 imirkin: i can never remember
20:12 imirkin: and you can flush it, etc
20:12 imirkin: but it's still a lot slower than, say, a register
20:13 mupuf: right
20:14 mupuf: would be nice to be able to annotate shaders with how many instructions are considered as "overhead" and check throughout the shaderdb if it ever amounts to a significant number of instructions
20:14 mupuf: but that will require serious changes to our compiler
20:14 mupuf: and there may not be a lot of those cases
20:15 imirkin: putting the cart before the horse
20:16 mupuf: it is way worse than this :D
20:16 mupuf: it is putting the cart hundreds of meters before the horses, and facing another direction too
20:18 imirkin: :)
20:19 mupuf: just saying this because that's what we sometimes see when analyzing shaders
20:19 imirkin: "we"?
20:19 mupuf: intel finland
20:20 mupuf: I work with the performance team
20:20 mupuf: https://patchwork.freedesktop.org/patch/79453/ <-- an example of thing that should not matter but has a real performance impact
20:21 imirkin: mupuf: does it?
20:21 imirkin: on a real application?
20:44 karolherbst: imirkin: well I don't think it matters in the end if this causes a significant perf impact in real applications, because stuff like that sums up and if you cause a 0.5% perf impact in general for each hack like that, after 20 hacks it is significant
20:46 imirkin: karolherbst: i doubt this is visible at all.
20:46 karolherbst: this is a mul for each cos/sin, right?
20:47 karolherbst: I count 607 sin and cos in saints row 3/4 + bioshock alone
20:47 imirkin: yeah, but rarely in a shader by themselves
20:48 imirkin: in many case you should be able to hide the latency
20:48 karolherbst: yeah, but you still add one instructions
20:48 karolherbst: *instruction
20:50 karolherbst: I am not saying that this alone decreases performance significantly, but there might be other things where you decrease perf for precision
20:50 karolherbst: or security or whatever
20:54 karolherbst: since ever nvidia also has an option to decrease quality to gain performance, never noticed what it does, but it indeed affected performance and this might be a collection of stuff like that
20:55 imirkin: unlikely
20:55 imirkin: it probably has to do with sampler quality settings
20:55 imirkin: and which algorithms are used for MSAA resolve
20:55 karolherbst: "Image Settings" on linux
20:55 karolherbst: "Use conformant texture clamping"
20:56 karolherbst: then there is also AA settings
20:56 mupuf: imirkin_: just a benchmark
20:56 karolherbst: antistrpoic filtering, but that's the classic one with AA
20:56 karolherbst: "Texture sharpening"
20:56 karolherbst: but that's all on linux
20:56 karolherbst: on windows there are like 4x more options
20:56 mupuf: imirkin_: yeah, sometimes, the latency can be hidden
20:57 mupuf: but not always, especially if the program is ALU-bound already
20:57 imirkin: mupuf: is there non-benchmark software that's ALU-bound and uses sin/cos?
20:57 karolherbst: I am sure the eon based ones are
20:57 mupuf: imirkin_: compute loads?
20:58 karolherbst: they usually only show high core load
20:58 karolherbst: and minimal memory loads
20:58 mupuf:has limited knowledge about how recent games use the hw
20:58 karolherbst: mupuf: crazy
20:58 karolherbst: :D
20:58 mupuf: will get to this at some point
21:38 Calinou: <karolherbst> "Use conformant texture clamping"
21:39 Calinou: heh, the tooltip mentions Quake 3
21:39 Calinou: it's such up to date! :>
21:43 karolherbst: Calinou: yeah well, on linux there isn't much to change anyway, but on windows you have tons of options for whatever
21:45 Calinou: yeah, I wish graphics drivers on GNU/Linux had more options :(
22:22 karolherbst: ehm...
22:23 karolherbst: imirkin: where would I want to put such an opt? https://gist.github.com/karolherbst/3efc680f1147cba2206a84043918c87c
22:23 imirkin: algebraic
22:23 imirkin: i can't tell what the opt is
22:23 imirkin: but it looks algebraic in nature :)
22:23 karolherbst: yeah well
22:24 karolherbst: look at the first two instructions
22:24 imirkin: (a - b) * (b - a)
22:24 karolherbst: this is a bit more tricky then a simple algebraic opt
22:24 karolherbst: nope, look carefully
22:24 karolherbst: the defs have nothing in common anymore
22:24 imirkin: becomes |a - b| * |a - b|
22:25 imirkin: i don't know that that's true
22:25 imirkin: oh
22:25 imirkin: i see
22:25 karolherbst: yeah...
22:25 imirkin: (a - b)^2 == (b - a)^2
22:25 imirkin: algebraic :)
22:26 imirkin: well
22:26 karolherbst: right
22:26 imirkin: it's actually a global opt
22:26 imirkin: so...
22:26 imirkin: we need a GVN
22:26 imirkin: or something.
22:26 imirkin: a clever CSE pass
22:26 karolherbst: GLobalCSE just smarT?
22:26 karolherbst: ahh
22:26 imirkin: no
22:26 imirkin: LocalCSE
22:26 imirkin: OR
22:26 imirkin: you could be smart about it
22:26 imirkin: and somehow normalize things
22:26 imirkin: so that CSE auto-picks up on it
22:26 imirkin: i.e. make sure that they BOTH appear as a - b
22:27 imirkin: instead of one as b - a and one as a - b
22:27 imirkin: er hm. no, that won't work.
22:27 karolherbst: right
22:27 imirkin: but you can teach CSE to be msarter
22:27 karolherbst: it more like -a + b and +a - b
22:27 karolherbst: and then just ^2
22:27 imirkin: it already knows about commuttative ops
22:27 imirkin: you could teach it about some more things.
22:27 imirkin: dunno.
22:28 karolherbst: mhh
22:28 imirkin: i think ultimately we need a GVN, which could assist in such matters. but i'm not really sure, it's a bit out of my knowledge range
22:28 imirkin: (GVN = global value numbering)
22:28 karolherbst: yeah
22:28 karolherbst: but I think it helps in other situations better
22:28 karolherbst: allthough
22:29 karolherbst: in fact we need something like that:
22:29 karolherbst: look at a source
22:29 karolherbst: and see if you get to the same result with a mod by using an older source
22:30 airlied: is there amny of you leaving in a group? I heard where you were going via rumours
22:30 airlied: oops
22:30 airlied: ignore me
22:32 karolherbst: imirkin: GVN sounds interessting though
22:32 karolherbst: imirkin: any estimation how much work this would be?
22:37 karolherbst: imirkin: well in the end we know how values are used
22:45 karolherbst: well
22:45 karolherbst: this one is easy though, just need to generalize it
22:46 karolherbst: imirkin: https://gist.github.com/karolherbst/3efc680f1147cba2206a84043918c87c
22:46 karolherbst: and the other opts will take care of everything else
22:54 imirkin: karolherbst: you want neg, not mov. but yeah.
22:57 karolherbst: right..
22:57 karolherbst: that would be part of localCSE then?
22:57 imirkin: mmmmmmmmmmmm
22:57 imirkin: right
22:57 imirkin: not sure how that'd get integrated tbh
22:57 karolherbst: imirkin: maybe Instruction::isResultEqual could be better
22:58 karolherbst: and also return a mod
22:58 karolherbst: in the meaning of: if this mod is applied, the result is equal
22:58 imirkin: maybe. i'm open to suggestions.
22:58 imirkin: i don't have a great way to do it
22:58 karolherbst: well, I will think a bit
23:11 karolherbst: imirkin: or we just handle a few known cases in LocalCSE directly, but there might be a lot, allthough we still have to declare that either way
23:23 karolherbst: imirkin: yeah I think we want to have this specialized and do switches like in the other opts for this, because we can't do this in a general way
23:25 karolherbst: and just have a function for this: bool haveEqualResultsWithMod(Instruction*a, Instruction*b, Modifier &mod) and mod gets set to what we need to map "mod b a" if the function returns true
23:53 karolherbst: imirkin: I think this should be a seperated pass from localCSE, because it does something else. localCSE also replaces the defs, and I just want to reuse already calculated values a bit smarter
23:57 karolherbst: imirkin: by the way, does nouveau do Loop-invariant code motion?
23:57 karolherbst: that sounds like something trivially enough to implement
23:58 imirkin: nope
23:58 imirkin: but it could quite easily, yeah
23:58 karolherbst: well
23:58 karolherbst: For example, if all reaching definitions for the operands of some simple expression are outside of the loop, the expression can be moved out of the loop.
23:58 karolherbst: ^ quote from the wiki üage
23:58 karolherbst: *page
23:58 imirkin: right