01:45 RSpliet1: orbea: if you're using X.org or mesa, that probably means there's a sw bug
03:57 karolherbst: hakzsam: do you have something ready or maybe any ideas about measuring stalls on the gpu?
04:19 RSpliet: karolherbst: if you do your experiments with the official driver (not sure if you can insert NOPs easily there), use nvprof to monitor the cycle count and the active_cycles signal
04:19 karolherbst: RSpliet: from envytools?
04:19 RSpliet: no, from nvidia
04:19 karolherbst: ohh wait, I have it installed already
04:20 RSpliet: doesn't work with nouveau
04:20 karolherbst: I already was thinking about downloading the linux debugger tools from nvidia, but this requires membership and stuff and I don't know if thats a smart move
04:20 RSpliet: but if you then play with your cuda params (workgroup size, warp size, whatever you can tweak), you can both get an indication of the individual instruction latency and the effectiveness of hiding that latency when having multiple warps in flight
04:22 RSpliet: not sure how useful that is going to be in practice, but it might also help find the right counters so you can do it in nouveau with hand-crafted shaders (not sure how well NVIDIA lets you execute unoptimised shaders with NOPs :-P)
04:23 karolherbst: ohh nvprof is only for cuda :/
04:25 RSpliet: well, Cuda is a lot easier to control than OpenGL, I don't see why that's a problem
04:25 RSpliet: you can try and feed it your own PTX
04:26 RSpliet: hand-disoptimised
04:26 karolherbst: mhh yeah that should help finding out the latencies and stuff
04:27 RSpliet: of course if hakzsam has ways of accessing those counters from nouveau you might be able to feed it shaders in different ways :-)
04:28 karolherbst: my first target is to find what differes between nouveau and nvidia first though, so that I know what is a good idea to focus on (and even if there are 10 things equally worthy, it is still good to know those things)
04:29 karolherbst: like I want to know how many instructions nvidia and nouveau executes per frame or how many stalls there are and such things
04:29 hakzsam: karolherbst, except the perf counters which are already exposed by the HUD, no
04:29 karolherbst: hakzsam: k
04:29 hakzsam: karolherbst, my plan is to merge my stuff just after the compute shaders series
04:30 hakzsam: karolherbst, in one month I hope
04:30 karolherbst: hakzsam: is there a nice way to run those counters on the nvidia driver?
04:30 hakzsam: karolherbst, nvprof and cupti are fine
04:30 hakzsam: but cuda only
04:30 karolherbst: :/
04:30 hakzsam: karolherbst, you can start to play with them, it's easy
04:31 karolherbst: yeah I know, but I would like to test things out on "real world" test cases, or rather run some games I somehow care about and see where the issues are
04:57 karolherbst: at least we should be able to tweak RA a bit. For example it should try harder to have the same reg for mads dest and src2 or select regs in a way, that we don't get those moves for alignment to fetch or tex instructions
04:58 karolherbst: this is something I noticed in most shaders
04:58 No1RL355: any ETA for wayland support in nouveau?
04:59 karolherbst: ?
04:59 karolherbst: why shouldn't nouveau support wayland?
05:01 No1RL355: okay, then it was unstable wayland :/
05:02 karolherbst: well it should work, if you have any issues then you might try to find out what it is
05:03 No1RL355: okie, thanks
05:05 pq: No1RL355, nouveau does not need to specifically support wayland, Mesa does it in generic code. However, graphics stacks built with wayland might have components that hit issues with nouveau, since they use things X11 stacks don't. Broadly speaking.
06:25 imirkin: orbea: if you can repro, please run mpv inside gdb and get a full stacktrace (type "bt full")
06:26 imirkin: chrisb2244: nouveau supports the 540MHz frequency of DP 1.2, but it does not, currently, support MST.
06:30 orbea: I cant reproduce consistently, but I will try later
06:30 urjaman: https://xkcd.com/583/
08:48 karolherbst: okay, zculling alignment is height 0x20 and width is 0x50 :)
08:49 imirkin: unlikely.
08:50 imirkin: more likely is that it's based on the tiling parameters of the depth buffer
08:52 karolherbst: mhh
08:52 karolherbst: I didn't check that
08:52 karolherbst: but yeah, could be
08:52 karolherbst: in the example I created it was like that though
08:52 karolherbst: I still have the mmt
08:52 karolherbst: where do I find the tiling parameters of the depth buffeR?
08:52 imirkin: check what's written into ZETA_TILE_MODE
08:53 imirkin: is the tiling mode 0x20?
08:53 karolherbst: mhhh, I don't find it in the trace :/
08:54 imirkin: did you not have a depth buffer attached?
08:54 imirkin: search for ZETA
08:54 karolherbst: yep, there is several stuff
08:54 imirkin: oh. it got renamed
08:54 imirkin: ZETA_BLOCK_DIMENSIONS
08:55 karolherbst: 0, 0x4, 0
08:55 imirkin: hm odd.
08:55 imirkin: that means height 128 tiles.
08:55 karolherbst: well I was slowly downsizing the window and just checked all widths and heights of the zcull stuff
08:55 imirkin: ok
08:55 imirkin: so you're probably right then
08:55 imirkin: or at least not completely wrong.
08:56 imirkin: fwiw calim agreed with the height 0x20 thing
08:56 karolherbst: k
08:56 imirkin: but aligned width up to 0xe0
08:56 imirkin: and not 0x50
08:56 karolherbst: yeah
08:57 imirkin: perhaps it's different based on... who knows what
08:57 karolherbst: mhhh but why 0xe0...
08:57 karolherbst: odd
08:57 imirkin: why 0x50 :)
08:57 karolherbst: I have three width sampes: 0x500, 0x4b0, 0x460
08:57 karolherbst: kind of tells me it is 0x50
08:58 imirkin: yeah dunno
08:58 imirkin: we can figure it out later, go with 0x50 for now :)
08:58 karolherbst: yep
08:59 imirkin: could be based on something dumb like memory layout
08:59 karolherbst: but now I have to still figure out how to enable that validation stuff, because I always ran into segfaults
08:59 karolherbst: this "nv50_miptree(sf->base.texture)" gave me sometimes 0x0
08:59 imirkin: well you need to make a zcull buffer
08:59 karolherbst: and I was hoping I could just use the depth buffer
08:59 imirkin: oh sure, well there might not be a depth buffer attached at all
08:59 imirkin: no, this is a separate item
09:00 karolherbst: okay
09:00 karolherbst: well it was within unigine, I tried out different invalidation flags, but nothing really worked out
09:00 karolherbst: but I also had no idea what I was doing there
09:01 imirkin: yeah
09:01 imirkin: the whole resource management machine is... tricky
09:01 karolherbst: I got the idea though, just the things doesn't tell me much...
09:02 imirkin: right so
09:02 imirkin: step 1
09:02 karolherbst: anyway I just need a way to let the zcull validate function be called often enough so that I can play around in there
09:02 imirkin: allocate a zcull buffer
09:02 imirkin: probably attach it alongside the fb state
09:02 imirkin: maybe only init it when there's a clear
09:03 imirkin: [that involves the depth buffer]
09:04 imirkin: or maybe store it alongside the depth texture in the miptree structure
09:05 karolherbst: does nouveau do EarlyZ?
09:05 imirkin: the chip does, yeah
09:05 imirkin: when allowed, or when forced
09:05 karolherbst: okay
09:05 karolherbst: so the depth buffer is uploaded the the gpu does its EarlyZ thing with that
09:06 imirkin: (except the ext for forcing it is images, which isn't enabled yet)
09:06 imirkin: right
09:06 imirkin: only way it can
09:06 imirkin: zcull just makes that faster.
09:06 karolherbst: the EarlyZ thing?
09:06 imirkin: i think so yeah
09:07 karolherbst: yeah makes sense
09:07 karolherbst: I was just hoping that Zcull could drop more work somehow
09:07 karolherbst: ohhh
09:07 karolherbst: I think it reduces needed bandwidth too
09:07 karolherbst: HiZ does actually
09:10 imirkin: and less fixed function work
09:11 imirkin: but it shouldn't reduce the number of shader invocations
09:11 karolherbst: yay, I found the zcull patent
09:12 karolherbst: is there any danger in reading it?
09:12 imirkin: misinformation
09:13 karolherbst: k
09:18 karolherbst: okay, so in the end I would have to allocation some vram on the gpu, put there the depth buffer for zcull (what ever has to be moved into that) tell the gpu where this buffer is and profit?
09:18 imirkin: right
09:18 imirkin: looks like the current thing expects the zcull buffer to be at the end of the depth buffer
09:18 imirkin: note the align(size, 1<<17)
09:18 imirkin: that's obviously a short-term hack
09:18 karolherbst: and if I put random stuff into the zcull buffer, there should be random stuff disappearing from the rendered stuff as well?
09:19 imirkin: mmmm... sort of
09:19 imirkin: the tricky bit is when to invalidate the zcull buffer
09:19 karolherbst: then I want to do this first, because otherwise I won't know if the hardware is doing zcull at all or not, not that I hack something up and it doesn't change a thing
09:19 imirkin: the way we e.g. blit onto depth buffers won't cause the zcull buffer to get properly updated
09:21 imirkin: so basically the initial thing, i think, will be to only enable it if you do a full depth clear
09:21 imirkin: which will properly initialize it
09:21 karolherbst: okay, maybe we should start with when nvc0_validate_zcull should be called, because that's somehow a critical thing I didn't managed yet
09:21 karolherbst: ohhhh
09:21 karolherbst: ...
09:21 karolherbst: k
09:22 imirkin: if you like i can write an (untested) patch tonight which sorta does what i mean
09:22 karolherbst: from a pdf slides from an nvidia and an amd guy: "Always Clear Z buffer to enable ZCULL"
09:22 imirkin: and then you can play with it from there
09:22 karolherbst: would be awesome
09:23 imirkin: ok
09:23 imirkin: not right now tho
09:24 karolherbst: yeah no worries, I still have stuff to cleanup
09:25 karolherbst: by the way, I sent the saturate and the neg(and(set, 1))) to set thing to the ML yesterday
09:25 karolherbst: but I really doubt there will be much in shader-db for the last one
09:27 imirkin: yeah, just want to test them locally on nv50 before pushing
09:28 karolherbst: k
09:56 wvvu: imirkin: another culprit, "mplayer using -vo vdpau". This time is different because it allows to move the mouse.
10:33 karolherbst: hakzsam: in which range should be the achieved_occupancy be?
10:33 karolherbst: i am asking because it is 1 for me and I have no clue if that's a good or a bad thing
10:38 pmoreau: karolherbst: I would go with [0,1]. I can't see what else it could be, apart from a percentage.
10:39 pmoreau: In that case, having an occupancy of 1 is a good thing
10:40 karolherbst: yeah
10:40 karolherbst: I guessed so much, but if it's percentage, then it would make more sense to display it as such
10:41 pmoreau: IIRC, there was/are some issues with gallium HUD which can't handle percentages
10:41 pmoreau: s/was/were
10:42 karolherbst: well it works for me usually
10:42 karolherbst: other values are displayed as percentage just fine
10:43 karolherbst: like metric-branch_efficiency
10:43 pmoreau: You can have a look at http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#occupancy-calculator for finding factors impacting the occupancy; mostly designed for CUDA, but can still be interesting
10:43 pmoreau: Ah, don't know then.
10:49 binaryplease: Hello, I'm having trouble with nouveau + displaylink setup. I got a internal screen, one connected to HDMI and one connected to a displaylink usb adapter (udl driver). As soon as I switch on the display monitor the performance gets extremly bad on ALL screens. I get the screen to show up with "xrandr --setprovideroutputsource 1 0 " System info: http://vpaste.net/pUWK0 The propietary nvidia driver does not
10:49 binaryplease: work well with displaylink.
10:50 imirkin: binaryplease: pastebin dmesg
10:50 imirkin: and xorg log
10:50 binaryplease: http://vpaste.net/8JCjf
10:50 hakzsam: karolherbst, achieved_occupancy is between 0 and 1, 1 is good, 0 is really bad :-)
10:50 imirkin: oh fun, you have one of those crazy *dual* nvidia gpu laptops
10:51 karolherbst: hakzsam: yeah, but shouldn't it be more like 0 and 100?
10:51 imirkin: and where the intel gpu is hard-disabled in the bios. excellent.
10:51 karolherbst: hakzsam: or are 0 and 1 the only possible values?
10:51 hakzsam: karolherbst, but.. as you can see the HUD doesn't support floats... so this should be a percentage
10:51 karolherbst: hakzsam: yeah, I figured :)
10:51 hakzsam: karolherbst, I would prefer to have floats but it's my opinion
10:51 binaryplease: imirkin: http://vpaste.net/ZEPUq xorg.log
10:51 karolherbst: hakzsam: yeah but it isn't displayed as a percentage
10:52 binaryplease: imirkin: I tried the propietary drivers before
10:52 imirkin: ok cool, looks like both of your gpu's come up ok
10:52 imirkin: that xorg log is from nvidia blob
10:52 hakzsam: karolherbst, right, I know
10:53 karolherbst: metric-branch_efficiency works as expected though
10:53 binaryplease: imirkin: should'nt be the logs in /var/log/Xorg.0.log ?
10:53 hakzsam: karolherbst, yeah
10:53 imirkin: binaryplease: should. but perhaps you're using systemd which helpfully puts the logs in god-knows-where
10:54 hakzsam: karolherbst, I compute the same values as NVIDIA, that's why achieved_occupancy is [0,1]
10:54 karolherbst: okay...
10:54 karolherbst: so having a 1 there is good?
10:54 karolherbst: mhhh
10:54 hakzsam: karolherbst, yeah, it's really good
10:54 karolherbst: weird, I would assumed that nouveau doesn't hit 1 at all
10:54 karolherbst: but it does with heaven
10:54 karolherbst: all the time
10:55 hakzsam: yeah, probably
10:56 hakzsam: karolherbst, I'll improve these metrics and add more as soon as possible
10:56 karolherbst: k, thanks :)
10:56 hakzsam: with the graphics counters I'll be able to expose a bunch of new metrics :-)
10:56 karolherbst: but occupancy means that there are some threads and warsp that instruction latencies are completly hidden by these?
10:56 karolherbst: yay
10:57 binaryplease: imirkin: and yeah its a lenovo Y500 with SLI. maybe I can us that somehow.
10:57 binaryplease: imirkin: there it is http://vpaste.net/N5Srj
10:58 hakzsam: karolherbst, it's regarding the number of active warps and the max. number of wraps per MP (ie. active_warps / active_cycles) / max. number of wraps on a MP)
10:58 AndrewR: hello. after updating libdrm I can't build mesa git (gcc 4.8.4, 32-bit): http://fpaste.org/317629/
10:58 imirkin: binaryplease: ok so... i don't think you want --setprovideroutputsource 1 0
10:58 imirkin: binaryplease: can you pastebin the output of xrandr --listproviders ?
10:58 karolherbst: AndrewR: update libdrm
10:59 binaryplease: imirkin: http://vpaste.net/wnutq
10:59 karolherbst: AndrewR: ohh sry
10:59 karolherbst: AndrewR: you have to update mesa then :/
10:59 karolherbst: mhh weird
10:59 binaryplease: imirkin: I used that command because otherwise I coundnt select the screen in xradr or arandr
10:59 karolherbst: I should learn to read more carefully
11:00 karolherbst: AndrewR: I think I had this issue too, you might want to remove libdrm
11:00 karolherbst: AndrewR: and check if all files are gone
11:00 imirkin: binaryplease: yeah i get it... i just assumed nouveau would be the first 2 providers
11:00 karolherbst: AndrewR: not that there are some old header files
11:01 AndrewR: karolherbst, thanks, I just make install-strip them over ...will remove and reinstall.
11:03 imirkin: binaryplease: [ 65.299] (EE) Failed to initialize GLX extension (Compatible NVIDIA X driver not found)
11:03 imirkin: that's probably not *great*
11:03 imirkin: sounds like you didn't completely nuke nvidia blob
11:04 imirkin: but i also don't know if that'd be causing your issues
11:04 imirkin: binaryplease: otherwise you'll need to try to find someone more knowledgeable than me on all this stuff... maybe airlied .
11:05 imirkin: [he's def more knowledgeable, just perhaps not interested in figuring out your issue]
11:06 binaryplease: imirkin: Well that would be great, if he is intrested. I can provide all information needed
11:08 imirkin: binaryplease: oh wait, this sounds familiar.... you need this patch for nouveau:
11:08 imirkin: binaryplease: http://cgit.freedesktop.org/nouveau/xf86-video-nouveau/commit/?id=b824d36c28124955eda4aced5e637aa75eea4d6c
11:08 imirkin: maybe.
11:08 imirkin: perhaps that was for a diff issue
11:10 binaryplease: imirkin: I must admint i dont know how to do that
11:11 imirkin: binaryplease: what distro?
11:11 binaryplease: linux arch
11:12 imirkin: any arch users around? i think there's a way to patch + rebuild, but i don't use that distro.
11:27 binaryplease: imirkin: ok I think I figured out how to compile from source and make a package. Where can I get the complete drmmode_display.c taht is patched?
11:27 imirkin: binaryplease: just apply the patch :)
11:30 binaryplease: Just copy the lines on that site to a file, call it something.patch and do 'patch drmmode_dysplay.c < somethong.patch'?
11:30 imirkin: or click on the patch link...
11:31 imirkin: ideally the patch would be applied as part of the build
11:31 imirkin: and not by you manually
11:31 orbea: imirkin: thte patch doesn't seem to apply against this which arch is using... http://xorg.freedesktop.org/archive/individual/driver/xf86-video-nouveau-1.0.12.tar.gz
11:32 orbea: patch: **** malformed patch at line 6: if (max_height < iter->mode.VDisplay)
11:32 imirkin: orbea: well, it then you're doing something wrong
11:32 orbea: patch -Np1 -i ../drmmode_display.c.diff
11:37 orbea: yea, idk what I'm doing wrong here, doesn't patch manually either
11:38 imirkin: perhaps you copy-pasted it
11:38 imirkin: instead of downloading the patch?
11:40 orbea: that is it....oops
11:41 imirkin: orbea: it's only relevant in reverse-prime scenarios btw
11:42 orbea: well, here is a PKGBUILD with the patch applied, I dont have any nvidia cards here so I cant test it beyond compiling... http://dpaste.com/0T67K4C
11:44 orbea: binaryplease: yaourt -G xf86-video-nouveau , then replace the PKGBUILD with the above one and put the patch in that same directory with the name 'drmmode_display.c.diff' and then run 'makepkg', install missing dependencies and then run makepkg again...
11:44 orbea: install the pkg directly with pacman -U xf86-video-nouveau-1.0.12-1-x86_64.pkg.tar.xz
11:46 orbea:somehow learned to use arch while not liking it at all...
11:47 glennk: imirkin, aside from some old mb pros, what laptops have two nvidia gpus?
11:47 imirkin: glennk: apparently some lenovo y500's... a pair of GK107's
11:48 binaryplease: orbea: That worked and is installed
11:48 binaryplease: thank you
11:48 orbea: cool :)
11:49 imirkin: but did it help :)
11:49 binaryplease: imirkin: So I reboot now and see what happens or just quit the xserver?
11:49 imirkin: binaryplease: just restart X
11:49 imirkin: binaryplease: btw, after the issues happen, would be great to get a copy of dmesg and xorg logs...
11:49 imirkin: you might be able to ssh in or whatever
11:51 binaryplease: imirkin: I killed and restarted x. Shall I do the xrandr --setprovideroutputsource 1 0 now or not? because everything is fine untill I start the third screen and that works only with that command
11:52 imirkin: right, go ahead
11:52 glennk: imirkin, ah so the second one is on a removable card
11:52 imirkin: glennk: mmaybe...
11:52 imirkin: you mean those MXM thingies?
11:52 glennk: lenovo ofc call it something else, but probably yes
11:53 binaryplease: imirkin: YEAH! Thats way better! NO lag at all. The only issue is that the cursor flickers now
11:54 imirkin: binaryplease: that's expected. first off, there's no hw cursor on reverse prime. secondly udl has no hw cursor :)
11:54 glennk: hmm, apparently not, some custom cartridge thing
11:54 imirkin: binaryplease: or you mean it flickers on the nvidia-powered screens too?
11:54 binaryplease: imirkin: it flickrs on all screens
11:55 imirkin: that's... unexpected. i think.
11:55 binaryplease: imirkin: It flickers also in the exact same rate that the led on the displayink adapter flashes
11:55 imirkin: hehe
11:56 imirkin: just sync your blinking with it, and you won't notice at all :)
11:56 binaryplease: imirkin: actually it ONLY blinks on the nvidia monitors
11:56 binaryplease: the display link screen = no flickering at all
11:56 imirkin: weird.
11:56 imirkin: airlied_: ---^ thoughts?
12:00 binaryplease: imirkin: also one more question: Can I just buy another one of these adapter and connect it or is there any limiting?
12:00 imirkin: you mean displaylink?
12:00 imirkin: it's USB, which has limited bandwidth per separate bus
12:03 binaryplease: yes displaylink. I though there might be problems if I add a 3rd (actually 4th counting sli) graphics adapter
12:04 binaryplease: also these guys seem to have the same problem https://bugs.launchpad.net/ubuntu/+source/xorg/+bug/1278223
12:05 imirkin: in THEORY it should all work
12:05 imirkin: in practice... no clue
12:05 imirkin: various bugs related to this stuff have been fixed, various more remain
12:06 binaryplease: hmm the flicker is gone if I kill compton
12:08 imirkin: some compositor redirection something issue... dunno
12:25 karolherbst: fun times, there are more spilling related crashes :/
12:27 imirkin: stop spilling!
12:27 binaryplease: imirkin: Is it normal to have a very high processor load with this adapters? htop show Xorg in first position with about 30% cpu usage (intel i7)
12:28 imirkin: binaryplease: it might be doing something dumb. it's definitely composting the cursor "by hand", but that shouldn't be too bad
12:28 imirkin: binaryplease: you're going to have to be more specific as to where the time is going... maybe profile it with 'perf'
12:28 karolherbst: https://github.com/karolherbst/mesa/blob/master/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp#L3100
12:28 imirkin: binaryplease: i suspect it's creating/destroying buffers and going mega-slow
12:28 karolherbst: r is 1073741824
12:28 imirkin: karolherbst: it shouldn't even try emitting
12:29 karolherbst: and I got a "ERROR: no viable spill candidates left" before
12:29 imirkin: karolherbst: if spilling fails, it should just abort the compile
12:29 karolherbst: there are three tries I assumed?
12:30 karolherbst: b is also 1073741824
12:30 binaryplease: imirkin: what info from perf do you need?
12:30 karolherbst: sadly the other stuff is optimized away ....
12:31 imirkin: binaryplease: where the cpu time is going
12:34 binaryplease: imirkin: this is perf tops's output http://vpaste.net/ZAxGf udl seems to be taking a huge amount of cpu
12:35 binaryplease: without doing anything on the screen besides showing a wallpaper
12:35 imirkin: that's... not great
12:35 glennk: is that a buffer from a discrete gpu?
12:36 imirkin: glennk: it's shared, so it should be in sysmem
12:36 glennk: numbers like that look like uncached reads, possibly also over pcie
12:36 imirkin: binaryplease: sorry, i dunno if that's typical or not
12:37 imirkin: binaryplease: i think it's pretty uncommon to do nvidia + udl
12:37 binaryplease: imirkin: well I dont know how to connect a 3rd monitor otherwise :)
12:38 imirkin: if nouveau supported DP-MST...
12:38 imirkin: but it doesn't =/
12:38 imirkin: you should be able to get a dock for your laptop
12:39 binaryplease: imirkin: it has no plug for dockingstations
12:39 imirkin: sadness
12:40 imirkin: is one of the outputs DP?
12:41 binaryplease: The display link one is
12:41 imirkin: heh
12:41 imirkin: that one doesn't count :p
12:42 binaryplease: no only hdmi and usb and vga
12:42 karolherbst: seriously gcc....
12:42 karolherbst: I recompiled with -Og and -g2 but still "v->reg.size" is optimized out
12:42 karolherbst: ....
12:42 imirkin: karolherbst: -O0
12:43 imirkin: binaryplease: hah, that's the worst.
12:43 karolherbst: imirkin: well if O0 fixes that, then Og is broken
12:43 imirkin: Og has nothing to do with it
12:43 binaryplease: hm performance is really bad, if I leave the something like htop ruinning on the display port screen like htop, I get noticable lag and tearing on all other screens
12:43 karolherbst: it sure has, because it shouldn't annoy me while I debug something
12:44 karolherbst: -Og: "Optimize debugging experience."
12:45 imirkin: karolherbst: -Og doesn't make anything worse...
12:45 imirkin: it's unrelated to the opt level...
12:46 karolherbst: Og is an opt level
12:46 imirkin: do you have a -O2 in there somewhere?
12:46 karolherbst: no
12:47 karolherbst: CFLAGS: -Og -g2 -ggdb2 -Wall -std=c99 -Werror=implicit-function-declaration -Werror=missing-prototypes -fno-strict-aliasing -fno-math-errno -fno-trapping-math -fno-builtin-memcmp
12:47 imirkin: hm ok. dunno.
12:47 imirkin: if you want everything to be there, use -O0
12:48 karolherbst: yeah, will do
12:48 karolherbst: but Og really shouldn't optimize such values away though :/
12:49 binaryplease: Can I make any use of the second sli card with nouveau? (gt 650m)
12:52 karolherbst: yay, -O0 works
12:52 karolherbst: as I thought, data.id is messed up big
12:53 karolherbst: well the entire data seems to be messed up
12:56 karolherbst: imirkin: did you have a solution upstreamed for this? https://github.com/karolherbst/mesa/commit/bd4a130db2f839fb661b6f5a1abc7b065b94bf7e
12:56 imirkin: karolherbst: i think so, yeah
12:56 karolherbst: I really don't know what the result of this way though
12:56 karolherbst: k
13:04 karolherbst: https://gist.github.com/karolherbst/c7d692a02a8dc5132e1a I've added a merge->print() in GCRA::resolveSplitsAndMerges after merge = *it
13:04 karolherbst: somehow that looks wrong
13:06 imirkin: karolherbst: no viable spill candidates means "shader's messed up"
13:06 imirkin: karolherbst: what part looks wrong? the fact that it's not allocated? or the other merges?
13:07 karolherbst: I didn't see r1077 being defined, but I guess this could be RA doing
13:07 imirkin: yeah
13:07 imirkin: and failing to allocate
13:07 karolherbst: I guess this is another memory corruption deep inside RA
13:08 imirkin: not memory corruption
13:08 imirkin: just... failure.
13:08 imirkin: i'm guessing there are some call's in there?
13:08 karolherbst: well I think RA is run more than one if it fails
13:08 imirkin: we currently don't properly handle pre-allocated nodes
13:09 imirkin: i have a patch, but it's not quite there yet
13:09 karolherbst: ohh there is a todo
13:10 karolherbst: okay
16:36 Umeaboy: Hi!
16:38 Umeaboy: I have the GeForce GTX 850M and the version of x11-server-xorg is 1.16.4 so that's updated.
16:39 Umeaboy: The problem with installing either Mageia or Ubuntu is that the standard kernel 3.19 freezes entirely if I don't blacklist the nouveau driver.
16:39 Umeaboy: I was wondering if this has been fixed upstream already.
16:40 imirkin: Umeaboy: what gpu is that? lspci -nn -d 10de: should tell you
16:40 Umeaboy: 01:00.0 3D controller [0302]: NVIDIA Corporation GM107M [GeForce GTX 850M] [10de:1391] (rev a2)
16:41 imirkin: kernel 3.19 should have had *very* basic support for it... no accel at all
16:41 imirkin: kernel 4.1+ should have somewhat better support, including some accel
16:41 Umeaboy: I know, but it doesn't even work in 4.15 as standard.
16:41 Umeaboy: I hav to add it anyway.
16:42 Umeaboy: have
16:42 imirkin: there was also an issue where we wouldn't run the init tables on it sometimes
16:42 Umeaboy: You need the Xorg.conf?
16:42 imirkin: which might have been fixed later on
16:42 imirkin: nope
16:42 imirkin: 4.15? that's not a thing quite yet
16:42 Umeaboy: Sorry. Missed a number in between.
16:42 Umeaboy: 4.1.15-desktop-2.mga5 is my kernel.
16:43 imirkin: ah ok
16:43 imirkin: i'd try a fresh 4.4 kernel and see how it goes
16:45 Umeaboy: The newest unstable kernel that Mageia offers is 4.4.1.
16:46 imirkin: cool
16:46 imirkin: however just as fair warning, there's no reclocking on maxwell
16:46 imirkin: so it won't actually be any faster than the intel gpu
16:46 imirkin: but on the bright side, nouveau should be able to suspend your chip which should save power :)
16:56 Umeaboy: imirkin: If I'm not misstaken the linus kernel is better used in laptops right?
16:58 imirkin: as opposed to?
16:59 Umeaboy: kernel-desktop-4.1.15-2.mga5
17:00 imirkin: ah, well, like i said, try a later kernel.
17:04 Umeaboy: imirkin: The unstable kernel won't build in the stable environment due to deps not up to date. I can manage this temporary solution until the 4.4.1-kernel is released into Core Updates.
17:04 Umeaboy: I was just curious as to if the issue I have reported with the kernel not booting without blacklisting the driver.
17:09 imirkin: Umeaboy: that's too broad. if you provide a dmesg with the failed boot, perhaps i could tell. but really the best way to tell is to just update and see.
17:17 Umeaboy: Right.
17:17 Umeaboy: Gotta go.
19:02 orbea: imirkin: nouveau froze my system by spamming dmesg excessively. I wrote about 10 seconds of it to a file through a ssh and got a 100 mb file. lol. http://ks392457.kimsufi.com/orbea/stuff/nouveau-dmesg.xz Happened trying to run those videos with mpv again...sorry no backtrace yet...
19:06 imirkin: wow, the ctxsw got really angry at you
19:06 imirkin: FECS ucode error 2
19:57 jeremySal: imirikin: So for the fragment_shader_interlock extension (https://developer.nvidia.com/sites/default/files/akamai/opengl/specs/GL_NV_fragment_shader_interlock.txt) the idea is that normally fragment shaders are executed in arbitrary order, but with this extension they are guaranteed to execute in rasterization order?
19:57 jeremySal: imirkin: at least for the enclosed block of code?
19:58 imirkin: something like that
19:58 imirkin: btw, i think skeggsb is going to look at that maxwell texture header thing
19:58 imirkin: not sure how far you'd gotten with it
19:59 jeremySal: oh ok, I'd only looked at the code
19:59 imirkin: ok
19:59 jeremySal: I work full time :p
19:59 imirkin: he actually has a pure-software ctxsw impl on gm20x, so he can actually test it. probably better placed to do it... at least for now.
20:01 jeremySal: I'm confused what pure software context switching means?
20:01 imirkin: cpu-side switching of gpu contexts
20:01 imirkin: instead of having the gpu do it itself
20:01 imirkin: it's a ton of data, so it's way slow
20:02 jeremySal: why would he have an implementation but nobody else?
20:02 imirkin: because he wrote it
20:02 imirkin: and i guess didn't want to make it public for one reason or another
20:03 jeremySal: how does context switching work on a GPU?
20:03 jeremySal: I thought the shaders executed in their entirety
20:03 jeremySal: with interrupts or context switches
20:04 jeremySal: or does it mean something different than what I'm used to
20:04 imirkin: yeah, so a particular shader invoc might not get interrupted
20:04 imirkin: but a single draw is tons of shader invocations
20:05 imirkin: and there's a ton of other associated state
20:05 jeremySal: so who is in charge of keeping track of the state?
20:05 jeremySal: the cpu or the microprocessor on the gpu?
20:05 imirkin: well, the GPU has state
20:05 imirkin: and there's an internal engine which accelerates switching of contexts
20:06 skeggsb: i didn't release it because a) it's stupidly slow, and not useful for people to *actually* use, b) it was awful and hacked in, and never rebased onto more recent work, and c) i was really hoping nvidia would have released firmware long before now :)
20:06 imirkin: skeggsb: i'm a lot more cynical than you are, i guess
20:07 imirkin: i wouldn't be terribly surprised if they don't release it at all
20:07 jeremySal: why is it not possible to get it from the nvidia blob?
20:07 imirkin: it's not... *not possible*, it's just that our regular tracing mechanism doesn't pick it up
20:09 imirkin: the card probably DMA's it from system memory
20:09 imirkin: so we'd have to dump that system memory at the right time
20:10 jeremySal: I see
20:13 imirkin: the tool in question is mmiotrace btw
20:13 imirkin: basically a clever PTE trick to capture all MMIO reads/writes
20:13 imirkin: anyways, i'm out
20:13 jeremySal: is that the same as the valgrind tool?
20:14 jeremySal: peace
20:17 Javantea: jeremySal: it's a kernel module in mainline
20:17 Javantea: different than the valgrind tool
20:20 jeremySal: Javantea: thanks
20:28 jeremySal: but it does output the same format?
20:29 Javantea: envytools decodes both, I believe they are different format though I don't know the valgrind format