00:00 mwk: and when you're done, you store it using st unlock s[]
00:00 mwk: noone else can touch the word in the meantime
00:00 mwk: and if you are not successful, well, just branch back to the load and try again
00:00 mwk: but, here comes the important part
00:01 mwk: you'll always be unsuccessful in all but one thread in the warp
00:01 mwk: and you have to take care to make sure the thread group with the successful thread executes first
00:01 mwk: otherwise you'll be spinning
00:02 mwk: so - when you copy what blob's doing, make sure you keep the polarity of the branch as it is
01:02 hakzsam: mwk, thanks for the clarification :)
03:21 metalhead33: http://pastebin.com/ZaTcKd6L
03:22 imirkin: did you forget to include 'intel' in your VIDEO_CARDS?
03:22 metalhead33: Nope
03:22 metalhead33: I did include it.
03:22 metalhead33: VIDEO_CARDS="intel i915 nouveau"
03:23 metalhead33: If the Intel driver didn't work, how would I be typing in Pidgin now, from an X server?
03:23 imirkin: the usual way?
03:24 imirkin: this has nothing to do with X
03:24 imirkin: looks like your i965_dri.so is messed up, so you don't get acceleration when running 3d applications targeted to the intel gpu
03:25 mwk: imirkin: are you a gentoo user?
03:25 metalhead33: Hmmm... also, when I tried to play a game on Wine with DRI_PRIME=1, as soon as I got to something 3D, I got dropped to the login screen.
03:26 imirkin: mwk: i am.
03:27 mwk: good to know
03:27 metalhead33: http://pastebin.com/aj4XqPkB dmesg
03:28 mwk:has a few gentoo VMs since recently
03:28 imirkin: metalhead33: probably the xorg log from the crashed xorg would be good... usually backed up in a .old file
03:28 karolherbst: metalhead33: VIDEO_CARDS=i965
03:28 karolherbst: not i915
03:29 karolherbst: i915 is for really old intel gpus
03:29 metalhead33: http://pastebin.com/aiKggz57
03:29 metalhead33: ah
03:29 mwk: turns out it's the only distro that works on s390 and has reasonably recent versions of everything
03:29 karolherbst: this has nothing to do with X, but with mesa
03:29 metalhead33: so, Xorg.0.log.old
03:29 mwk: although getting it set up on qemu was... painful
03:29 imirkin: mwk: heh. s390. very sad.
03:30 mwk: imirkin: s390 is a boat of fun
03:30 karolherbst: metalhead33: i915 is the kernel driver used for both, i915 and i965 gpus, it is a bit confusing, but that's the way it is
03:30 mwk: curiously ppc is giving me much more grief
03:31 imirkin: karolherbst: what's confusing is that people assume that there must be a 1:1 relation between kernel drivers and userspace programs.
03:31 metalhead33: so, replace i915 with i965
03:31 karolherbst: in VIDEO_CARDS, yes
03:31 imirkin: mwk: el or be?
03:31 karolherbst: and rebuild mesa
03:31 mwk: with s390, it's horribly slow with qemu, but stuff just works... but all my ppc VMs just come apart as I try to set them up
03:31 metalhead33: anyway, I repalced the log with the old log
03:31 karolherbst: mesa should be the only one using this
03:31 mwk: imirkin: I tried to install both
03:32 metalhead33: hang on one second, I need to check what kind o INtel GPU do I have
03:32 karolherbst: metalhead33: you don't have to
03:32 mwk: be was sorta-working, but I had trouble with *everything*, and just when I thought I finally set it up properly and updated everything, it started randomly segfaulting in all programs
03:32 karolherbst: metalhead33: i965 will work, trust me
03:32 mwk: le... gentoo doesn't provide stages for le
03:32 metalhead33: Ehm... okay, let's try it this way
03:32 metalhead33: Do I have to edit the kernel?
03:32 karolherbst: metalhead33: no
03:32 imirkin: mwk: hm, my ppc g5 also dies randomly :)
03:32 mwk: so I tried to bootstrap it with crossdev, but ran out of patience
03:32 karolherbst: just rebuild mesa
03:33 imirkin: mwk: i can give you my crossdev root for ppc :)
03:33 metalhead33: going for it
03:33 karolherbst: just make sure portage says that it will remove i915 and add i965
03:33 mwk: imirkin: somehow I managed to compile a glibc for ppc-be that requires POWER7 VSX instructions
03:33 metalhead33: done
03:33 mwk: while running a kernel without VSX support
03:33 mwk: SIGILLs are fun
03:34 karolherbst: :D
03:34 mwk: imirkin: I do have a crossdev root
03:34 karolherbst: mwk: I had some fun with march/mtune native too where gcc just added illegal instructions for my host :/
03:34 mwk: so unless you have a ppc64le stage3 laying around, I'll pass
03:35 mwk: oh, and the damn kernel
03:35 mwk: I tried identical configuration for be and le
03:35 mwk: le boots fine, be crashes hard in bootloader
03:35 mwk: even though stock distro be kernels work
03:36 RSpliet: mwk: is Fedora s390 not going anywhere?
03:36 mwk: RSpliet: I think I tried fedora, but it just kept crashing in the installer
03:36 imirkin: sounds like loads-o-fun. my stuff is for ppc g5, so... ppc64be, power4.
03:37 mwk: tbh I don't trust qemu-s390x to be doing a good job at all
03:37 mwk: but then gentoo did work without any problem
03:37 RSpliet: mwk: I was surprised they even have a build for it in the first place, seems quite a niche for a distro like it (although, it's probably because it's in Red Hats turf)
03:38 mwk: other than, you know, compile times on the order of a week... but distcc helps with that
03:38 mwk: yeah, I keep wondering who uses gentoo on that too
03:39 mwk: ... maybe noone ever ran it on an actual s390 and it works only on qemu
03:40 RSpliet:hopes to get cracking on RiscV soon...
03:41 mwk: huh
03:41 mwk: never heard of that
03:41 mwk: sounds like fun
03:41 mwk: "128-bit address space", eh...
03:42 imirkin: for all those times when 64-bit is just not enough
03:42 mwk: I had the... pleasure... of reading about HP-PA architecture recently
03:42 mwk: they are real proud of their 96-bit VMAs
03:42 RSpliet: it's a Berkeley university project, open source ISA, several (both in order and OoO) implementations provided
03:43 RSpliet: there's a chap sitting behind me who's got Linux running on it... but it's very much in it's infancy right now
03:43 RSpliet: implementations are BSD licensed as well
03:43 RSpliet: iirc
03:46 zeq: Hi! I've put together a system for a friend from old parts I had laying around, and the best gfx card I had available is a Geforce FX 5900 (nv35) AGP. I kind of assumed it would be OpenGL2.1/GLES2 capable and would work with mutter/weston etc. Apparently I was wrong since it only exposes OpenGL 1.5/GLES1 which suprises me. What's missing to enable at least GLES2 for nv35? Should it be possible?
03:47 imirkin: zeq: it's conceivable that GLES2 would be possible
03:47 imirkin: i'd have to think about it
03:47 imirkin: (aka read up on GLES2)
03:47 imirkin: the main thing that nv3x doesn't have is support for NPOT textures
03:47 mwk: weren't there some problems with NPOT?
03:47 mwk: ... right
03:48 zeq: ... and emulation would be slow?
03:48 mwk: with that kind of stuff, you might as well emulate the entire GPU
03:48 imirkin: i haven't given it much thought
03:48 imirkin: mwk: can't you just divide the coords?
03:49 zeq: presumably the nvidia binary driver did *something* to expose OpenGL2.1?
03:49 zeq: or did they just fail for NPOT?
03:49 imirkin: zeq: blob drivers do all sorts of fixups
03:49 karolherbst: zeq: shouldn't weston run with the drm backend?
03:49 imirkin: zeq: and any time you did something that wasn't super-supported, you'd fall off the fast path into slow-land
03:49 zeq: it was a pretty quick card back in the day
03:50 imirkin: zeq: if you're interested in improving the nv3x backend, you're more than welcome to :)
03:50 imirkin: it has a few deficiencies on top of not emulating certain features
03:50 imirkin: like if the color and depth formats don't match up quite right, it just ignores depth entirely.
03:51 imirkin: that seemed better than doing illegal draws on the gpu which sometimes result in hangs, but still a bit unsatisfying :)
03:51 zeq: imirkin: I would love to, and even if I knew where to start (which with some guidance I probably could) but unfortunately I'll have to be handing the machine over to my friend - with xfce4 I guess...
03:51 karolherbst: zeq: can you start weston like this? "weston --backend=drm-backend.so"
03:52 imirkin: karolherbst: no quantity of arguments to weston will provide GL 2 or GLES 2 support with nv3x.
03:52 karolherbst: why should be gles2 a requiernment at all?
03:52 imirkin: zeq: if you can get your hands on a nv4x, that should get you GL 2.1 and GLES 2.0
03:52 zeq: karolherbst: it didn't work when I tried it yesterday, I'm just re-building mesa, so I'll give it another go when I'm done, but I'm sure gles2 is a requirement.
03:53 imirkin: zeq: many of those were PCI-E, but there were also AGP variants
03:53 karolherbst: afaik weston doesn't require gles2 at all if you don't want to
03:54 zeq: imirkin: It's what I have. I'm giving it to my friend, if I start spending actual money on it, he'll feel he needs to contribute, and he doesn't have any...
03:54 imirkin: zeq: ok
03:54 imirkin: zeq: you could also install the blob, although i doubt they've updated the nv3x one in quite a while
03:55 zeq: imirkin: as far as I know the legacy blob hasn't supported nv3x for a while, it would have to be a really old userspace.
03:55 imirkin: zeq: i think the 173.x series should do nv3x
03:56 imirkin: looks like GLES 2 needs npot btw
03:57 zeq: I'm a Gentoo guy, so I'm trying to squeeze all the performance out the the whole system I can, so it's definitely looking like xfce4. It's what he was using before anyway, since his old Celeron Coppermine-128 just died!
03:57 mwk: blob tends to handle unusual unsupported cases with raise(SIGSEGV);
03:57 karolherbst: zeq: xfce4 is nice for old hardware
03:57 zeq: It'll be quite an upgrade for him anyway. I just hoped to squeeze a little more perf from the gfx card.
03:57 imirkin: mwk: er, obviously ignore the division comment
03:58 imirkin: zeq: i'm still using the same WM i've been using for over a decade... WindowMaker. works great.
03:58 mwk: iirc it just lies about full 2.1 support and hope none actually tries to use wrapping NPOTs or whatever it was that they didn't support
03:58 zeq: So modern software just assumes NPOT support and fails on older chips?
03:59 imirkin: well, NPOT is part of GL 2.0 and GLES 2.0
03:59 imirkin: so modern software just requires those
03:59 mwk: it's not about NPOTs, it's about NPOTs with some wrapping modes IIRC
03:59 imirkin: mwk: yeah.... you MAY be right. otoh you specify the size as log2, so... dunno
04:00 zeq: imirkin: so nvidia would have had a fallback in the driver?
04:00 imirkin: mwk: except for rect
04:00 mwk: software wants NPOTs, but the GL extension for NPOT support requires wrapping support
04:00 imirkin: mwk: but then you don't have mipmaps
04:00 mwk: there's no GL extension that says "I support NPOT, but not NPOT with wrapping"
04:00 mwk: oh, mipmaps... yeah, could be mipmaps, not wrapping
04:00 imirkin: right, well that's the whole point of npot
04:01 imirkin: if you just want rect textures, there's ARB_texture_rectangle
04:01 imirkin: which everything supports, including nv3x
04:01 zeq: imirkin: I wonder though. What would be the intent from a compositor?
04:01 imirkin: i guess the way to emulate npot would be to scale the textures on upload to the closest POT size, and then munge the coordinates? dunno.
04:02 zeq: do they just want rect textures?
04:02 imirkin: zeq: rect textures should be more than sufficient
04:02 imirkin: zeq: however that's not how things work. people just say "the GPU's i'm targeting support GL 2.0, so i'm going to require GL 2.0"
04:02 imirkin: zeq: that said, you can use MESA_GLES_VERSION_OVERRIDE=2.0
04:02 imirkin: (or MESA_GL_VERSION_OVERRIDE=2.0)
04:03 imirkin: which will force the driver to claim things it doesn't support
04:03 karolherbst: zeq: xfwm4 doesn't use opengl by the way
04:03 zeq: but it won't provide rect textrures when the software asks for NPOT; it will just fail, right?
04:03 imirkin: you'll get a GL error probably, yeah
04:03 mwk: imirkin: you can get mipmapping by just choosing the next higher POT size
04:03 zeq: unless the driver provides a fast rect texture for NPOT when that's really all that's wanted?
04:04 imirkin: when you try to do glTexImage(GL_TEXTURE_2D, npot size)
04:04 mwk: but not wrapping
04:04 mwk: maybe that was the problem
04:04 imirkin: yeah that could be it
04:04 imirkin: i think it'd be reasonable to emulate it without the wrapping
04:04 imirkin: GL 2.0 is useful enough for that
04:05 imirkin: i've been trying to locate a GeForce FX PCI card so that i can plug it into my comp alongside everything else
04:05 imirkin: but keep getting outbid on ebay
04:05 mwk: I think I have an nv34 like that somewhere
04:05 imirkin: i have a PCIe one, but don't want to unplug one of the other cards :)
04:05 mwk: oh, you found a PCIe one?
04:05 mwk: I never managed to do that
04:05 imirkin: only one i found actually ;)
04:05 imirkin: it's an NVS 280 or whatever... bridge chip obviously
04:06 imirkin: the nice thing about nv3x is that i should be able to simultaneously test nouveau_vieux on it :)
04:06 imirkin: although that theory is unconfirmed
04:07 mwk: heh
04:07 mwk: ... asssuming the necessary kernel support is there
04:07 zeq: while I'm on here, I also have a GF108GLM in this laptop. Is there anything I can do to help get reclocking working? Obviously, it will have to wait until after I get my friends PC finished...
04:07 mwk: IIRC you had to do some special dance on NV1x to create a properly-working NV04 3D object
04:07 imirkin: zeq: you could send patches that make reclock work :p
04:07 mwk: wouldn't be surprised if it applied to NV2x/NV3x pairing too
04:08 imirkin: mwk: well, when i locate a relevant card, i'll try it, and we'll see what happens
04:08 zeq: imirkin: yeah, right! have you got datasheets?
04:08 imirkin: zeq: plenty. just none that relate to the nvidia stuff :)
04:09 imirkin: zeq: talk to karolherbst and RSpliet -- they've been looking at fermi afaik
04:09 imirkin: i'm just the 3d guy :)
04:09 zeq: imirkin: I don't know why but the BIOS sets the clocks to the lowest possible values (I think) the memory is clocked at something like 50Mhz, it's a bit slower than the Integrated IvyBridge graphics!
04:10 imirkin: zeq: to save power :)
04:10 mwk: zeq: have you seen an nvidia Thermi at full throttle? :)
04:10 zeq: well it's turned off most the time!
04:11 imirkin: mwk: presumably by the time they were making GF108's the kinda figured it out
04:11 imirkin: i suspect a GF100 is a sight to see though
04:11 mwk: they're very useful during the winter[4~
04:11 zeq: I have a Win7 install on here (just for when strictly necessary) and there it can be clocked very high (memory width is only 64bit) memory especially
04:12 imirkin: yeah, GF108 isn't going to be a beast under any conditions
04:12 imirkin: it's the cheapest slowest fermi of them all
04:13 mwk: nah, there's also GF119
04:13 imirkin: i was under the impression it was somehow better
04:13 imirkin: by at least 11 :)
04:13 mwk: yeah
04:13 mwk: just like GF108 is 8 better than GF100
04:13 imirkin: exactly!
04:13 zeq: imirkin: It's still pretty quick when clocked to the highest frequencies. Quite a lot faster than the IVB in Windows anyway.
04:14 imirkin: zeq: oh yeah, definitely
04:14 metalhead33: Okay, I re-emerged Mesa.
04:14 metalhead33: Now, should I log in and log back? Or reboot to be safe?
04:14 mwk: I think once they figured they can clock these at 50MHz, they saw no reason to change that
04:14 imirkin: metalhead33: neither
04:14 imirkin: metalhead33: should work immediately
04:14 metalhead33: Oh?
04:14 metalhead33: Hmmm... let me try then.
04:14 metalhead33: yep
04:14 metalhead33: works immediately
04:15 metalhead33: Now, I will try a game. Not sure if the two have anything to do with each other though.
04:15 metalhead33: Likely, I will get the login screen again though. If I "quit" from the IRC, that's what happened.
04:20 zeq: Oh, talking of reclocking, what happened to the nv35 support? The last time I tried the card, there was experimental support, which partially worked. But now it seems to be gone.
04:21 glennk: NPOT in gles 2.0 doesn't require mipmaps or wrap modes other than clamp_to_edge
04:21 imirkin: glennk: isn't that just rectangle?
04:21 zeq: glennk: does that mean it should be possible to wire it up for nv3x?
04:22 imirkin: glennk: but with normalized coords
04:22 glennk: rectangle really just means the coords aren't normalized
04:23 imirkin: and no mipmaps
04:24 glennk: https://www.opengl.org/registry/specs/NV/texture_rectangle.txt is the one you are thinking of?
04:25 metalhead33: back
04:25 metalhead33: Well yep... it looks like Wine doesn't like Nouveau.
04:26 RSpliet: pmoreau: would you like me to return you your G96 next week?
04:28 karolherbst: metalhead33: why shouldn't it?
04:28 karolherbst: metalhead33: do you use staging?
04:29 metalhead33: As I mentioned, when ever I try "DRI_PRIME=1 wine game.exe"... I get thrown back to the login screen.
04:30 karolherbst: metalhead33: x log then
04:30 metalhead33: I did try it with the Firefox though, and WebGL worked just fine.
04:30 metalhead33: Xorg.0.log.old? Okay, just one minute...
04:30 karolherbst: metalhead33: by any chance, does the game use physx?
04:31 metalhead33: Possibly, but I think that's unlikely. This happened even with games that don't.
04:31 karolherbst: k
04:31 metalhead33: http://pastebin.com/pzbVBaXa
04:32 metalhead33: Looking at the Nouveau list... it does not contain GeForce 5xx
04:32 metalhead33: it has GeForce 3, 4Ti and 6, but no 5. Especially 5xxM
04:33 karolherbst: metalhead33: well I wanted the old log though :/
04:33 metalhead33: This is the old log.
04:33 karolherbst: ohh strange
04:33 metalhead33: Xorg.0.log.old
04:34 karolherbst: what happens without DRI_PRIME?
04:35 metalhead33: Game runs just fine... Except when it's a graphically demanding game.
04:35 metalhead33: Then it bitches about my Intel card not supporting double buffering.
04:38 karolherbst: no clue, I don't have any problems with that on my end :/
04:38 karolherbst: I use wine staging though
04:38 imirkin: metalhead33: perhaps your compositor crashes?
04:38 pmoreau: RSpliet: Why not, even though I don't know what I'll be doing with it. :-D Maybe run some automatic tests on it, should be easier than on the one from my laptop, since I always have my laptop with me.
04:39 metalhead33: Compositor?
04:39 karolherbst: ohh right
04:39 imirkin: like gnome-shell or something
04:39 metalhead33: I use Xfce
04:39 karolherbst: metalhead33: maybe something odd is inside ~/.xsession-errors
04:39 metalhead33: oh
04:39 imirkin: metalhead33: is there something in dmesg about some process being killed?
04:41 metalhead33: Hmmm...
04:41 metalhead33: pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported
04:41 metalhead33: I will post them to pastebin
04:43 metalhead33: http://pastebin.com/jPwarbqF
04:43 metalhead33: http://pastebin.com/AJMS7qqv
04:46 metalhead33: Oh right, let me translate the parts of the errors that are not in English... "Erőforrás átmenetileg nem érhető el" means "Resource temporarily unavailable/unreachable", "Félbeszakított rendszerhívás" means "Aborted system call", and "Nincs ilyen fájl vagy könyvtár" means "No such file or folder"
05:00 vita_cell: karolherbst
05:21 karolherbst: mupuf: yay, I power gated my gpu :O
05:21 karolherbst: well a bit
05:22 RSpliet: karolherbst: which way? "block-level clock gating"? thermal protect?
05:22 karolherbst: graphics power gating
05:23 karolherbst: enabled it, power consumption went down by 0.4W and I couldn't start a gl app anymore on the gpu
05:23 karolherbst: RSpliet: ftp://download.nvidia.com/open-gpu-doc/gk104-disable-graphics-power-gating/1/gk104-disable-graphics-power-gating.txt
05:23 karolherbst: :D
05:25 karolherbst: but this isn't used by nvidia at runtime though :/
05:25 sfix: hey guys, does nouveau support maxwell cards? I seen a couple of posts about it but couldn't find anything definitive. I had some problems with nouveau and tried to use the proprietary driver instead but discovered I had secure boot enabled. Keen to revert to nouveau if possible.
05:28 RSpliet: karolherbst: ah yes, no, it's not very useful during runtime having to wait 100ms before the graphics engine starts doing work for you
05:28 RSpliet: you'd miss 6 frames just waiting for that :-P
05:28 karolherbst: mhhh
05:28 karolherbst: ohh why did I disbale the clock gates lcoally :O
05:29 RSpliet: sfix: depends on which generation of Maxwell you are trying to use
05:30 RSpliet: first gen should be usable with nouveau if you have everything updated to the final version
05:30 RSpliet: second gen is problematic, we're waiting for important bits of firmware before anything useful can be done with the cards under nouveau
05:38 Lekensteyn: Where is the latest nouveau tree? linus' tree contains more changes than http://cgit.freedesktop.org/nouveau/linux-2.6/?h=linux-4.4
05:41 RSpliet: Lekensteyn: I think your best bet right now is drm-next or Bens github
05:41 imirkin_: Lekensteyn: ben's latest tree is at https://github.com/skeggsb/linux
05:44 Lekensteyn: thanks, is it temporary or will it be like that for a longer time? Maybe some of these trees could be mentioned at http://nouveau.freedesktop.org/wiki/Source/
05:45 imirkin_: i dunno, ben changes things up with certain regularity
05:51 pmoreau: Lekensteyn: The GitHub repo is the replacement of the "out-of-tree module git" from the Source webpage you linked
06:08 metalhead33: Karolherbst
06:08 metalhead33: (13.43.29) metalhead33: http://pastebin.com/jPwarbqF
06:08 metalhead33: (13.43.29) metalhead33: http://pastebin.com/AJMS7qqv
06:08 metalhead33: "Erőforrás átmenetileg nem érhető el" means "Resource temporarily unavailable/unreachable", "Félbeszakított rendszerhívás" means "Aborted system call", and "Nincs ilyen fájl vagy könyvtár" means "No such file or folder"
06:15 metalhead33: Oh yes... DRI_PRIME=1 glxgears gives me a black window.
06:17 karolherbst: metalhead33: do you have compositing enabled in xfwm?
06:17 metalhead33: compositing?
06:18 metalhead33: How do I check that?
06:18 karolherbst: xfwm settings
06:18 imirkin_: metalhead33: i'd also encourage you to set up DRI3. it'll requrie rebuilding xf86-video-intel -- unfortunately the gentoo ebuild forces --disable-dri3
06:18 karolherbst: not quite sure anymore, it should be enabled by default by now, but I want to make sure
06:18 karolherbst: metalhead33: https://github.com/karolherbst/F.U.N.-overlay/blob/master/x11-drivers/xf86-video-intel/xf86-video-intel-2.99.917-r3.ebuild
06:18 metalhead33: Oh wait... maybe I should also update xfwm
06:19 metalhead33: xfce-base/xfwm4 is not the newest version
06:19 metalhead33: okey, gonna compile that too
06:19 metalhead33: Does it require me to rebuild MESA too?
06:19 imirkin_: no.
06:20 imirkin_: metalhead33: actually, yeah - you'd need to build mesa with +dri3
06:20 imirkin_: metalhead33: so depends on your current use flag settings
06:25 metalhead33: Gonna emerge it now then
06:27 sfix: RSpliet: ah great, on a 1st gen chip so will try that when I get the chance. I guess secure boot being enabled would be the reason for nouveau not loading succesfully too?
06:27 RSpliet: I don't know why that'd make a difference tbh, but I might be overlooking details about secure boot
06:28 sfix: hm, maybe I have other issues there then which the proprietary drivers solve
06:29 imirkin_: sfix: GM10x doesn't have the secure stuff that GM20x has
06:29 imirkin_: or are you talking about efi secure boot?
06:29 metalhead33: Oh boy... trying to install third-party ebuilds is a pain.
06:29 imirkin_: if so, i have no idea what restrictions that places, if any
06:30 imirkin_: i think it just requires the blob that efi loads to be signed, that kernel can then go on to do whatever it pleases.
06:30 vita_cell: guys, something wrong with 4.4.0
06:30 sfix: imirkin_: yes EFI secure boot, which is what solved my problems with the nvidia driver (so took a guess at that being the issue with nouveau too)
06:31 vita_cell: I can compile nouveau with 4.2 and 4.3, but not with 4.4
06:31 sfix: I'll give it a go when I get home though and hopefully it was just some oversight
06:31 imirkin_: sfix: i suspect it was an unrelated issue. but i guess i dunno. note that GM10x support in nouveau isn't great. is this for a primary gpu or a secondary one?
06:31 RSpliet: vita_cell: there's nothing wrong with the numbers 4.4.0, nor is there something wrong with that kernel version. I take it you are trying to do an out-of-tree build of the nouveau kernel-module against the 4.4.0 kernel?
06:31 imirkin_: if primary, it will effectively be unusable without a mesa patch which pours cement into the gpu pipeline
06:32 sfix: imirkin_: primary (no on board graphics) -- would you recommend then that I stick with the nvidia module instead, then?
06:32 sfix: ah
06:32 RSpliet: in that case, make sure that your out-of-tree git repository is up to date, the drm<->driver interface tends to be a bit of a moving target, so different versions are incompatible
06:32 sfix: well that explains why X is using gallium
06:33 imirkin_: sfix: depends on your goals. there's a patch here: https://bugs.freedesktop.org/show_bug.cgi?id=93373
06:33 imirkin_: sfix: not sure what gallium has to do with anything....
06:33 vita_cell: I compile it with my headers, tryed with 4.4 and it throws me an error. I compile karolherbst's nouveau folder, he did fixes for me.
06:33 vita_cell: http://hastebin.com/oqilelufic.lua
06:34 sfix: well if it couldn't initialize the nouveau driver because of that I guess it falls back to the next video driver available, but thanks, looking at the patch
06:34 imirkin_: sfix: gallium is not a video driver, it's an API.
06:35 vita_cell: I tryed to compile my nouveau with "LINUXDIR=..." with git cloned http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-next
06:35 RSpliet: vita_cell: yes, that's a typical case of version mismatch
06:35 metalhead33: guys... why is that that despite following every instruction to get third-party ebuilds working, it still does not want to recognize it? "emerge: there are no ebuilds to satisfy "=xf86-video-intel-2.99.917-r3"."
06:35 sfix: oh, well today I learned... was going off of the OpenGL renderer string result from glxinfo
06:35 vita_cell: so, how to compile mu nouveau with 4.4 headers?
06:36 imirkin_: sfix: yeah... it might say something like "OpenGL renderer string: Gallium 0.4 on NV108". The NV108 is the key part here.
06:36 imirkin_: (in my case that's a GK208... GM107 would be NV117)
06:36 RSpliet: vita_cell: is that the ~skeggsb/nouveau repository? if so, update your remote to https://github.com/skeggsb/nouveau and pull the latest update
06:36 vita_cell: I compiled this https://github.com/skeggsb/nouveau with 4.4, but it doesnt work, I think that it is incomplete folder
06:37 RSpliet: vita_cell: what does git log --pretty=oneline -1 say?
06:38 metalhead33: okay, screw that, I gonna just use layman to add the FUN overlay
06:38 vita_cell: how?
06:38 sfix: right. that makes sense, I'll have another go with the patch above and with secure disabled and see how I get on. Thanks for the help.
06:38 RSpliet: oh, and having said that, yes, that directory is only supposed to contain a user-space module. You probably don't want to build that as an end-user
06:39 imirkin_: sfix: you'll really want to apply the patch in the bug i mentioned to your mesa build.
06:39 RSpliet: never paid much attention to that repo myself
06:39 imirkin_: sfix: otherwise there will be a lot of visual glitches
06:39 RSpliet: metalhead33: ebuilds are not a nouveau thing I think, consult your distro's IRC channel for help with that kind of stuff please
06:40 metalhead33: I think the problem might be with DRI3
06:40 vita_cell: "karolherbst" did voltage fix, core and memory reclock too, and I dont want to lost it, skeggsb/nouveau repository doesnt work, when I boot, I have only the basic graphis, only 77hz and it dont detect my gpu
06:42 RSpliet: vita_cell: that fix has already been applied to the upstream 4.4 kernel; if you're running that you don't need to also build a separate module
06:43 vita_cell: it crashes al 0f pstate
06:43 RSpliet: then that fix didn't solve your particular issue, sorry to hear
06:43 RSpliet: does it work with the 0a pstate?
06:43 RSpliet: (if that exists for your board)
06:44 RSpliet: I expect that to be an issue with the voltage settings, karolherbst has had a look at that, but didn't find a proper fix yet
06:45 vita_cell: fiest, I installed 4.3, compiled my nouveau folder with 4.3 headers, and put nouveau.ko, and updated, rebooted, all worked, but later I installed 4.4, worked, but I wanted to recompile with 4.4 headers
06:46 vita_cell: but my nouveau source don't want to compile with 4.4 headers
06:46 RSpliet: okay, let me reiterate what I've said before
06:47 vita_cell: you saying that, if I install 4.4, I don't need to compile nothing?
06:47 RSpliet: 1) I don't see why you'd want to recompile an out of tree nouveau module against 4.4. Karolherbst's fixes have been brought upstream in kernel 4.4, so the nouveau module you find in kernel 4.4 should work fine
06:48 vita_cell: (I can compile my fixed nouveau source folder with 4.3 and all working, stock reclock for core and memory, 07, 0a, 0e, 0f, working fine)
06:49 vita_cell: I will try it right now
06:49 RSpliet: 2) different versions of the out-of-tree nouveau module (that thing you try to compile) "work" for different kernel versions. The interface between kernel and module changes all the time, so they must be developed in lock-step. What you experience is that you try to build an out-of-tree module for kernel 4.3 on a 4.4 kernel, and that is not expected to work. Hence, update your out-of-tree module if you want to build
06:49 RSpliet: against 4.4
06:51 vita_cell: I compiled "nouveau-master" from skeggsb/nouveau with "LINUXDIR=gitcloned http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-next folder"
06:51 vita_cell: but doesnt work with my 4.4
06:52 RSpliet: I asked you to execute "that git command" in the out-of-tree repository directory to verify it's up to date
06:52 vita_cell: what command?
06:52 RSpliet: git log --pretty=oneline -1
06:53 metalhead33: Hey karolherbst, I did what was said in your overlay's readme, yet eix-sync did not show me the updated Intel.
06:54 RSpliet: but regardless of that, point 1 still holds, first just test your stock 4.4 kernel and see what happens. If it fails, look for an appropriate solution. This is most likely not rebuilding skeggsb's out-of-tree module, since it hasn't had any fixes for kepler reclocking since kernel 4.4 (and certainly not since the soon-to-be-4.5rc1 that you built)
06:54 vita_cell: I have 4.4 deblobed kernel, and with previous versions I had no problems
06:55 vita_cell: so, I don't use stock kernel
06:56 metalhead33: Okay, gonna try again.
07:01 vita_cell: Yes, with 4.4 it works, 0f pstate too. So, I don't need anymore my nouveau source folder?
07:01 vita_cell: all fixes are already in 4.4?
07:02 RSpliet: vita_cell: all fixed you seem to need are in 4.4. 4.5 will get PCI link speed changing to improve perf a little further, but nothing breaks if you don't have it, so I'd recommend you to just wait until it's out
07:03 RSpliet: and no, I can't think of a reason for you to keep your "nouveau source" folder any longer, but then again I have no idea what you want to do next in life ;-)
07:03 vita_cell: harol did fixed for me with pcie3, My computer supports it, but not my CPU i7 2600
07:04 vita_cell: the reason is, that I need it for 4.2 and 4.3
07:05 RSpliet: I try to not get stuck in the past longer than necessary, but for future kernels you don't need any special builds for changing performance states
07:05 vita_cell: the next to do is using nouveau, cuz I hate non-free software, I using deblobed kernel too
07:05 vita_cell: ok, thanks you, you helped me
07:06 RSpliet: no worries, enjoy your blazing fast graphics
07:06 vita_cell: this is gtx770 4gb
07:06 orbea: even if you're willing to use non-free software nouveau is just more trustworthy... :)
07:06 vita_cell: nice machine to play Steam's non-free games
07:06 RSpliet: orbea: glad to hear you have this experience
07:07 karolherbst: metalhead33: you could just add it to your local overlay
07:07 vita_cell: Nvidia blobed driver are bad, and AMD gpus and drivers are very very useless
07:07 imirkin_: just don't try to run a fuzzer on it :)
07:07 metalhead33: Did that.
07:08 metalhead33: However, trying to use third-party ebuilds is just too big pain. No matter what I do, it always says that there is no ebuild.
07:08 metalhead33: So it's easier to just add the overlay to layman and emerge from that.
07:21 metalhead33: Maybe I should just give up...
07:28 metalhead33: Damnit, how do I unmask an ebuild?
07:30 prg: how is it masked? put an appropriate atom into /etc/portage/package.keywords, .unmask or both
07:35 metalhead33: I unmasked it now, but now there are other errors
07:35 metalhead33: http://pastebin.com/YEuFcKJT
07:36 prg: so put the patch where it's expected?
07:37 prg: you just put an ebuild into your personal overlay without all the required files?
07:41 metalhead33: Oh how I wish I could just clone one part of a git repository...
07:41 metalhead33: instead of all of it
07:43 karolherbst: metalhead33: you can also just use /etc/portage/repos.conf/
07:43 karolherbst: there should be a layman.conf file where you can simply add repositories
07:44 metalhead33: I tried that, and it was a disaster
07:46 metalhead33: nice, finally emerging it
07:47 metalhead33: Okay, so I am going to emerge the nvidia driver that supports dri3, and then mesa, and then what?
07:47 metalhead33: Just hope it will all work out?
07:48 imirkin_: intel
07:48 imirkin_: you need the intel ddx to support dri3
07:48 metalhead33: ah sorry, intel driver
07:48 metalhead33: I meant the intel driver
07:48 metalhead33: libtool: warning: remember to run 'libtool --finish /usr/lib64/xorg/modules/drivers'
07:48 metalhead33: Should I do this one too?
07:48 metalhead33: befoe re-emerging mesa?
07:49 imirkin_: order doesn't matter
07:50 metalhead33: Allright, going for Mesa now.
07:50 metalhead33: And then it won't cause errors again with games and such?
07:50 imirkin_: it's something to try.
07:51 imirkin_: as you've been unable to provide any useful information re your issues, i have no clue what's wrong
07:53 metalhead33: I did in fact link the xorg log and the dmesg, I think
07:53 metalhead33: (13.43.29) metalhead33: http://pastebin.com/jPwarbqF
07:53 metalhead33: (13.43.29) metalhead33: http://pastebin.com/AJMS7qqv
07:53 metalhead33: (13.46.35) metalhead33: Oh right, let me translate the parts of the errors that are not in English... "Erőforrás átmenetileg nem érhető el" means "Resource temporarily unavailable/unreachable", "Félbeszakított rendszerhívás" means "Aborted system call", and "Nincs ilyen fájl vagy könyvtár" means "No such file or folder"
07:54 metalhead33: oh wait, it was xfwm-errors
07:55 metalhead33: I think I'll go have dinner until MESA compiles again...
07:58 imirkin_: metalhead33: i was careful to qualify it with "useful" :p none of that is useful... you need to figure out what's dying and why
07:58 imirkin_: none of those logs assist with that
08:31 metalhead33: Back
08:35 metalhead33: So, what will qualiy as useful information?
08:35 metalhead33: Like, which logs should I seek out?
08:36 imirkin_: i dunno - you need to figure out what's dying and why
08:39 metalhead33: DRI_PRIME=1 glxgears - XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0.0"
08:39 imirkin_: ok, so the X server exits
08:39 imirkin_: (probably)
08:39 imirkin_: why does the X server exit?
08:50 metalhead33: I give up... I will give Nouveau one last try (by using Bumblebee), and if that doesn't work, I will concede that it just does not support my system.
08:51 imirkin_: actually it's highly unlikely that the issue has anything to do with nouveau
08:52 metalhead33: According to karolherbst, the issue is with my Intel driver not support DRI3
08:52 imirkin_: that's not a problem in and of itself
08:52 imirkin_: the issue is that for some reason X dies when you try to do offloading
08:52 imirkin_: but that death isn't captured in the xorg log
08:53 imirkin_: or perhaps a different part of your stack dies, causing X to exit
08:53 metalhead33: actually, it only dies with full-screen applications
08:53 metalhead33: It does not die with glxgears, only displays an all-black window.
08:53 imirkin_: ah, so regular glxgears works?
08:53 metalhead33: Almost.
08:53 imirkin_: so then it's most likely your compositor
08:53 metalhead33: As I said, it displays an all-black window, refusing to render anything
08:53 metalhead33: BUT
08:53 metalhead33: but
08:53 imirkin_: full-screen windows tend to bypass composition
08:53 metalhead33: it tells me how high the FPS is - 5000~
08:58 metalhead33: Let's see bumblebee... modprobe: FATAL: Module bbswitch not found. Oh.
08:59 scaroo_: what does "DRI_PRIME=1 glxinfo | grep vendor" says ?
09:00 metalhead33: server glx vendor string: SGI
09:00 metalhead33: client glx vendor string: Mesa Project and SGI
09:00 metalhead33: OpenGL vendor string: nouveau
09:01 scaroo_: metalhead33: oh that s good :) the black window might indeed be related composition. did you enable it in your wm and/or use a standalone composition (compton or xcompmgr) ?
09:02 metalhead33: Nope, I wasn't even aware of those things.
09:02 imirkin_: for DRI2 you *have* to use a compositor for prime offloading to work
09:02 metalhead33: Ooooh...
09:02 imirkin_: with DRI3 that should no longer be required
09:02 scaroo_: metalhead33: if I remeber well, you use xfce, right ? If so see metalhead33: did you enable compositing in your wm ? If i remember well you are
09:02 metalhead33: Yep, XFCE
09:02 metalhead33: But I did not enable anything
09:02 scaroo_: oops, i meant https://wiki.archlinux.org/index.php/Xfwm#Composite_manager
09:02 metalhead33: and... I don't even have xcompmgr emerged. Yet.
09:03 metalhead33: Now I gonna get it
09:03 scaroo_: no non, no need xfwm does compisition, if enabled
09:03 imirkin_: just... don't run xcompmgr and then ^Z it... i made that mistake once.
09:03 imirkin_: meant to do ^Z and then bg
09:03 imirkin_: but... heh
09:04 metalhead33: Got it. Compositiing enbled.
09:04 metalhead33: And now glxgears runs fine too
09:04 scaroo_: Victory !
09:04 metalhead33: Thanks!
09:04 imirkin_: this was covered on that optimus page =/
09:05 metalhead33: NOw, I gonna try agame. If it fails, I will get logged out, because... you know. If it doesn't, I'll be still around, despite not being able to write back until I manually quit the game.
09:05 scaroo_: imirkin_: RTFM isnt often followed, sadly ;)
09:05 imirkin_: "Everything seems to work but the output is black"
09:05 imirkin_: oh well.
09:06 imirkin_: i guess his X server still died
09:06 scaroo_: guess it crash
09:07 scaroo_: what is he running that makes his server crash ?
09:07 imirkin_: some game
09:08 imirkin_: no clue how to debug it remotely though
09:08 scaroo_: i guess something would show up in dmesg if it is related to nouveau, right ?
09:08 imirkin_: i'm quite sure it has nothing to do with nouveau
09:10 karlmag: multiple ssh-sessions from a remote machine with tail -f on logs + kernel debugger (etc)?
09:10 karlmag: might be a tall order even then :-P
09:10 scaroo_: well, if the game runs with its main gpu but crashes while priming...
09:10 metalhead33: back
09:10 metalhead33: Well
09:10 metalhead33: I have some screenshots to share
09:12 scaroo_: metalhead33: what is the game you are running ? Did your xserver just crashed?
09:13 metalhead33: Yep, it crashed. I was running Medieval 2 Total War.
09:14 scaroo_: native or wine (not that it matters I guess)
09:14 metalhead33: Wine.
09:14 metalhead33: http://pastebin.com/mKApjSZZ
09:14 metalhead33: http://imgur.com/a/bfMMb
09:15 metalhead33: [  1572.494] (EE) Caught signal 11 (Segmentation fault). Server aborting - A ha!
09:15 karolherbst: exa?
09:15 karolherbst: please use sna
09:16 karolherbst: mhhh
09:16 karolherbst: ohhh
09:16 karolherbst: this comes from the nouveau ddx
09:16 imirkin_: ok interesting
09:16 imirkin_: this seems somehow familiar
09:17 imirkin_: an easy thing to try would be to upgrade to xf86-video-nouveau 1.0.12
09:17 imirkin_: although i honestly don't know if this would be fixed there or not
09:17 metalhead33: this full-screen commandline thing was part of X, because I could still use CTRL+Alt+F3 to get to the (other) command line and reboot
09:17 metalhead33: Oke, gotta update nouveau
09:18 imirkin_: i believe this is the same bug as https://bugs.freedesktop.org/show_bug.cgi?id=91756
09:19 metalhead33: And I will have to re-emerge MESA again :v
09:19 imirkin_: nope
09:19 scaroo_: but how comes nouveau ddx is involved, as it is the intel driver that is driving the output, right ? so if i get it right, the nvidia hw is only accessed throught drm/render nodes or... ?
09:19 imirkin_: that's totally separate
09:19 imirkin_: scaroo_: DRI2
09:19 imirkin_: with DRI3 it's different
09:20 imirkin_: metalhead33: i highly recommend trying to build the intel ddx with DRI3 support
09:20 scaroo_: ah alright
09:20 metalhead33: I did that
09:20 imirkin_: oh, you did!
09:20 imirkin_: now you need a small xorg.conf:
09:20 metalhead33: Don't you remember karolherbst?
09:20 metalhead33: He gave me his Intel driver and I emerged it
09:20 imirkin_: er wait, maybe you don't
09:21 scaroo_: Section "Device" Option "DRI3" EndSection or smth
09:21 metalhead33: so theoretically I should have DRI3 support... theoretically. None of the logs say that tho. All logs say DRI2.
09:21 imirkin_: scaroo_: actually "DRI" "3" :)
09:21 metalhead33: so, gotta get rid of xorg.conf and xorg.conf.d again
09:22 imirkin_: metalhead33: http://hastebin.com/woxibuwapo.cmake
09:22 imirkin_: stick that and only that into an xorg.conf file
09:22 scaroo_: metalhead33: also i would get rid of the nouveau ddx to make sure access to your gpu is not mediated
09:22 imirkin_: scaroo_: that's the AutoAddGPU bit
09:23 karolherbst: imirkin_: $r63 was always 0?
09:23 metalhead33: done
09:23 scaroo_: thats new to me :) well, I am under wayland anyway :)
09:23 metalhead33: I have that small xorg.conf inside /etc/X11
09:23 imirkin_: karolherbst: on fermi and kepler1, yes
09:23 metalhead33: now I just reboot and it should work fine just fine...
09:23 imirkin_: karolherbst: on kepler2 there are 255 regs :)
09:23 karolherbst: imirkin_: found stuff like this: "69: mad ftz f32 $r16 $r11 $r8 neg $r63"
09:24 imirkin_: karolherbst: yeah... that can happen, we don't run optimizations to a fixed point
09:24 imirkin_: karolherbst: it's rare though
09:24 karolherbst: is mul somewhat faster than mad?
09:25 imirkin_: karolherbst: probably?
09:26 imirkin_: karolherbst: but it's _really_ rare that stuff like that happens
09:26 imirkin_: basically the 0 has to appear *after* the constant folding pass
09:26 imirkin_: which is... unlikely
09:26 karolherbst: well I found two of that in pixmark_piano
09:26 imirkin_: the only real way is through MemoryOpt, where you have some idiotic array which has indirect accesses
09:27 metalhead33: I reboot my computer, and /etc/X11/xorg.conf is gone
09:27 scaroo_: metalhead33: in the future, to save you some time you may also only restart the xserver instead of a full reboot. If you are under a systemd system with gdm, try # systemctl restart gdm.service. Else # init 3 then # init 5
09:27 pmoreau:realised he had "Modern compiler implementation in C" from Appel, and there is a section about going out of SSA… --"
09:27 imirkin_: i wonder if MemoryOpt should run before all the folding stuff
09:27 imirkin_: er hrm, probably not
09:27 karolherbst: imirkin_: will check where that comes from
09:28 metalhead33: So... I assume my settings did not carry over.
09:29 imirkin_: i dunno what's going on with your system, but i can't help you with that
09:29 metalhead33: [ 27.667] (**) intel(0): Option "DRI" "3"
09:29 imirkin_: ok cool
09:30 metalhead33: nope not cool, not yet
09:30 imirkin_: now do DRI_PRIME=1 glxinfo
09:30 metalhead33: Let me upload the log to show you
09:31 scaroo_: metalhead33: so how does Xorg.0.log lloks now ? Any reference to DRI3 ?
09:31 metalhead33: That's what I'm about to show you guys
09:32 metalhead33: http://pastebin.com/Ns451Pgq
09:32 metalhead33: Only one reference
09:32 metalhead33: otherwise, its all DRI2
09:32 metalhead33: also, I still have to manually set the offloading
09:32 metalhead33: And now it can't find provider with name nouveau
09:33 scaroo_: metalhead33: also instamm pastebinit or nopaste, so you can pastebin a file from the command line
09:33 imirkin_: metalhead33: no you no longer have to do that
09:33 imirkin_: metalhead33: it should Just Work (tm)
09:33 scaroo_: metalhead33: it is normal, with dri3, the second gpu doesnt show up as an xrandr provider
09:33 metalhead33: aah
09:34 karolherbst: imirkin_: it is still there without MemoryOpt
09:34 metalhead33: well, let me test it out (after emerging nopaste - I hit the limit with pastebinit)
09:34 imirkin_: karolherbst: that's... surprising.
09:34 imirkin_: karolherbst: oh. i thought i fixed that...
09:34 imirkin_: karolherbst: is this with git master? perhaps i never pushed the fix
09:35 karolherbst: yeah
09:35 imirkin_: hmmmmm
09:36 imirkin_: i thought i fixed it with d50e6128
09:36 metalhead33: oh yeah, DRI_PRIME=1 glxgears still works
09:37 scaroo_: so now that the nouveau ddx (xorg driver) is out of the way, you may try to launch the game again
09:38 imirkin_: metalhead33: does it say "nouveau"?
09:38 imirkin_: (for glxinfo)
09:38 metalhead33: yep, game works just fine
09:39 metalhead33: nope
09:39 metalhead33: it says Intel
09:39 scaroo_: metalhead33: with DRI_PRIME=1 ?
09:39 metalhead33: YEP
09:39 metalhead33: even with DRI_PRIME=1, it says intel
09:40 imirkin_: metalhead33: you're sure you built mesa with +dri3?
09:40 karolherbst: imirkin_: Loadpropagation
09:40 metalhead33: I emerged karolherbst's Intel drive and then rebuilt mesa.
09:40 imirkin_: karolherbst: can i see the tgsi for the shader?
09:40 metalhead33: Should I have did some more tweakings?
09:41 karolherbst: imirkin_: d50e6128 was for another issue :D
09:41 imirkin_: karolherbst: loadpropagation doesn't actually generate the 0
09:41 karolherbst: imirkin_: remember, that pixmark_piano shader produces like 4k instructions, so I think there will be a lot of tiny things in there
09:41 imirkin_: if it moves a 0 in, that means that ConstantPropagation somehow missed it
09:41 karolherbst: imirkin_: okay
09:42 karolherbst: imirkin_: you mean ConstantFolding?
09:42 imirkin_: karolherbst: oh haha
09:42 imirkin_: karolherbst: we never consider that case
09:42 metalhead33: How do I see if my MESA has DRI3 support?
09:42 imirkin_: karolherbst: i.e. that the 3rd arg is an immediate
09:42 imirkin_: metalhead33: equery u mesa
09:42 karolherbst: imirkin_: constantfolding cuts this shader from 5000 to 4000 instructions :/
09:42 zeq: metalhead33: I maintain an overlay of live ebuilds which I know work fine with DRI3+PRIME+nouveau at least with the modesetting driver. The kernel which introduced fencing broke PRIME for me with the intel ddx driver.
09:43 metalhead33: it does have dri3 in it
09:43 imirkin_: hrm. and you're quite sure you built xf86-video-intel with --enable-dri3?
09:43 metalhead33: oooh...
09:43 metalhead33: Noep.
09:43 karolherbst: imirkin_: " 81: mov f32 $r17 0.000000" " 84: mad ftz f32 $r17 $r12 $r10 neg $r17"
09:43 metalhead33: I just emerged it.
09:44 zeq: as fine as GF108GLM works with default clocks that is! ;-)
09:44 imirkin_: karolherbst: yeah, we never look at the third arg
09:44 imirkin_: karolherbst: i suppose i should add something for that, but opnd() is really dseigned for 2 args... lots of assumptions
09:44 metalhead33: well then... emerge "=xf86-video-intel-2.99.917-r3" --enable-dri3, right?
09:45 karolherbst: mhh
09:45 imirkin_: no
09:45 imirkin_: the ebuild itself needs to enable dri3
09:45 imirkin_: heh
09:45 metalhead33: but doesn't it do that already?
09:45 imirkin_: no.
09:45 imirkin_: it, in fact, disables dri3
09:45 karolherbst: imirkin_: well reducing a mad to mul may make other optimizations possible, doesn't it? I will check if that shader benefits from this
09:46 metalhead33: karolherbst's ebuild enables it
09:46 xexaxo: metalhead33, imirkin: if one's using mesa 11.0 or later just $LIBGL_DEBUG=verbose glxinfo
09:46 imirkin_: karolherbst: i'm not saying it's a bad idea
09:46 zeq: imirkin_, metalhead33: might do better with modesetting driver
09:46 karolherbst: imirkin_: I know
09:46 xexaxo: props to mupuf for that one :)
09:46 imirkin_: karolherbst: i'm saying that we just don't do it, and it's a bit of a pain to integrate it into the code
09:46 imirkin_: karolherbst: i think a new opnd3 function would make sense
09:47 imirkin_: since it really is just going to be for MAD... i can't think of any other ops
09:48 karolherbst: maybe I will manage to write that
09:48 scaroo_: metalhead33, xexaxo is right, try 'LIBGL_DEBUG=verbose glxinfo | grep DRI'
09:48 karolherbst: imirkin_: so ConstantFolding::opnd3 should be added
09:49 karolherbst: and called for mad instead of opnd?
09:49 scaroo_: metalhead33: you should read 'libGL: Using DRI3 for screen 0'
09:49 metalhead33: libGL: Using DRI2 for screen 0
09:49 metalhead33: libGL: Can't open configuration file /root/.drirc: No such file or directory.
09:49 scaroo_: metalhead33: you should run those as user, not rooy
09:50 scaroo_: root
09:50 metalhead33: libGL: Using DRI2 for screen 0
09:50 metalhead33: libGL: Can't open configuration file /home/metalhead33/.drirc: No such file or directory.
09:50 scaroo_: (but you ll also have those file missing warning, disregard them)
09:50 karolherbst: imirkin_: shouldn't fma also benefit from that? or could that lead to a different result in the fma case?
09:51 imirkin_: karolherbst: yeah. fma + mad :)
09:51 karolherbst: k
09:51 scaroo_: metalhead33: anyway you are indeed in dri2 mode, so yeah rebuild mesa with the right flags....
09:51 imirkin_: the only other 3-op thing i can think of is bfi, but that wouldn't benefit from knowing the 3rd operand was constant
09:52 scaroo_: metalhead33: (or bget a nice rawhide, where everything is up to date, well flaged and already compile for you :P)
09:52 karolherbst: imirkin_: so opnd checks either the 1st or 2nd arg, where opnd3 will check the 3rd one, right?
09:52 imirkin_: scaroo_: i suspect the DDX doesn't have the support.
09:52 imirkin_: karolherbst: right.
09:53 metalhead33: okay, gonna emerge mesa again
09:53 metalhead33: even though equery u mesa already says it has dri3
09:53 zeq: karolherbst, RSpliet: imirkin mentioned earlier that I should speak to you guys about anything I can do to help get reclocking going on fermi. Is there anything that might help? I do have a little experience with Linux driver coding (mostly fixing bugs and forward/backporting). I can certainly test patches etc. I also have an nv35 for a little while, but as I mentioned earlier, it's going to a friend, but I should be able to try out patches on it
09:53 zeq: occasionally.
09:54 karolherbst: zeq: you could try out if this patch series works for you: http://lists.freedesktop.org/archives/nouveau/2016-January/023773.html
09:54 scaroo_: imirkin_: but but but the ddx aint involved AFAIK? metalhead33: before remerging, try to force dri3 init in xorg as previously instructed
09:54 karolherbst: and if it does, we really only have to concentrate on memory reclokcing for now
09:54 scaroo_: imirkin_: here i dont even have an xserver (wayland gnome-shell) and dri3 is up and running
09:55 metalhead33: how do I force it to do that? I get it, I am supposed to edit .xinitrc
09:55 metalhead33: but other than that?
09:56 scaroo_: metalhead33: nop, xorg.conf with what imirkin_ linked you to)
09:56 metalhead33: I already did that and when I logged in, the file got deleted
09:56 scaroo_: metalhead33: Section "Device" Option "DRI" "3" EndSection or something
09:56 metalhead33: http://hastebin.com/woxibuwapo.cmake this one, yes
09:57 scaroo_: metalhead33: try putting that in /etc/X11/xorg.conf.d/00-dri3.conf instead
09:58 scaroo_: metalhead33: xorg should NOT delete your conf file, Ever.
09:58 metalhead33: okay, got it...
09:59 imirkin_: karolherbst: btw, i've pushed a handful of minor opts in recent times. i guess they didn't move the needle on any of these benchmarks?
09:59 karolherbst: mhh let me check
09:59 imirkin_: it was mostly stuff to do with indirect register access, so probably not
09:59 imirkin_: coz no serious app is dumb enough to do that
10:01 metalhead33: how do I force the X server to restart again?
10:01 karolherbst: imirkin_: I think pixmark_piano is mainly arithmetic stuff lots of crazy things like cos(sqrt(sin(wtvr(bla))))
10:01 scaroo_: # systemctl restart gdm.service (if systemd and gdm of course)
10:02 scaroo_: metalhead33: else #init 3 then #init 5
10:02 scaroo_: or Ctrl-Alt-BackSpace if not inhibited in xorg.conf
10:02 metalhead33: Failed to restart gdm.service: Unit gdm.service failed to load: No such file or directory.
10:03 zeq: karolherbst: I'll give it a try. That's everything *but* memory reclocking?
10:04 karolherbst: yeah
10:04 zeq: karolherbst: It should help, but I think it's memory that's the biggest killer, at least on my laptop
10:05 metalhead33: And it still uses DRI2 mode...
10:05 metalhead33: Why?
10:05 karolherbst: zeq: yeah it helps like around 25%
10:06 karolherbst: zeq: depends on your clocks though
10:06 zeq: karolherbst: is there a public git repo I can pull from?
10:06 karolherbst: zeq: https://github.com/karolherbst/nouveau
10:06 zeq: yeah, I'm lazy
10:06 karolherbst: fermi branch
10:06 zeq: ok
10:06 karolherbst: but there is a bunch of other fermi stuff in there
10:06 karolherbst: shouldn't effect you though
10:07 karolherbst: zeq: my branches are usually 4.4 compatible
10:07 metalhead33: It did not delete /etc/X11/xorg.conf this time, but it still uses dri2
10:07 zeq: I rebase quite often with Linus, I won't have a problem merging if necessary
10:08 karolherbst: imirkin_: I think this shader lost one instructions after rebase on master
10:08 imirkin_: karolherbst: heh
10:08 karolherbst: or not...
10:08 metalhead33: Why didn't this new setting force dri3?
10:08 karolherbst: well hard to rember if it was like 3901 or 3902 instructions :D
10:08 karolherbst: metalhead33: because some say dri3 is unstable
10:09 karolherbst: metalhead33: ohh wait
10:09 metalhead33: In other words, all hope is lost
10:09 scaroo_: metalhead33: somehow your mesa do not support dri3, reemerge making sure it compile that in
10:09 zeq: karolherbst: from testing in windows7, the memory is stable right up to the maximum. I don't know what chips Dell used but they're more than quick enough
10:09 karolherbst: metalhead33: that's what I use as /etc/X11/xorg.conf: https://gist.github.com/karolherbst/7dbca92163a7239f86ff
10:09 metalhead33: http://hastebin.com/babicekamu.coffee
10:10 zeq: default is something like 50MHz though as I mentioned earlier
10:10 karolherbst: zeq: that's really low
10:10 metalhead33: hmmm
10:10 karolherbst: zeq: maybe the improvement will be much more
10:11 zeq: maybe it was higher than that, but it's what I remember. It's definitely really slow!
10:11 metalhead33: well, I modified my xorg files accordingly... let's test it out that way then
10:12 metalhead33: also, while I'm briefly gone for 1-2 minutes, please see if this has anything that indicates the problems: http://hastebin.com/babicekamu.coffee
10:15 metalhead33: This is driving me nuts... Still DRI2.
10:17 karolherbst: metalhead33: maybe your xorg-server doesn't support dri3 :/
10:17 metalhead33: Then I have no choice... I must give up on Nouveau.
10:17 metalhead33: Unless re-emerging Xorg solves it
10:17 karolherbst: yeah, try that :D
10:17 zeq: metalhead33: make sure xorg-server is emerged with glamor and try the modesetting driver
10:18 metalhead33: okay, modesetting driver first then
10:18 karolherbst: zeq: why glamor and why modesetting?
10:18 karolherbst: this is intel
10:19 scaroo_: BTW i was pleasantly surprise that modesetting + glamor permormes as wellm if not better, than intel sna for RENDER workloads
10:19 zeq: I had trouble with interop between recent kernels intel ddx and nouveau PRIME
10:19 zeq: modesetting just works
10:19 metalhead33: x11-base/xorg-drivers or x11-base/xorg-server?
10:19 zeq: glamor is fast enough on my ivb
10:20 scaroo_: yep, hw specific ddx should die :)
10:21 metalhead33: [blocks B ] x11-drivers/xf86-video-modesetting ("x11-drivers/xf86-video-modesetting" is blocking x11-base/xorg-server-1.17.4)
10:22 imirkin_: remove it
10:22 imirkin_: xf86-video-modesetting was moved into xserver
10:22 metalhead33: oh...
10:22 metalhead33: But zeq keeps telling me to emerge x11-drivers/xf86-video-modesetting-0.9.0
10:22 metalhead33: and glamor
10:23 metalhead33: karolherbst says I should re-emerge the xorg-server (or xorg-drivers?) before giving up on nouveau
10:23 imirkin_: you don't want that... xf86-video-intel is good.
10:23 metalhead33: What should I do?
10:23 zeq: metalhead33: are you emerging the latest xorg-server?
10:23 metalhead33: I will be
10:23 metalhead33: also, re-emerging
10:23 metalhead33: since I already have the latest one
10:24 zeq: metalhead33: I only use live ebuilds, so I'm not sure what the current portage tree version supports
10:24 metalhead33: 1.17.4
10:24 zeq: recent upstream has merged modesetting
10:24 karolherbst: it works here for me
10:24 karolherbst: intel ddx, with sna + xorg-server 1.17.4
10:24 karolherbst: zeq: we don't need modesetting here
10:24 karolherbst: zeq: seriously
10:26 metalhead33: well, I am re-emerging the xorg-server
10:27 metalhead33: but I doubt it will work that well
10:27 metalhead33: maybe if I re-emerged the xorg-drivers...
10:27 zeq: karolherbst: okay
10:28 metalhead33: okay, re-emerged xorg-serve
10:28 metalhead33: now gotta restart the X-server
10:28 karolherbst: I think his xorg-server just doesn't have the dri3 support
10:28 karolherbst: for whatever reasons
10:28 karolherbst: there should be a line like "Loading sub module "dri3"" even when the intel ddx doesn't support it
10:29 metalhead33: And it's STILL ****in' DRI2
10:29 scaroo_: karolherbst: well intel ddx has not much edge, if any, over modesetting+glamor. But well, this is another debate.
10:29 karolherbst: metalhead33: do you have minimal set?
10:29 metalhead33: Yes.
10:29 karolherbst: ...
10:30 karolherbst: don't set that on the xorg-server
10:30 metalhead33: wait
10:30 metalhead33: What do you mean by minimal set?
10:30 metalhead33: Like... that tiny xorg.conf?
10:30 karolherbst: xorg-server[minimal]
10:30 metalhead33: oh, nope, not that
10:30 metalhead33: I don't know about that
10:30 karolherbst: k
10:31 metalhead33: as I said, Xorg.0.log says DRI2
10:31 karolherbst: metalhead33: could you do this: ebuild /usr/portage/x11-base/xorg-server/xorg-server-1.17.4.ebuild configure
10:31 karolherbst: and give us the output
10:31 scaroo_: metalhead33: can you try replacing Option "DRI" 3 by Option "DRI3" "On"
10:31 zeq: is 1.17.4 new enough?
10:31 karolherbst: yes
10:32 zeq: just checking :)
10:32 metalhead33: http://hastebin.com/ikudedodeq.vhdl
10:32 karolherbst: imirkin_: how can I check if the ImmediateValue is 0?
10:33 metalhead33: ah
10:33 imirkin_: imm.isInteger(0)
10:33 metalhead33: trying it in root
10:33 metalhead33: one moment
10:33 karolherbst: imirkin_: thanks
10:33 metalhead33: http://hastebin.com/disuwirawu.js
10:34 karolherbst: imirkin_: yay, two in the entire shader
10:34 karolherbst: mhh "configure: DRI3 enabled"
10:35 imirkin_: karolherbst: yeah, it's not phenomenally frequent occurrence
10:35 karolherbst: imirkin_: fma is also converted to mul?
10:35 imirkin_: yes
10:35 imirkin_: same exact thing as mad
10:35 imirkin_: some day we should probably start caring
10:35 imirkin_: but not today
10:36 karolherbst: imirkin_: so I do op = OP_MUL; setSrc(2, NULL); ?
10:36 karolherbst: or did I forget anything
10:36 metalhead33: okay, I folowed Zeq's instructions by replacing
10:36 metalhead33: (19.31.24) zeq: is 1.17.4 new enough?
10:36 metalhead33: (19.31.28) karolherbst: yes
10:36 metalhead33: (19.31.46) zeq: just checking :)
10:36 metalhead33: (19.32.30) metalhead33: http://hastebin.com/ikudedodeq.vhdl
10:36 metalhead33: (19.32.35) karolherbst: imirkin_: how can I check if the ImmediateValue is 0?
10:36 metalhead33: (19.32.41) mquin elhagyta a szobát (quit: Quit: Changing server).
10:36 metalhead33: (19.32.46) metalhead33: ah
10:36 metalhead33: (19.32.48) imirkin_: imm.isInteger(0)
10:36 metalhead33: (19.32.48) metalhead33: trying it in root
10:36 metalhead33: (19.32.49) metalhead33: one moment
10:36 metalhead33: (19.33.04) karolherbst: imirkin_: thanks
10:36 metalhead33: accidental flood...
10:36 imirkin_: karolherbst: no, you're good...
10:37 metalhead33: anyway, I did what scaroo suggested, replacing Option "DRI" 3 by Option "DRI3" "On"
10:37 metalhead33: let's see if it works or not
10:37 imirkin_: nope, it won't
10:37 metalhead33: ...
10:37 metalhead33: Then my only option really is to give up on Nouveau.
10:37 imirkin_: intel uses DRI, not DRI3. i think DRI3 has been deprecated as an option.
10:37 karolherbst: imirkin_: yeah, it worked :)
10:38 metalhead33: well clearly, using Option "DRI" "3" has failed
10:38 imirkin_: metalhead33: might be best. sounds like you're using a system you're not fully capable of operating =/
10:39 metalhead33: Well then... I'll have to disable Nouveau in the kernel.
10:39 karolherbst: imirkin_: commit message should start with nv50/ir?
10:39 imirkin_: karolherbst: ya
10:42 karolherbst: imirkin_: https://github.com/karolherbst/mesa/commit/29fd482f328f825f550d91bfae4bade89cb6857d
10:42 karolherbst: imirkin_: I doubt there are any shader-db changes, so I won't check that
10:43 imirkin_: fine by me
10:43 imirkin_: that commit kinda sucks though
10:43 imirkin_: you want a separate if, i think
10:44 imirkin_: e.g. just coz arg0 or arg1 is an imm doesn't mean you don't want to do the arg2 thing
10:44 metalhead33: Is unmerging the Nouveau driver and getting it out of the Kernel enough to get the job done?
10:44 karolherbst: mhh yeah right
10:44 karolherbst: makes sense
10:54 karolherbst: imirkin_: you were right, now there is a third hit
10:54 imirkin_: lol
10:54 imirkin_: out of... 3000 instructions?
10:54 karolherbst: 3900
10:55 imirkin_: my bad :)
10:55 karolherbst: :D
10:55 karolherbst: but now I messed up
10:55 karolherbst: "mad ftz f32 $r30 $r28 $r29 $r63 " => "mul ftz f32 $r30 $r28 6.283100"
10:55 imirkin_: that makes sense
10:55 karolherbst: ohh wait
10:55 karolherbst: right
10:55 imirkin_: :)
10:55 karolherbst: one reg less used°
10:55 karolherbst: yay
10:55 imirkin_: there is presently not LIMM mad supported
10:56 imirkin_: coz on some gpu revisions it requires src2 == dst
10:56 imirkin_: but on gk110 it's fully supported
10:57 imirkin_: so i have a commit to add emission for it
10:57 imirkin_: and then i should update the target
10:57 imirkin_: but on gm107 it's unsupported again
10:57 imirkin_: so actually maybe gk110 isn't quite there either
10:57 karolherbst: https://github.com/karolherbst/mesa/commit/4bb6f4e4b6c5bb0bae60f9b0d256052c67e50492 :)
10:58 RSpliet: karolherbst: not using reg 63 doesn't make a difference though, it's hard-wired to 0 anyway and will always be there
10:59 karolherbst: RSpliet: $r29
10:59 RSpliet: you might win ever-so-slightly on compression though :-)
11:00 RSpliet: karolherbst: ehhhh...
11:01 imirkin_: RSpliet: you also gain opportunities to merge it with a *2 down the line
11:01 karolherbst: though in that shader it doesn't matter cause $r29 is used elsewhere
11:01 imirkin_: RSpliet: this isn't some huge super-impactful op, but it may help a handful of cases
11:01 karolherbst: RSpliet: I think this mad/fma => mul makes it possible to optimize even more later
11:02 RSpliet: imirkin_, karolherbst: oh right, I'
11:02 RSpliet: m with G80 in the back of my mind, where the imm is always SRC2
11:03 imirkin_: RSpliet: actually that's true with fermi too
11:03 imirkin_: depending how you count
11:03 RSpliet: SRC2 in envydocs terms
11:03 karolherbst: and I am sure this will also help pipelining or dual issuing somehow or something really unnoticable :D
11:03 RSpliet: karolherbst: I actually expect it to use the same hardware :-P
11:03 karolherbst: :D
11:04 karolherbst: what is slct?
11:04 imirkin_: x = a ? b : c
11:05 imirkin_: although that might not be the literal instruction order
11:05 imirkin_: and it's like x = a < 0 ? b : c
11:05 imirkin_: or something
11:05 karolherbst: cuda says
11:05 karolherbst: d = (c >= 0) ? a : b;
11:05 karolherbst: Read more at: http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#ixzz3y0AI8Tcd
11:05 karolherbst: Follow us: @GPUComputing on Twitter | NVIDIA on Facebook
11:05 karolherbst: ....
11:05 karolherbst: stupid js copy hacks
11:05 karolherbst: how I hate those
11:05 RSpliet: karolherbst: why'd you say $r29 though? if i->src(2) is an imm() of 0, it'd be emitted as $r63
11:05 imirkin_: there are other variants though -- you can flip the sign at least
11:06 karolherbst: RSpliet: read again
11:06 karolherbst: RSpliet: "mad ftz f32 $r30 $r28 $r29 $r63 " => "mul ftz f32 $r30 $r28 6.283100"
11:06 imirkin_: RSpliet: it allows another arg to be inlined
11:07 RSpliet: ah, the example wasn't complete ;-)
11:07 karolherbst: well it is, because $r29 might be still in use and needed
11:08 karolherbst: in fact it is
11:08 karolherbst: but maybe there are some shaders out there which benefit from this a bit more
11:09 karolherbst: imirkin_: right, what is the benefit of doing this? mov u32 $r3 0x00000000
11:09 RSpliet: karolherbst: possibly :=_
11:09 RSpliet: :-)
11:09 imirkin_: karolherbst: none.
11:09 karolherbst: I saw something like that also in nvidia binaries :/
11:10 karolherbst: where registers are just set to 0
11:10 imirkin_: karolherbst: it can happen due to idiocy
11:10 imirkin_: karolherbst: it can also come up for e.g. tex instructions
11:10 imirkin_: if you need to stick a 0 in there
11:10 imirkin_: you can't just use $r63 since they're a group
11:10 karolherbst: it is in a vertex shader though
11:10 karolherbst: ohh no, fragment
11:11 karolherbst: imirkin_: I am not sure here though: https://gist.github.com/karolherbst/92a6e80b714ee8cf3ba4#file-gistfile1-txt-L26
11:11 karolherbst: there are three movs wich depend on $r3 after a prebreak
11:16 imirkin_: karolherbst: yeah, that's one of the dumb places it can come from
11:16 imirkin_: karolherbst: it gets introduced in one of the out-of-ssa/ra passes
11:17 imirkin_: don't fight it, just learn to like it :)
11:17 karolherbst: imirkin_: well I think about if that mov $r3 is droped and those mov $r.. $r3 are changed to $r.. $r63 it could free the $r3 register, but I have no clue how all of that branching works
11:17 imirkin_: yeah, but they're all inserted way after optimizations
11:17 karolherbst: mhh okay
11:17 imirkin_: i'd let it be :)
11:18 imirkin_: i've looked into it before
11:18 karolherbst: there are so many max/min $r1 $r1 $r63 thingies in that shader :/
11:18 imirkin_: and i didn't see a clear path to fixing it
11:18 karolherbst: yeah, I first try to find trivial stuff
11:18 karolherbst: maybe non trivial stuff gets more trivial that way :D
11:20 imirkin_: karolherbst: unfortunately i've picked off a lot of the low-lying fruit
11:20 imirkin_: generally nowadays i don't see obvious fail in shaders
11:20 imirkin_: it's more of the non-obvious fail like a lot of pointless moves
11:20 imirkin_: suboptimal RA
11:20 imirkin_: etc
11:21 karolherbst: what does sub do? d = s1 - s2?
11:21 imirkin_: yes
11:21 karolherbst: sub ftz f32 $r24 $r63 $r27
11:21 imirkin_: that should be neg
11:21 karolherbst: right
11:21 karolherbst: as you see there is still some trivial stuff :D
11:22 imirkin_: yeah, but const folding happens too late for it to matter
11:22 imirkin_: we do modifier folding first
11:23 karolherbst: "sub ftz f32 $r24 $r63 $r27" => "neg ftz f32 $r24 $r27"
11:23 karolherbst: next instruction is join mad ftz f32 $r26 $r24 $r19 $r27
11:24 imirkin_: yeah, but the folding happens too late
11:24 imirkin_: the modifier folding happens *before* this pass
11:24 imirkin_: we don't run to a fixed point
11:24 karolherbst: yeah I know
11:24 karolherbst: so the neg won't be moved into the mad in the end
11:25 imirkin_: you could try using that knowledge and propagate it right there and then
11:25 imirkin_: basically iterate over all the uses of that def and just push it forward
11:25 imirkin_: make some of the ModifierFolding thing a utility function you can call
11:26 imirkin_: and run it whenever you generate a OP_NEG or OP_ABS or whatever
11:26 karolherbst: okay, so first step would be "sub ftz f32 $r24 $r63 $r27" => "neg ftz f32 $r24 $r27" in const folding
11:26 imirkin_: same thingw ould make sense if you had an op like add dst, src0, neg src1 -- and src0 == 0
11:26 imirkin_: that'd become OP_NEG dst src1
11:28 imirkin_: karolherbst: i.e. look at the OP_ADD case, it's the same thing
11:28 imirkin_: if it comes out as a OP_CVT, try folding
11:28 imirkin_: into the dests
11:28 imirkin_: or... try running ModifierFolding a second time? dunno.
11:29 karolherbst: right, there is no OP_SUB case
11:31 karolherbst: imirkin_: whats with that if (i->usesFlags()) break in OP_ADD?
11:31 imirkin_: heh
11:32 imirkin_: so like if one of the sources is a carry
11:32 imirkin_: you can't just get rid fo it
11:32 karolherbst: ohhh k
11:32 imirkin_: (yeah, annoying, i know)
11:32 imirkin_: the ir isn't *super* great about listing out all the available options
11:33 karolherbst: mhhh
11:33 karolherbst: now I have to think
11:33 imirkin_: or validating the various cases for legality
11:34 karolherbst: what if $r63 has any modifier in that case?
11:34 imirkin_: no amount of modifiers will make 0 be not 0
11:34 karolherbst: mhh or is there any modifier which would matteR?
11:34 karolherbst: ahh k
11:35 karolherbst: I have to handle sub $r1 $r63 $r2 and sub $r1 $r2 $r63, and s tells me where the immediate value is, okay
11:36 karolherbst: imirkin_: how do I do "i->op = i->src(0).mod.getOp();" in the OP_SUB case?
11:36 imirkin_: just like that :p
11:36 karolherbst: just do OP_NEG and leave the mod in src1?
11:37 karolherbst: src0
11:37 karolherbst: imirkin_: "i->op = i->src(0).mod.getOp();" is from OP_ADD ;)
11:37 imirkin_: you want to merge a NEG modifier in too though
11:37 imirkin_: and you want src1's modifier, not src0
11:37 karolherbst: imirkin_: src1 is moved to src0 if the immediate is at src0
11:38 imirkin_: modifier is separate
11:38 imirkin_: you want to do like
11:38 karolherbst: ohh k
11:38 imirkin_: Modifier mod = i->src(1).mod | NEG
11:39 karolherbst: imirkin_: why src(1)? I do the same src1->src0 copy as with OP_ADD: https://github.com/karolherbst/mesa/blob/4bb6f4e4b6c5bb0bae60f9b0d256052c67e50492/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp#L999-L1012
11:40 imirkin_: ah, you copy the mod. ok.
11:40 imirkin_: but you forget to add the neg
11:40 imirkin_: oh. that's still OP_ADD. no neg required.
11:40 karolherbst: yes
11:40 imirkin_: but with OP_SUB you have to throw an extra OP_NEG in
11:40 karolherbst: I know that the SUB case is a bit more tricky
11:40 karolherbst: :d
11:40 imirkin_: er, NV50_IR_MODIFIER_NEG or whatever
11:41 imirkin_: the Modifier() class has all sorts of overloads too
11:41 imirkin_: which are neat
11:41 imirkin_: i wouldn't be surprised if there were an operator-() overload :)
11:41 karolherbst: imirkin_: why not leave the mods of src0 and just do a op = OP_NEG?
11:41 imirkin_: heh
11:41 imirkin_: you can't have an OP_NEG with mods
11:41 imirkin_: that will just fail
11:42 imirkin_: elsewhere
11:42 karolherbst: ohhh okay
11:42 imirkin_: in truth, there shouldn't be an OP_NEG at all
11:42 imirkin_: we should just have an OP_CVT and move on
11:42 imirkin_: but... here we are.
11:43 karolherbst: CVT means what?
11:43 imirkin_: "it was like that when i got here"
11:43 imirkin_: convert
11:43 karolherbst: ahh okay
11:43 imirkin_: but it can also apply various modifiers
11:43 imirkin_: as part of the conversion
11:43 imirkin_: like neg, abs, sat
11:43 karolherbst: Modifier(0).getOp is OP_CVT?
11:43 imirkin_: or OP_MOV
11:43 imirkin_: Modifier(0).getOp() should be OP_MOV
11:44 karolherbst: okay
11:45 karolherbst: then I do "i->op = i->src(0).mod.getOp(); if (i->op != OP_CVT) i->src(0).mod = Modifier(0); i->src(0).mod |= NV50_IR_MODIFIER_NEG;" ?
11:45 imirkin_: wrong order.
11:45 imirkin_: first add the neg in
11:45 imirkin_: then do the stuff
11:46 karolherbst: do I have to do Modifier(NV50_IR_MODIFIER_NEG) ?
11:47 karolherbst: ohh there is no operator|= overload :/ sad
11:50 karolherbst: ohhh wait
11:50 karolherbst: imirkin_: I can only do that when the immedaite is at src0
11:50 karolherbst: because at src1 it is a different story...
11:51 imirkin_: right
11:51 imirkin_: you need to deal with those differently
11:52 imirkin_: or you can just add in the op_neg if s == 1
11:52 imirkin_: er
11:52 imirkin_: if s == 0
11:52 imirkin_: er. figure out which is which
11:52 imirkin_: heh
11:52 imirkin_: i can never remember
11:52 karolherbst: src0 == 0 => neg
11:52 karolherbst: otherwise $r1 - 0x0 ;)
11:53 imirkin_: exactly.
11:53 karolherbst: ahh this makes the code actually a bit simplier
11:53 imirkin_: :)
11:54 karolherbst: I can just move the "| Modifier(NV50_IR_MOD_NEG)" up into the if (s==0) case
11:54 karolherbst: i->src(0).mod = i->src(1).mod | Modifier(NV50_IR_MOD_NEG);
11:54 imirkin_: yup
11:54 imirkin_: some day i should prune this junk out of the ops list
11:55 imirkin_: but... meh
11:57 karolherbst: https://github.com/karolherbst/mesa/commit/e0c736f0ceacbed4db50bba81506e85238de45de
11:58 karolherbst: okay, now I have this nice "neg ftz f32 $r24 $r27 (8) join mad ftz f32 $r26 $r24 $r19 $r27 (8)"
11:59 karolherbst: imirkin_: running ModifierFolding after ConstantFolding: 3902 => 3878 instructions
12:00 karolherbst: yeah, this is somewhat noticeable
12:01 imirkin_: neat.
12:02 karolherbst: but I think there was another thing we could do
12:03 karolherbst: imirkin_: algebraic after the last dead code: 3902 => 3856 instructions
12:03 karolherbst: before dead code: 3902 = > 3900
12:04 imirkin_: karolherbst: heh. stick a loop on it? :)
12:04 karolherbst: no, this are the only two things
12:04 karolherbst: I checked that already some months ago
12:05 karolherbst: with both: ModifierFolding after ConstantFolding and algebraic after dead code: 3902 => 3833 instructions
12:05 imirkin_: i could see doing algebraic/modifier/constant folding in a loop like 2x or so
12:05 imirkin_: they're relatively cheap
12:05 karolherbst: cuts frame time in pixmark_piano by about 1ms (from 62.5ms to 61.5ms)
12:05 imirkin_: wow
12:06 imirkin_: that's like 2%
12:06 imirkin_: i'll try it on shader-db and see what happens
12:06 imirkin_: not right now tho
12:08 karolherbst: but I guess it would be possible to make the other things a bit smarter depending on what does change
12:09 karolherbst: imirkin_: I ran the optimization now three times: 3833 instructions
12:09 imirkin_: karolherbst: i'm talking about something liek this:
12:10 imirkin_: http://hastebin.com/iqimirijej.coffee
12:10 karolherbst: I was just checking if there is more potential
12:10 imirkin_: but a little smarter
12:10 karolherbst: nope
12:10 imirkin_: so that it checks if there's progress
12:10 karolherbst: AlgebraicOpt has to be after DeadCodeElim
12:10 imirkin_: uhhhhhhhhh
12:10 karolherbst: before DeadCodeElim it only cuts 2 instructions
12:10 imirkin_: oh, lame
12:10 karolherbst: after, a lot more
12:10 imirkin_: coz of the stupid thing
12:10 imirkin_: gr
12:10 imirkin_: coz it makes decisions based on refcocunts
12:10 karolherbst: yeah
12:11 imirkin_: stick a dead code thing in there at the front
12:13 karolherbst: imirkin_: this is enough: https://github.com/karolherbst/mesa/commit/17119cdc96450da007f54bcf78124a970eb59a69
12:13 karolherbst: even if I run all of them like three times, I still get only 3833 instructions
12:14 imirkin_: kk
12:16 karolherbst: what the
12:16 karolherbst: imirkin_: 68: mul ftz f32 $r17 neg $r13 neg $r13
12:16 karolherbst: ...
12:16 karolherbst: there are several of them
12:16 imirkin_: yeah, that happens and not a problem
12:16 imirkin_: at emission time the neg's get xor'd
12:17 karolherbst: k
12:17 karolherbst: it just looks funny
12:17 imirkin_: yea
12:17 imirkin_: but neg's are free anyways
12:17 imirkin_: they're just modifiers
12:18 karolherbst: okay, I think I removed all stupid $r63 from that shader
12:18 karolherbst: the remaining are in sclt,min or max
12:22 karolherbst: imirkin_: seems like most of the algebraic stuff is mul+add => mad
12:24 karolherbst: okay, now those mov $r1 0x0 thingies
12:25 imirkin_: they get added after opt
12:25 imirkin_: check nv50_ir_ra.cpp
12:25 imirkin_: enjoy :)
12:25 karolherbst: wait wait ;)
12:25 karolherbst: imirkin_: what about this? https://gist.github.com/karolherbst/92a6e80b714ee8cf3ba4#file-gistfile1-txt-L3625-L3627
12:26 karolherbst: same thing?
12:26 imirkin_: i think so yeah
12:27 karolherbst: k, seems like you are right
12:28 karolherbst: okay, then there is also no 0x0 thingy I could optimize
12:28 imirkin_: see, it's stuff like this that's annoying:
12:28 imirkin_: 3054: mov u32 $r0 $r25 (8)
12:29 imirkin_: 3055: mov u32 $r1 $r26 (8)
12:29 imirkin_: 3056: tex 2D $r0 $s0 f32 $r28 $r0d (8)
12:29 imirkin_: instead it could have had those values in $r26 + $r27
12:29 imirkin_: and used $r26d
12:30 karolherbst: how did I run shader-db again?
12:30 imirkin_: ./run shaders
12:30 karolherbst: and if I want to have a nice result file?
12:30 imirkin_: pipe the output of the 2 runs to some file
12:30 imirkin_: and then do python nv-report.py old new
12:32 karolherbst: k, I got crashes :D
12:32 scaroo_: https://paste.gnome.org/pbjnhx9ij
12:32 karolherbst: seems like I did something wrong
12:33 karolherbst: ohhh
12:33 karolherbst: imirkin_: running those both runs twice makes it crash
12:33 scaroo_: oops, polari does suck but at leaste has a pastebin integration, here is it again:
12:34 imirkin_: scaroo_: difficult to do since it depends a lot on the specific board.
12:34 karolherbst: in the algebraic :/
12:34 imirkin_: karolherbst: not surprised =/
12:34 imirkin_: it sees something it doesn't expect
12:34 karolherbst: well at least the modifier one can be run without a crash
12:35 scaroo_: imirkin_: but i guess sthe results are quite coherent within a chip family, at least the idea would be to have a coarse granularity so any fermi lets say is a 300 at the current time, to be compared with the intel igp in the user's machine
12:36 karolherbst: imirkin_: wuhu total instructions in shared programs : 106909 -> 106877 (-0.03%) :D
12:36 imirkin_: scaroo_: not really. some fermi boards boot to 50mhz, others to 500mhz
12:36 imirkin_: scaroo_: as you can imagine, there's a bit of a perf difference between those
12:38 scaroo_: well then i guess ferm'is should be rated low, this way the igpu would take precedence until reclock is in and the perfdb reflects it
12:44 karolherbst: imirkin_: https://github.com/karolherbst/mesa/commit/2c1595a0f9c4c916ca4750af806158aa8a170c1e
12:44 karolherbst: I get more gprs used :/
12:44 karolherbst: but why=
12:44 karolherbst: ?
12:52 karolherbst: ohh some bad reordering
12:56 karolherbst: imirkin_: crash is here: https://github.com/karolherbst/mesa/blob/55216593885b836cef66054e2159ad91effcac18/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp#L1843
12:56 karolherbst: insn == NULL
12:57 imirkin_: uhhhhh
12:57 imirkin_: did an immediate make it the cvt?
12:57 karolherbst: handleCVT_NEG is run before
12:57 karolherbst: handleCVT_NEG(i); handleCVT_CVT(i);
12:57 imirkin_: oh wait yeah
12:58 imirkin_: it can be cvt a[0] or something
12:58 imirkin_: or c[]
12:58 imirkin_: which won't have an insn
12:58 imirkin_: are you running LoadPropagation before this?
12:58 imirkin_: LoadPropagation has to be after
12:58 karolherbst: ohh uhhh
12:58 karolherbst: this is after deadcode
12:59 karolherbst: the last one
12:59 imirkin_: yeah you can't do that
13:00 karolherbst: but... total instructions in shared programs : 103007 -> 102020 (-0.96%)
13:03 imirkin_: cool
13:03 karolherbst: https://github.com/karolherbst/mesa/commit/5f23cb7ae5763a4766f50aa65d7bfccbc2d368c9
13:03 karolherbst: hacky if(!insn) though
13:03 imirkin_: yeah you can't do that.
13:03 karolherbst: I know
13:04 imirkin_: the proper fix is to do it for real
13:04 imirkin_: with a for loop
13:04 imirkin_: very much like i had it
13:04 karolherbst: yeah, this just shows the potential
13:04 imirkin_: and stick the DCE into the for loop
13:04 imirkin_: before AlgebraicOpt or something
13:04 karolherbst: the algebraicopt only works after the last deadcode though
13:05 karolherbst: otherwise: total instructions in shared programs : 103007 -> 102983 (-0.02%)
13:05 imirkin_: ugh
13:05 karolherbst: and this is like nothing
13:05 imirkin_: you are not listening to what i'm saying.
13:05 imirkin_: please reread what i wrote
13:05 imirkin_: and internalize
13:05 imirkin_: and then apply
13:06 imirkin_: the thing i'm saying will work, probably better than what you've done.
13:06 karolherbst: ohh dce is dead code elimination
13:06 karolherbst: ...
13:06 karolherbst: now it makes sense
13:07 karolherbst: okay, still crashing
13:08 imirkin_: same spot?
13:08 karolherbst: didn't check yet
13:08 karolherbst: yes
13:08 karolherbst: same spot
13:08 imirkin_: so wtf is in that cvt?
13:08 imirkin_: and can i see your diff?
13:09 karolherbst: ohh wait, I had some leftovers from the last change :/
13:10 karolherbst: uhh -1.02%
13:10 imirkin_: yay
13:10 imirkin_: not *much* better, but.. .better :)
13:10 karolherbst: but now more gprs used
13:11 karolherbst: but I still forgot something
13:11 imirkin_: can't win 'em all
13:13 karolherbst: mhh -1.03% instructions, but 0.13% gprs
13:13 karolherbst: and I bet this is some ordering bs again
13:14 imirkin_: can i see your diff?
13:14 karolherbst: imirkin_: https://github.com/karolherbst/mesa/commit/6a765769fa2306665f3dbb2790100535a20c6d3a
13:14 imirkin_: more registers isn't necessarily bad -- you could do everything with just like 4 regs and use a lot of movs :)
13:14 karolherbst: didn't update message
13:15 karolherbst: yeah, but I checked the shaders
13:15 karolherbst: some mad thignies got moved way up as muls
13:15 karolherbst: ohh that was for the other optimizazion
13:15 imirkin_: hm weird.
13:15 karolherbst: will check what changes
13:15 imirkin_: that's roughly how i was thinking about it.
13:16 karolherbst: we don't reorder instructions yet, so something like that can happen
13:16 karolherbst: I will show you the difference and you will understand
13:20 karolherbst: imirkin_: there are some with less gprs though
13:20 imirkin_: karolherbst: that's the hurt/helped thing
13:21 karolherbst: right
13:21 imirkin_: it counts the # of shaders hurt or helped for that particular metric
13:22 scaroo_: Alex is a photograph that loves to play. He buys quite a good laptop that, as the sticker, says has "hybrid uberduper graphics". He doesn't care much, his friends said it should be fine to enjoy his xonotic addiction. Xonotic he installs then. It plays quite smooth and makes him happy. Now couple of week later, after being nicely asked by his OS to reboot, as usual, he grabs an appletini and launch the game. And Wow, it does feel bett
13:22 scaroo_: er, smoother in a way and is he finally able to stick it to preteen killah42. What happened he doesn't know but it is a great improvement.
13:22 scaroo_: Now the backstory is that using the perfdb, the igpu was at first deemed to be the better candidate. Later on, the nouveau heros (:P) pushes reclocking and/or tesselation support for the gpu family found in Alex's laptop. They also update the perfdb to reflect the bump incured. Distro packagers do there own magic, and both the newer driver and the perfdb find their way to Alex's storage device. The generic launcher (shell script|gnome
13:22 scaroo_: -shell|plasma|...), instructed by a .desktop metadata field (FATGL), identifies the "stronger" (gpu,driver) couple, probe dri for the display topology(basically which one drives the display), and pick the right DRI_PRIME value accordingly. And launch the fragfest. Alex smiles.
13:23 imirkin_: scaroo_: tl;dr?
13:23 imirkin_: you want a thing that figures out if DRI_PRIME is beneficial or not?
13:23 scaroo_: exactyl :)
13:23 karolherbst: imirkin_: or maybe it just counts wrongly?
13:23 imirkin_: karolherbst: inconceivable
13:23 imirkin_: scaroo_: good luck :)
13:23 karolherbst: imirkin_: gpr: 10 means there has to be a $r10?
13:24 imirkin_: karolherbst: $r9
13:24 imirkin_: 0..9
13:24 karolherbst: ahh
13:24 karolherbst: and gpr: 9 means there is no $r9?
13:24 imirkin_: hopefully!
13:24 karolherbst: 10: tex 2D $r9 $s0 f32 $r7 $r0d (8)
13:24 karolherbst: just found that
13:24 imirkin_: for something where it said gpr: 9?
13:24 karolherbst: yes
13:25 imirkin_: that's... disturbing
13:25 imirkin_: are you _sure_ you're looking at the right shader?
13:25 imirkin_: each thing will have multiple ones
13:25 karolherbst: yes
13:25 imirkin_: one vertex, one fragment, for example
13:25 karolherbst: yes
13:25 imirkin_: ok, well if our maxGPR counts are off, that's very bad
13:26 karolherbst: the other shader has gprs: 18
13:26 imirkin_: maybe it is inclusive then?
13:26 imirkin_: :)
13:26 karolherbst: :D
13:26 karolherbst: imirkin_: what is the best way to print the maxgpr through NV50_PROG_DEBUG?
13:26 karolherbst: then I will check that
13:27 imirkin_: there isn't one i think
13:27 karolherbst: well I can hack one in then
13:27 karolherbst: but where?
13:31 karolherbst: ahh I know
13:32 karolherbst: imirkin_: mhhh nouveau prints it right, but somehow shader-db adds 1 to it?
13:33 karolherbst: imirkin_: anyway, this looks wrong
13:34 karolherbst: imirkin_: https://gist.github.com/karolherbst/8fc17a04e206fb622e4b
13:34 karolherbst: reg usage 8
13:34 karolherbst: 10: tex 2D $r9 $s0 f32 $r7 $r0d (8)
13:34 imirkin_: ohhhhhhh
13:34 imirkin_: hehehe
13:34 imirkin_: no, it's right
13:34 karolherbst: gpr is prog->maxGPR
13:34 imirkin_: the print is wrong
13:34 imirkin_: that $r9
13:34 imirkin_: is not a register
13:34 karolherbst: ohhhh
13:34 karolherbst: k
13:34 imirkin_: it's actually a texture reference
13:35 karolherbst: anyway
13:35 karolherbst: this becomse this after running those passes again: https://gist.github.com/karolherbst/54cfc2b271a2af0fb111
13:36 karolherbst: change in 26,27
13:36 imirkin_: 0x3f0 = 0.5
13:36 imirkin_: weird
13:37 imirkin_: 21: mul ftz f32 $r8 $r7 4.000000 (8)
13:37 imirkin_: 23: mul ftz f32 $r0 $r0 $r8 (8)
13:37 imirkin_: that could be mul x^2 $r0 $r7
13:37 imirkin_: oh but $r8 is used a bunch of times. i see.
13:38 imirkin_: but... that would still work. ugh.
13:38 karolherbst: I am more worried about 26,27 though ;)
13:38 imirkin_: tryCollapseChainedMULs needs better heuristics
13:39 imirkin_: i don't see how that could be made any shorter...
13:39 karolherbst: no, the change adds a new gpr being used
13:39 karolherbst: $r9
13:39 imirkin_: oh
13:39 imirkin_: yeah
13:39 imirkin_: wtvr
13:39 imirkin_: don't worry about that
13:39 karolherbst: k
13:40 imirkin_: it's very difficult to super-optimize that stuff
13:40 karolherbst: so -1% instructions is "better" than 0.13% gprs more?
13:40 imirkin_: not sure why tryCollapseChainedMULs checks for refcount==1
13:44 imirkin_: and for stuff like "add ftz f32 $r0 $r0 1.000000" there's actually a "addpo" version of mad which adds 1 at the end
13:45 karolherbst: ohhh
13:45 karolherbst: that may come in handy
13:45 imirkin_: nouveau doesn't support that fwiw
13:46 imirkin_: but one could add that in OP_ADD handling
13:46 imirkin_: if one were inclined
13:46 imirkin_: (and add the relevant subOp handling everywhere)
13:49 karolherbst: "and for stuff like "add ftz f32 $r0 $r0 1.000000" there's actually a "addpo" version of mad which adds 1 at the end" this again, so there is mad which takes its normal stuff like mad $r1 $r2 $r3 $r4, but then adds a nother 1 at the end
13:49 imirkin_: yep
13:49 imirkin_: for good measure :)
13:49 karolherbst: well, this really sounds usefull to eliminate those +1 things
13:51 karolherbst: imirkin_: by the way: pixmark piano 1024x640 benchmark: 974 => 1004 points with all those changes
13:51 imirkin_: cool :)
13:52 karolherbst: now lets check heaven
13:53 karolherbst: wow heaven really hates debug builds
13:53 karolherbst: perf drop over 20%
13:54 imirkin_: don't forget about the drirc thing :)
13:55 karolherbst: yeah, but I will benchmark both runs with same drirc files
13:55 karolherbst: but now I got like 6 fps in the first scene
13:55 karolherbst: with drirc change I get 8, non debug and without drirc change 10
14:02 karolherbst: imirkin_: do you know how that addpo stuff gets encoded?
14:02 karolherbst: because then I could add this maybe
14:03 imirkin_: check envydis
14:03 karolherbst: but somehow I get the feeling that arithmetic optimizations make more sense somehow :/
14:08 karolherbst: imirkin_: in gf100.c tabaddop, tabaddop2 or tabvmop?
14:08 imirkin_: heh
14:08 imirkin_: i dunno, check
14:08 karolherbst: or the tabm struct
14:08 imirkin_: probably addop
14:08 imirkin_: look for T(addop)
14:08 imirkin_: and see which ops it's on
14:08 imirkin_: repeat for addop2
14:08 imirkin_: i suspect vmop is for vector ops
14:10 karolherbst: I have no idea how to read this stuff yet :D
14:24 karolherbst: imirkin_: any idea what we could do with "add ftz f32 $r7 $r7 1.000000" ?
14:25 imirkin_: if $r7 comes from a mad, you can make it a FMAD.PO thing
14:25 karolherbst: mhh it does, but $r7 is used in between
14:26 imirkin_: oh wait, sad!
14:26 imirkin_: the .PO thing is for int only
14:26 karolherbst: k
14:28 karolherbst: mov f32 $r0 0.062500; add ftz f32 $r7 neg $r0 -1.000000
14:28 imirkin_: whaaaa
14:28 imirkin_: that's a bunch of dirty lies
14:28 karolherbst: ohh right
14:28 karolherbst: $r0 is set in between
14:28 karolherbst: sorry
14:28 imirkin_: :p
14:29 karolherbst: but really funny
14:29 karolherbst: there is potential
14:29 karolherbst: too complicated though
14:30 imirkin_: karolherbst: ken pushed an alternate fix for the heaven drirc situation
14:30 imirkin_: so you should be getting your perf back
14:31 karolherbst: http://cgit.freedesktop.org/mesa/mesa/commit/?id=b3340cd32acf5935891f19833de0cfc500a93e0b
14:31 karolherbst: ?
14:31 imirkin_: yes.
14:32 karolherbst: nice
14:34 karolherbst: nice
14:34 karolherbst: mov u32 $r22 0x3c8b4396; mad ftz f32 $r21 $r22 $r14 $r21; rsq f32 $r22 abs $r21
14:34 karolherbst: =>
14:35 karolherbst: mad ftz $r21 $r14 0x3c8b4396 $r21; rsq f32 $r22 abs $r21 ?
14:35 karolherbst: or where can mad take immediates?
14:35 imirkin_: no limm fmad
14:35 imirkin_: it can take immediates, but they can't have more than 20 bits set
14:35 karolherbst: ohhhh
14:35 karolherbst: k
14:35 imirkin_: limm = long immediate
14:35 imirkin_: i.e. all 32 bits
14:35 karolherbst: meh :/
14:35 imirkin_: actually there is one
14:36 imirkin_: but it requires that dstreg = src2reg
14:36 karolherbst: who comes up with such restrictions and thinks it is a good idea :D
14:36 Wonka: hw engineers
14:36 imirkin_: well, when you have a 64-bit opcode
14:36 karolherbst: imirkin_: doesn't it cound for tha src2?
14:36 imirkin_: using 32 of those bits up for something
14:36 imirkin_: doesn't leave a *ton* for everything else
14:37 ravior: Hi. I've retested https://bugs.freedesktop.org/show_bug.cgi?id=71659 with the latest kernel and the defect is still reproducible.
14:37 karolherbst: because dest == src2
14:37 karolherbst: mad ftz f32 $r21 $r22 $r14 $r21, or is it dest src1 src2 src3?
14:37 imirkin_: ravior: you could try using blob firmware, see if that helps
14:38 ravior: If there is a way I can help with the debugging proces, just ping me. Thanks and sorry for the interruption.
14:38 imirkin_: karolherbst: yes, but... that's not known at opt time
14:38 ravior: imirkin_: I should try that. I'll reply with more info.
14:38 karolherbst: imirkin_: .. so we can't move that limm into that, because we don't know that dest == src2?
14:39 imirkin_: ravior: you can get the firmware using my script... described here: http://nouveau.freedesktop.org/wiki/VideoAcceleration/#firmware
14:39 imirkin_: ravior: (yeah, i know it says video, but ignore that. it gets pgraph fw too)
14:39 imirkin_: ravior: also if you're on 4.3 or later, you'll have to adjust the filenames
14:39 imirkin_: karolherbst: right
14:39 karolherbst: :/
14:39 imirkin_: karolherbst: RSpliet added a pass post-ra to do fixups like that
14:39 imirkin_: karolherbst: which is used for nv50
14:39 karolherbst: ohh
14:39 karolherbst: but not for nvc0?
14:40 imirkin_: no
14:40 karolherbst: okay
14:40 imirkin_: coz i wasn't aware of the situation at the time
14:40 imirkin_: but i am now!
14:40 karolherbst: I see
14:40 karolherbst: :D
14:40 karolherbst: then I will keep diggin into that shader :D
14:40 imirkin_: anyways... hand-optimized tends to do better than machine-generated
14:41 ravior: imirkin_: That is nice! I have Arch on the other hand which makes things a bit easier. Thanks anyway.
14:42 imirkin_: ravior: you'd think that, but iirc in their infinite wisdom and against my advice they don't use the 325.15 version
14:42 karolherbst: imirkin_: abs ftz f32 $r22 $r12; mul ftz f32 $r22 $r22 0.600000
14:42 karolherbst: => mul ftz f32 $r22 abs $r12 0.6?
14:42 karolherbst: or can't mul accept abs?
14:42 imirkin_: ravior: which means that it won't extract pgraph fw
14:43 imirkin_: karolherbst: it can, but not when there's an immediate. or maybe just not when there's a limm.
14:43 imirkin_: karolherbst: or wait, no, it can't
14:43 karolherbst: like 100% sure or nobody saw it yet?
14:44 imirkin_: karolherbst: try fuzzing nvdisasm, see if we missed some bits
14:44 karolherbst: well my blob dump of that shader doesn't contain mul abs
14:45 imirkin_: :)
14:45 imirkin_: does it contain unknown bits?
14:45 ravior: imirkin_: I didn't knew that. For now i'll try without pgraph to see if it works. Afterwards I'll try using the 325.15. Covering all bases shouldn't hurt.
14:45 karolherbst: imirkin_: how do I find those unknowns?
14:45 imirkin_: ravior: well... the pgraph firmware is the one i'm talking about using
14:45 imirkin_: karolherbst: they'll be loudly printed
14:45 karolherbst: the only unknown is that: joinat 0xdf0 [unknown: 00001c00 00000000]
14:45 imirkin_: ah yeah. that's ok.
14:45 imirkin_: that's an always-true predicate
14:46 imirkin_: ravior: and you'll have to move it to a diff location, and boot with nouveau.config=NvGrUseFW=1
14:49 ravior: imirkin_: I will do that. Thanks for tips.
14:50 imirkin_: ravior: iirc 409 -> fecs, 41a -> gpccs, c = inst, d = data. it will want the files in /lib/firmware/nvidia/gf119/{fecs,gpccs}_{inst,data}.bin
14:50 imirkin_: that's the 4.3 and 4.4 kernels
14:50 karolherbst: imirkin_: how can I follow values across branches? :/
14:50 imirkin_: look at the pre-RA stuff
14:50 imirkin_: it's a lot easier to read
14:50 karolherbst: ohh okay
14:50 imirkin_: phi nodes merge values from multiple branches
14:51 imirkin_: they're not real instructions
14:51 imirkin_: although sometimes they can become those stupid mov's you see
14:54 karolherbst: imirkin_: ehm.. max ftz f32 $r25 abs $r19 $r63
14:54 karolherbst: that looks like mov ftz f32 $r25 $r19
14:55 karolherbst: I meant mov ftz f32 $r25 abs $r19
14:55 imirkin_: heh
14:55 imirkin_: feel free to add an algebraic opt
14:56 imirkin_: max(abs(x), 0) = abs(x)
14:56 karolherbst: yeah
14:56 imirkin_: i'm sure that one comes up _all the time_ :p
14:56 imirkin_: actually i guess const folding makes more sense than algebraic
14:57 karolherbst: yeah
14:57 karolherbst: I guess we could do the same with min
14:57 karolherbst: min(abs(x), 0) = 0
14:57 imirkin_: :)
14:58 imirkin_: it might not behave well wrt NaN, but... meh
14:58 karolherbst: ohh
14:58 imirkin_: i'm inclined to disregard
14:58 karolherbst: nvidia will do the same optimization for sure
14:58 imirkin_: graphics shaders and NaN's never mix on purpose
14:58 imirkin_: only by accident
14:58 imirkin_: for compute it's more of an issue
14:59 karolherbst: imirkin_: ConstantFolding::opnd ?
15:00 imirkin_: yes
15:00 imirkin_: since one arg will be 0, and one will be not-immediate
15:01 karolherbst: k
15:01 Lekensteyn: I am getting "unknown connector type 70" after "DCB conn 15: 00000f70" (apparently a virtual connector for Wireless Display); shouldn't it be silenced?
15:02 imirkin_: meh
15:02 imirkin_: bbl
15:06 karolherbst: imirkin_: 3840 => 3770
15:07 karolherbst: but I did something wrong :D
15:08 karolherbst: looks very funny though
15:22 karolherbst: does this look okay? https://github.com/karolherbst/mesa/commit/9edc3b6829d47dadfc1f2c0121cf25842eef4bd6
15:22 karolherbst: I am not sure with this line: i->src(0).mod = i->src(0).mod - Modifier(NV50_IR_MOD_ABS);
15:27 imirkin: - ?
15:27 karolherbst: yeah as I said, I am not sure
15:27 karolherbst: I want to remove the ABS modifier
15:27 karolherbst: I thought I have to use | or ^
15:27 karolherbst: but with both I get a different output
15:28 imirkin: wtf does operator - do?
15:28 karolherbst: no idea, I guess it removes that modifier
15:28 karolherbst: no idea
15:28 imirkin: check!
15:28 karolherbst: that's why I am asking .D
15:28 imirkin: the code's in front of you
15:29 karolherbst: but the generate code looks right
15:29 karolherbst: 299: max ftz f32 $r18 abs $r18 $r63 => 299: abs ftz f32 $r18 $r18
15:29 karolherbst: imirkin: you would have used ^ I assume?
15:30 imirkin: i dunno, read through the modifier class
15:30 imirkin: see what suits your needs
15:30 karolherbst: k
15:31 karolherbst: there is no operator-
15:31 karolherbst: ...
15:31 imirkin: exactly
15:31 karolherbst: well I will head to bed now anyway, will continue tomorrow I guess :D
15:32 imirkin: when in doubt
15:32 imirkin: look at the actual bytecode generated
15:32 karolherbst: :D
15:34 glennk: when in doubt, look at r600 asm, then be happy you are poking about nouveau :-)
15:36 imirkin: glennk: well this might be about the actual instruction encoding :)
15:37 imirkin: i.e. are we emitting the instructions we *think* we're emitting
15:37 glennk: i've had a few of those lately
15:37 imirkin: those are the worst
15:38 glennk: i can say in comparison the fifo queue thing is way worse than a few mangled bits in the encoding
15:38 imirkin: hehe
15:39 Lekensteyn: ok, apparently WiDi connector 15 is used by output 5 ("DCB outp 05: 01d1fff8 00000000") which is of type 8 ("Reserved"; "failed to create encoder 1/8/0: -19")
15:40 imirkin: Lekensteyn: afaik it's a purely virtual thing to help with drivers
15:41 Lekensteyn: seems like it
15:41 imirkin: i think i've only seen it on GM20x's... is that what you have?
15:42 Lekensteyn: yes, GTX 965M of GM204
15:42 imirkin: kk
15:42 Lekensteyn: there is a dmesg at http://article.gmane.org/gmane.linux.kernel/2129609 if you are interested
15:43 Lekensteyn: nothing interesting really (except maybe that this hardware has a mux for the internal display panel)
15:43 ravior: imirkin: Since I've started using their firmware I haven't seen any warning/error. The downside is a small dip in performance.
15:43 imirkin: ravior: you were able to get it to load successfully?
15:44 ravior: I'll test it more to see if I'm able to reproduce it and I'll let you know
15:44 ravior: seems so
15:44 imirkin: does it say "using external firmware"?
15:45 ravior: [ 0.983025] nouveau 0000:01:00.0: gr: using external firmware
15:45 Lekensteyn: Did you get docs for the Optimus-related ACPI functions? I saw that nouveau_acpi.c contains more detailed names for the caps function
15:45 ravior: [ 0.983089] nouveau 0000:01:00.0: Direct firmware load for nvidia/gf119/fecs_inst.bin failed with error -2
15:45 imirkin: ravior: so... unsuccessful
15:45 imirkin: ravior: does it proceed to fail GR init?
15:46 imirkin: maybe that's why you're seeing a dip in perf - you lost gpu accel
15:47 ravior: doesn't seems to reach the init
15:47 imirkin: yeah, and i bet "glxinfo" shows llvmpipe
15:47 ravior: and you're right about GPU the accel.
15:47 imirkin: ravior: you need to adjust the file paths like i mentioned
15:47 imirkin: <imirkin_> ravior: iirc 409 -> fecs, 41a -> gpccs, c = inst, d = data. it will want the files in /lib/firmware/nvidia/gf119/{fecs,gpccs}_{inst,data}.bin
15:48 imirkin: the files you have now are /lib/firmware/nouveau/nvd9_fuc409c or something along those lines
15:52 ravior: replacing all files containing 409 with fecs and those with 41a with gpccs. Got it. Regarding /lib/firmware/nvidia/gf119/, I don't have that firmware.
15:52 imirkin: mkdir -p :p
15:52 imirkin: so like nouveau/nvd9_fuc409c -> nvidia/gf119/fecs_inst.bin
15:53 imirkin: etc
15:53 ravior: oh :D. So add the files to that path
15:53 imirkin: (nvd9 = gf119)
15:54 imirkin: this stuff got renamed in 4.3 for maximal user confusion
15:54 ravior: So I see... I'll do that now. I'll reply as soon as I'll finish it.
15:59 ravior: I finally understood the converstion. Still can't understand why this change. Doing it now.
16:01 ravior: So I'm interested in renaming only in nvd9 firmware.
16:02 imirkin: yes.
16:02 imirkin: and i recommend copying, not renaming
16:02 imirkin: so that if you should switch to a pre-4.3 kernel it'll also work
16:03 ravior: That means 4 files
16:03 imirkin: yes.
16:03 ravior: I did copy them. Thanks for the detailed walkthrough.
16:04 imirkin: np. i was surprised to hear you say you got it to work on the first try :)
16:04 imirkin: while conceivable, it seemed unlikely
16:04 imirkin: heh
16:08 ravior: Thanks. :) Now with that done, after reboot the proprietary firmware should be used. Nothing else is required?
16:09 imirkin: well, make sure the fw is available when the module loads
16:09 imirkin: e.g. if the module loads from initrd, the fw needs to be in the initrd
16:09 imirkin: or if it's built-in, the firmware needs to be in the kernel
16:09 imirkin: etc
16:10 ravior: imirkin: I already made sure to add nouveau to the list of mkinitcpio modules. I should be safe for now.
16:11 ravior: I'll reply in a moment. Time to see if it works.
16:11 imirkin: ok, well those files in the nvidia dir need to be there too
16:18 ravior: imirkin: It may be a stupid question, but in mkinitcpio.conf should I put the path for all 4 files in the "FILES:" option?
16:18 ravior: Or I'm thinking this in a wrong way?
16:29 ravior: imirkin: It was a stupid question.
16:29 ravior: The firmware was loaded correctly now.
16:30 ravior: No issues yet.
16:31 imirkin: and accel works now?
16:34 ravior: yes
16:34 imirkin: cool
16:35 ravior: I haven't seen any warning/error until now.
16:36 ravior: I should play some multimedia stream to see if the problem is occuring with the external firmware.
16:56 ravior: I've spoken to soon. I'm getting both types of error on my dmesg. The computer hasn't froze yet though, but it's just a matter of time.
17:00 ravior: Unfortunatelly It's 3 am here, so I'm out. If you want to take a look at the logs I've put the dmesg output here: https://gist.github.com/Steinhagen/62172fbf7c1fcce38c45
17:01 ravior: Thanks again for the help and I hope we'll return on this topic.
17:12 imirkin: hakzsam: happen to have the blob loaded perchance?
17:15 imirkin: or anyone happen to be running on the blob atm and can trace a piglit for me?
17:41 karlmag: http://www.slackware.com/~karlmag/nouveau/codenames/nouveau-codenames-reworked.txt
17:41 karlmag: oh I *hate* those people coming up with these naming schemes...
17:42 karlmag: had it been unique at the very least :-P
17:43 karlmag: it *will* be full of errors. Been trying to interpret http://nouveau.freedesktop.org/wiki/CodeNames/ as well as I could, but I probably have misinterpreted some of it and also have typoes, etc, etc..
17:44 imirkin: not sure i understand the ordering
17:46 karlmag: sorted?
17:46 karlmag: sorted on column 2
17:47 karlmag: that is... first on column 1, then column 2 (but I was a lazy bastard and didn't put e.g "GeForce" on all the first column lines..
17:47 karlmag: that can be fixed, obviously
17:49 karlmag: and "dup?" means "are these duplicates, or are these cards with different codes but same names?"
17:57 karlmag: d'oh.. checked for my own graphics card (in my current workstation) and found that I had omitted that (and some others) :-P Fixed and updated
18:09 imirkin: hakzsam: could really use a mmt trace of 'bin/arb_framebuffer_no_attachments-atomic -fbo -auto'
22:37 jayhost: What is status on GM107 reclock?